regression analysis

直線近似の各推定値の誤差-02

直線近似の推定について，具体的な値を

x	y
1	3
2	5
3	8
4	9
5	10

とします．

\( \Large y = ax+b \)

による近似なので，

各パラメータは，

\( \Large n = 5 \)

\( \Large p = 2 \)

\( \Large \overline{x} = \displaystyle \frac{1+2+3+4+5}{5} = 3 \)

\( \Large \overline{y} = \displaystyle \frac{3+5+8+9+10}{5} = 7 \)

\( \Large S_{xx} = \displaystyle \sum_{i=1}^n (x_i - \bar{x})^2 \)

\( \Large = \displaystyle (1-3)^2 +(2-3)^2+(3-3)^2+(4-3)^2+(5-3)^2 \)

\( \Large = 4+1+0+1+4= 10 \)

\( \Large S_{xy} = \displaystyle \sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y}) \)

\( \Large = \displaystyle (1-3)(3-7) +(2-3)(5-7)+(3-3)(8-7)+(4-3)(9-7)+(5-3)(10-7) \)

\( \Large = 8+2+0+2+6= 18 \)

\( \Large a = \displaystyle \frac{ S_{xy} }{ S_{xx}} = \frac{18}{10} = \color{red}{1.8} \)

\( \Large b = \displaystyle \overline{y} - \frac{ S_{xy} }{ S_{xx}} \overline{x} = 7 - 1.8 \times 3 = \color{blue}{1.6} \)

となります．推定値は，

x	y	\(\Large \hat{y} \)	\(\Large y-\hat{y} \)
1	3	3.4	-0.4
2	5	5.2	-0.2
3	8	7	1
4	9	8.8	0.2
5	10	10.6	-0.6

となるので，ｙの推定値からの差分の二乗和は，

\( \Large \displaystyle \sigma_y^2 = \sum_{i=1}^N ( y_i - \hat{y})^2 \)

\( \Large \displaystyle = (3-3.4)^2 + (5-5.2)^2 + (8-7)^2 + (9-8.8)^2 + (10-10.6)^2 \)

\( \Large \displaystyle = 0.16 + 0.04 + 1 + 0.04 + 0.36 = 1.6 \)

推定誤差は，

\( \Large \displaystyle \sigma_a^2 = \displaystyle \frac{ \sigma_y^2}{S_{xx}} = \frac{1.6}{10} = 0.16 \)

\( \Large \displaystyle SE_a = \sqrt{ \frac{0.16}{5-2}} = \color{red}{0.23094} \)

\( \Large \displaystyle \sigma_b^2 = \sigma_y^2 \left\{ \frac{1}{n}+ \frac{\left( \bar{x} \right)^2 }{S_{xx}} \right\} = 1.6 \times \left( \frac{1}{5} + \frac{3^2}{10} \right) = 1.76 \)

\( \Large \displaystyle SE_b =\sqrt{ \frac{1.76}{5-2}} = \color{blue}{0.765942}\)

となります．

エクセルにおける推定誤差

エクセルにおいても，線形近似の場合は推定誤差を計算することができて，LINEST関数を用いれば，

=LINEST(C2:C6,B2:B6,TRUE,TRUE)

a	b
1.8	1.6
0.230940108	0.765941686
0.952941176	0.730296743
60.75	3
32.4	1.6

との結果を得ることができます．各項目は，

行 / 列	1列目	2列目
1行目	傾き（m）	切片（b）
2行目	傾きの標準誤差	切片の標準誤差
3行目	R²（決定係数）	y の標準誤差
4行目	F値	自由度
5行目	回帰平方和	残差平方和

となり，一致しました．

Rにおける推定誤差

Rの場合でも，

....................................

xx <- c(1,2,3,4,5)

yy <- c(3,5,8,9,10)

plot(xx,yy)

fm<-nls(yy~(a0*xx+a1),start=c(a0=1,a1=5),trace=TRUE)

summary(fm)

m <- lm(yy~xx)

abline(m)

summary(m)

....................................

を実行すると，

Coefficients:

	Estimate	Std. Error	t value	Pr(>｜t｜)
(Intercept)	1.6000	0.7659	2.089	0.1279
xx	1.8000	0.2309	7.794	0.0044 **

となり，一致しました．

Pytonにおける推定誤差

Pythonにおいても，

....................................

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# データ
xx = np.array([1,2,3,4,5])
yy = np.array([3,5,8,9,10])

# モデル関数
def model(x, a0, a1):
return a0 + a1 * x

# 初期値
p0 = [1, 1]

# フィッティング
params, cov = curve_fit(model, xx, yy, p0=p0)

# 推定パラメータ
a0, a1 = params

# 標準誤差（= 共分散行列の対角成分の平方根）
stderr = np.sqrt(np.diag(cov))
se_a0, se_a1 = stderr

print("推定パラメータ:")
print(f"a0 = {a0:.5f} ± {se_a0:.5f}")
print(f"a1 = {a1:.5f} ± {se_a1:.5f}")

# 近似曲線を描くための X 軸
xx_new = np.linspace(min(xx), max(xx), 200)
yy_fit = model(xx_new, *params)

# プロット
plt.scatter(xx, yy, label="Data")
plt.plot(xx_new, yy_fit, label="Fit", linewidth=2)
plt.xlabel("x")
plt.ylabel("y")
plt.legend()
plt.show()

....................................

を実行すると，

推定パラメータ:

a0 = 1.60000 ± 0.76594

a1 = 1.80000 ± 0.23094

と一致しました．