Mathematical representation of the model
\[
\begin{aligned}
Y &= \text{Model} + \text{Error} \\[8pt]
&= f(X) + \epsilon \\[8pt]
&= E(Y|X) + \epsilon \\[8pt]
&= \beta_0 + \beta_1 X + \epsilon
\end{aligned}
\]
where the errors are independent and normally distributed:
- independent: Knowing the error term for one observation doesn’t tell you anything about the error term for another observation
- normally distributed: \(\epsilon \sim N(0, \sigma_\epsilon^2)\)
Mathematical representation, visualized
\[
Y|X \sim N(\beta_0 + \beta_1 X, \sigma_\epsilon^2)
\]
- Mean: \(\beta_0 + \beta_1 X\), the predicted value based on the regression model
- Variance: \(\sigma_\epsilon^2\), constant across the range of \(X\)
- How do we estimate \(\sigma_\epsilon^2\)?
Regression standard error
Once we fit the model, we can use the residuals to estimate the regression standard error, the average distance between the observed values and the regression line
\[
\hat{\sigma}_\epsilon = \sqrt{\frac{\sum_\limits{i=1}^n(y_i - \hat{y}_i)^2}{n-2}} = \sqrt{\frac{\sum_\limits{i=1}^ne_i^2}{n-2}}
\]
Standard error of \(\hat{\beta}_1\)
The standard error of \(\hat{\beta}_1\) quantifies the sampling variability in the estimated slopes
\[
SE_{\hat{\beta}_1} = \hat{\sigma}_\epsilon\sqrt{\frac{1}{(n-1)s_X^2}}
\]
| (Intercept) |
116652.33 |
53302.46 |
2.19 |
0.03 |
| area |
159.48 |
18.17 |
8.78 |
0.00 |