Linear Regression in Matrix Notation
Notations
The model is
The least squares estimators are
define the vector of fitted values and residuals as
and it follows that
The Hat Matrix
Note that
where
is called the hat matrix, because it puts a βhatβ on .
The diagonal entries of are referred to as the leverages.
Mathematically, is a projection matrix, as it βprojectβ the vector of observed responses onto the space of all vectors which are linear combinations of the columns of . Besides, note that is symmetric and idempotent, meaning that .
projects the vector onto the null space of the columns of , because
Note that is also symmetric and idempotent.
Additional Results
The variance of is
If is assumed normal with mean zero and variance , the following hold:
- is the maximum likelihood estimator
- is multivariate normal with mean and covariance
- is multivariate normal with mean and covariance
- is multivariate normal with mean and covariance
- is multivariate normal with mean zero and covariance
Degrees of Freedom
Suppose the has columns ( independent variables and one intercept), then there are degrees of freedom in the residuals. Therefore
- The unbiased estimator of is
- When the errors of i.i.d. normal, the statistic
has the t-distribution with degrees of freedom. Hence, a confidence interval for is formed as
and hypothesis tests concerning should compare the test statistic (the βt valueβ), with the t-distribution with degrees of freedom.
- For large , we can appeal to the central limit theorem and construct an approximate confidence interval for using
Robust Regression
In the least squares approach to regression, we seek that minimizes the residual sum of squares
which is equivalent to minimizing
where .
The general concern with is that it may place too much weight on extreme observations because increases much more quickly than .
The choice may be optimal when the errors are normal, but it is not very robust to deviations from this assumption.
There are many possible choices for , for example, the Huber loss function,
where is set by the user. For technical reasons, the default choice is .
- If is chosen large, the Huber loss function will get closer to least squares result
- If is chosen closer to 0, the result is similar to (i.e. L1 regression)
Example
It can be observed that in both instances above, the Huber loss function effectively downweights the extreme observations, and , as a result, results in a closer approximation to the truth.
Robust regressions (using other loss function instead of is useful where it is suspected that the error distribution is not normal, and there is concern over extreme observations.
Loading Comments...