πŸ“Š

FDSI-7 Multiple Regression in Matrix Form

Linear Regression in Matrix Notation

Notations
The model is
The least squares estimators are
define the vector of fitted values and residuals as
and it follows that

The Hat Matrix

Note that
where
is called the hat matrix, because it puts a β€œhat” on .
The diagonal entries of are referred to as the leverages.
Mathematically, is a projection matrix, as it β€œproject” the vector of observed responses onto the space of all vectors which are linear combinations of the columns of . Besides, note that is symmetric and idempotent, meaning that .
projects the vector onto the null space of the columns of , because
Note that is also symmetric and idempotent.

Additional Results

The variance of is
If is assumed normal with mean zero and variance , the following hold:
  • is the maximum likelihood estimator
  • is multivariate normal with mean and covariance
  • is multivariate normal with mean and covariance
  • is multivariate normal with mean and covariance
  • is multivariate normal with mean zero and covariance

Degrees of Freedom

Suppose the has columns ( independent variables and one intercept), then there are degrees of freedom in the residuals. Therefore
  • The unbiased estimator of is
  • When the errors of i.i.d. normal, the statistic
    • has the t-distribution with degrees of freedom. Hence, a confidence interval for is formed as
      and hypothesis tests concerning should compare the test statistic (the β€œt value”), with the t-distribution with degrees of freedom.
  • For large , we can appeal to the central limit theorem and construct an approximate confidence interval for using

Robust Regression

In the least squares approach to regression, we seek that minimizes the residual sum of squares
which is equivalent to minimizing
where .
The general concern with is that it may place too much weight on extreme observations because increases much more quickly than .
The choice may be optimal when the errors are normal, but it is not very robust to deviations from this assumption.
There are many possible choices for , for example, the Huber loss function,
where is set by the user. For technical reasons, the default choice is .
Comparison of Loss Functions
Comparison of Loss Functions
  • If is chosen large, the Huber loss function will get closer to least squares result
  • If is chosen closer to 0, the result is similar to (i.e. L1 regression)
Example
notion image
It can be observed that in both instances above, the Huber loss function effectively downweights the extreme observations, and , as a result, results in a closer approximation to the truth.
Robust regressions (using other loss function instead of is useful where it is suspected that the error distribution is not normal, and there is concern over extreme observations.

Loading Comments...