๐Ÿ“Ž

Ch8: Heteroskedasticity

TOC
๐ŸŽฏ
Getting the correct standard erroes is central in statistical inference and important in applied works. There are two common issues: heteroskedasticity (i.e. ) and serial/spatial correlation ( i.e. )

Consequence of Heteroskedasticity for OLS

  • Homoskedasticity ;
  • Heteroskedasticity .
notion image

1. Heteroskedasticity is Common

Heteroskedasticity means that the variance of conditional on varies with . It is common and reasonable. Recap how we interpret descriptively:
  • We approximate the conditional mean of on using a linear function.
  • If is non-linear, then the quality of this approximation depends on . At values of where the fit is poorer, the variance of can be larger.
  • Even when is indeed linear, homoskedasticity is not guaranteed.
    • For example, the linear model where dependent variable is dummy and .
      • which also varies with .

2. Consequences of Heteroskedasticity

For model
๐Ÿ”ฅ
Heteroskedasticity will: 1. Not cause bias in OLS estimators 2. Not cause inconsistency in OLS estimators 3. Not affect and (the measurement of goodness-of-fit) 4. Make biased
  • 1 and 2 is obvious because they only need MLR.1 - MLR.4
  • As for 3, we know that both and are calculated using sample data to estimate the population (), which is decided by nature. Intuitively, neither the nature nor its estimation will change when we pick up different samples. Besides, consistently estimates , and consistently esimates , whether or not is constant.
  • Further Consequences of 4 :
    • the standard error, the t-statistics and CI are no longer valid;
    • The stat we used to test hypothesis under the Gauss-Markov assumptions are not valid;
    • Large sample size would not help solve the problem ๐Ÿ˜ข
    • OLS is no longer BLUE (Best โŒ, Linear โœ”๏ธ, Unbiased โœ”๏ธ)

Heteroskedasticity-Robust Standard Error

1. Estimating under Heteroskedasticity

For the simple linear regression model:
Assume SLR.1 - SLR.4 are satisfied, and there exists heteroskedasticity:
which take on differet values when varies and can be any function of .
The OLS estimator is
so conditional on ,
One valid estiamtor
For multiple regression model:
The valid estimator is:
where is the residual from regressing on all other independent variables, is the sum of residual squared of this regression.
โ˜๐Ÿป
is called the heteroskedasticity-robust standard error, or simply, robust standard error.
Note:
  1. It can be proved that the estimators are consistent.
  1. Since we approximate using , which creates bias in the estimation. There are some variations, for instance, to correct degrees of freedom by multiply before taking the square root.
  1. All these adjustments result in asymptotically equivalent estimators. For example, when ,

2. Robust se v.s. Conventional se

Under homoskedasticity, is simplified as
Under heteroskedasticity,
where , and thus and
๐Ÿ“Œ
Under heteroskedasticity, is replaced by a weighted average of .
For small samples, there is bias in the robust standard error formula. Besides robust standard errors have larger sampling variance.
๐Ÿ’ก
In Practice: - With large sample sizes, use robust standard errors. - Sometimes, especially with small sample size, report both standard errors, and use whichever larger to do inference.
Example
notion image

3. Inference

construct the t-stat
For F test, use STATA command: reg y x1 x1, vce(robust) and test command.

Weighted Least Squares Estiamtion

Now we put some restrictions on the heteroskedasticity โ€” it is up to a multiplicative constant.
So that the variance of can be written as:
where for all possible values of because variances must be positive.

1. Weighted Least Squares estimator (WLS)

Suppose we have known , therefore, we can write .
An alternative regression model:
  • Let denote all the explanatory variables. Conditional on , .
  • , satisfying homoskedasticity.
Denote the OLS estimator after the transformation as , we can prove that minimizes
๐Ÿ’ก
Weighted least squares estimator(WLS): the weight for each is . We give less weight for observations with higher variance. Intuitively, they provide less information.
  • is still one estimator for the original model, and have the same interpretation.
  • It satisfies MLR.1-MLR.5, so it is BLUE under heteroskedasticity with the form .
  • is also called generalized least squares estimators (GLS).

2. Feasible Generalized Least Squares (FGLS)

Since in practice, we often do not know , thus we should estimate it.
Assume takes the following form:
where has a mean of one.
  • The is to guarantee that
  • Equivalently:
  • Replace the unobserved with the OLS residuals . Estimate , calculate the fitted value . Then .
A FGLS Procedure to Correct for Heteroskedasticity
  1. Run the regression o on , get the residual .
  1. Calculate .
  1. Estimate , get the fitted value .
  1. Compute .
  1. Use as weights, estimate using WLS.
๐Ÿ”ฅ
FGLS is consistent, and has smaller asymptotic variance than OLS.
Example: Demand for Cigarettes
notion image
๐Ÿšจ
If the heteroskedasticity function is misspecified: 1. WLS is still consistent under MLR.1 - MLR.4; 2. The usual WLS se and test statistics are no longer valid; 3. There is no guarantee taht WLS is more efficient than OLS; 4. Robust se should be computed.
๐Ÿ“Œ
Note that if we observe significant differences in the point estimates using OLS versus WLS, it often suggests MLR.4 is violated.
WLS v.s. Robust se
There are two ways to handle heteroskedasticity so far:
  1. Use OLS to estimate the model, calculate the robust se (or use the max of the conventional se and robust se);
  1. Use FGLS to estimate the model, report conventional se or robust se.

Testing for Heteroskedasticity

Consider the model:
Test:

1. Visual Inspection

Using the OLS residual, as an estimate of , we can check how varies with in a graph.
STATA command: rvfplot or rvpplot can plot against (or ) (it shoule be used after reg command).
notion image
  • Note that the x-axis is fitted value (), because it is the linear combination of .
  • Itโ€™s obvious that for homoskedasticity, there is no correlation between the residual and fitted value while for heteroskedasticity, there is clear correlation.

2. LM test

The LM statistic can be used in testing multiple exclusion restrictions under large sample.
Test
We can use F test as well as LM test.
First estimate the restricted model:
if is true, then should be uncorrelated to . Regress on all
Let denote the R-squared of the regrssion. and the smaller the , the more likely is true.
๐Ÿ“Œ
We must include all because the omitted in the restricted model might be correlated with the that appear in the restricted model.
It can be proved that follows chi-square distribution with q degrees of freedom: .
notion image
๐Ÿ‘‰๐Ÿป
Reject if critical value (p < significance level)
  • With a large smaple, the outcomes of LM and F tests is close.
  • STATA command:
    • di 1-chi2(q,LM) , where LM is the LM statistic we obtain.
    • di invchi2(q,1-a) , where is the significance level.

3. B-P Test for Heteroskedasticity

We want to know in model , whether is correlated with .
  1. Estimate , get the residual .
  1. Estimate the following model and get :
  1. Test
  1. Calculate
    1. LM-stat: , or
    2. F-stat:
  1. Reject homoskedasticity if
      • test stat > critial value;
      • p < significance level.
Example: Housing Price
notion image

4. White Test for Heteroskedasticity

The homoskedasticity assumption can be replaced with the weaker assumption that the squared error, , is uncorrelated with all the independent variables (), the squares of the independent variables () and all the cross products ().
For example, when the model contains independent variables, the White test is based on an estimation of:
๐Ÿ“Œ
The White test for heteroskedasticity is the LM statistic for testing that all of the are zero, except for the intercept.
๐Ÿ‘Œ๐Ÿป
If there are many independent variables, we can use and instead:
Test:
STATA command:
  • BP test: reg y x1 x2, estat hettest, rhs iid
  • White test: reg y x1 x2, estat imtest, white
๐Ÿšจ
- All the above tests are based on the assumptions that MLR.1-MLR.4 hold - If MLR.4 does not hold (the functional form is misspecified, there is omitted variable etc.), then the test is no longer valid.

Loading Comments...