TOC
Consequence of Heteroskedasticity for OLS1. Heteroskedasticity is Common2. Consequences of HeteroskedasticityHeteroskedasticity-Robust Standard Error1. Estimating under Heteroskedasticity2. Robust se v.s. Conventional se3. InferenceWeighted Least Squares Estiamtion1. Weighted Least Squares estimator (WLS)2. Feasible Generalized Least Squares (FGLS)Testing for Heteroskedasticity1. Visual Inspection2. LM test3. B-P Test for Heteroskedasticity4. White Test for Heteroskedasticity
Getting the correct standard erroes is central in statistical inference and important in applied works. There are two common issues: heteroskedasticity (i.e. ) and serial/spatial correlation ( i.e. )
Consequence of Heteroskedasticity for OLS
- Homoskedasticity ;
- Heteroskedasticity .
1. Heteroskedasticity is Common
Heteroskedasticity means that the variance of conditional on varies with . It is common and reasonable. Recap how we interpret descriptively:
- We approximate the conditional mean of on using a linear function.
- If is non-linear, then the quality of this approximation depends on . At values of where the fit is poorer, the variance of can be larger.
- Even when is indeed linear, homoskedasticity is not guaranteed.
- For example, the linear model where dependent variable is dummy and .
which also varies with .
2. Consequences of Heteroskedasticity
For model
Heteroskedasticity will:
1. Not cause bias in OLS estimators
2. Not cause inconsistency in OLS estimators
3. Not affect and (the measurement of goodness-of-fit)
4. Make biased
- 1 and 2 is obvious because they only need MLR.1 - MLR.4
- As for 3, we know that both and are calculated using sample data to estimate the population (), which is decided by nature. Intuitively, neither the nature nor its estimation will change when we pick up different samples. Besides, consistently estimates , and consistently esimates , whether or not is constant.
- Further Consequences of 4 :
- the standard error, the t-statistics and CI are no longer valid;
- The stat we used to test hypothesis under the Gauss-Markov assumptions are not valid;
- Large sample size would not help solve the problem ๐ข
- OLS is no longer BLUE (Best โ, Linear โ๏ธ, Unbiased โ๏ธ)
Heteroskedasticity-Robust Standard Error
1. Estimating under Heteroskedasticity
For the simple linear regression model:
Assume SLR.1 - SLR.4 are satisfied, and there exists heteroskedasticity:
which take on differet values when varies and can be any function of .
The OLS estimator is
so conditional on ,
One valid estiamtor
For multiple regression model:
The valid estimator is:
where is the residual from regressing on all other independent variables, is the sum of residual squared of this regression.
is called the heteroskedasticity-robust standard error, or simply, robust standard error.
Note:
- It can be proved that the estimators are consistent.
- Since we approximate using , which creates bias in the estimation. There are some variations, for instance, to correct degrees of freedom by multiply before taking the square root.
- All these adjustments result in asymptotically equivalent estimators. For example, when ,
2. Robust se v.s. Conventional se
Under homoskedasticity, is simplified as
Under heteroskedasticity,
where , and thus and
Under heteroskedasticity, is replaced by a weighted average of .
For small samples, there is bias in the robust standard error formula. Besides robust standard errors have larger sampling variance.
In Practice:
- With large sample sizes, use robust standard errors.
- Sometimes, especially with small sample size, report both standard errors, and use whichever larger to do inference.
Example
3. Inference
construct the t-stat
For F test, use STATA command:
reg y x1 x1, vce(robust)
and test command.Weighted Least Squares Estiamtion
Now we put some restrictions on the heteroskedasticity โ it is up to a multiplicative constant.
So that the variance of can be written as:
where for all possible values of because variances must be positive.
1. Weighted Least Squares estimator (WLS)
Suppose we have known , therefore, we can write .
An alternative regression model:
- Let denote all the explanatory variables. Conditional on , .
- , satisfying homoskedasticity.
Denote the OLS estimator after the transformation as , we can prove that minimizes
Weighted least squares estimator(WLS): the weight for each is . We give less weight for observations with higher variance. Intuitively, they provide less information.
- is still one estimator for the original model, and have the same interpretation.
- It satisfies MLR.1-MLR.5, so it is BLUE under heteroskedasticity with the form .
- is also called generalized least squares estimators (GLS).
2. Feasible Generalized Least Squares (FGLS)
Since in practice, we often do not know , thus we should estimate it.
Assume takes the following form:
where has a mean of one.
- The is to guarantee that
- Equivalently:
- Replace the unobserved with the OLS residuals . Estimate , calculate the fitted value . Then .
A FGLS Procedure to Correct for Heteroskedasticity
- Run the regression o on , get the residual .
- Calculate .
- Estimate , get the fitted value .
- Compute .
- Use as weights, estimate using WLS.
FGLS is consistent, and has smaller asymptotic variance than OLS.
Example: Demand for Cigarettes
If the heteroskedasticity function is misspecified:
1. WLS is still consistent under MLR.1 - MLR.4;
2. The usual WLS se and test statistics are no longer valid;
3. There is no guarantee taht WLS is more efficient than OLS;
4. Robust se should be computed.
Note that if we observe significant differences in the point estimates using OLS versus WLS, it often suggests MLR.4 is violated.
WLS v.s. Robust se
There are two ways to handle heteroskedasticity so far:
- Use OLS to estimate the model, calculate the robust se (or use the max of the conventional se and robust se);
- Use FGLS to estimate the model, report conventional se or robust se.
Testing for Heteroskedasticity
Consider the model:
Test:
1. Visual Inspection
Using the OLS residual, as an estimate of , we can check how varies with in a graph.
STATA command:
rvfplot
or rvpplot
can plot against (or ) (it shoule be used after reg
command).- Note that the x-axis is fitted value (), because it is the linear combination of .
- Itโs obvious that for homoskedasticity, there is no correlation between the residual and fitted value while for heteroskedasticity, there is clear correlation.
2. LM test
The LM statistic can be used in testing multiple exclusion restrictions under large sample.
Test
We can use F test as well as LM test.
First estimate the restricted model:
if is true, then should be uncorrelated to . Regress on all
Let denote the R-squared of the regrssion. and the smaller the , the more likely is true.
We must include all because the omitted in the restricted model might be correlated with the that appear in the restricted model.
It can be proved that follows chi-square distribution with q degrees of freedom: .
Reject if critical value (p < significance level)
- With a large smaple, the outcomes of LM and F tests is close.
- STATA command:
di 1-chi2(q,LM)
, where LM is the LM statistic we obtain.di invchi2(q,1-a)
, where is the significance level.
3. B-P Test for Heteroskedasticity
We want to know in model , whether is correlated with .
- Estimate , get the residual .
- Estimate the following model and get :
- Test
- Calculate
- LM-stat: , or
- F-stat:
- Reject homoskedasticity if
- test stat > critial value;
- p < significance level.
Example: Housing Price
4. White Test for Heteroskedasticity
The homoskedasticity assumption can be replaced with the weaker assumption that the squared error, , is uncorrelated with all the independent variables (), the squares of the independent variables () and all the cross products ().
For example, when the model contains independent variables, the White test is based on an estimation of:
The White test for heteroskedasticity is the LM statistic for testing that all of the are zero, except for the intercept.
If there are many independent variables, we can use and instead:
Test:
STATA command:
- BP test:
reg y x1 x2
,estat hettest, rhs iid
- White test:
reg y x1 x2
,estat imtest, white
- All the above tests are based on the assumptions that MLR.1-MLR.4 hold
- If MLR.4 does not hold (the functional form is misspecified, there is omitted variable etc.), then the test is no longer valid.
Loading Comments...