๐Ÿ”—

T3. Instrumental Variables Regression

TOC

1. Instrumental Variables

1.1 Definitions

Exogenous variable is a variable that is uncorrelated with ,
Endogenous variable is a variable that is correlated with ,
An Instrumental Variable (IV) is a variable that is correlated with but uncorrelated with s
We can use IV to estimate the effect on of only that part of that is correlated with . Because is uncorrelated with , the part of that is correlated with must also be uncorrelated with .

1.2 Examples

Survey on twins: measurement error
When economist is worried about measurement error, a good choice of instrument is simply a different measure of the same variable. The new measure may have its own errors, but these errors are unlikely to be correlated with the mistakes in the first measure, or with any other component of . For example, Ashenfelter and Rouse were studying the effect of education on earnings. Their data came from a survey of twins. They were concerned that individuals might mis-report their own years of schooling, leading to measurement error biases. However, Ashenfelter and Rouse had two separate measures for each individualโ€™s years of schooling: the survy asked each individual to list both his/her own years of schooling, and also the years of schooling for his/her twin. The twinโ€™s report of an individualโ€™s schooling served as an instrumental variable for the individualโ€™s self-report.
Itโ€™s a good instrument because:
Cigarettes Sold: Simutaneous error
Suppose we are studying the effect of price on the demand for cigarettes, using a cross-section of different statesโ€™ cigarette consumption and average price
where indexes each state.
  • Each stateโ€™s cigarette excise tax is not a good IV:
    • taxes reflect the level of anti-smoking sentiment in the state, thus
  • A measure of state anti-smoking laws:
    • A proxy of anti-smoking sentiment
    • highly correlated with
  • Each stateโ€™s sales tax
    • State sales taxes are correlated with cigarette prices
    • a relatively good IV

2. Deriving IV estimator

2.1 MOM Regression

Consider the single regressor and single IV case. Since there is only one IV, we can derive the IV estimator as following (kind of intuitively)
If Z increases by 1 unit, then X1 (hence X) will increase by, say , units. If X1 increases by 1 unit, then Y will increases by  units โ‡’ If Z increase by 1 unit, then Y will increases by () units
If Z increases by 1 unit, then X1 (hence X) will increase by, say , units. If X1 increases by 1 unit, then Y will increases by units โ‡’ If Z increase by 1 unit, then Y will increases by () units
where is the โ€œcleanโ€ part and is the โ€œdirtyโ€ part. and .
  • regress on :
  • regress directly on :
thus
Another way to deriving IV estimators (Method of Moments, MOM)
Supplement: MOM estimator is equivalent to OLS estimator
law of large number
For regression model
thus
That is, OLS estimator is an MOM estimator.

2.2 2SLS and Multiple Regression

2.2.1 Multiple regression model
where are endogenous regressors and are included exogenous regressors.
Let be instruments, so . The model is
  • exactly (just) identified when ;
  • over-identified when
  • under-identified when
Note that we cannot use as an instrument for , otherwise, there will be multi-colinearity problem. Therefore, we require that
2.2.2 2SLS Procedure
1st Stage Regression:
  • Regress each of the on ALL exogenous variables (including all and ) to get predicted
  • We should include in the first stage because and may be correlated. Otherwise, may be correlated with and then in stage 2, residual may be correlated with , introducing new endogeneity.
2nd Stage Regression:
  • Regress on predicted and
notion image

3. Properties of IV estimator

3.1 Biased & Consistent

3.1.1 IV estimator is biased (single IV)
if , using law of expectation iteration, thus,
since X is endogenous, cannot be zero. Therefore, IV estimator is biased.
3.1.2 IV estimator is consistent (single IV)

3.2 Asymptotic Properties

3.2.1 Asymptotic Normality of IV estimator (single IV)
Estimation:
where
3.2.2 Asymptotic Variance of IV estimator under conditional homoskedasticity
Under conditional homo: , plus , we have
Thus
  • Standard error in the IV case differs from OLS only in the from regressing on ()
  • Since , the IV standard errors are larger than the OLS standard errors
  • The stronger the correlation between and , the smaller the IV standard errors
3.2.3 Asymptotic Variance of IV/2SLS estimator
Consider a model having a single endogenous explanatory variable
Assume is endogenous and is exogenous. Following 2SLS procedure, after regress on its IVs and , we obtain . Then regress on . Thus the (asymptotic) variance of is
where , is the total variation in , and is the R-squared from a regression of on .
Compare with the OLS estimator which directly regress on and
We can yield the same conclusion as above MOM that variance of IV (here ) estimator is larger than variance of OLS estimator:
  • No difference in
  • : because total sum square = explained sum of squares + residual sum of squares. is the total sum square while is the explained sum of squares according to first stage regression.
  • : the correlation between and is larger than the correlation between and because of the first stage regression.
Because , when there is a multicollinearity issue (variance or SE of estimator gets large), 2SLS estimator will suffer even more.

4. Instrument Relevance

4.1 Weak Instruments

Focus on a single included endogenous regressor:
First stage regression is
  • The instruments are relevant if at least one of are nonzero
  • The instruments are said to be weak if all the are either zero or nearly zero
Weak instruments explain very little of the variation in , beyond that explained by the .
If the instruments are weak, the sampling distribution of the 2SLS estimators and t-statistics are not normal even in large samples.
An example: the sampling distribution of the TSLS t-statistic with weak instruments
An example: the sampling distribution of the TSLS t-statistic with weak instruments
The existence of instruments make IV estimator less desirable, take the one IV example:
It shows that, even if is small, the inconsistency in the IV estimator can be very large if is also small. Thus, even if we only focus on consistency, it is not necessarily better to use IV than OLS even if the correlation between and is smaller than that between and because

4.2 Detecting Weak Instruments

We can use F-test in the first stage regression to detect weak instruments. For regression,
we wish to test the hypotheses
Note that the test is only on the coefficients of the Zโ€™s not the Wโ€™s.
Rule of thumb: means that instruments are not weak.
The intuition of comparing with 10 is to test whether the bias of 2SLS, relative to OLS, is less than 10%. If is smaller than 10, the relative bias exceeds 10%, that is, 2SLS can have substantial bias.
For the general case where there are multiple , rank condition and matrix algebra are needed.

5. Instrument Exogeneity

For the simplest model: ,
if , then IV estimator is inconsistent.
Order condition: We cannot test exogeneity when we have exact identification (i.e. the number of instruments equals the number of endogenous regressors).
Suppose we have two instruments: for model . Then we have two possible first-stage regression:
These two first stage regression will lead to two 2SLS estimates. If both instruments are exogenous, then these two estimates are expected to be close to each other as both are consistent. If these two estimates are far apart from each other, then it would be reasonable to believe one or both IVs are not exogenous.
If we have multiple instruments, it is possible to test for the exogeneity. The exogeneity of instruments means that they are uncorrelated with . This suggests that 2SLS residual should be approximately uncorrelated with the instruments. Test Procedure:
(1) Run the 2SLS by using all potential IVs and obtain the 2SLS residuals
(2) Run the regression
F-statistic for
, where is the number of excluded IVs () and is the number of endogenous regressors ().
notion image
We reject the null for large values of the J-stats. Note that we require otherwise always.

6. OLS or IV

Considerations
If explanatory variable is exogenous:
  • IV estimator and OLS estimator are both consistent
  • Therefore, use OLS estimator
If explanatory variable if endogenous:
  • IV estimator is consistent
  • OLS estimator is not consistent
  • Therefore, use IV estimator
Test for Endogeneity of a Single Explanatory Variable
Suppose is endogenous, and IV is .
In the 1st stage of 2SLS, we know that if is correlated with , it must be is correlated with . So regress on (may have multiple ) and to get and regress
Test (which means is exogenous)
STATA command: estat endogenous
if p < significance level, rejecet and is endogenous.

7. IV/2SLS Matrix Form

7.1 IV Matrix Form

Let the equation of interest be
where is a vector. Assume that so there is endogeneity. We call this equation the structural equation. In matrix notation, this can be written as
Definition of IV: The random vector is an instrumental variable for above structural model if:
  1. (instrument exogeneity)
  1. (instrument relevance)
In a typical set-up, some regressors in (at least the intercept) will be uncorrelated with . Thus we make the partition
where and . We call exogenous and endogenous. should be included in . So we have the partition
where contains the included exogenous variables and contains the excluded exogenous variables.
The mode is just-identified if (i.e., ) and is over-identified if (i.e., )
The reduced form relationship between and the instrument is found by linear projection:
where is an matrix of coefficients, and is the projection error such that . In matrix notation, we can write as
where is an matrix.
By construction, a linear projection can be estimated by OLS
where
The reduced form for is
where and . Its element . Observe that
The above equation can be estimated by OLS
where . The reduced form equation for the system is
  • If , then
  • If , then for any p.d. matrix ,

7.2 2SLS Matrix Form

The two-stage least squares (2SLS) estimation
Stage 1: Regress on to obtain , and save the predicted value .
Stage 2: Regress on to obtain the 2SLS estimator
If the model is just-identified, so that , then the formula for can be simplified to:
which is also called the instrumental variable (IV) estimator and written as in the literature.
In the just-identified case, can also have some other interpretation. Since , we can construct the indirect least squares (ILS) estimator:

Loading Comments...