๐Ÿ“Ž

Ch15: Instrumental Variable (IV)

TOC

IV Estimator

1. Endogenenity & Exogeneity

Zero conditional mean condition:
  • is endogenous if it is correlated with .
  • is exogenous if it is not correlated with .
๐Ÿšจ
Violating the zero conditional mean condition will cause the OLS estimator to be biased and inconsistent.

2. Omitted Variable Bias

Example
Assume , Our primary interest is to estimate consistently, and do not really care about .
Suppose we do not have data on , and thus we only regress on :
Note that
.
So if , zero conditional mean condition will be violated. That is the error terms contain variables correlated with .
notion image
  • Since , OLS is biased.
  • Since , OLS is inconsistent.

3. Instrumental Variable

For model
  • is still 0 because there is an intercept () in the equation.
  • Since , we no longer have .
  • To carry out sample analogue, we should find another variable (Instrumental Variable) which satisfies:
Then we can get
Use and , we get
notion image
Therefore, s are
  • can be viewed at the special case of e. And when , . In other words, an exogenous variable is ites own intrument.

4. Assumptions of the IV estimator

notion image
  1. Instrument relevance:
      • That is is relevant for explaining variation in .
      • We can test it by:
  1. Instrument exogeneity:
      • This assumption guarantees consistency.
      • In general, we cannot test this directly from the data, we appeal to economic theory or introsepction.
๐Ÿ“Œ
Instrument Variable v.s. Proxy Variable: Model: + + , coefficient of which we care about. + , coefficient of which we not care. + , Proxy variable or Instrument variable to address the problem. Problem: is not observed, and , which make an omitted variable causing biased and inconsistent. Assumptions & Intuition ๐Ÿค” > Similarity: we require (whether as proxy variable or Instrument variable) is exogenous, that is, i.e. โ‡’ To address . > Difference: โ‡’ Ways to address ). >> As a proxy variable, we require to be correlated with the omitted variable (, thatโ€™s why is called proxy). Besides, (All parts of which is correlated with is contained in . Otherwise, will still be biased because there are still some omitted varibles in . >> As an IV, we require to be uncorrelated with the omitted variable (Otherwise, there will be omitted variable for and we would not have and further ). With IV, we can get and thus carry out the sample analogue. ๐Ÿฆ A more straightforward difference between IV and Proxy variable is that Proxy variable is the proxy of variable which cannot be measured, thus Proxy variable will actually exists in the final model. However, IV is only used as an instrument, and will not appear in the final model.
Example: Smoking and Birth Weights
Potential IVs are: cigarettes price, cigarette tax etc.
notion image
In this case, the price coefficient of is positive, which means is not correlated with (may because of the addict). So cannot the be IV.

Properties and Inference with IV Estimator

๐Ÿ”ฅ
- When , is unbiased, consistent and BLUE - When , is biased, inconsistent - No matter whether , if , is biased and consistent, and has a larger variance of

1. Properties of IV Estimator

IV estimator is consistent
IV estimator is Biased
Conditional on :
Since is not a constant, we cannot simplify further.
Conditional both on and :
But . Otherwise , which contradicts
Variance of
Add another assumption:
It can be proved that under these assumptions:
where is the population variance of , is the square of the population correlation between and .
  • is estimated by the sample variance of .
  • To estimate , run the regression of on to obtain .
  • To estimate , use the IV residuals:
    • A consistent estimator of is:
Therefore,
Note that the variance of the OLS estimator is
So the IV estimator has a larger variance.
โš ๏ธ
If and are only slightly correlated, then can be small, and this translate into a large sampling variance of the IV estimator.

2. Extend to Multiple Regression Model

Multiple Regression Model:
where is endogenous, and we have an IV .
Then use the moment conditions , , , the sample analogues:
notion image
Assumption: is correlated with (even after partialling out ) and
๐Ÿ‘Œ๐Ÿป
Then we can estimate using OLS and use a test.

Two Stage Least Squares

In practice, we may have more than one instrumental variables. Two-stage least squares (2SLS): use the linear combination of multiple IVs to construct a new IV.

1. Steps of 2SLS

Stage 1: estimate (using OLS)
and calculate .
  • We decompose the endogenous variable into two parts:
      1. which is uncorrelated with because are all uncorreltaed with .
      1. : which is correlated with . It contains the โ€œendogenousโ€ part of .
  • We should include all exogenous variables or there will be omitted variable in stage 2
    • Proof
      notion image
      ย 
Stage 2: use as an IV for . Or directly estimate
Proof: Two method are equivalent:
notion image
notion image
notion image

2. Multiple Endogenous Variables

For model:
where and are endogenous (IVs are respectively ), is exogenous.
In the first stage, we need to include all IVs and exogenous on the right side
notion image
STATA command ivregress 2sls y x3 (x1 x2 = z1 z2)
if we think there is heteroskedasticity, use
ivregress 2sls y x3 (x1 x2 = z1 z2),vce(robust)
โš ๏ธ
Although we can carry out 2SLS step by step manually (using 3 STATA commands) and the coefficient will be the same as we just use one command (ivregress), the of estimators will be different. So in practice, just run the single command in STATA.

Issues with IV

1. Sample Size

We decompose into the exogenous part and the endogenous part . But in the 2nd stage, we plug in not . Under small sample, the estimates could deviate from the true value a lot which causes bias. But since IV is consistent, we can address the issue with large sample.

2. Weak Instruments

Weak instruments means low correlation between and .
Suppose there is some small correlated between and . Since
notion image
where and are the sd of and in the population respectively.
Besides,
notion image
So if is small enough, then then even if is small, the IV estimator could result in larger asymptotic bias than the OLS estimator.
Even though , since
notion image
The sencond term will converge to 0 as the sample size is large. But for small sample, it will not be exactly 0 and weak IV will enlarge it a lot.
๐Ÿ”
Weak Instrument will enlarge the errors (compared to OLS) once the errors come up.
Detecting Weak Instruments
Model
Assume is endogenous with two IVs , and is exogenous. Then estimate:
Test:
Rule of thumb: F-stat should be larger than 10 to conclude no weak IV problem.

3. Testing for Endogeneity

Suppose is endogenous, and IV is .
In the 1st stage of 2SLS, we know that if is correlated with , it must be is correlated with . So regress on and to get and regress
Test (which means is exogenous)
STATA command: estat endogenous
if p < significance level, rejecet and is endogenous.

4. Tesing Overidentification Restrictions

The key assumption for a valid IV is , which cannot be directly be tested using the data. But when we have more IVs than the endogenous independent variables, we can provide more evidence on whether this is true.
Suppose is endogenous, and the IVs are and . Using either one of them, we calculate and . If the two coefficient is very different, then at least one of them does not satisfy . If they are close, maybe both satisfy, maybe both not.
When we have parameters and IV, itโ€™s cumbersome to calculate them. We can compute a test stat based on 2SLS residuals. The idea is that , if all instruments are exogenous, the 2SLS residuals should be uncorrelated with the instruments, up to sampling error.
Therefore, the test checks whether the 2SLS residuals are correlated with linear functions of the instruments.
Steps of Testing Overidentification Restrictions
  1. Estimate the model by 2SLS and obtain the 2SLS residuals, .
  1. Regress on all exogenous variables. Obtain the .
  1. Under the that all IVs are uncorrelated with , , where q is the number of IVs from outside the model minus the total number of endogenous explanatory variables. Reject if
STATA command: estat, overid
๐Ÿ› 
Cookbook of Using IV: 1. First determine the endogenous varibales of the model using economic theory, common sense, or the endogeneity test. 2. Draw a large sample 3. Find valid IVs that: + Satisfy : test for weak instruments + Satisfy : judge using economic theory etc. or the overidentifying tests 4. Having identified the endogenous variables and their IVs, use STATA to estimate the model.
ย 

Loading Comments...