๐Ÿ”—

T2. Endogeneity

TOC

1. Definition & Influence

1.1 Endogeneity

An endogenous variable is a variable that is correlated with , that is
An exogenous variable is a variable that is uncorrelated with , that is
Endogeneity: The correlation between and implies that the Ceteris Paribus assumption does not hold , where Ceteris Paribus is a Latin phrase meaning โ€œall other things being equalโ€.

1.2 Influence of endogeneity

When , the consequence is that the OLS estimator is inconsistent and biased.
For simple linear regression
then
If , the effect is shown as below figure
The red/solid line is the true population. The blue/dotted line is the fitted line. Because the errors are positively correlated with the regressor, the fitted OLS line is steeper than the true line: positive bias.
The red/solid line is the true population. The blue/dotted line is the fitted line. Because the errors are positively correlated with the regressor, the fitted OLS line is steeper than the true line: positive bias.

2. Sources of Endogeneity

Main sources of endogeneity include Omitted variable bias (OVB), Wrong functional form, Measurement error, Simultaneous causality, Sample selection, etc.

2.1 Omitted variable bias (OVB)

2.1.1 Definition
when is omitted, we have
Now
if and .
notion image
The intuitive reason is that, in addition to its direct effect , has an apparent indirect effect as a consequence of acting as a proxy for the missing . The strength of the proxy effect depends on two factors: the strength of the effect of on , which is given by , and the ability of to mimic , i.e. .
For example:
  • when has a positive bias;
  • when has a negative bias.
notion image
ย 
2.1.2 Solutions to OVB
  • If the variable can be measured, include it as an additional regressor in multiple regression
  • Possibly, use panel data in which each entity (individual) is observed more than once
  • If the variable cannot be measured, use instrumental variable (IV) regression
  • If the variable cannot be measured, use proxy variable (another variable which is correlated with the omitted variable but can be measured and easily accessed)
    • Good proxy variables should satisfy
      then

2.2 Wrong Functional Form

2.2.1 Definition
Wrong functional form arises if the functional form used in the regression is incorrect. For example, the true relationship between and is
If we run a regression
Then
and
2.2.2 Testing
To test whether there are omitted nonliner terms, we can follow below steps:
  1. Regress
      • test whether . If so, there are no omitted nonlinear terms. Otherwise, there is.
2.2.3 Solutions to functional form misspecification
  • For continuous dependent variable: use โ€œappropriateโ€ nonlinear specifications in (logarithms, interactions, etc.)
  • For discrete (e.g. binary) dependent variable: need an extension of multiple regression methods (โ€probitโ€ or โ€œlogitโ€ analysis for binary dependent variables)
  • Some other Nonparametric Econometrics methods

2.3 Measurement Error

2.3.1 Definition
In reality, economic data often have measurement error for some reasons:
  • Data entry errors in administrative data
  • Recollection errors in surveys (e.g. when did you start your current job?)
  • Ambiguous questions (e.g. what was your income last year?)
  • Intentionally false response problems with surveys (e.g. What is the current value of your financial assets?)
Assume the model we want to estimate is
but we can only access measurement , which differs from the true value of by an error , i.e. . Itโ€™s intuitive to assume:
Then
the estimation of is
The bias is called Attenuation bias, the bias towards zero (estimated coefficientsโ€™ abstract values are always smaller):
  • When , the OLS estimator is biased upward (positive bias, estimated beta tends to be larger)
notion image
  • When , the OLS estimator is biased downward (negative bias, estimated beta tends to be smaller)
notion image
Explanation about the bias towards zero is that we are tring to use the association between and to capture the strength of causal link between and . However, due to the presence of the noise , the association is a dempened measure (having smaller abstract value) of the causal link.
notion image
2.3.2 Solutions
  • Obtain better data
  • Develop a specific model of the measurement error process
    • This is only possible if a lot is known about the nature of the measurement error
  • Instrument variable (IV) regression
Supplement:
when there is noise in , that is we can only access measurement , where is random error. Then
is still โ€™s consistent and unbiased estimation but has larger variance (recall that )

2.4 Simultaneity

Definition
In structural models, for example, supply and demand model, there may exist endogeneity as well.
There are two variables: quantity and price.
  • (D): ,
  • (S): ,
In market equilibrium, . Besides, we assume that prices and quantites are endogenous (by assumptions of ) and they are determined simultanously.
notion image
From the market equilibrium condition, we have
thus
and thus
We can explain the endogeneity from another perspective
notion image
can be regarded as inputs in a (market) system. can be regarded as output of a (market) system. In general, will be correlated with both and .
Furthermore, if we run the regression
using market data, then we get something that is a mix of supply and demand curves. tends to be between and .
notion image
has two effects on :
  • For producers, larger causes to increase
  • For consumers, larger causes to decrease
Example
Assume
Regress , prove that
According to market equilibrium, , that is . Therefore, .
since
therefore

Loading Comments...