๐Ÿ“Ž

Ch9: Proxy Variable and Measurement Error

TOC
Zero conditional mean condition:
  • is endogenous if it is correlated with .
  • is exogenous if it is not correlated with .
๐Ÿšจ
Violating the zero conditional mean condition will cause the OLS estimator to be biased and inconsistent.

Proxy Variable

1. Necessity of Proxy Variable

Example
Consider the model: .
Assume that . Our primary interest is to eatimate consisitently, and do not really care about . However, if we do not have data on , and thus we regress on only. If and , there will be an omitted variable bias.
๐Ÿ”ฅ
We can use Proxy variable to address omitted variable bias.

2. Definition and Assumptions

๐Ÿ“Œ
Proxy variable is the variable related to the unobserved variable that we would like to control for in our analysis.
For the above example, we can use as a proxy variable. We donโ€™t require it to be the same thing as , we only require to be correlated with .
Formally, we have a model
Assume that . Besides, is observed while is not. We have a proxy variable for , which satisfies that:
where is the error to allow the possibility that and is not exactly related. Additionally, .
Then we can replace the omitted variable by the proxy variable: (1)
To get an unbiased and consistent estimator for , we require:
Break this down into two assumptions:
  1. :
      • the proxy variable should be exogenous. Intuitively, since is exogenous, the proxy variable which is related is only good if it is also exogenous.
  1. :
      • which is equivalent as . Once is controlled for, the expected value of does not depend on . (That is all the exogenous part of to is contained in )
      • Otherwise, will be biased and inconsistent.
        • Desc: Suppose is related to both and that is , where . Plug in the regression equation we get
          We can show that, if
As for the above example, we estimate the model using the proxy variable
the two assumptions are thus:
  1. . That is, the average level of ability only changes with , not with education.
Equation (1)
  • is the new intercept.
  • is the slope parameter on the proxy variable .
  • Under the two assumptions, we will not get unbiased estimator for and . The important thing is that we get unbiased estimator for
Example
๐Ÿ‘‰๐Ÿป
One Trick applying Proxy variable: using lagged dependent variables as proxy variables
notion image
notion image

Measurement Error

1. Definition

Sometimes, we cannot collect data on the variable that truly affects economic behavior. The data we collectโ€™s difference from the variables that influence decisions by individuals, firms etc is called Measurement error.

2. Measurement Error in the Dependent Variable

Let denotes the variable that we would like to explain and denotes the observed measure of . Thus the measurement error is defined as
Consider the model:
which statisfies the Gauss-Markov assumptions.
Plug in and rearrange, we get
๐Ÿ”ฅ
regress on : 1. OLS estimatorsโ€™ slope coefficients are still unbiased and consistent if we require (assume) to be uncorrelated with the independent variables. 2. If and are uncorrelated (which is usually assumed), then . That is, the measurement error in dependent variable will result in a larger error variance. 3. However, if the measurement error is a random reporting error in that is independent of , then OLS is still perfectly appropriate.

3. Measurement Error in the Independent Variable

Let denotes the variable that we would like to explain and denotes the observed measure of . Thus the measurement error is defined as
Consider the simple regression model:
Assume it satisfies the Gauss-Markov assumptions and .
Plug in , we get
To derive the properties of the OLS estiamtors, we need assumptions:
  1. Assume that:
    1. It implies that , that is , does not affect after has been controlled for.
  1. Consider two mutually exclusive cases about how the measurement error is correlated with
Case I: :
It means that is uncorrelated with the observed measure . Note that if , then .
Plug in :
Then , therefore, the OLS estimator is unbiased and consistant.
Whatโ€™s more, if is uncorrelated with , then:
๐Ÿ‘Œ๐Ÿป
Under this case, OLS has all its nice properties.
Case II:
This is the classical errors-in-variables (CEV) assumption which implies that is uncorrelated with the unobserved variables.
Plug in
Then
The final equation is due to and are unrelated.
Since , the OLS regression of on gives a biased and inconsistent estimator of .
The probability limit of :
๐Ÿšง
- . This is called the attenuation bias in OLS due to CEV. - If the variance of is large relative to the variance in the measurement error, then the inconsistency in OLS will be small.
Consider multiple linear regression case:
is unobserved, we only observe .
  • If is correlated with , then all OLS estimators will be biased and inconsistent.
  • It can be shown that
    • where is the population error in the equation
  • The formula for the coefficient of variables with no measurement error is more complicated. In the special case that is uncorrelated with and , and are consistent.
Case III: and
There are not nice and simple formulas for this case. All we know is that OLS estimators are likely biased and inconsistent.

Loading Comments...