๐Ÿ”—

T5. Panel Data Models

TOC

1. Pooling Cross Sections

1.1 Definition

Pooled cross-section means sampling randomly from a large population at different points in time. For instance, in each year, we can draw a random sample on hourly wages, education, experience, and so on, from the population of working people in the U.S. For a statistical standpoint, these data sets have an important featrue: they consist of independently sampled observations. Thus, among other things, it rules out correlation in the error terms across different observations.
An independently pooled cross section differs from a single random sample in that sampling from the population at different points in time likely leads to observations that are not identically distributed. We can use pooling cross sections over time to evaluate policy changes.
Reasons for using Pooling Cross Sections:
  • bigger sample size: relationship between and at least some of remains constant over time
  • investigate the effect of time
  • investigate whether relationships have changed over time

1.2 Chow Test

Motivation: test whether a multiple regression function differs across two (or multiple) different time periods.
Chow test
Idea: allow the intercepts to change over time and then test whether the slope efficients have changed over time.
Procedure:
  1. estimate the restricted model by doing a pooled regression allowing for different time intercepts, which gives
  1. run a regression for each of the T time periods and obtain the sum of squared residuals for each time period. Then
If there are explanatory variables (not including the intercept or the time dummies) with time periods, then we are testing restrictions, and there are parameters estimated in the unrestricted model. Assume the total number of observations is , then the df of test are and .
This test is not robust to heteroskedasticity including changing variances across time. To obtain a heteroskedasticity-robust test, we must construct the interaction terms and do a pooled regression.

1.3 Policy Analysis (DID)

Policy analysis using pooling cross sections especially has applications in natural experiment (or quasi-experiment). A natural experiment occurs when some exogenous event (often a change in government policy) changes the environment in which individuals, families, firms operate.
A natural experiment always has a control group, which is not affected by the policy change, and a treatment group, which is thought to be affected by the policy change.
Unlike a true experiment, in which treatment and control groups are randomly and explicitly chosen, the control and treatment groups in natural experiments arise from the particular policy change.
To control for systematic differences between the control and treatment groups, we need two years of data, one before the policy change and one after the change. And thus, our sample is usefully broken down into FOUR groups:
  • control group () before the change
  • control group after the change
  • treatment group () before the change
  • treatment group after the change
Let equal unity for those in the treatment group , and zero otherwise. Let denote a dummy variable for the second (post-policy change) time period, the equation of interest is
where is the outcome variable of interest. When there is no other factors in the regression, will be the difference-in-differences estimator:
notion image
When explanatory variables are added to equation to control for the fact that the populations sampled may differ systematically over the two periods, the OLS estimate of no longer has the simple for of DID, but its interpretation is similar.

2. Panel Data Analysis

2.1 Panel Data

A panel dataset contains observations on multiple entities (individuals, families, cities, states, etc.), where each entity is observed at two or more points in time.
Notations: A double subscript distinguishes cross section and time periods
  • ,
  • ,
Suppoer we have 1 regressor, then the data are
Panel data with regressors:
Some jargon:
  • Another term for panel data is longitudinal data
  • balanced panel: no missing observations, that is, all variables are observed for all entities and all time periods.
Panel Data are useful because with panel data, we can control for factors that:
  • vary across entities but do NOT vary over time
  • could cause OVB is they are omitted
  • are unobserved or unmeasured, and therefore cannot be included in the regression using multiple regression
Key idea: If an omitted variable does not change over time, then any changes in over time can NOT caused by the omitted variable.

2.2 Policy Analysis: Panel Data version of DID

Let denote an outcome variable and let be a program participation dummy variable. The simplest unobserved effects model is
If program participation only occurred in the second period, then the OLS estimator of in the differenced equation has a very simple representation:
We compute the average change in over the two time periods for the treatment and control groups. This is the panel data version of the DID estimator. If program participation takes place in both periods, cannot be written as in DID form, but we interpret it in the same way: it is the change in the average value of due to program participation.
Controlling for time-varying factors does not change anything of significance. We simply difference those variables and include them along with . This allows us to control for time-varying variables that might be correlated with program designation.

2.3 Pooled OLS

Example: Effect of unemployment on city crime rate. Assume that no other explanatory variables are available. If cities are observed for at least two periods and other factors affecting crime stay approximately constant over those periods, we can estimate the causal effect of unemployment on crime.
where
  • : the time dummy for the second period
  • : unobserved time-constant factors (=fixed effect)
  • : other unobserved factors (=idiosyncratic error)
Pooled OLS
We can directly pool the two years and use OLS, the main drawback of it is heterogeneity bias which is caused by .
When use pooled OLS
where .
Even if we assume the idiosyncratic error is uncorrelated with , pooled OLS is biased and inconsistent if and are correlated.
There are several methods to handle this issue, including First Difference Estimation (FD), Fixed Effect Estimation (FE) and Random Effect Estimation (RE).

3. First Difference Estimation (FD)

3.1 FD Estimator

Two-period Example
Substract we have
where the fixed effect drops out.
Key assumption needed: is uncorrelated with , this assumption hold if the idiosyncratic error at each time , , is uncorrelated with the explanatory variable in both time periods (strict exogeneity).
notion image
Note that:
  • The first-differenced panel estimator is a way to estimate consistently causal effects in the presence of time-invariant endogeneity
  • Strict exogeneity has to hold in the original equation
  • First-differenced estimates will be imprecise if explanatory variables vary only little over time (unless we have a rather large sample size)
Gerneral Situation
Suppose we have individuals and time periods for each individual. A general fixed effects model is
for
The key assumption is that the idiosyncratic erros are uncorrelated with the explanatory variable in each time period: .
Eliminate by differencing adjacent periods, in the T=3 case, it yields
for . Note that there is no differenced equation for . Also notice that there is two dummy variables but no intercept which will make it inconvenient for certain purposes, including the computation of . Unless the time intercepts in are of direct interest, itโ€™s better to estimate the first-differenced equation with an intercept and a single time-period dummy. Thus the equation becomes
For more than three periods and when is small relative to , the FD equation is
where we have time periods on each unit for the FD equation. The total number of observations is . Then we can use pooled OLS.
Potential Pitfalls in FD
  • when strictly exogenous is not satisfied
  • when the key explanatory variables do not vary much over time
  • when one or more of the explanatory variables is subject to measurement error

3.2 Assumptions

Assumptions for Pooled OLS Using First Differences
FD.1:
For each , the model is
where are the parameters to estimate and is the unobserved effect.
FD.2:
We have a random sample from the cross section
FD.3:
Each explanatory variable changes over time (for at least some ), and no perfect linear relationships exist among the explanatory variables
FD.4:
let denotes the explanatory variables for all time periods for cross-sectional observation ; thus contains
For each , the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero:
When this assumption holds, we sometimes say that are strictly exogenous conditional on the unobserved effect. That is, once we control for , there is no correlation between the and the remaining idiosyncratic error, , for all and .
Under FD.1~FD.4, the FD estimators are unbiased and consistent (with a fixed and as )
FD.5:
The variance of the differenced errors, conditional on all explanatory variables, is constant . (homoskedastic)
FD.6:
For all , the differences in the idiosyncratic errors are uncorrelated (conditional on all explanatory variables):
(differenced errors are serially uncorrelated, which means that the follow a random walk across time)
Under FD.1~FD.6, the FD estimator of the is the best linear unbiased estimator (conditional on the explanatory variables).
FD.7
Conditional on , the are independent and identically distributed normal random variables.
When we add FD.7, the FD estimators are normally distributed, and the and statistics from pooled OLS on the differences have exact and distributions. Without this assumption, we can rely on the usual asymptotic approximations.

4. Fixed Effects Estimation (FE)

4.1 FE Estimator

First differencing is just one of the many ways to eliminate the fixed effect . An alternative method, which works better under certain assumptions, is called the fixed effects transformation.
Consider a model with a single explanatory variable: for each
Now, for each , average this equation over time, we get
where , and so on. By subtracting, we can get
or
where is the time-demeaned data on , and similarly for and . The fixed effects transformation is also called the within transformation. In above equation, the unobserved (fixed) effect has disappeared. Thus we can use the pooled OLS.
When there are more explanatory variables
which we estimate by pooled OLS.
Note that constant explanatory variables are swept away by the the fixed effects transformation for all if is constant across . Therefore, we cannot include variables such as gender or a cityโ€™s distance from a river.
Degree of Freedom
When we estimate by pooled OLS, we have total observations and independent variables. (Note there is no intercept in the equation, which is eliminated by the fixed effects transformation). But the degree of freedom is not , because for each cross-sectional observation , we lose one because of the time-demeaning. In other words, for each , the demeaned errors add up to zero when summed across , so we lose one degree of freedom. Therefore, the appropriate degrees of freedom is .
Time-Constant Variables
Although time-constant variables cannot be included by themselves in a fixed effects model, they can be interacted with variables that change over time and, in particular, with year dummy variables.
For example, we can interact education with each year dummy to see how the return to education has changed over time. But we cannot use fixed effects to estimate the return to education in the base period, which means we cannot estimate the return to education in any period. We can only see how the return to education in each year differs from that in the base period.
Example (Wooldridge Example 14.2)
notion image
notion image

4.2 Assumptions

Assumptions for Fixed Effects
FE.1:
For each , the model is
where the are the parameters to estimate and is the unobserved effect.
FE.2
We have a random sample from the cross section.
FE.3
Each explanatory variable changes over times (for at least some ), and no perfect linear relationships exist among the explanatory variables.
FE.4
For each , the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: , where .
This means there are no omitted lagged effects (any lagged effects of must enter explicitly) and there is no feedback from to future .
Under FE.1~FE.4 (which are identical to the assumptions for the FD estimator), the FE estimator is unbiased. Again, the key assumption is the strict exogeneity.
Under FE.1~FE.4, the FE estimator is consistent with a fixed as .
FE.5
, for all
FE.6
For all , the idiosyncratic errors are uncorrelated (conditional on all explanatory variables and ): .
Under FE.1~FE.6, the FE estimator of the is the Best Linear Unbiased Estimator.
The assumption that makes FE better than FD is FE.6, which implies that the idiosyncratic errors are serially uncorrelated.
FE.7
Conditional on and , the are independent and identically distributed as .
Assumption FE.7 implies FE.5, FE.5 and FE.6. If we add FE.7, the FE estimator is normally distributed, and t and F statistics have exact t and F distributions. Without FE.7, we can rely on asymptotic approximations, which require large and small without making special assumptions.

4.3 FE v.s. FD

When
the FE and FD estimates, as well as all test statistics, are identical. Note the equivalence between FE and FD estimated requires that we estimate the same model in each case. Note that itโ€™s natural to include an intercept in the FD equation, which is actually the intercept for the second time period in the original model written for the two time periods. Therefore, FE estimation must include a dummy variable for the second time period in order to be identical to the FD estimates that include an intercept.
FD has the advantage of being straightforward to implement in any econometrics or statistical package that supports basic data manipulation, and itโ€™s easy to compute heteroskedasticity-robust statistics after FD estimation (because when , FD estimation is just a cross-sectional regression).
When
Under FE.1~FE.4 (also FD.1~FD.4)
  • both are unbiased
  • both are consistent with fixed as
If are serially uncorrelated, FE is more efficient than FD, and the standard errors reported from FE are valid. Since the unobserved effects model is typically stated with serially uncorrelated idiosyncratic errors, the FE estimator is used more than the FD estimator.
If are severely correlated, say, follows a random walk, which means there is very substantial, positive serial correlation, then the difference is serially uncorrelated, and FD is better. If T is very large (and N not so large), the panel has a pronounced time series character and problems such as strong dependence arise. In these cases, itโ€™s probably better to use FD.
In many cases, the exhibit some positive serial correlation, but perhaps not as much as a random walk. Then we cannot easily compare the efficiency of the FE and FD estimators.
Generally, itโ€™s difficult to choose between FE and FD when they give substantively different results. It makes sense to report both sets of results and to try to determine why they differ.

4.4 Time Fixed Effects

Formulations of Regression
Above discussion are about individual FE. An omitted variable might vary over time but not across states: e.g. changes in national laws, natioanl macro shocks.
There are two formulations of regression with time fixed effects:
  1. โ€œT-1 binary regressorโ€ formulation:
    1. where
  1. โ€œTime effectsโ€ formulation:
Esimation with both Entity and Time FE
When , computing FD and including an intercept is equivalent to including entity and time FEs.
When , there are various equivalent ways to incorporate both entity and time FEs.

5. Random Effects Estimation (RE)

5.1 RE Estimator

The unobserved effects model is as before
where we explicitly include an intercept so that we can make the assumption that the unobserved effect, , has zero mean (without loss of generality).
In FD and FE estimations, the goal is to eliminate because it is thought to be correlated with one or more of the . But suppose we think is uncorrelated with each explanatory variable in all time periods. Then, using a transformation to eliminate results in inefficient estimators.
Equation becomes a random effects model when we assume that the unobserved effect is uncorrelated with each explanatory variable:
We define the composite error term as , then can be written as
Because is in the composite error in each time period, the are serially correlated across time. In fact under the RE assumptions
where and . This (necessarily) positive serial correlation in the error term can be substantial, and, because the usual pooled OLS standard errors ignore this correlation, they will be incorrect, as will the usual test statistics.
We can use GLS to sovel the serial correlation problem here. For the procedure to have good properties, we should have large and relatively small . Define
which is between zero and one. Then, the transformed equation turns out to be
Recall that FE transformation subtracts the time averages from the corresponding variable. The RE transformation, however, subtracts a fraction of that time average, where the fraction depends on , and the number of time periods, . The GLS estimator is simply the pooled OLS estimator of equation .
The parameter is never known in practice, but it can always be estimated.
where and are estimated by
where are the residuals from estimating by pooled OLS. And is the square of the usual standard error of the regression from pooled OLS.

5.2 Assumptions

RE.3:
There are no perfect linear relationships among the explanatory variables. The cost of allowing time-constant regressors is that we must add assumptions about how the unobserved effect, , is related to the explanatory variables.
RE.4:
For each , the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: (FE.4).
In addition, the expected value of given all explanatory variables is constant: .
This is the assumption that rules out correlation between the unobserved effect and the explanatory variables, and it is the key distinction between fixed effects and random effects.
RE.5:
, for all (FE.5)
In addition, the variance of given all explanatory variables is constant
Under FE.1, FE.2, RE.3, RE.4, RE.5 and FE.6, the RE estimator is consistent and asymptotically normally distributed as gets large for fixed .

5.3 RE v.s. FE

In economics, unobserved individual effects are seldomly uncorrelated with explanatory variables so that fixed effects is more convincing.
When explanatory variable is time-invariant, use RE.
When our interests are time-variant regressors, we can use Hausmanโ€™s Test to decide use FE or RE.
Hausmanโ€™s Test
Assuming RE assumptions (). Apply both RE and FE, and then formally test for statistically significant differences in the coefficients on the time-varying regressors.
If we reject , use FE instead.
Often, discussions of the Hausman test comparing FE and RE assume only time-varying regressors are included in the RE estimation.

5.4 Correlated Random Effects Approach (CRE)

In applications where it makes sense to view the (unobserved effects) as being random variables, along with the observed variables we draw. There is an alternative to fixed effects that still allows to be correlated with the observed explanatory variables
where
The individual-specific effect is split up into a part that is related to the time-averages of the explanatory variables and a part that is uncorrelated with the explanatory variables. Therefore,
The resulting model is an ordinary random effects model with uncorrelated random effect but with the time averages as additional regressors. It turns out that in this model, the resulting estimates for the explanatory variables are identical to those of the FE estimator, that is,

6. Applications to Other Data Structures

The various panel data methods can be applied to certain data structures that do not involve time. For example, it is common in demography to use siblings (sometimes twins) to account for unobserved family and background characteristics.
For example
where is the unobserved genetic and family characteristics that do not vary across twins. With two equations for twins, we can estimate differenced equation by OLS.

7. Clustered Standard Errors

Autocorrelation
Suppose a variable is observed at different dates , so observations are on . Then is said to be autocorrelated or serially correlated if for some dates .
  • Autocorrelation (serial correlation) means correlation with itself.
  • is called the j-th autocovariance of
  • In many panel data applications, is plausibly autocorrelated
Indenpendence and Autocorrelation in Panel Data In a Picture
Indenpendence and Autocorrelation in Panel Data In a Picture
If units are sampled by simple random sampling, then is independent of for different units . But if the omitted factors comprising are serially correlated, then is serially correlated.
Clustered Standard Errors
The OLS fixed effect estimator is unbiased, consistent, and asymptotically normally distributed. However, the usual OLS standard errors (both homo-only and hetero-robust will general be wrong because they assume that is serially uncorrelated). This problem is solved by using โ€œClusteredโ€ Standard Errors.
Cluster se estimate the variance of when the variables and are i.i.d. across units but are potentially autocorrelated within a unit.
Assume . The estimator of is
where is the sample mean for unit .
Because observations are i.i.d. across entites, are i.i.d. Thus, if is large, the CLT applies and
where . And the SE of ()is the square root of an estimator of .
This delivers the clustered standard error formula for computed using panel data:
where
Note that in the cluster SE derivation, we never assumed that observations are i.i.d. within unit. Thus we have implicitly allowed for serial correlation within an unit. Serial correlation in actually influences
If is serially correlated, the autocovariances . The usual formula which sets them to 0 will be wrong.
For the clustered se,
the final term estimates the autocovariance between and . Thus the clustered SE formula implicitly is estimating all the autocovariances, then using them to estimate . In contrast, the usual SE formula zeros out the autocovariances by omitting all thecross terms, which is only valid if those autocovariances are truly all zero.

8. Panel Data Models with Endogeneity

We consider panel data models where some of the regressors are endogenous and study the FE, 2SLS estimation of the model.
The 2SLS estimation based on the first-differenced (FD) model, the between (BE) equation, or the random effect (RE) specification can be similarly analyzed.
Consider the following panel data model with endogeneity:
where
  • is a vector of included exogenous variables
  • is a vector of endogenous variables
  • , is a vector of excluded instrumental variables
  • are the individual-specific fixed effects
  • are idiosyncratic error terms
Let , and . For identification purpose, we assume .
To derive the FE 2SLS estimator of , we need to remove both and first.
where, e.g.
Let and define analogously, then
It does not matter whether or not we time demean the instruments. But using emphasizes that FE, 2SLS works only when the instruments vary over time.
To obtain consistent FEIV, we need
ย 

Loading Comments...