๐Ÿ”—

T5. Panel Data Models

TOC

1. Pooling Cross Sections

1.1 Definition

Pooled cross-section means sampling randomly from a large population at different points in time. For instance, in each year, we can draw a random sample on hourly wages, education, experience, and so on, from the population of working people in the U.S. For a statistical standpoint, these data sets have an important featrue: they consist of independently sampled observations. Thus, among other things, it rules out correlation in the error terms across different observations.
An independently pooled cross section differs from a single random sample in that sampling from the population at different points in time likely leads to observations that are not identically distributed. We can use pooling cross sections over time to evaluate policy changes.
Reasons for using Pooling Cross Sections:
  • bigger sample size: relationship between YY and at least some of XX remains constant over time
  • investigate the effect of time
  • investigate whether relationships have changed over time

1.2 Chow Test

Motivation: test whether a multiple regression function differs across two (or multiple) different time periods.
Chow test
F_stat=(SSRโˆ’(SSR1+SSR2))/(k+1)(SSR1+SSR2)/(nโˆ’2(k+1))โˆผF(k+1,nโˆ’2(k+1))F\_stat=\frac{(SSR-(SSR_1+SSR_2))/(k+1)}{(SSR_1+SSR_2)/(n-2(k+1))}\sim F(k+1,n-2(k+1))(k+1)Fโ†’dฯ‡2(k+1)(k+1)F\to_d \chi^2(k+1)
Idea: allow the intercepts to change over time and then test whether the slope efficients have changed over time.
Procedure:
  1. estimate the restricted model by doing a pooled regression allowing for different time intercepts, which gives SSRrSSR_r
  1. run a regression for each of the T time periods and obtain the sum of squared residuals for each time period. Then SSRur=SSR1+SSR2+...+SSRTSSR_{ur}=SSR_1+SSR_2+...+SSR_T
If there are kk explanatory variables (not including the intercept or the time dummies) with TT time periods, then we are testing (Tโˆ’1)k(T-1)k restrictions, and there are T+TkT+Tk parameters estimated in the unrestricted model. Assume the total number of observations is n=n1+n2+...+nTn=n_1+n_2+...+n_T, then the df of FF test are (Tโˆ’1)k(T-1)k and nโˆ’Tโˆ’Tkn-T-Tk.
F_stat=(SSRrโˆ’SSRur)/SSRur(nโˆ’Tโˆ’Tk)/(Tโˆ’1)kF\_stat=\frac{(SSR_r-SSR_{ur})/SSR_{ur}}{(n-T-Tk)/(T-1)k}
This test is not robust to heteroskedasticity including changing variances across time. To obtain a heteroskedasticity-robust test, we must construct the interaction terms and do a pooled regression.

1.3 Policy Analysis (DID)

Policy analysis using pooling cross sections especially has applications in natural experiment (or quasi-experiment). A natural experiment occurs when some exogenous event (often a change in government policy) changes the environment in which individuals, families, firms operate.
A natural experiment always has a control group, which is not affected by the policy change, and a treatment group, which is thought to be affected by the policy change.
Unlike a true experiment, in which treatment and control groups are randomly and explicitly chosen, the control and treatment groups in natural experiments arise from the particular policy change.
To control for systematic differences between the control and treatment groups, we need two years of data, one before the policy change and one after the change. And thus, our sample is usefully broken down into FOUR groups:
  • control group (CC) before the change
  • control group after the change
  • treatment group (TT) before the change
  • treatment group after the change
Let dTdT equal unity for those in the treatment group TT, and zero otherwise. Let d2d2 denote a dummy variable for the second (post-policy change) time period, the equation of interest is
y=ฮฒ0+ฮด0d2+ฮฒ1dT+ฮด1d2โ‹…dT+other factors   (โˆ—)y=\beta_0+\delta_0d2+\beta_1d T+\delta_1d2ยทdT+other\ factors\ \ \ (*)
where yy is the outcome variable of interest. When there is no other factors in the regression, ฮด^1\hat{\delta}_1 will be the difference-in-differences estimator:
ฮด^1=(yห‰2,Tโˆ’yห‰2,C)โˆ’(yห‰1,Tโˆ’yห‰1,C)\hat{\delta}_1=(\bar{y}_{2,T}-\bar{y}_{2,C})-(\bar{y}_{1,T}-\bar{y}_{1,C})
notion image
When explanatory variables are added to equation (โˆ—)(*) to control for the fact that the populations sampled may differ systematically over the two periods, the OLS estimate of ฮด1\delta_1 no longer has the simple for of DID, but its interpretation is similar.

2. Panel Data Analysis

2.1 Panel Data

A panel dataset contains observations on multiple entities (individuals, families, cities, states, etc.), where each entity is observed at two or more points in time.
Notations: A double subscript distinguishes cross section and time periods
  • i=entity,n=number of entitiesi=\text{entity},n=\text{number of entities}, i=1,...,ni=1, ...,n
  • t=time period,T=number of time periodst=\text{time period},T=\text{number of time periods}, t=1,...,Tt=1,...,T
Suppoer we have 1 regressor, then the data are
(Xit,Yit),i=1,...,n,t=1,...,T(X_{it},Y_{it}),i=1,...,n,t=1,...,T
Panel data with kk regressors:
(X1,it,X2,it,...,Xk,it,Yit),i=1,...,n,t=1,...,T(X_{1,it},X_{2,it},...,X_{k,it},Y_{it}),i=1,...,n,t=1,...,T
  • n=number of entitiesn=\text{number of entities}
  • T=number of time periodsT=\text{number of time periods}
Some jargon:
  • Another term for panel data is longitudinal data
  • balanced panel: no missing observations, that is, all variables are observed for all entities and all time periods.
Panel Data are useful because with panel data, we can control for factors that:
  • vary across entities but do NOT vary over time
  • could cause OVB is they are omitted
  • are unobserved or unmeasured, and therefore cannot be included in the regression using multiple regression
Key idea: If an omitted variable does not change over time, then any changes in YY over time can NOT caused by the omitted variable.

2.2 Policy Analysis: Panel Data version of DID

Let yity_{it} denote an outcome variable and let progitprog_{it} be a program participation dummy variable. The simplest unobserved effects model is
yit=ฮฒ0+ฮด0d2t+ฮฒ1progit+ai+uity_{it}=\beta_0+\delta_0d2_t+\beta_1prog_{it}+a_i+u_{it}
If program participation only occurred in the second period, then the OLS estimator of ฮฒ1\beta_1 in the differenced equation has a very simple representation:
ฮฒ^1=ฮ”yห‰treatโˆ’ฮ”yห‰control\hat{\beta}_1=\bar{\Delta y}_{treat}-\bar{\Delta y}_{control}
We compute the average change in YY over the two time periods for the treatment and control groups. This is the panel data version of the DID estimator. If program participation takes place in both periods, ฮฒ^1\hat{\beta}_1 cannot be written as in DID form, but we interpret it in the same way: it is the change in the average value of yy due to program participation.
Controlling for time-varying factors does not change anything of significance. We simply difference those variables and include them along with ฮ”prog\Delta prog. This allows us to control for time-varying variables that might be correlated with program designation.

2.3 Pooled OLS

Example: Effect of unemployment on city crime rate. Assume that no other explanatory variables are available. If cities are observed for at least two periods and other factors affecting crime stay approximately constant over those periods, we can estimate the causal effect of unemployment on crime.
crmrteit=ฮฒ0+ฮด0d87it+ฮฒ1unemit+ai+uit,  t=1982,1987crmrte_{it}=\beta_0+\delta_0d87_{it}+\beta_1unem_{it}+a_i+u_it,\ \ t=1982,1987
where
  • d87itd87_{it}: the time dummy for the second period
  • aia_i: unobserved time-constant factors (=fixed effect)
  • uitu_{it}: other unobserved factors (=idiosyncratic error)
Pooled OLS
We can directly pool the two years and use OLS, the main drawback of it is heterogeneity bias which is caused by Cov(ai,xit)โ‰ 0Cov(a_i,x_{it})\ne0.
When use pooled OLS
yit=ฮฒ0+ฮด0d2t+ฮฒ1xit+vit,  t=1,2y_{it}=\beta_0+\delta_0d2_t+\beta_1x_{it}+v_{it},\ \ t=1,2
where vit=ai,uitv_{it}=a_i,u_{it}.
Even if we assume the idiosyncratic error uitu_{it} is uncorrelated with xitx_{it}, pooled OLS is biased and inconsistent if aia_i and xitx_{it} are correlated.
There are several methods to handle this issue, including First Difference Estimation (FD), Fixed Effect Estimation (FE) and Random Effect Estimation (RE).

3. First Difference Estimation (FD)

3.1 FD Estimator

Two-period Example
crmrtei1987=ฮฒ0+ฮด0โ‹…1+ฮฒ1unemi1987+ai+ui1987crmrtei1982=ฮฒ0+ฮด0โ‹…0+ฮฒ1unemi1982+ai+ui1982\begin{aligned} crmrte_{i1987}&=\beta_0+\delta_0ยท1+\beta_1unem_{i1987}+a_i+u_{i1987}\\ crmrte_{i1982}&=\beta_0+\delta_0ยท0+\beta_1unem_{i1982}+a_i+u_{i1982} \end{aligned}
Substract we have
ฮ”crmrtei=ฮด0+ฮฒฮ”unemi+ฮ”ui\Delta crmrte_i=\delta_0+\beta \Delta unem_i+\Delta u_i
where the fixed effect drops out.
Key assumption needed: ฮ”ui\Delta u_i is uncorrelated with ฮ”xi\Delta x_i, this assumption hold if the idiosyncratic error at each time tt, uitu_{it}, is uncorrelated with the explanatory variable in both time periods (strict exogeneity).
notion image
Note that:
  • The first-differenced panel estimator is a way to estimate consistently causal effects in the presence of time-invariant endogeneity
  • Strict exogeneity has to hold in the original equation
  • First-differenced estimates will be imprecise if explanatory variables vary only little over time (unless we have a rather large sample size)
Gerneral Situation
Suppose we have NN individuals and T=3T=3 time periods for each individual. A general fixed effects model is
yit=ฮด1+ฮด2d2t+ฮด3d3t+ฮฒ1xit1+...+ฮฒkxitk+ai+uit  (3.1โˆ—)y_{it}=\delta_1+\delta_2d2_t+\delta_3d3_t+\beta_1 x_{it1}+...+\beta_kx_{itk}+a_i+u_{it}\ \ (3.1*)
for t=1,2,3t=1,2,3
The key assumption is that the idiosyncratic erros are uncorrelated with the explanatory variable in each time period: Cov(xitj,uis)=0,for all t,s,jCov(x_{itj},u_{is})=0,\text{for all }t,s,j .
Eliminate aia_i by differencing adjacent periods, in the T=3 case, it yields
ฮ”yit=ฮดฮ”d2t+ฮด3ฮ”d3t+ฮฒ1ฮ”xit1+...+ฮฒkฮ”xitk+ฮ”uit\Delta y_{it}=\delta \Delta d2_t+\delta_3 \Delta d3_t+\beta_1\Delta x_{it1}+...+\beta_k\Delta x_{itk}+\Delta u_{it}
for t=2,3t=2,3. Note that there is no differenced equation for t=1t=1. Also notice that there is two dummy variables but no intercept which will make it inconvenient for certain purposes, including the computation of R2R^2. Unless the time intercepts in (3.1โˆ—)(3.1*) are of direct interest, itโ€™s better to estimate the first-differenced equation with an intercept and a single time-period dummy. Thus the equation becomes
ฮ”yit=ฮฑ0+ฮฑ3d3t+ฮฒ1ฮ”xit1+...+ฮฒkฮ”xitk+ฮ”uit,for t=2,3\Delta y_{it}=\alpha_0+\alpha_3 d3_t+\beta_1\Delta x_{it1}+...+\beta_k\Delta x_{itk}+\Delta u_{it},for\ t=2,3
For more than three periods and when TT is small relative to NN, the FD equation is
ฮ”yit=ฮฑ0+ฮฑ3d3t+ฮฑ4d4t+...+ฮฑTdTt+ฮฒฮ”xit1+...+ฮฒkฮ”xitk+ฮ”uit,  t=2,3,...,T\Delta y_{it}=\alpha_0+\alpha_3 d3_t+\alpha_4d4_t+...+\alpha_T dT_t+\beta \Delta x_{it1}+...+\beta_k\Delta x_{itk}+\Delta u_{it},\ \ t=2,3,...,T
where we have Tโˆ’1T-1 time periods on each unit ii for the FD equation. The total number of observations is N(Tโˆ’1)N(T-1). Then we can use pooled OLS.
Potential Pitfalls in FD
  • when strictly exogenous is not satisfied
  • when the key explanatory variables do not vary much over time
  • when one or more of the explanatory variables is subject to measurement error

3.2 Assumptions

Assumptions for Pooled OLS Using First Differences
FD.1:
For each ii, the model is
yit=ฮฒ1xit1+...+ฮฒkxitk+ai+uit,t=1,...,Ty_{it}=\beta_1 x_{it1}+...+\beta_kx_{itk}+a_i+u_{it},t=1,...,T
where ฮฒj\beta_j are the parameters to estimate and aia_i is the unobserved effect.
FD.2:
We have a random sample from the cross section
FD.3:
Each explanatory variable changes over time (for at least some ii), and no perfect linear relationships exist among the explanatory variables
FD.4:
let Xi\bold{X}_i denotes the explanatory variables for all time periods for cross-sectional observation ii; thus Xi\bold{X}_i contains xitj,t=1,...,T,j=1,...,kx_{itj},t=1,...,T,j=1,...,k
For each tt, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: E(uitโˆฃXi,ai)=0E(u_{it}|\bold{X}_i,a_i)=0
When this assumption holds, we sometimes say that xitjx_{itj} are strictly exogenous conditional on the unobserved effect. That is, once we control for aia_i, there is no correlation between the xisjx_{isj} and the remaining idiosyncratic error, uitu_{it}, for all ss and tt.
Under FD.1~FD.4, the FD estimators are unbiased and consistent (with a fixed TT and as Nโ†’โˆžN\to \infin)
FD.5:
The variance of the differenced errors, conditional on all explanatory variables, is constant Var(ฮ”uitโˆฃXi)=ฯƒ2,t=2,...,TVar(\Delta u_{it}|\bold{X}_{i})=\sigma^2,t=2,...,T. (homoskedastic)
FD.6:
For all tโ‰ st\ne s, the differences in the idiosyncratic errors are uncorrelated (conditional on all explanatory variables): Cov(ฮ”uit,ฮ”uisโˆฃXi)=0,tโ‰ sCov(\Delta u_{it},\Delta u_{is}|\bold{X}_i)=0,t\ne s
(differenced errors are serially uncorrelated, which means that the uitu_{it} follow a random walk across time)
Under FD.1~FD.6, the FD estimator of the ฮฒj\beta_j is the best linear unbiased estimator (conditional on the explanatory variables).
FD.7
Conditional on Xi\bold{X}_i, the ฮ”uit\Delta u_{it} are independent and identically distributed normal random variables.
When we add FD.7, the FD estimators are normally distributed, and the tt and FF statistics from pooled OLS on the differences have exact tt and FF distributions. Without this assumption, we can rely on the usual asymptotic approximations.

4. Fixed Effects Estimation (FE)

4.1 FE Estimator

First differencing is just one of the many ways to eliminate the fixed effect aia_i. An alternative method, which works better under certain assumptions, is called the fixed effects transformation.
Consider a model with a single explanatory variable: for each ii
yit=ฮฒ1xit+ai+uit,t=1,2,...,Ty_{it}=\beta_1x_{it}+a_i+u_{it},t=1,2,...,T
Now, for each ii, average this equation over time, we get
yห‰i=ฮฒ1xห‰i+ai+uห‰i\bar{y}_i=\beta_1\bar{x}_i+a_i+\bar{u}_i
where yห‰i=Tโˆ’1โˆ‘t=1Tyit\bar{y}_i=T^{-1}\sum_{t=1}^T y_{it}, and so on. By subtracting, we can get
yitโˆ’yห‰i=ฮฒ1(xitโˆ’xห‰i)+uitโˆ’uห‰i,  t=1,2,...,Ty_{it}-\bar{y}_i=\beta_1(x_{it}-\bar{x}_i)+u_{it}-\bar{u}_i,\ \ t=1,2,...,T
or
yยจit=ฮฒ1xยจit+uยจit,  t=1,2,...,T\ddot{y}_{it}=\beta_1\ddot{x}_{it}+\ddot{u}_{it},\ \ t=1,2,...,T
where yยจit=yitโˆ’yห‰i\ddot{y}_{it}=y_{it}-\bar{y}_i is the time-demeaned data on yy, and similarly for xยจit\ddot{x}_{it} and uยจit\ddot{u}_{it}. The fixed effects transformation is also called the within transformation. In above equation, the unobserved (fixed) effect aia_i has disappeared. Thus we can use the pooled OLS.
When there are more explanatory variables
yit=ฮฒ1xit1+ฮฒ2xit2+...+ฮฒkxitk+ai+uit,  t=1,2,...,Tyยจit=ฮฒ1xยจit1+ฮฒ2xยจit2+...+ฮฒkxยจitk+uยจit,  t=1,2,...,T   (4.1โˆ—)\begin{aligned} y_{it} &=\beta_1 x_{it1}+\beta_2 x_{it2}+...+\beta_kx_{itk}+a_i+u_{it},\ \ t=1,2,...,T\\ \ddot{y}_{it} &=\beta_1\ddot{x}_{it1}+\beta_2\ddot{x}_{it2}+...+\beta_k\ddot{x}_{itk}+\ddot{u}_{it},\ \ t=1,2,...,T\ \ \ (4.1*) \end{aligned}
which we estimate by pooled OLS.
Note that constant explanatory variables are swept away by the the fixed effects transformation xยจit=0\ddot{x}_{it}=0 for all i,ti,t if xitx_{it} is constant across tt. Therefore, we cannot include variables such as gender or a cityโ€™s distance from a river.
Degree of Freedom
When we estimate (4.1โˆ—)(4.1*) by pooled OLS, we have NTNT total observations and kk independent variables. (Note there is no intercept in the equation, which is eliminated by the fixed effects transformation). But the degree of freedom is not NTโˆ’kNT-k, because for each cross-sectional observation ii, we lose one dfdf because of the time-demeaning. In other words, for each ii, the demeaned errors uยจit\ddot{u}_{it} add up to zero when summed across tt, so we lose one degree of freedom. Therefore, the appropriate degrees of freedom is df=NTโˆ’Nโˆ’k=N(Tโˆ’1)โˆ’kdf=NT-N-k=N(T-1)-k.
Time-Constant Variables
Although time-constant variables cannot be included by themselves in a fixed effects model, they can be interacted with variables that change over time and, in particular, with year dummy variables.
For example, we can interact education with each year dummy to see how the return to education has changed over time. But we cannot use fixed effects to estimate the return to education in the base period, which means we cannot estimate the return to education in any period. We can only see how the return to education in each year differs from that in the base period.
Example (Wooldridge Example 14.2)
notion image
notion image

4.2 Assumptions

Assumptions for Fixed Effects
FE.1:
For each ii, the model is
yit=ฮฒ1xit1+...+ฮฒkxitk+ai+uit,  t=1,...,Ty_{it}=\beta_1x_{it1}+...+\beta_kx_{itk}+a_i+u_{it},\ \ t=1,...,T
where the ฮฒj\beta_j are the parameters to estimate and aia_i is the unobserved effect.
FE.2
We have a random sample from the cross section.
FE.3
Each explanatory variable changes over times (for at least some ii), and no perfect linear relationships exist among the explanatory variables.
FE.4
For each tt, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: E(uitโˆฃXi,ai)=0E(u_{it}|\bold{X}_i,a_i)=0, where Xi=(Xi1,...,XiT)\bold{X}_i=(\bm{X}_{i1},...,\bm{X}_{iT}).
This means there are no omitted lagged effects (any lagged effects of X\bold{X} must enter explicitly) and there is no feedback from uu to future X\bm{X}.
Under FE.1~FE.4 (which are identical to the assumptions for the FD estimator), the FE estimator is unbiased. Again, the key assumption is the strict exogeneity.
Under FE.1~FE.4, the FE estimator is consistent with a fixed TT as Nโ†’โˆžN\to \infin.
FE.5
Var(uitโˆฃXi,ai)=Var(uit)=ฯƒu2Var(u_{it}|\bold{X}_{i},a_i)=Var(u_{it})=\sigma_u^2, for all t=1,...,Tt=1,...,T
FE.6
For all tโ‰ st\ne s, the idiosyncratic errors are uncorrelated (conditional on all explanatory variables and aia_i): Cov(uit,uisโˆฃXi,ai)=0Cov(u_{it},u_{is}|\bold{X}_{i},a_i)=0.
Under FE.1~FE.6, the FE estimator of the ฮฒj\beta_j is the Best Linear Unbiased Estimator.
The assumption that makes FE better than FD is FE.6, which implies that the idiosyncratic errors are serially uncorrelated.
FE.7
Conditional on Xi\bold{X}_i and aia_i, the uitu_{it} are independent and identically distributed as N(0,ฯƒu2)N(0, \sigma_u^2).
Assumption FE.7 implies FE.5, FE.5 and FE.6. If we add FE.7, the FE estimator is normally distributed, and t and F statistics have exact t and F distributions. Without FE.7, we can rely on asymptotic approximations, which require large NN and small TT without making special assumptions.

4.3 FE v.s. FD

When T=2T=2
the FE and FD estimates, as well as all test statistics, are identical. Note the equivalence between FE and FD estimated requires that we estimate the same model in each case. Note that itโ€™s natural to include an intercept in the FD equation, which is actually the intercept for the second time period in the original model written for the two time periods. Therefore, FE estimation must include a dummy variable for the second time period in order to be identical to the FD estimates that include an intercept.
FD has the advantage of being straightforward to implement in any econometrics or statistical package that supports basic data manipulation, and itโ€™s easy to compute heteroskedasticity-robust statistics after FD estimation (because when T=2T=2, FD estimation is just a cross-sectional regression).
When Tโ‰ฅ3T\ge 3
Under FE.1~FE.4 (also FD.1~FD.4)
  • both are unbiased
  • both are consistent with TT fixed as Nโ†’โˆžN\to \infin
If uitu_{it} are serially uncorrelated, FE is more efficient than FD, and the standard errors reported from FE are valid. Since the unobserved effects model is typically stated with serially uncorrelated idiosyncratic errors, the FE estimator is used more than the FD estimator.
If uitu_{it} are severely correlated, say, follows a random walk, which means there is very substantial, positive serial correlation, then the difference ฮ”uit\Delta u_{it} is serially uncorrelated, and FD is better. If T is very large (and N not so large), the panel has a pronounced time series character and problems such as strong dependence arise. In these cases, itโ€™s probably better to use FD.
In many cases, the uitu_{it} exhibit some positive serial correlation, but perhaps not as much as a random walk. Then we cannot easily compare the efficiency of the FE and FD estimators.
Generally, itโ€™s difficult to choose between FE and FD when they give substantively different results. It makes sense to report both sets of results and to try to determine why they differ.

4.4 Time Fixed Effects

Formulations of Regression
Above discussion are about individual FE. An omitted variable might vary over time but not across states: e.g. changes in national laws, natioanl macro shocks.
There are two formulations of regression with time fixed effects:
  1. โ€œT-1 binary regressorโ€ formulation:
    1. Yit=ฮฒ0+ฮฒ1Xit+ฮด2B2t+...+ฮดTBTt+uitY_{it}=\beta_0+\beta_1 X_{it}+\delta_2B2_t+...+\delta_TBT_t+u_{it}
      where B2t={1when t=2 (year#2)0otherwise,etc.B2_t= \begin{cases} 1&when\ t=2\ (year \#2)\\ 0&otherwise \end{cases}, etc.
  1. โ€œTime effectsโ€ formulation:
    1. Yit=ฮฒ1Xit+ฮปt+uitY_{it}=\beta_1X_{it}+\lambda_t+u_{it}
Esimation with both Entity and Time FE
Yit=ฮฒ1Xit+ฮฑi+ฮปt+uitY_{it}=\beta_1 X_{it}+\alpha_i+\lambda_t+u_{it}
When T=2T=2, computing FD and including an intercept is equivalent to including entity and time FEs.
When T>2T>2, there are various equivalent ways to incorporate both entity and time FEs.

5. Random Effects Estimation (RE)

5.1 RE Estimator

The unobserved effects model is as before
yit=ฮฒ0+ฮฒ1xit1+...+ฮฒkxitk+ai+uit   (5.1โˆ—)y_{it}=\beta_0+\beta_1x_{it1}+...+\beta_kx_{itk}+a_i+u_{it}\ \ \ (5.1*)
where we explicitly include an intercept so that we can make the assumption that the unobserved effect, aia_i, has zero mean (without loss of generality).
In FD and FE estimations, the goal is to eliminate aia_i because it is thought to be correlated with one or more of the xitjx_{itj}. But suppose we think aia_i is uncorrelated with each explanatory variable in all time periods. Then, using a transformation to eliminate aia_i results in inefficient estimators.
Equation (5.1โˆ—)(5.1*) becomes a random effects model when we assume that the unobserved effect aia_i is uncorrelated with each explanatory variable:
Cov(xitj,ai)=0,  t=1,2,...,T; j=1,2,...,kCov(x_{itj},a_i)=0,\ \ t=1,2,...,T;\ j=1,2,...,k
We define the composite error term as vit=ai+uitv_{it}=a_i+u_{it}, then (5.1โˆ—)(5.1*) can be written as
yit=ฮฒ0+ฮฒ1xit1+...+ฮฒkxitk+vit  (5.1โˆ—โˆ—)y_{it}=\beta_0+\beta_1x_{it1}+...+\beta_kx_{itk}+v_{it}\ \ (5.1**)
Because aia_i is in the composite error in each time period, the vitv_{it} are serially correlated across time. In fact under the RE assumptions
Corr(vit,vis)=ฯƒa2/(ฯƒa2+ฯƒu2), tโ‰ sCorr(v_{it},v_{is})=\sigma_a^2/(\sigma_a^2+\sigma_u^2),\ t\ne s
where ฯƒa2=Var(ai)\sigma_a^2=Var(a_i) and ฯƒu2=Var(uit)\sigma_u^2=Var(u_{it}). This (necessarily) positive serial correlation in the error term can be substantial, and, because the usual pooled OLS standard errors ignore this correlation, they will be incorrect, as will the usual test statistics.
We can use GLS to sovel the serial correlation problem here. For the procedure to have good properties, we should have large NN and relatively small TT. Define
ฮธ=1โˆ’[ฯƒu2/(ฯƒu2+Tฯƒa2)]1/2\theta=1-\big[\sigma_u^2/(\sigma_u^2+T\sigma_a^2)\big]^{1/2}
which is between zero and one. Then, the transformed equation turns out to be
yitโˆ’ฮธyห‰i=ฮฒ0(1โˆ’ฮธ)+ฮฒ1(xit1โˆ’ฮธxห‰i1)+...+ฮฒk(xitkโˆ’ฮธxห‰ik)+(vitโˆ’ฮธvห‰i)   (5.1โˆ—โˆ—โˆ—)y_{it}-\theta\bar{y}_i=\beta_0(1-\theta)+\beta_1(x_{it1}-\theta\bar{x}_{i1})+...+\beta_k(x_{itk}-\theta \bar{x}_{ik})+(v_{it}-\theta\bar{v}_i)\ \ \ (5.1***)
Recall that FE transformation subtracts the time averages from the corresponding variable. The RE transformation, however, subtracts a fraction of that time average, where the fraction depends on ฯƒu2\sigma_u^2, ฯƒa2\sigma_a^2 and the number of time periods, TT. The GLS estimator is simply the pooled OLS estimator of equation (5.1โˆ—โˆ—โˆ—)(5.1***).
The parameter ฮธ\theta is never known in practice, but it can always be estimated.
ฮธ^=1โˆ’(11+Tโ‹…(ฯƒ^a2/ฯƒ^u2))1/2\hat{\theta}=1-\Big(\frac{1}{1+Tยท(\hat{\sigma}_a^2/\hat{\sigma}_u^2)}\Big)^{1/2}
where ฯƒ^a2\hat{\sigma}_a^2 and ฯƒ^u2\hat{\sigma}_u^2 are estimated by
ฯƒ^a2=1[NT(Tโˆ’1)/2โˆ’(k+1)]โˆ‘i=1Nโˆ‘t=1Tโˆ’1โˆ‘s=t+1Tv^itv^isฯƒ^u2=ฯƒ^v2โˆ’ฯƒ^a2\hat{\sigma}_a^2=\frac{1}{[NT(T-1)/2-(k+1)]}\sum_{i=1}^N\sum_{t=1}^{T-1}\sum_{s=t+1}^T\hat{v}_{it}\hat{v}_{is}\\ \hat{\sigma}_u^2=\hat{\sigma}_v^2-\hat{\sigma}_a^2
where v^it\hat{v}_{it} are the residuals from estimating (5.1โˆ—โˆ—)(5.1**) by pooled OLS. And ฯƒ^v2\hat{\sigma}_v^2 is the square of the usual standard error of the regression from pooled OLS.

5.2 Assumptions

RE.3:
There are no perfect linear relationships among the explanatory variables. The cost of allowing time-constant regressors is that we must add assumptions about how the unobserved effect, aia_i, is related to the explanatory variables.
RE.4:
For each tt, the expected value of the idiosyncratic error given the explanatory variables in all time periods and the unobserved effect is zero: E(uitโˆฃXi,ai)=0E(u_{it}|\bold{X}_i,a_i)=0 (FE.4).
In addition, the expected value of aia_i given all explanatory variables is constant: E(aiโˆฃXi)=ฮฒ0E(a_i|\bold{X}_i)=\beta_0.
This is the assumption that rules out correlation between the unobserved effect and the explanatory variables, and it is the key distinction between fixed effects and random effects.
RE.5:
Var(uitโˆฃXi,ai)=Var(uit)=ฯƒu2Var(u_{it}|\bold{X}_{i},a_i)=Var(u_{it})=\sigma_u^2, for all t=1,...,Tt=1,...,T (FE.5)
In addition, the variance of aia_i given all explanatory variables is constant Var(aiโˆฃXi)=ฯƒa2Var(a_i|\bold{X}_i)=\sigma_a^2
Under FE.1, FE.2, RE.3, RE.4, RE.5 and FE.6, the RE estimator is consistent and asymptotically normally distributed as NN gets large for fixed TT.

5.3 RE v.s. FE

In economics, unobserved individual effects are seldomly uncorrelated with explanatory variables so that fixed effects is more convincing.
When explanatory variable is time-invariant, use RE.
When our interests are time-variant regressors, we can use Hausmanโ€™s Test to decide use FE or RE.
Hausmanโ€™s Test
Assuming RE assumptions (H0H_0). Apply both RE and FE, and then formally test for statistically significant differences in the coefficients on the time-varying regressors.
If we reject H0H_0, use FE instead.
Often, discussions of the Hausman test comparing FE and RE assume only time-varying regressors are included in the RE estimation.

5.4 Correlated Random Effects Approach (CRE)

In applications where it makes sense to view the aia_i (unobserved effects) as being random variables, along with the observed variables we draw. There is an alternative to fixed effects that still allows aia_i to be correlated with the observed explanatory variables
Yit=ฮฒ0+ฮฒ1X1,it+...+ฮฒkXk,it+ai+uit,  i=1,...,n; t=1,2,..,TY_{it}=\beta_0+\beta_1 X_{1,it}+...+\beta_k X_{k,it}+a_i+u_{it},\ \ i=1,...,n;\ t=1,2,..,T
where
ai=a+ฮณ1Xห‰1,i+...+ฮณkXห‰k,i+ria_i=a+\gamma_1\bar{X}_{1,i}+...+\gamma_k\bar{X}_{k,i}+r_i
The individual-specific effect aia_i is split up into a part that is related to the time-averages of the explanatory variables and a part rir_i that is uncorrelated with the explanatory variables. Therefore,
Yit=ฮฒ0+a+ฮฒ1X1,it+...+ฮฒkXk,it+ฮณ1Xห‰1,i+...+ฮณkXห‰k,i+ri+uitY_{it}=\beta_0+a+\beta_1 X_{1,it}+...+\beta_k X_{k,it}+\gamma_1\bar{X}_{1,i}+...+\gamma_k\bar{X}_{k,i}+r_i+u_{it}
The resulting model is an ordinary random effects model with uncorrelated random effect rir_i but with the time averages as additional regressors. It turns out that in this model, the resulting estimates for the explanatory variables are identical to those of the FE estimator, that is,
ฮฒ^CRE=ฮฒ^FE\hat{\beta}_{CRE}=\hat{\beta}_{FE}

6. Applications to Other Data Structures

The various panel data methods can be applied to certain data structures that do not involve time. For example, it is common in demography to use siblings (sometimes twins) to account for unobserved family and background characteristics.
For example
logโก(wagei1)=ฮฒ0+ฮฒ1educi1+...+ai+ui1logโก(wagei2)=ฮฒ0+ฮฒ1educi2+...+ai+ui2โ€…โ€ŠโŸนโ€…โ€Šฮ”logโก(wagei)=ฮฒ1ฮ”educi+...+ฮ”ui\begin{aligned} \log(wage_{i1}) &=\beta_0+\beta_1educ_{i1}+...+a_i+u_{i1}\\ \log(wage_{i2}) &=\beta_0+\beta_1educ_{i2}+...+a_i+u_{i2}\\ \implies \Delta \log(wage_i)&=\beta_1\Delta educ_i+...+\Delta u_i \end{aligned}
where aia_i is the unobserved genetic and family characteristics that do not vary across twins. With two equations for twins, we can estimate differenced equation by OLS.

7. Clustered Standard Errors

Autocorrelation
Suppose a variable ZZ is observed at different dates tt, so observations are on Zt,t=1,...,TZ_t,t=1,...,T. Then ZtZ_t is said to be autocorrelated or serially correlated if corr(Zt,Zt+j)โ‰ 0corr(Z_t,Z_{t+j})\ne0 for some dates jโ‰ 0j\ne0.
  • Autocorrelation (serial correlation) means correlation with itself.
  • Cov(Zt,Zt+j)Cov(Z_t,Z_{t+j}) is called the j-th autocovariance of ZtZ_t
  • In many panel data applications, uitu_{it} is plausibly autocorrelated
Indenpendence and Autocorrelation in Panel Data In a Picture
Indenpendence and Autocorrelation in Panel Data In a Picture
If units are sampled by simple random sampling, then (ui1,...,uiT)(u_{i1},...,u_{iT}) is independent of (uj1,...,ujT)(u_{j1},...,u_{jT}) for different units iโ‰ ji\ne j. But if the omitted factors comprising uitu_{it} are serially correlated, then uitu_{it} is serially correlated.
Clustered Standard Errors
The OLS fixed effect estimator ฮฒ^1\hat{\beta}_1 is unbiased, consistent, and asymptotically normally distributed. However, the usual OLS standard errors (both homo-only and hetero-robust will general be wrong because they assume that uitu_{it} is serially uncorrelated). This problem is solved by using โ€œClusteredโ€ Standard Errors.
Cluster se estimate the variance of ฮฒ^1\hat{\beta}_1 when the variables XX and uu are i.i.d. across units but are potentially autocorrelated within a unit.
Assume Yit=ฮผ+uit, i=1,...,n; t=1,..,TY_{it}=\mu+u_{it},\ i=1,...,n;\ t=1,..,T. The estimator of ฮผ\mu is
Yห‰=1nTโˆ‘i=1nโˆ‘t=1TYit=1nโˆ‘i=1n(1Tโˆ‘t=1TYit)=1nโˆ‘i=1nYห‰i\bar{Y}=\frac{1}{nT}\sum_{i=1}^n\sum_{t=1}^T Y_{it}=\frac{1}{n}\sum_{i=1}^n\Big(\frac{1}{T}\sum_{t=1}^T Y_{it}\Big)=\frac{1}{n}\sum_{i=1}^n\bar{Y}_i
where Yห‰i=1Tโˆ‘t=1TYit\bar{Y}_i=\frac{1}{T}\sum_{t=1}^TY_{it} is the sample mean for unit ii.
Because observations are i.i.d. across entites, (Yห‰1,...,Yห‰n)(\bar{Y}_1,...,\bar{Y}_n) are i.i.d. Thus, if nn is large, the CLT applies and
Yห‰=1nโˆ‘i=1nYห‰iโ†’dN(0,ฯƒYiห‰2/n)\bar{Y}=\frac{1}{n}\sum_{i=1}^n\bar{Y}_i\to_d N(0,\sigma^2_{\bar{Y_i}}/n)
where ฯƒYห‰i2=Var(Yห‰i)\sigma^2_{\bar{Y}_i}=Var(\bar{Y}_i). And the SE of Yห‰\bar{Y} (sYห‰i2s_{\bar{Y}_i}^2)is the square root of an estimator of ฯƒYห‰i2n\frac{\sigma^2_{\bar{Y}_i}}{n}.
This delivers the clustered standard error formula for Yห‰\bar{Y} computed using panel data:
Clustered SE of Yห‰=sYห‰i2nClustered\ SE\ of\ \bar{Y}=\sqrt{\frac{s_{\bar{Y}_i^2}}{n}}
where sYห‰i2=1nโˆ’1โˆ‘i=1n(Yห‰iโˆ’Yห‰)2s_{\bar{Y}_i}^2=\frac{1}{n-1}\sum_{i=1}^n(\bar{Y}_i-\bar{Y})^2
Note that in the cluster SE derivation, we never assumed that observations are i.i.d. within unit. Thus we have implicitly allowed for serial correlation within an unit. Serial correlation in YitY_{it} actually influences ฯƒYห‰i2\sigma_{\bar{Y}_i}^2
ฯƒYห‰i2=Var(Yห‰i)=Var(1Tโˆ‘t=1TYit)=1T2Var(Yi1+Yi2+...+YiT)=1T2{Var(Yi1)+...+Var(YiT)+2Cov(Yi1,Yi2)+...+2Cov(YiTโˆ’1,YiT)}\begin{aligned} \sigma_{\bar{Y}_i}^2 &=Var(\bar{Y}_i)\\ &=Var(\frac{1}{T}\sum_{t=1}^TY_{it})\\ &=\frac{1}{T^2}Var(Y_{i1}+Y_{i2}+...+Y_{iT})\\ &=\frac{1}{T^2}\Big\{ Var(Y_{i1})+...+Var(Y_{iT})+2Cov(Y_{i1},Y_{i2})+...+2Cov(Y_{iT-1},Y_{iT}) \Big\} \end{aligned}
If YitY_{it} is serially correlated, the autocovariances โ‰ 0\ne0. The usual formula which sets them to 0 will be wrong.
For the clustered se,
sYห‰i2=1nโˆ’1โˆ‘i=1n(Yห‰iโˆ’Yห‰)2=1nโˆ’1โˆ‘i=1n(1Tโˆ‘t=1TYitโˆ’Yห‰)2=1nโˆ’1โˆ‘i=1n(1Tโˆ‘t=1T(Yitโˆ’Yห‰))2=1nโˆ’1โˆ‘i=1n(1Tโˆ‘t=1T(Yitโˆ’Yห‰))(1Tโˆ‘t=1T(Yitโˆ’Yห‰))=1nโˆ’1โˆ‘i=1n1T2โˆ‘t=1Tโˆ‘s=1T(Yisโˆ’Yห‰)(Yitโˆ’Yห‰)=1T2โˆ‘t=1Tโˆ‘s=1T[1nโˆ’1โˆ‘i=1n(Yisโˆ’Yห‰)(Yitโˆ’Yห‰)]\begin{aligned} s_{\bar{Y}_i}^2 &=\frac{1}{n-1}\sum_{i=1}^n(\bar{Y}_i-\bar{Y})^2\\ &=\frac{1}{n-1}\sum_{i=1}^n\big(\frac{1}{T}\sum_{t=1}^T Y_{it}-\bar{Y}\big)^2\\ &=\frac{1}{n-1}\sum_{i=1}^n\big(\frac{1}{T}\sum_{t=1}^T (Y_{it}-\bar{Y})\big)^2\\ &=\frac{1}{n-1}\sum_{i=1}^n\big(\frac{1}{T}\sum_{t=1}^T (Y_{it}-\bar{Y})\big)\big(\frac{1}{T}\sum_{t=1}^T (Y_{it}-\bar{Y})\big)\\ &=\frac{1}{n-1}\sum_{i=1}^n\frac{1}{T^2}\sum_{t=1}^T\sum_{s=1}^T(Y_{is}-\bar{Y})(Y_{it}-\bar{Y})\\ &=\frac{1}{T^2}\sum_{t=1}^T\sum_{s=1}^T\Big[\frac{1}{n-1}\sum_{i=1}^n(Y_{is}-\bar{Y})(Y_{it}-\bar{Y})\Big] \end{aligned}
the final term 1nโˆ’1โˆ‘i=1n(Yisโˆ’Yห‰)(Yitโˆ’Yห‰)\frac{1}{n-1}\sum_{i=1}^n(Y_{is}-\bar{Y})(Y_{it}-\bar{Y}) estimates the autocovariance between YisY_{is} and YitY_{it}. Thus the clustered SE formula implicitly is estimating all the autocovariances, then using them to estimate ฯƒYห‰i2\sigma_{\bar{Y}_i}^2. In contrast, the usual SE formula zeros out the autocovariances by omitting all thecross terms, which is only valid if those autocovariances are truly all zero.

8. Panel Data Models with Endogeneity

We consider panel data models where some of the regressors are endogenous and study the FE, 2SLS estimation of the model.
The 2SLS estimation based on the first-differenced (FD) model, the between (BE) equation, or the random effect (RE) specification can be similarly analyzed.
Consider the following panel data model with endogeneity:
Yit=ฮฒ1โ€ฒX1it+ฮฒ2โ€ฒX2it+ฮฑi+uitXit=ฮ โ€ฒZit+ฮทi+vit\begin{aligned} Y_{it}&=\bm{\beta}_1'\bm{X}_{1it}+\bm{\beta}_2'\bm{X}_{2it}+\alpha_i+u_{it}\\ \bm{X}_{it}&=\bm{\Pi}' \bm{Z}_{it}+\eta_i+v_{it} \end{aligned}
where
  • X1it\bm{X}_{1it} is a k1ร—1k_1\times 1 vector of included exogenous variables
  • X2it\bm{X}_{2it} is a k2ร—1k_2\times 1 vector of endogenous variables
  • Zit=(X1,itโ€ฒ,Z2,itโ€ฒ)โ€ฒ\bm{Z}_{it}=(\bm{X}'_{1,it},\bm{Z}'_{2,it})', Z2,it\bm{Z}_{2,it} is a l2ร—1l_2\times 1 vector of excluded instrumental variables
  • ฮฑi,ฮทi\alpha_i,\eta_i are the individual-specific fixed effects
  • uit,vitu_{it},v_{it} are idiosyncratic error terms
Let ฮฒ=(ฮฒ1โ€ฒ,ฮฒ2โ€ฒ)โ€ฒ,Xit=(X1itโ€ฒ,X2itโ€ฒ)โ€ฒ\bm{\beta}=(\bm{\beta}'_1,\bm{\beta}_2')',\bm{X}_{it}=(\bm{X}_{1it}',\bm{X}_{2it}')', and k=k1+k2,l=k1+l2k=k_1+k_2,l=k_1+l_2. For identification purpose, we assume l2โ‰ฅk2l_2\ge k_2.
To derive the FE 2SLS estimator of ฮฒ\bm{\beta}, we need to remove both ฮฑi\alpha_i and ฮทi\eta_i first.
Yitโˆ’Yห‰i=(X1itโˆ’Xห‰1i)โ€ฒฮฒ1+(X2itโˆ’Xห‰2i)โ€ฒฮฒ2+(uitโˆ’uห‰i)Xitโˆ’Xห‰i=ฮ โ€ฒ(Zitโˆ’Ziห‰)+(vitโˆ’vห‰i)\begin{aligned} Y_{it}-\bar{Y}_i &=(\bm{X}_{1it}-\bar{\bm{X}}_{1i})'\bm{\beta}_1+(\bm{X}_{2it}-\bar{\bm{X}}_{2i})'\bm{\beta}_2+(u_{it}-\bar{u}_i)\\ \bm{X}_{it}-\bar{\bm{X}}_i &=\bold{\Pi}'(\bm{Z}_{it}-\bar{\bm{Z}_i})+(\bm{v}_{it}-\bar{\bm{v}}_i) \end{aligned}
where, e.g. Yห‰i=1Tโˆ‘t=1TYit\bar{Y}_i=\frac{1}{T}\sum_{t=1}^T Y_{it}
Let Yยจit=Yitโˆ’Yห‰i\ddot{Y}_{it}=Y_{it}-\bar{Y}_i and define Xยจ1it,Xยจ2it,Zยจ2it,uยจit,vยจit\ddot{\bm{X}}_{1it},\ddot{\bm{X}}_{2it},\ddot{\bm{Z}}_{2it},\ddot{u}_{it},\ddot{\bm{v}}_{it} analogously, then
Yยจit=Xยจ1itโ€ฒฮฒ1+Xยจ2itโ€ฒฮฒ2+uยจitXยจit=ฮ โ€ฒZยจit+vยจit\begin{aligned} \ddot{Y}_{it} &=\ddot{\bm{X}}_{1it}'\bm{\beta}_1+\ddot{\bm{X}}_{2it}'\bm{\beta}_2+\ddot{u}_{it}\\ \ddot{\bm{X}}_{it} &=\bold{\Pi}'\ddot{\bm{Z}}_{it}+\ddot{\bm{v}}_{it} \end{aligned}
It does not matter whether or not we time demean the instruments. But using Zยจit\ddot{\bm{Z}}_{it} emphasizes that FE, 2SLS works only when the instruments vary over time.
To obtain consistent FEIV, we need
E[uitโˆฃZi,ฮฑi]=0, t=1,..,TE[u_{it}|Z_i,\alpha_i]=0,\ t=1,..,T