TOC
A Single Dummy Independent Variable1. Analyze in One Regression Framework2. Add Control VariablesUsing Dummy Variables for Multiple CategoriesInteractions Involving Dummy Variables1. Regression with Interactive Term2. Testing for Differences across Groups3. Chow StatisticProgram Evaluation with Dummy Variable
Variables with quantitative meaning: hourly wage rate, years of education, college GPA.
Variables with qualitative meaning: gender, race, province.
This chapter ๐๐ป regression analysis with qualitative independent variable.
A Single Dummy Independent Variable
1. Analyze in One Regression Framework
When we want to analyze the average wage difference between men and women, we can collect a sample and calculate menโs womenโs average wage respectively and then compare the difference.
Actually, we study this in a regression framework. Define binary variable or a zero-one variable:
Theoretically, we could use any two different values instead of zero and one. However, zero-one leads to natural interpretations of the regression paeameters.
Define
Then we have
where and . So to test wheter , we can test whether .
Model:
thus and .
2. Add Control Variables
- We are interested in whether there is a wage difference across genders after controlling for education, labor market experience and year working with the current employer.
- : holding fixed other variables, the difference in wage between women and men.
When using Dummy Variable, be cautious about perfect collinearity:
- In the above case, we include female, and set male as the base group (dummy variable is 0), or baseline group.
- We cannot set the model as , because of the perfect collinearity.
- While itโs okay to se the model as , which does not have an intercept.
However, models without intercept have some annoying shortcomings.
- More difficult to test the difference between two groups: . Because we need to calculate .
- The conventional formula may be negative, which makes it hard to interpret.
- Desc:
- To address the issue, some researchers use the uncentered when there is no intercept in the model:
Consider two models:
The OLS estiamtor ( FOC of is ). And since , can be viewed as the sum of residual squares of model (1). Given that in (2), which means that we are comparing the sum of residual squares in the two models. In general, and can be any variables, none of them is a special case of the other . So there is no guarantee that .
where
Using Dummy Variables for Multiple Categories
Suppose we now have four types: single men, single women, married men, married women. If we want to study the wage differences among these types, we can define three dummy variables.
Set single men as the base (baseline) group, then
Note that we can also only use two dummy variable in this case and include a interact of the two dummy variable in the model.
Generally, if we have categories, then we could include dummy variables. The coefficient of the group dummy is the difference in between and the baseline group.
Example
We want to estimate the effect of city credit ratings on the municipal bond interest . has five categories: 0 - 4.
One intuitive thought is to regress the model
may not be the real case because the change in may not be the same when increases by on category.
A better approach is to define four dummy variables.
- Let when , otherwise it equals to zero.
- Estimate the model
- Then is the difference in (holding other factors fixed) between a municipality with a credit rating of and a municipality with a credit rating of zero (baseline group).
Interactions Involving Dummy Variables
we assume that the return on education is the same amont women and men since .
We now want to valid it.
1. Regression with Interactive Term
Method 1: Run Separate Regression
Assume that
Intuitively, but it difficult to test whether
Method 2: Run Regression with Interactive Term
Consider the model:
then
Therefore,
- : the difference in intercepts between men and women.
- : the difference in slopes (i.e. return on education) between women and men.
- To test whether return on education is the same amont men and women, we test whether .
- To test whether the model is the same for men and women, we test .
Example
2. Testing for Differences across Groups
To test whether all the coefficients are the same for men and women, we can include interactive terms for all variables:
Test .
We can use F-test
Or using the Chow Statistic
3. Chow Statistic
In the general model with explanatory variables and an intercept, suppose we have tow groups, respectively.
We would like to test whether the intercept and all slopes are the same across the two groups. Model:
It can be proved that the sum of squared residuals from the unrestricted model can be obtained form two seperate regressions, one for each group, that is,
The F-stat:
- : SSR from pooling the groups and estimating a single equation.
- Note the conditions to satisfy when using chow stat:
- the model satisfies homoskedasticity
- we want to test no differences at all between the groups.
Program Evaluation with Dummy Variable
We would like to know whether there are effects of economic or social programs. So we can use two groups of subjects โ control group which does not participate in the program and experimental group (treatment group) which does take part in the program.
Things to consider: Self-selection problem:
If yes, we need to think what coule possibly be different between two groups and then control for these factors.
Loading Comments...