๐Ÿ“Ž

Ch17: Limited Dependent Variable Models

TOC

Linear Probability Model

1. Limited Depend Variable

the range of values of the dependent variable is restricted, such as binary outcome including:
  • whether an individual is married
  • whether a firm exit the market, etc.
Recap: Bernoulli Distribution
random variable Y only takes on value 0 and 1, then y follows a Bernoulli distribution. Assume , thus
notion image

2. Linear Probability Model

takes on two values: 0 and 1. Suppose and has the linear relation:
Suppose . Then
  • represents when increases by one unit, the impact on the probability that .
  • measures the marginal effect of on the probability that
Note that whether y is binary or continuous does not affect how we interpret the model
Descriptive interpretation
is the expected difference in the probability that if changes by one unit
Causal interpretation
one unit increase in causes the probability of to change by on average
Heteroskedasticity
the model will violate homoskedasticity, because
is a function of
We can use FGLS to estimate
  1. can be estimated using
  1. We can also estimate use OLS and report robust standard errors
Example: Labor Participation Rate
How education affect womenโ€™s labor force participation: inlf is a binary variable indicating participating in the labor force
notion image
How to understand the coefficient of educ ?
one unit increase in education year causes the probability of to change by on average
Note that could be larger than 1 or smaller than 0 (Linear modelsโ€™ fault). However, that is not a problem if our interest is in how educ affect y.
notion image

Non-linear Model

In some applications, we want to make sure the fitted value is between 0 and 1.
non-linear model:
where is a function mapping values to the range of 0 and 1, to make sure belongs to 0 and 1.
Two common forms of
  • logistic function (logit)
  • standard normal CDF (probit)

1. Logit Model

this is the CDF of the standard logistic random variable
notion image
we can observe that ,

2. Probit Model

is the CDF of the standard normal random variable. Also, ,
notion image

3. Properties of Logit and Probit

logit and probit can be derived from an underlying latent variable (which cannot be observed)
suppose random variable has a CDF
Here can be either logit or probit.
Let , where is independent of .
can be transformed from
Therefore,

4. Partial Effect

The marginal effect of on the probability that
where
when we have more than one independent variables:
the ratio of the partial effect of and is
One cost of using logit/probit instead of OLS is that the partial effects are harder to summarize because the scale factor, depends on
One possibility: plug in interested values for the , such as means, and then see how changes
  • Partial effect at the average: (independentsโ€™ average)
  • Calculate the average marginal effect: (vectorsโ€™ average)
โ‡’ logit / probit allows the marginal effect to be different for different
โ‡’ However, the calculated marginal effect relies on the assumption that follows a specific distribution
notion image

5. Linear Probability Model or Logit/Probit

IF our primary interest is to predict whether or estimate , then logit and probit are preferred. However, in many other applications, our interest is to determine the causal impact of on (causal interpretation) or we are interested in how the conditional mean of changes with (descriptive interpretation). In these cases, OLS is easier to interpret and calculate (linear model) and it also relies on less assumptions. And actually, the partial effects calculated using linear models is pretty close to that calculated using logit or probit.
๐Ÿ’ก
If your interest is to estimate the partial effect of on , OLS is enough If you want to be cautious, put logit and probit estimations in robustness checks

Maximum Likelihood Estimation

1. Steps of MLE

Suppose we have a random sample of size . Fix every and , the probability that is :
Then for any observation or , its probability density function is (Bernoulli Distribution)
For a random sample, all observations are independent of each other. Then the probability that we observe the sample is: ( is the index for each observation)
MLE: maximize the probability that we observe the data:
Take the natural logarithm and define:
Then we can equivalently write:
๐Ÿ‘‰๐Ÿป
General steps of MLE: 1. Write the likelihood function 2. For a random sample, we can write the likelihood function for one observation, and then multiply to get the sample likelihood funtion since we assume that the observations are iid. 3. Take the natural logarithm and turn it to an optimization problem
It can be proved that MLE is consistent and asymptotically efficient.
Stata command
  • When is the standard logistic CDF, then is the logit estimator โ‡’ logit y x
  • When is the standard normal CDF, then is the probit estimator โ‡’ probit y x
To calculate the partial effects, use command of margis, dydx

2. MLE and OLS

OLS is used to estimate linear models while MLE can be used to estimate both linear and non-linear models.
For linear model , when follows a normal distribution, the OLS estimator is the same as the MLE estimator
Proof:
suppose , thus
conditional on and (fix them),
use the normal distribution formula:
the objective function of MLE:
maximizing it is equivalent as minimizing , which is the objective function of the OLS estimator
ย 

Loading Comments...