🔗

T4. Binary Dependent Variable

TOC

1. Binary Dependent Variables

A binary (dummy) dependent variable is a dependent variable that takes only two values: 0 or 1. By the definition of conditional expectation:
So when we estimate , we are estimating the probability that given . Suppose , then
thus, for binary dependent variables, the key question is how to model .

2. Linear Probability Model (LPM)

Single Regressor
The linear probability model (LPM) assumes that
This is the familiar simple linear regression model. The model for the data observed is
where . It is easy to verify that
Thus, the error term is conditionally heteroscedastic by definition.
Example
We are interested in whether race is a factor in denying a mortgage application. We have data of the mortgage application in the Boston area. An important determining variable is Payment-income Ratio (P/I).
The Linear Probability Model:
The population regression is
Graphical Representation
Graphical Representation
The estimation results are shown as below:
notion image
Interpretation: If increases by 0.1, the probability of denial increases by 0.604*0.1=0.0604, that is about 6.0 percentage points.
If we are interested in the effect of race on the probability of denial, holding constant the ratio, we can add the Race variable
notion image
Interpretation: African-American applicants have a 17.7% higher probability of having a mortgage application denied than a white, holding constant.
LPM in the General Case
The coefficient can be interpreted as the change in probability for a unit change of some regressor, holding other regressors fixed. Inference can be done based on White Standard Errors.
Summary of LPM
  • Key feature: model as a linear function of
  • Advantages:
    • simple to estimate and to interpret
    • the inference is the same as for linear multiple regression models but we need to use heteroskedasticity-robust standard errors
  • Disadvantages:
    • Predicted probabilities can be <0 or > 1
    • It makes no sense that the probability should be linear in

3. Probit and Logit Models

3.1 Models

We need a “translator” that takes a value from to and returns a value from 0 to 1 such that:
  • The closer to the value from linear regression model is, the closer the predicted probability is.
  • The closer to the value from linear regression model is, the closer the predicted probability is.
  • No predicted probabilities are less than 0 or greater than 1.
In common practice, econometricians use TWO such “translators”:
  • Probit (standard normal CDF)
  • Logit (standard logistic CDF)
The differences between the two “translators” are small. In particular, there is no practical difference between the two “translators” if we only care predicted probabilities in the middle range of the data.
The nonlinear functions  are translators
The nonlinear functions are translators
Both the Probit and Logit models have the same basic structure. Define a “Z-index” as . Use a non-linear S-shaped CDF-type function to transform into a predicted value between 0 and 1. The model is
  • The probit model uses the standard normal CDF:
    • where is the standard normal CDF
  • The logit model uses the logistic CDF:
    • where

3.2 NLS & MLE

Nonlinear least squares (NLS)
Model is
estimators are given by
The NLS estimators are consistent and asymptotically normally distributed. But they are inefficient.
Maximum Likelihood Estimation (MLE)
The probability that conditional on is . The conditional probability distribution for the i-th observation is
Assume that are i.i.d., , the joint probability distribution of conditional on the is
The likelihood function is the above joint probability distribution treated as a function of the unkown coefficients . The ML estimators are
The ML estimators are consistent and asymptotically normally distributed. They are also efficient and commonly used in practice.

3.3 Comparison

Both the probit and logit are nonlinear “translators”. There is no real reason to prefer on over the other.
Predicted probabilities from estimated probit and logit models usually are very close in above mortgage example
Predicted probabilities from estimated probit and logit models usually are very close in above mortgage example
Traditionally we saw more of the logit, mainly because the logistic function leads to a more easily computed model. Nowadays, probit is easy to compute with standard packages and thus becomes more popular.

3.4 Interpretation, Estimation, Inference

Interpretation
Clearly, coefficient estimates across the three models are not directly comparable. It’s the probability of being denied that is of interest. We can compare sign and significance (based on a standard z test) of coefficients.
In general we care about the effect of on , that is, we care about :
  • For the linear case, this is easily computed as the coefficient of
  • For the nonlinear probit and logit models,
    • , where
    • The adjustment factor, depends on
Estimation and Inference
For probit and logit models, the difficulty is that partial effects are not constant but depend on . Thus PEA and APE are introduced:
  • PEA: Partial Effects at the Average.
    • The partial effect of explanatory variable is considered for an “average” individual. This is problematic in the case of explanatory variables such as gender.
    • For discrete explanatory variables, say, for a change in from to
  • APE: Average Partial Effects
    • The partial effect of explanatory variable is computed for each individual in the sample and then averaged across all sample members. This method makes more sense.
    • For discrete explanatory variables, say, for a change in from to

3.5 Goodness-of-fit measures

Percent correctly predicted
Individual ’s outcome is predicted as one if the probability for this event is larger than 0.5, then percentage of correctly predicted and is counted. There are thus four possible outcomes on each pair : . Then,
  • percent correctly predicted for :
  • percent correctly predicted for :
Percent correctly predicted is the weighted average of the above two. The weights are the fraction of zeros and ones in the sample.
Example of Percent Correctly Predicted
Example of Percent Correctly Predicted
Pseudo R-squared
Compare maximized log-likelihood of the model with that of a model that only contains a constant (and no explanatory variables)
  • log-likelihoods are negative, so , and
  • If no are significant, should be close to
  • if ,
  • cannot reach zero in a probit/logit model. Requires the estimated probabilities when all to be unity and the estimated probabilities when all to be zero.
Correlation based measures
Define , calculate . In any case, goodness-of-fit is usually less important than trying to obtain convincing estimates of the ceteris paribus effects of the explanatory variables.

3.6 Hypothesis Test after MLE

  • The usual z-tests and confidence intervals can be used
  • Likelihood ratio test (restricted and unrestricted models needed)
where is the log-likelihood value for the unrestricted (restricted) model. Based on the same concept as the F test in a linear model . Basic idea: and under , is close to zero.

3.7 Latent Variable Model

Probit and logit models can be derived from an underlying latent variable model.
is unobserved, or latent, variable which rarely has a well-defined unit of measurement. For example, might be the difference in utility levels from two different actions. has either the standard normal or logistic distribution, symmetrically distributed about zero, which means . is observable. here is the indicator function, which takes on the value one if the event in the bracket is true, and zero otherwise.
The response probability for :

Loading Comments...