T4. Binary Dependent Variable

TOC

1. Binary Dependent Variables 2. Linear Probability Model (LPM)3. Probit and Logit Models 3.1 Models 3.2 NLS & MLE 3.3 Comparison 3.4 Interpretation, Estimation, Inference 3.5 Goodness-of-fit measures 3.6 Hypothesis Test after MLE 3.7 Latent Variable Model

1. Binary Dependent Variables

A binary (dummy) dependent variable is a dependent variable that takes only two values: 0 or 1. By the definition of conditional expectation:

So when we estimate , we are estimating the probability that given . Suppose , then

thus, for binary dependent variables, the key question is how to model .

2. Linear Probability Model (LPM)

Single Regressor

The linear probability model (LPM) assumes that

This is the familiar simple linear regression model. The model for the data observed is

where . It is easy to verify that

Thus, the error term is conditionally heteroscedastic by definition.

Example

We are interested in whether race is a factor in denying a mortgage application. We have data of the mortgage application in the Boston area. An important determining variable is Payment-income Ratio (P/I).

The Linear Probability Model:

The population regression is

The estimation results are shown as below:

Interpretation: If increases by 0.1, the probability of denial increases by 0.604*0.1=0.0604, that is about 6.0 percentage points.

If we are interested in the effect of race on the probability of denial, holding constant the ratio, we can add the Race variable

Interpretation: African-American applicants have a 17.7% higher probability of having a mortgage application denied than a white, holding constant.

LPM in the General Case

The coefficient can be interpreted as the change in probability for a unit change of some regressor, holding other regressors fixed. Inference can be done based on White Standard Errors.

Summary of LPM

Key feature: model as a linear function of

Advantages:

simple to estimate and to interpret
the inference is the same as for linear multiple regression models but we need to use heteroskedasticity-robust standard errors

Disadvantages:

Predicted probabilities can be <0 or > 1
It makes no sense that the probability should be linear in

3. Probit and Logit Models

3.1 Models

We need a “translator” that takes a value from to and returns a value from 0 to 1 such that:

The closer to the value from linear regression model is, the closer the predicted probability is.

The closer to the value from linear regression model is, the closer the predicted probability is.

No predicted probabilities are less than 0 or greater than 1.

In common practice, econometricians use TWO such “translators”:

Probit (standard normal CDF)

Logit (standard logistic CDF)

The differences between the two “translators” are small. In particular, there is no practical difference between the two “translators” if we only care predicted probabilities in the middle range of the data.

Both the Probit and Logit models have the same basic structure. Define a “Z-index” as . Use a non-linear S-shaped CDF-type function to transform into a predicted value between 0 and 1. The model is

The probit model uses the standard normal CDF:

where is the standard normal CDF

The logit model uses the logistic CDF:

where

3.2 NLS & MLE

Nonlinear least squares (NLS)

Model is

estimators are given by

The NLS estimators are consistent and asymptotically normally distributed. But they are inefficient.

Maximum Likelihood Estimation (MLE)

The probability that conditional on is . The conditional probability distribution for the i-th observation is

Assume that are i.i.d., , the joint probability distribution of conditional on the is

The likelihood function is the above joint probability distribution treated as a function of the unkown coefficients . The ML estimators are

The ML estimators are consistent and asymptotically normally distributed. They are also efficient and commonly used in practice.

3.3 Comparison

Both the probit and logit are nonlinear “translators”. There is no real reason to prefer on over the other.

Predicted probabilities from estimated probit and logit models usually are very close in above mortgage example

Traditionally we saw more of the logit, mainly because the logistic function leads to a more easily computed model. Nowadays, probit is easy to compute with standard packages and thus becomes more popular.

3.4 Interpretation, Estimation, Inference

Interpretation

Clearly, coefficient estimates across the three models are not directly comparable. It’s the probability of being denied that is of interest. We can compare sign and significance (based on a standard z test) of coefficients.

In general we care about the effect of on , that is, we care about :

For the linear case, this is easily computed as the coefficient of

For the nonlinear probit and logit models,

, where
The adjustment factor, depends on

Estimation and Inference

For probit and logit models, the difficulty is that partial effects are not constant but depend on . Thus PEA and APE are introduced:

PEA: Partial Effects at the Average.

The partial effect of explanatory variable is considered for an “average” individual. This is problematic in the case of explanatory variables such as gender.
For discrete explanatory variables, say, for a change in from to

APE: Average Partial Effects

The partial effect of explanatory variable is computed for each individual in the sample and then averaged across all sample members. This method makes more sense.
For discrete explanatory variables, say, for a change in from to

3.5 Goodness-of-fit measures

Percent correctly predicted

Individual ’s outcome is predicted as one if the probability for this event is larger than 0.5, then percentage of correctly predicted and is counted. There are thus four possible outcomes on each pair : . Then,

percent correctly predicted for :

percent correctly predicted for :

Percent correctly predicted is the weighted average of the above two. The weights are the fraction of zeros and ones in the sample.

Pseudo R-squared

Compare maximized log-likelihood of the model with that of a model that only contains a constant (and no explanatory variables)

log-likelihoods are negative, so , and

If no are significant, should be close to

if ,

cannot reach zero in a probit/logit model. Requires the estimated probabilities when all to be unity and the estimated probabilities when all to be zero.

Correlation based measures

Define , calculate . In any case, goodness-of-fit is usually less important than trying to obtain convincing estimates of the ceteris paribus effects of the explanatory variables.

3.6 Hypothesis Test after MLE

The usual z-tests and confidence intervals can be used

Likelihood ratio test (restricted and unrestricted models needed)

where is the log-likelihood value for the unrestricted (restricted) model. Based on the same concept as the F test in a linear model . Basic idea: and under , is close to zero.

3.7 Latent Variable Model

Probit and logit models can be derived from an underlying latent variable model.

is unobserved, or latent, variable which rarely has a well-defined unit of measurement. For example, might be the difference in utility levels from two different actions. has either the standard normal or logistic distribution, symmetrically distributed about zero, which means . is observable. here is the indicator function, which takes on the value one if the event in the bracket is true, and zero otherwise.

The response probability for :