๐Ÿ“Ž

Ch6: Multiple Regression Analysis: Further Issues

TOC

Effects of Data Scaling on OLS Statistics

1. Changing Unit of Measurement in Linear Form

๐Ÿ“Œ
When we change the unit of measurement: - The statistical significance does not change - The does not change
Proof
  1. Statistical Significance
Consider the simple regression model:
Suppose and . Then for this model:
Since , then
Therefore, and , thus and .
It means that

2. Changing Unit of Measurement in Log Form

  • If the dependent variable appears in logarithmic form, changing the unit of measurement does not affect the slope coefficients:
    • and the new intercept will be .
  • Similarly, if the independent variable appears in log form, changing the unit of measurement only affects the intercept estimate.

3. Beta Coefficients

When we regress all variables that are standardized, we can substract off the effect of scale, which makes it more reasonable sometimes.
notion image
  • denotes the z-score of . The new coefficients are .
  • In this model, when increases by one standard deviation, then changes by standard deviation.

Functional Forms

1. Logrithmic Functional Form

i.e. . The necessity of log form:
  • Depict non-linear relationship between and .
  • Sometimes easier to interpret
  • For negative values or 0, use the inverse hyperbolic sine tarnsformation

2. Models with Quadratics

If the partial effect of on is not linear, we could consider using a model with quadratic form. Example: Housing Prices
notion image
To interpret the effect of rooms on , take derivation of with respect to :
It means that at low values of rooms, an additional room has a negative effect on while at some point, the effect becomes positive. And the turnaround value of rooms is .
notion image

3. Models with Interaction Terms

For example
notion image
where price is the house price, sqrft is the size of the house, bdrms is the number of bedrooms. Holding all other variables fixed,
  • If , then this implies that an additional bedroom yields a higher increase in housing price for larger houses.
  • We often evaluate the partical effects at interesting values of sqrft, for example, the mean value.
  • We can also test whether
๐Ÿ‘‰๐Ÿป
For models with quadratics or interaction terms, when interpreting the partial effect of a independent variable, just take the derivative of dependent variable with respect to it.

Goodness of Fit

In Ch4, we know that we can use F test to choose between nested models, that is one model is (the restricted model) is a special case of the other model (the unrestricted model). We we reject the , we should use unrestricted model, otherwise use the restricted model.
When decide between nonested models, we can use adjusted :
The model with a higher adjusted fits the data better.
๐Ÿšซ
When we want to find the causal effect of an independent varibal and choose models, we cannot simply pick the one with higher adjusted . Because the model having higher may have issue of over controlling.
Example 1
We want to estimate the effects of beer tax on traffic fatalities. If we pick the model including the independent varible measuring beer consumption (which may have higher adjusted ), we cannot interpret the causal effect of beer tax on traffic fatalities properly. Because, beer tax has a relationship with beer consumption, and when we interpret the causal effect, we hold other variables fixed. Thus the coefficient of the โ€œbetterโ€ model only reflect the causal effct of beer tax net of the effect related with beer tax on traffic fatalities .
Example 2
We want to estimate the effects of pesticide usage among farmers on family health expenditure.
If we include the number of doctor visits as an explanatory variable (which may have higher adjusted ), the coefficient of the model only reflects the causal effect of pesticide usage on health expenditure net of doctor visits.

Prediction Analysis

1. Confidence Intervals for E(y|x)

Suppose we have estimated the equation
let denote particular values for each of the independent variables. Denote by , the estimator of is
As for the confidence interval of , we can tranform the regression equation so that is a parameter
Regress on and obtain the of the intercept.
Then the confidence interval of is .
๐Ÿ’ก
In this way, we obtain a CI for the average value of for the subpopulation with agiven set of covariates. However, the average CI for the subpopulation is not the same with the CI for a particular unit (individual, family, firm, etc.) . Because there is another very import source of variation: the variance of the unobserved error, which measures our ignorance of the unobserved factors that affect .

2. Prediction Interval

We want to form the condidence interval of (which is an unkown outcome).
Let be the new values of the independent variables, which are observed, while is the unobserved error.
The OLS estimation is
Thus the prediction error using to predict is
Since (because the are unbiased). Besides, because of MLR.4. Therefore, . Note that is uncorrelated with each , because is uncorrelated with the errors in the sample used to obtain the .
So conditional on all in-sample values of the independent variables, the variance of of the prediction error is:
The standard error of is , thus
the prediction interval for is
๐Ÿ“Œ
Note that PI of is wider than CI because of , which reflect the factors in that we have not accounted for.

Loading Comments...