Here are some transformation aiming to maximize the quality of the fit of a normal distribution to the sample data.
The motivations for such a transformation include
- to make a skewed distribution more symmetric
- to allow for the use of the normal model in a case where it otherwise may not be clear how to model the variable
- to transform the response of a regression model so that the normal approximation is a better fit for the model error
Box-Cox Transformation
This is only useful in the case where the variable is strictly positive.
Define Box-Cos power transformation function:
This defines a wide range of possible transformations, including , etc.
Note that
the function is defined in such a format that the transformation is continuous in .
In Python, Box-Cox is implemented via
scipy.stats.boxcox
Box-Cox for Regression
for model
question: what choice of yields the best fitting model, if we want to assume the errors are i.i.d. Normal?
Note that we cannot directly apply Box-Cox transformation on responses because Box-Cox try to make the transformed results i.i.d normal while responses in the dataset are never independent. Instead, we can try different and calculate the corresponding βs likelihood assuming itβs normal. Then choose the maximizing the likelihood.
Implementation Code
def BoxCoxRegression(X,y,ax=None,alpha=0.05): # X is the design matrix. Include a column of ones if you require the intercept. # y is the vector of response values import numpy as np from scipy.stats import chi2 # Calculate the log likelihood for a fixed value of lambda def loglike(lmbda, hatmatrix, y): from scipy.stats import boxcox # Transform using proposed lambda value yt = boxcox(y, lmbda=lmbda) # Residual sum of squares RSS = np.sum((np.matmul(hatmatrix,yt) - yt)**2.0) # Compare this with Equations (8) and (9) in Box and Cox (1964). # Note that they multiply last term by (lmbda-1), but I am # changing to just lmbda to make consistent with the R function. # Since it adds a constant, it is irrelevant. return -len(y)/2.0*np.log(RSS) + lmbda*np.sum(np.log(y)) # Construct the hat matrix. This is the same for all lambdas XtXinv = np.linalg.inv(np.matmul(np.transpose(X),X)) hatmatrix = np.matmul(np.matmul(X,XtXinv),np.transpose(X)) # Calculate the loglikelihoods for a range of lambda values lambdarange = np.linspace(-2,2,100) loglikes = np.array([loglike(lmbda,hatmatrix,y) for lmbda in lambdarange]) # Find the best choice of lambda lambdaopt = lambdarange[np.argmax(loglikes)] # Find the confidence limits for lambda conflim = np.max(loglikes) - chi2.ppf(1-alpha,1)/2 ininterval = lambdarange[loglikes>= conflim] confint = [np.min(ininterval),np.max(ininterval)] # Create a plot if not ax is None: ax.plot(lambdarange,loglikes,lw=1) ax.set(xlabel=r'$\lambda$',ylabel="log likelihood") ax.plot([lambdarange[0],lambdarange[-1]],[conflim,conflim],"--",color="red",lw=0.5) ax.plot([confint[0],confint[0]],[np.min(loglikes),conflim],"--",color="red",lw=0.5) ax.plot([confint[1],confint[1]],[np.min(loglikes),conflim],"--",color="red",lw=0.5) ax.plot([lambdaopt,lambdaopt],[np.min(loglikes),np.max(loglikes)],"--",color="red",lw=0.5) ax.text(confint[0],np.min(loglikes),round(confint[0],1),ha="center",va="top",size=8) ax.text(confint[1],np.min(loglikes),round(confint[1],1),ha="center",va="top",size=8) ax.text(lambdaopt,np.min(loglikes),round(lambdaopt,1),ha="center",va="top",size=8) return lambdaopt, confint, lambdarange, loglikes
The Yeo-Johnson Transformation
The Box-Cox Transformation suffers from the limitation that it can only be applied to strictly positive variables. A generalization that works for any variable is the Yeo-Johnson Transformation,
In Python, this is implemented in
scipy.stats.yeojohnson
Β
Loading Comments...