Intro
In some situations it is not possible to derive the MLE using an analytical approach, so numerical optimization methods are required.
When taking a numerical approach, it is still a good idea to work with the log likelihood, as this function is usually better behaved. (The product in the likelihood function will often lead to a likelihood with either very small or very large values.)
Methods and Examples
Python includes general-purpose optimization capability; here we will focus on
minimized()
, a part of scipy
, which minimizes a function specified by the user.from scipy.optimize import minimize
For example, suppose is and is unkown. We have seen that the sample proportion is unbiased and has standard error of . The decide the value of p maximizing the standard error ()
import numpy as np def neg_se_of_phat(p, n=100): return (-1) * np.sqrt(p * (1-p) / n) n = 100 res = minimize(neg_se_of_phat, 0.2, method='Nelder-Mead', options={'disp': True}, args=(n))
- The first argument is the function to be minimized.
- The second argument is the starting values for the parameters over which the function will be minimized. Thie should be on the interior (not the boundary) of the space of possible values for the parameters. If there are more than one parameter, then this argument should be an array.
- The method argument tells
minimize
which minimization technique to utilize.
- The
options
argument can set various properties, including the maximum number of iterations, the tolerance, and, as in the example, the amount of information displayed.
res.x # array([0.5])
Numerical MLE for Gamma Dist
Suppose are i.i.d. .
def gamma_mle(x): from scipy.stats import gamma def neg_log_likelihood(pars, x): return (-1) * sum(gamma.logpdf(x, pars[0], scale=1/pars[1])) beta_hat_mom = np.mean(x) / (np.mean(x**2) - np.mean(x)**2) alpha_hat_mom = beta_hat_mom * np.mean(x) mle_out = minimize(neg_log_likelihood, [alpha_hat_mom, beta_hat_mom], args=(x), method='Nelder-Mead') return mle_out
Regression with t-distributed Errors
Consider the following regression model:
where are assumed to have the t-distribution with degrees of freedom. We will assume that is set by the user (not estimated).
There are three unknown parameters in this model: and . We want to estimate them via maximum likelihood.
We assume that we will observe pairs of values, and that the values are independent. Derive the density for a single observation.
Since are independent, the are also independent. But the are not i.i.d. since they have different values, and hence different means.
where , and thus
therefore
In preparation for using
minimize()
, here is the function that returns the negative log-likelihood as a function of the three parameters.def neglogliketreg(pars, x, y, nu): from scipy.stats import t resids = y - (x*pars[1] + pars[0]) return (-1) * sum(t.logpdf(resids/pars[2], nu)) + len(x) * np.log(pars[2])
Following function will fit the model using the assumed t-distributed errors
def treg(x, y, nu): from scipy.stats import t def neglogliketreg(pars, x, y, nu): resids = y - (x*pars[1] + pars[0]) return (-1) * sum(t.logpdf(resids/pars[2], nu)) + len(x) * np.log(pars[2]) holdcov = np.cov(x, y) # MLE for beta_1 & beta_0, under assumption of epsilon ~ N(0, 1) beta_1_init = holdcov[0, 1] / holdcov[0, 0] beta_0_init = np.mean(y) - np.mean(x) * beta_1_init # MLE for sigma, under assumption of epsilon ~ N(0, 1) resids = y - (beta_0_init + beta_1_init * x) sigma_init = np.sqrt(sum(resids**2)/len(x)) output = minimize(neglogliketreg, \ [beta_0_init, beta_1_init, sigma_init], \ args=(x, y, nu), \ method = 'Nelder-Mead') return output
Fisher Information
In cases where the likelihood is maximimized numerically, it is typically not possible to derive the Fisher Information analytically. Instead, we can use an approximation based on the result
This result suggests that we can approximate by using the curvature at the peak of the observed likelihood function.
Example:
Assume are i.i.d. Gamma(), that is
def one_d_gamma_mle(x): from scipy.stats import gamma import numpy as np from scipy.optimize import minimize import numdifftools as nd def neg_log_likelihood(pars, x): return (-1) * sum(gamma.logpdf(x, pars[0], scale=1)) alpha_hat_mom = np.mean(x) mle_out = minimize(neg_log_likelihood,\ alpha_hat_mom, args=(x), \ method='Nelder-Mead') hessfunc = nd.Hessian(neg_log_likelihood) # hessfunc is a Function mle_var = 1 / hessfunc(mle_out.x, x) # mle_out is alpha hat return [mle_out.x, mle_var]
hessfunc(mle_out.x, x)
returns the observed Fisher Information, i.e. the approximation of .Note that the
Hessian
is being evaluated on the negative log likelihood.Multiparameter Case
In as case where the MLE is found numerically, we can use the observed Fisher Information in place of .
In a multidimensional optimization using
minimize()
, the procedure minimizes the negative of the log likelihood. Thus, the Hessian is the matrix of second derivatives evaluated at the peakwe do not need to scale this by or multiply by -1.
To get the approximate covariance matrix, simply invert this Hessian.
Loading Comments...