📊

FDSI-3 Properties of MLEs

Intro

Briefly stated, we can prove that, asymptotically, maximum likelihood estimators cannot be beat, in the sense that they have minimal MSE, and also are approximately normally distributed with variance that is easy to calculate. This makes quantifying the error in the estimators straightforward.

Invariance of the MLE

Statement: If is the MLE for , then is the MLE for .
This is useful especially when we are interested in estimating some function of the parameter , not itself. For example, if are i.i.d. , then the MLE for is
and the MLE for the signal to noise ratio is
One consequence of this properties is that estimation is not affected by how one chooses to parameterize a model.
For example, some choose to represent the Exponential distribution using the “mean parameterization” which leads to , whereas some use the “rate parameterization” which leads to . Because of invariance, these two approaches will lead to the same estimates, i.e., .
Remarks:
  • Unbiasedness does not have this invariance property, i.e., if is an unbiased estimator for , no guarantee that is unbiased for . For example, is a biased estimator for , even if is unbiased for .
Example:
Suppose that are i.i.d. Exponential. The MLE of the tail probabiliy :
Since , and the MLE for is , thus MLE for is .

Consistency of the MLE

An estimator is consistent for if as the sample size increases, the estimator converges to the true value of the parameter.
Formally, we would write: for any
The is a fairly low standard placed on an estimator. For MLEs, under certain “regularity conditions”, they are consistent.

Asymptotic Normality of the MLE

MLE is asymptotically normal under certain “regularity conditions”. Utilizing this property, we can easily:
  • Calculate standard errors for MLEs
  • Construct confidence intervals for parameters
  • Construct confidence intervals for general functions of parameters (via the Delta method)
  • Construct hypothesis tests for parameters

Regularity Conditions

(1) We observe , where are i.i.d.
(2) The parameter is identifiable, that is, if , then
(3) The desities have common support, and is differentiable in
(4) The parameter space contains an open set of which the true parameter value is an interior point
(5) For every , the density is three times differentiable with respect to , the third derivative is continuous in , and can be differentiated three times under the integral sign.
(6) For any , there exists a positive number and a function (both of which may depend on ) such that
with .
Conditions (1)~(6) are sufficient to prove asymptotic normality and efficiency of MLEs.

One-dimensional Situation

Statement: if are i.i.d. , then, under suitable regularity conditions, as increases
where
is the Fisher Information.
In the definition of , is random, and the subscript on is meant to emphasize that has distribution specified by .
From equation we have
so , i.e. unbiased. And . Then SE of .
Geometric Understanding
It can be observed that the MLE is approximately equal to the reciprocal of the expected value of the negative second derivative of the log likelihood at its peak. The “negative of the second derivative” is actually the amount of curvature at the peak of the likelihood. If the log likelihood has a “sharp” peak, then will be large, and the variance of will be small.
notion image
Relation to Log-likelihood
It can be shown that
Proof

The Multidimensional Case

Consider the case where . We define the Fisher Information matrix to be the by matrix whose entry is
The Main Result: Under appropriate “regularity conditions”, if are i.i.d. , then
where deontes the p-dimensional multivariate normal distribution.
Practical Version: Under “regular conditions”, is approximately normal with variance .

Examples

Efficiency of the MLE

It is possible to show that the lower bound on the variance of any unbiased estimator is . This is called the Cramer-Rao lower bound.
Since the MLE is asymptotically unbiased, we know that we are achieving the Cramer-Rao lower bound, at least asymptotically.
Hence, we are minimizing the MSE (asymptotically). This is objectively the best rationable for utilizing maximum likelihood estimators.

The Delta Method

Often, our objective is not to estimate , but instead some function of , call it . For example, we want to estimate the probability of some event . This probability is not a parameter, but is a function of .
From the invariance property we know that MLE of is . But we also need to calculate a SE or construct a confidence interval for this estimate.

One-dimensional Delta Method

If is approximately and as increases, then is approximately , assuming that exists and is not zero.
We can use this result for the MLE of in place of : for , we can conclude that is approximately
In practice, we use the approximation variance:

Multivariate Delta Method

In general, suppose that is approximately . Now consider an m-dimensional function where :
Then, is approximately , where the entry of is . This all assumes that is of full rank.

Examples

Case Study: Geometric Brownian Motion
The Standard Brownian Motion (Wiener Process)
A stochastic process is a standard Brownian motion if the following hold:
  • is continuous as a function of
  • and are independent for
  • The distribution of depends only on for
  • is Normal for all
Brownian Motion with Drift and Scaling
A stochastic process is a Brownian Motion with drift and scaling if is a standard Brownian motion and
This implies the following:
  • is continuous as a function of
  • and are independent for
  • The distribution of depends only on for
  • is Normal for all
Geometric Brownian Motion
A stochastic process is a geometric Brownian motion if is a Brownian motion with drift and scaling and
where is the initial value. Define . This implies:
  • is continuous as a function of
  • and are independent for any
  • The distribution of depends only on for any
  • is Lognormal() for all
  • has expected value
  • is Normal() for all
Question: Can we use data (observed prices) to estimate and ?
Suppose we observe the stock price over the period . We will model the price as a geometric Brownian motion .
Divide the time interval into equal-width subintervals, with , and define
where , the times at which the price is observed. From the last property of geometric Brownian motion, are i.i.d. .

Loading Comments...