FDSI-2 Constructing Estimators

Introduction

Recall that typically a probability model is incompletely specified, i.e., a particular model is assumed, but there are still unknown components, often real-valued parameters. We use available data (a sample) to estimate these parameters.

Terminologies

Parameters: the constants that distinguish probability models in the same “family”. Such “family” include “normal”, “exponential”, etc.

Statistic: a function of the RVs being used to model the observable data, e.g. the sample mean, the sample variance, etc. A statistic is a random variable.

Estimator: a statistic which is used to estimate an unknown parameter. Estimator is also a random variable.

The “hat notation” is standard for denoting an estimator: is an estimator for .
An estimator is said to be an unbiased estimator for if .
The variability (or precision) of an estimator is quantified by its variance,
The standard error (SE) of is the square root of its variance. The SE is often reported along with an estimate in order to give a sense of the amount of error one should anticipate in that estimate.

The Method of Moments

Recall that for any positive integer , the quantity is called the k-th moment for the distribution of .

Suppose that there are random variables that model the observable sample, then the k-th sample moment is

Note that .

Further, suppose that are i.i.d. each with density . Here is a vector of parameters. Set up a system of equations, setting the first sample moments equal to the first population moments:

then solve the system of equations for

these equations are built on the (incorrect) belief that . In fact , so

and we construct our estimator as

Maximum Likelihood Estimation

Suppose are i.i.d. with density , where is the parameter to be estimated. In this case, the joint distribution of the random variables is

and the likelihood function is

Note that MLE actually does not require observations to be i.i.d. or even independent.

Since likelihood is positive at its maximum, thus the likelihood is maximized at the same as does the logarithm of the likelihood. We often work with the log likelihood instead of the likelihood. In above case, we have

Examples

FDSI-2 Example.pdf