πŸ“Š

FDSI-1 Introduction to Modelling

β€œAll models are wrong; some are useful.”

Terminology

  • parameters: the constants that distinguish probability models in the same family.
  • statistic: a function of the RVs being used to model the observable data, e.g. the sample mean, the sample variance, etc. A statistic is a random variable.
  • estimator: a statistic which is used to estimate an unknown parameter or an unknown function of a parameter. An estimator is a random variable.

Performance of Estimators

An ideal estimator for would have two properties:
  • the estimator should be accurate, i.e., its distribution would be centered on the true value
  • the estimator should be precise, i.e. its distribution has a small spread (small variance)
Estimators example: theta 1 is the optimal estimator. theta 2 has lower precision, theta 3 has bad accuracy.
Estimators example: theta 1 is the optimal estimator. theta 2 has lower precision, theta 3 has bad accuracy.
An estimator is said to be an unbiased estimator for if .
The variability in an estimator (its precision) is quantified by its variance . The standard error (SE) of is the square root of its variance. The SE is often reported along with an estimate in order to give a sense of the amount of error one should anticipate in that estimate.
An estimator is said to be the minimum variance unbiased estimator for (MVUE) if is unbiased for and for any other unbiased estimator ,
Bias-Variance Tradeoff
In some cases, allowing some bias leads to reduced variance. For an estimator for
where .

Important Cases

Case 1: for
where is the sample size, is the successes in sample.
  • which is oftened calculated by
Besides, according to CLT, is approximately normal. Rule of thumb: when and , such approximation is acceptable.
Β 
Case 2: Uncorrelated random variables with and
Standard estimators:
  • ’s estimator is sample mean
    • ,
  • ’s estimator is sample variance:
    • Note that is minimized when . is an unbiased estimator for because which is the definition of variance.
    • thus,
    • is also an unbiased estimator of
    • will minimize the MSE of estimators for , in the case where i.i.d.
Note that the case where are i.i.d. is a special instance of this situation.
If we did assume are i.i.d., by CLT we have provided is large.
Β 
Case 3: i.i.d. random variables following
A special case of case2. Sample mean and sample variance remain unbiased estimators of and respectively. There are some stronger statements as well, (classic results)
  • and are independent
  • The quantity
  • The quantity
  • The quantity

Confidence Intervals

Standard error (SE) of the estimator is a useful tool for quantifying error in an estimate, but a common alternative is the use of confidence intervals for unknown parameters.
Formally, we call a confidence interval for if
Four important confidence intervals commonly used:
(1) CI for Population Mean, , when is large ()
Assume sample is i.i.d. from a distribution with mean and variance , (both unknown). Then, a confidence interval for is
where is such that when has the standard normal distribution.
Β 
(2) CI for Population Mean, , when is small ()
Assume sample is i.i.d. form the normal distribution with (both unknown). Then a confidence interval for is
where is such that when has the t-distribution with degrees of freedom.
Note for all (the cost for not knowing the ), but for ,
Remark:
If and we do not have the information that are i.i.d. from normal distribution, then we cannot find corresponding confidence interval.
(3) CI for Population Variance,
Assume the sample is i.i.d. from the normal distribution with (both unknown). Then a confidence interval for is
Β 
(4) CI for Population Proportion,
Assume that is large enough that you are confident and . Let where is , and is unknown. Then, a confidence interval for is
Β 

Loading Comments...