🎲

Prob Prep

Part 1. Basic Discrete Probability

…

Conditional Probability Rules

  • For any events with
  • Multiplication Rule:
  • Law for Total Probability: Suppose events form a partition of . Then for any event
  • Bayes’ Theorem: Suppose events form a partition of . Then for any event

Independent Events

  • are independent β‡’
  • NOT β‡’ are independent
Β 

Conditional Independence

Suppose events are not necessarily independent, but there is another event such that
then we say that and are independent conditional on .

Part 2. Random Variables

…

Expectation

Remark:
  • Some distributions do not have expectation, for example, the Cauchy Distribution (fat tails).
  • If are random variables, and are constants, then:
    • if and are finite
    • if and are independent.
  • Jensen inequality:
    • If is convex on the support of the random variable , then
    • If is concave on the support of the random variable , then

Rule for the Lazy Statistician

Discrete version:
this is a theorem not a definition of
Continuous version:
Joint distribution version:

Bernoulli() Distribution

Binomial() Distribution

Geometric() Distribution

Poisson() Distribution

Remark:
  • let , as n grows, converges to Poisson()
  • if independent,

Uniform() Distribution

Pareto() Distribution

Β 
Remark:
  • . This is often used for modelling the tail of a distribution

Exponential() Distribution

Remark:
  • Memoryless property: If I model the time between β€œevents” as exponential, the prob time to next event is greater than a units is the same, no matter how long it has been since the last event. Suppose , and . Then
    • notion image

Gamma() Distribution

For ,
where
Remark: if
  • ,
  • If , then
  • For any , the random variable
  • If are independent and , then
  • The sum of n independent random variables has the distribution (Erlang Distribution).
notion image
notion image

Normal() Distribution

Remark:
  • For and any ,
  • Linear combinations of independent normal random variables are also normally distributed.

The Poisson Process

Poisson distribution can be derived from the limit of binomial(). Imagine divide the interval into subintervals. When the interval extended indefinitely, this leads to Poisson Process with rate .
β€œEvents occur as a Poisson Process with rate ” means
  • If equals the number of events during an interval of time of length , then has the Poisson() distribution
  • The times between events are random variables with the distribution
  • The times between events are independent random variables
  • The numbers of events in disjoint time intervals are independent random variables.
  • The waiting time for events has the Gamma() distribution. (sum of k distributions)

Beta() Distribution

Remark:
  • If , then the distribution is symmetric about 0.5
notion image
notion image

Summary Table

notion image

Part 3: Multivariate Distributions

Joint Probability Function

For discrete random variables
Joint PMF:
Β 
Joint CDF
For continuous random variables
Joint PDF
  • To calculate probabilities, integrate the pdf over the region of interest:
    • A can be any shape

Marginal Distributions

For discrete case
For continuous case
Remark:
  • for random variables

Conditional Distribution

for discrete case
for continuous case
with called the conditional desity of given .
Conditional Distribution V.S. Marginal Distribution
notion image
Conditional distribution Y for X = 1
Conditional distribution Y for X = 1
  • for case I: are independent
  • for case II: are weakly (positively) dependent
  • for case III: are strongly (positively) dependent
are all the same for three cases. While are not the same, for example, as shown in the second plot.

Covariance & Correlation

properties:
  • ,
  • independent β‡’ . NOT β‡’ independent
    • if β‡’
Β 
More than two random variables
Let denote two vectors of scalars, is a non-random matrix

Bivariate Normal Distribution

has the bivariate normal distribution if it has joint pdf
where is the vector of means and is the covariance matrix
Equally definition:
has the bivariate normal distribution iff all linear combinations of and are also normally distributed.
Properties:
  • are normally distributed, i.e., the marginals are normal
  • conditional distribution
  • are independent.
  • are independent β‡’ is bivariate normal

Multivariate Normal Distribution

has the multivariate normal distribution if it has joint pdf
where is positive definite (otherwise the inverse might not exist)
Properties:
  • All marginal and conditional distributions are multivariate normal.
  • Any random variable/vector of the form will be multivariate normal if is positive definite.
  • If , then are independent.
  • If is diagonal, are independent

Part 4: Conditional Expectation

For discrete random , when conditioning on an event
when the event is an event concerning itself, the .
  • E.g. ,
For continuous case,

Laws of total Probability

Prior and Posterior Distributions

In Bayesian analysis, before data is observed, the unknown parameter is modeled as a random variable having a probability distribution , called prior distribution. This distribution represents our prior belief about the value of this parameter. After observing data, we have increased our knowledge about the parameter . The equation is

Iterated Conditioning

where
Useful euqation:
proof:

Measure-Theoretic Notions of Conditional Expectation

β€œInformationβ€œ can be captured via a collection of subsets of . Such collections are denoted using ,etc. Information means, we can tell elements in occur or not (instead of unsure).
When satisfies having properties, it is a -field or -algebra
    • you can always know does not occur
  • If , then
    • If you know the occurrence / non-occurrence status of , then you also know status of
  • If , then
    • If you know the occurrence / non-occurrence status of , then you also know the status of
Β 
Remark:
  • is the set of subsets
  • Trivial -field consists of only . Conditioning on is like conditioning on no information
  • The power set is a -field, corresponding to β€œknowing everything”
  • The information content from conditioning on a random variable is called the -field geerated by , denoted .
    • written in simply probability theory can be interpreted as
  • A random variable is said to be -measurable if
Β 
Properties of Conditional Expectation
Assume are -field. are random variables.
  • If is -measurable, then
  • If is -measurable, then
  • , for scalars
  • If , then
  • If is convex, then
Β 
Measure-Theoretic Independence
  • (-field) are independent iff for any and ,
  • (random variables) are independent iff and are independent

Part 5: Moment Generating Functions

MGF of
which is a function of . Can be calculated using the Rule for the Lazy Statistician.
Calculate Moments

Applications

Uniqueness of MGF
If for all , then and have the same distribution.
  • Note that two random variables can have matching moments, i.e., but have different distributions.
Β 
Sum of Independent Random Variables
Suppose are independent random variables, and . Then
for all
Β 
Establishing Convergence in Distribution
if as for all for some , then
e.g.
for , then
thus

The Central Limit Theorem

Suppose are i.i.d. and and both exist and are finite, then
where and
Equally inferences, when n is large

The Delta Method

Assume are such that
where , is a constant, and satisfies and . Then, assuming is a function that is differentiable at and , then
This is often applied with and
Equivalent description:
if is approximately and as , then is approximately
assuming that is a function that is differentiable at and .

Part 6: Classic Results

Chi-squared distribution
Chi-squared distribution with degrees of freedom is a special case of the Gamma distribution, which is . If follows such distribution
Β 
t-distribution
t-distribution with degrees of freedom is defined as
where , is the Chi-squared distribution with freedom . And are independent.
t-distribution has a bell shape density, but has heavier tails than the normal distribution. As increases, the distribution converges to
When , the distribution is Cauchy distribution, which have very heavy tails. Its mean (expectation) does not exist.
  • CLT do not work for Cauchy distribution
  • are i.i.d. Cauchy β‡’ is also Cauchy distribution, no matter how large n is
Β 
Classic Results
Assuming are i.i.d. , then
  • and are independent
  • The quantity
  • The quantity
  • The quantity
Note that
so is an unbiased estimator of .
Β 
End of Content

Loading Comments...