๐ŸŸข

GrB: Probability Theory

1. Formula and Basis

1.2 Combinatorial Analysis

Basic principle of counting
Let be a set of length- sequence. If there are
  • possible first entries
  • possible second entries for each first entry
  • possible third entries for each combination of first and second entries, etc.
Then there are a total of possible outcomes.
Permutation
A rearrangement of objects into distinct sequence (i.e., order matters). Property: There are
different permutations of objects, of which are alike, are alike, โ€ฆ, are alike.
Combination
An unordered collection of objects (i.e., order doesnโ€™t matter). Property: There are
different combinations of distinct objects taken at a time.
Binomial theorem
Inclusion-Exclusion Principle
more generally,
where has terms.

1.3 Conditional Probability and Bayesโ€™ formula

Conditional Probability
Multiplication Rule
Law of Total Probability
for any mutually exclusive events , whose union is the entire sample space (), we have
Independent Events
Bayesโ€™ Formula
for any mutually exclusive events , whose union is the entire sample space (), we have

1.4 Discrete and Continuous Distributions

Common function of random variables
  • Cumulative distribution function (pdf)
    • Discrete:
    • Continuous:
  • Probability mass function (pmf) / Probability density function (pdf)
    • Discrete โ‡’ pmf:
    • Continuous โ‡’ pdf:
  • Expected value,
    • Discrete:
    • Continuous:
  • Expected value of ,
    • Discrete:
    • Continuous:
  • Variance of ,
    • Discrete & Continuous:
  • Standard deviation of ,
    • Discrete & Continuous:
Discrete random variables
  • Discrete uniform
    • intuition: Discrete uniform random variable represents the occurance of a value between number and when all values in the set have equal probability.
    • pmf:
  • Binomial
    • intuition: Binomial random variable represents the number of successes in a sequence of experiments when each trial is independently a success with probability .
    • pmf:
  • Poisson
    • intuition: Poisson random variable represents the number of events occuring in a fixed period of time with the expected number of occurrences when events occur with a known average rate and are independent of the time since the last event.
    • pmf:
  • Geometric
    • intuition: Geometric random variable represents the trial number to get the first success when each trial is independently a success with probability .
    • pmf:
  • Negative Binomial
    • intuition: Negative Binomial random variable represents the trial number to get to the r-th success when each trial is independently a success with probability
    • pmf:
Continuous random variables
  • Uniform
    • intuition: Uniform distribution describes a random variable uniformly distributed over the interval
    • pdf:
  • Normal
    • intuition: most popular continuous distribution
    • pdf:
  • Exponential
    • intuition: Exponential distribution models the arrival time of an event if it has a constant arrival rate .
    • pdf:
  • Gamma
    • intuition: Gamma distribution arising in practice, often represents the amount of time one has to wait until a total of events occur.
    • pdf:
  • Beta
    • intuition: Beta distribution models events that are constrained within a defined interval.
    • pdf:
Moment Generating Functions
Moment generating functions are defined as
can be used to calculate higher moments of distributions since
for normal distribution
thus

1.5 Expected Value, Variance, Covariance

If is finite for all , then . This equation holds whether s are independent of each other or not.
Covariance & Correlation
If are independent, . But latter conditions cannot derive that two random variables are independent, can only derive they are not linear correlated instead. The definition of indicate that (whether are independent or not).
General rules of variance and covariance
Conditional expectation and variance
  • Discrete distribution:
  • Continuous distribution:
Law of total expectation

1.6 Order Statistics

Let be a random variable with cdf . The distribution function of and , where are IID following , can be derived as below

2. Questions and Answers

Q for 1.2

Screwy pirates 2
โ”
A team of 11 pirates save their loot into a safe and ask locksmith to put a certain number of locks on it. The requirement is that, only a majority โ€” any majority โ€” of them (โ‰ฅ 6) together can open the safe. Each lock can have multiple keys; but each key only opens one lock. Each pirate can carry multiple keys. What is the smallest number of locks needed? How many keys must each pirate carry?
Solution: highlighted phrase here is โ€œonly a majority โ€” any majorityโ€. It means any combination of 5 pirates cannot open the safe but with any extra rest pirate, they can. So there should be at least one decisive key which these 5 pirates do not have, yet the rest pirates have. Then the minimal locks needed is . The keys needed is . And each pirate will carry keys.
Karting tournament
โ”
There are go-kart racers with skills . It is organized as a knockout tournament ๐Ÿ, which means after each round only the winner proceeds to the next round. Except for the final, opponents in each round are drawn at random. Assume when two players meet in a game, the player with better skills always wins. Whatโ€™s the probability that players 1 and 2 will meet in the final?
Solution:
Using conditional probability method: the probability 1 and 2 meet in round 1 is , prob of otherwise is . Condition on 1 and 2 do not meet in round1, the prob they do not meet in round 2 is . So the probability that 1 and 2 do not meet until the final is
Using counting method:
notion image
ย 
Clearly, to make 1 and 2 meet in the final, 2 has to choose a position among leaves of the right tree. Total positions available to choose is while the one on the right tree is . Thus, the probability is .
โ”
Following above question, what is the probability that players 1, 2, 3, 4 are the ones survive until the semi-final?
notion image
Solution: Similarly as above question, the probability is
Garbage Classification
โ”
A foreigner just arrives in Shanghai, where garbage classification ๐Ÿšฎ is mandatory. He does not know such rule and has four pieces of garbage in hand happening to be one of each type (hazardous waste, recyclables, wet waste, dry waste). In front of four dustbins, he decides to take full advantage of them via throwing only one piece of garbage into one bin. What is the probability that all these garbage are thrown to the wrong bins?
Solution: A classic question of general Inclusion-Exclusion Principle. Denote as the event that -th garbage is thrown into the right dustbin. Then the probability required is .
where ,
Similarly, we have and . Then
The probability to the question is thus .
Birthday problem
โ”
How many people do we need in a class to make the probability that two people have the same birthday more than ? (A year has days.)
Solution: Think about the complement to this question, assume there are people, then
the smallest satisfying above inequity is 23.
64th digit
โ”
What is the 64th digit to the right of the decimal point in the decimal representation of ?
Solution: A classic question of Binomial theorem. Consider the sum of
when is odd, corresponding term in above one and term is below one are exactly the opposite. Therefore, we know that for any , is an integer. Since . Thus the 64-th digit to the right of the decimal point in the decimal representation of must be 9.

Q for 1.3

Paramecium Proliferation
โ”
There is one Paramecium ๐Ÿฆ  in a pond. After every quarter of hour, it may die, stay the same or split into two with . All its offspring, if any, will behave the same (and independent of other paramecia). What is the probability the paramecium population finally die out?
Solution: Suppose the probability for a paramecia to die out is , then
since , .
Danish Cookie
โ”
In a box of cookies, there are 5 triangle cookies, 6 square cookies and 10 round cookies. You randomly take one out of the box and eat it. What is the probability that when you eat the last triangle cookie, there still left at least one square cookie and at least one round cookie in the box?
Solution: imagine instead of eating the cookies you take, you put them in sequence on the table. Suppose the last cookie of ๐Ÿ”บโฌœโšช position as in the sequence. We should ensure .
Suppose . For to be last, we only need to ensure the last position of ๐Ÿ”บโฌœโšช sequence is โšช, probability of which is .
Based on this, requires the last position of ๐Ÿ”บโฌœ sequence is โฌœ, probability of which is . Thus . Similarly, . Then
Coin toss series pattern, origin matters
โ”
You and I are tossing a fair coin in sequence. Whenever we obtain HT (head followed by tail ), the game is over and who tosses tail wins the game. What is the prob you win when you toss first (origin) and what is the prob you win when I toss first?
Solution: the first one toss is A, the second one toss is B. Denote the probability A eventually wins by , obviously . Besides,
where is the probability A wins conditional on he tosses H in the first time while is the other case.
  • If A tosses T in the first time, then whatever B tosses, neither A nor B can win. Actually, B here is the same as A in its first toss, that is, conditional on A has tossed T in the first time, the probability B wins eventually () is equal to . Then
  • If A tosses H in the first time, then with prob of 0.5, B tosses T and A loses; with prob of 0.5, B tosses H then himself have to win. Thus
Thus .
Thus the prob you win when you toss first is while the prob you win when I toss first is .
โ”
You and I are tossing a fair coin in sequence. Whenever we obtain HHT, the game is over and who toss tail win the game. What is the prob you win when you toss first and what is the prob you win when I toss first?
Solution:
where
thus .
Therefore, whether you toss first or I toss first, the prob you win is equal.
Sanity Check: for the first question, A can never win in his first toss while B can. So intuitively, this should be an unfair game for A; for the second question, neither A nor B could win in first toss.
Russian roulette series
โ”
A traditional version of Russian roulette: A single bullet is put into a 6-chamber revolver. The barrel randomly spun so that each chamber is equally likely to be under the hammer. A and B take turns to pull the trigger, pointing at their own heads, without further spinning until the gun goes off and one guy is killed. Assume A has the first trial, what is the prob A dies and what is the prob B dies?
Solution: Chamber series: A B A B A B, if the bullet falls into a A chamber, A dies, otherwise, B dies. So the prob each is .
โ”
Version 2: After one guy pull the trigger (and if he survives), the other person will spin the barrel randomly again before pull the trigger next. Assume A has the first trial, what is the prob A dies and what is the prob B dies.
Solution: now each guy has the prob of shooting himself in each round. Notations: (not shoot), (shoot). Assume the prob A eventually wins is , then
thus . Therefore, the prob A dies is and the prob B dies is . Obviouly, in this kind of circumstance, when adding more bullets, the analysis process is the same.
โ”
Version 3: There are two bullets in the chambers. You are B now. After A pull the trigger for the first time, he survives. You can either spin the barrel or not before you take your turn. Should you spin it?
Solution: if do not spin, bullet positions are not changed, located somewhere in the rest 5 chambers. So the prob B dies is ; if do spin, bullet positions are shuffled again, located somewhere in 6 chambers. So the prob B dies is . Thus you should spin the barrel before pull the trigger.
โ”
Version 4: following version 3, what if the two bullets are in two consecutive positions?
Solution: Still, if spin, the prob B dies is ; if do not spin only if the position A pull is ABABAB, B will die, the prob is . So you should not spin the barrel.
Queen Cards
โ”
52 cards are randomly equally distributed to 4 players (each one has 13 cards). What is the probability that each one will have a QueenCard ๐ŸŽต?
Solution:
Using counting method: the permutations to distribute 52 cards to 4 players with 13 cards each is (imagine queue 52 cards first, then set boundaries to cut them into 4 parts; the denominator 13! is to eliminate inner order of each pile). As for queen cards, first assign each player one of them (4! ways), then distribute the rest 48 cards randomly which has permutations. Thus the prob is
Using conditional probability method: total 52 positions for 4 queen card to be located:
the first queen card has 52 choices out of 52 positions; the second one has 52 - 13 = 39 choices out of 51 positions โ€ฆ; Thus the prob is .
Gamblerโ€™s ruin problem
โ”
A gambler starts with an initial wealth of coins. On each successive game, the gambler wins 1 coin with probability (), or losses 1 coint with . He will stop if he either accumulates coins or loses all his money. What is the probability that he will end up with coins.
Solution: A very classic and basic problem in both textbooks and quant interviews. One inspiring point here is, though, random variable is neither random nor a variable, but a function.
Denote the probability when the gambler has coins and finally end up with coins by a random variable . Then
Besides, . Then
Extending this expression to , we have
thus
Dart Scores
โ”
Richard is throwing darts for 123 times. He scores one point if the dart hits the board and zero point if not. It is already known that he hit the board for the first time and did not hit the board for the second time. Besides, the probability he hits the board following the based on the scores he accumulate in previous rounds, say, scores out of times, then the probability he hits the board in the time is . What is the probability for him to score exactly 17 after all these 123 throws?
Solution: Denote as the probability Richard obtains scores out of rounds. Then . Start with , we have
Note that are โ€œall eventsโ€, from
We can sumrise that . Prove this via induction step, we need to prove that .
Thus . Thus the required probability is .
Time Inverval Segmentation
โ”
A group of students are heading for a lecture starting at 13:30 pm. If the probability that at least one student arrives within 13:00 pm ~ 13:20 pm is , then what is the probability that at least one student arrives within 13:15 pm~13:20 pm? Assume that the probability of every student arriving at any moment is uniform (constant) during this 20 minutes.
Solution: Separate this 20 minutes into four 5-minute intervals. The probability of at least one student arriving during each interval is denoted by . Though intuively but truly, we have . Thus, .

Q for 1.5

Connecting Ropes
โ”
You have 233 ropes in a blindfolded bag. Each time, you choose two end in the bag and connect them. Keep doing so until there is no end in the bag. What is the expected number of circles in the bag when you stop?
Solution: denote as the left number of circles in the end when we have ropes in the beginning. Since there are ends in total, when we happen to choose two ends of the same rope, there left ropes and an extra circle. The ability of this situation is . If we do not choose two ends of certain rope, there left ropes without extra circle. Thus,
Since , we have
Dice Game
โ”
You are rolling a dice. For each roll, you are paid the face value. If a roll gives odd numbers (1, 3, 5), you can roll the dice again. Once you get even numbers (2, 4, 6), the game stops. What is the expected payoff of this game?
Solution: Assume the answer is . Consider the first rolling, the face can be 1~6 all with probability of . The expected payoff of odd numbersโ€™ situation is the face value while the expected payoff of even numbers are face value plus (rolling again). Thus
Queen and Joker
โ”
Turn over 54 cards (a regular 52-card deck plus two jokers) one by one until seeing the first queen card or joker card (i.e. one of the 4+2 = 6 cards). What is the expected number of cards turned over?
Solution: there are 6 special cards and 48 ordinary cards here. Define random variable of to denote those ordinary cards, where
The total number of cards turned over is thenz
since each ordinary card can randomly locate in space separated by those special cards
then . Therefore
Blind Box Collection
โ”
There are distinct types of new toys in blind box and each type, independent of prior selections, is equally likely to be in a box. If you want to collect a complete set of these toys with at least one of each type, what is the expected number of boxes you need to buy?
Solution: label the distinct types we (sequentially) get as , denote as the extra number you need to buy in order to obtain another -th toy when already obtain distinct types. Then the total number you need to buy to get a complete set is
For any , since types already exist, the probability for another new type in following boxes is . Thus and its expectation is . Then,
โ”
If you abandon the ambition to collect a complete set and only plan to buy boxes, what is the expected number of distinct types you will have?
Solution: this time, label these types of toys as (e.g. Spongebob Squarepants โ‡’ 1st, Patrick Star โ‡’ 2nd, Squidward โ‡’ 3rd, โ€ฆ) which is slightly different from above labels. Denote the indicator random variable as
Then
For each box, the probability it does not contain a -th type toy is . Since all boxes are independent, the probability that none of these boxes contain a -th type toy is , thus
โ”
Scene Migration: there are a total of balls and boxes. You randomly put each ball into a box. What is the expected number of empty boxes in the end?
Solution: label boxes respectively as . Denote the indicator random variable
Denote as the number of non-empty boxes, then . Same as above question
Then the expected number of empty box is
Default Correlation
โ”
If there is a 40% probability that bond A will default next year and a 30% probability that bond B will default. What is the range of their correlation? Hint: what is the range of the probability that at least one bond default next year?
Solution: Default or not is binomial. Denote as the indicator random variable of default or not. Since
Imgine a Venn diagram, we can know , thus
Note that we cannot start with the relationship of and and assume the maximum / minimum is obtained when . Because actually cannot be such values.

Q for 1.6

Expected value of max and min
โ”
Let be IID random variables with uniform distribution between 0 and 1. What are the cdf, pdf, and expected value of ? What are the cdf, pdf, and expected value of ?
Solution: following the derivation in 1.6, we have
the expected value is
for , we have
the expected value is
Correlation of max and min
โ”
Let be IID random variables with uniform distribution between 0 and 1, and . What is the probability of given that for any ? What is the correlation of and ?
Solution: the meaning of the prob subspace is shown as following
notion image
the small white square inside the big square (z, z) represents โ€œ given that for any โ€. Hence
As for the correlation of and
from above question: . Besides,
thus
as for , since , thus . Then
Acutally, , thus . Therefore,
Note that , this is actually true for all .
Mario Turtle Shell Collision
โ”
Imagine there are 64 turtles ๐Ÿข randomly put on a 100-brick-long platform. Then they get trampled at the same time, after which each turtle shell randomly moves toward one end of the platform at constant speed of 2 bricks/s until it falls off at one end of the platform. However, during the process, if two shells collide, they bounce off and travel to directions opposite to original ones. What is the expected time for all those shells to fall from the platform?
Solution: Because of symmetry after collision, we can treat the collision as two shells go through each other as they meet. Also, because of symmetry, we can treat the left end and right end of the platform the same. Fix left end as axis origin, denote as -th turtleโ€™s initial position on the platform, then the time it drops from the platform is . And the time for all those shells to fall is . The expectation is then
ย 

Loading Comments...