 
              Proof   � � i �  1 � �  A ( i ) = Var X ( j ) Var i j = 1 i � � � = 1 � X ( j ) Var i 2 j = 1
Proof   � � i �  1 � �  A ( i ) = Var X ( j ) Var i j = 1 i � � � = 1 � X ( j ) Var i 2 j = 1 = σ 2 i
Proof �� � 2 � � i →∞ E lim A ( i ) − µ
Proof �� � 2 � �� �� 2 � � � � � i →∞ E lim A ( i ) − µ = lim i →∞ E A ( i ) − E A ( i )
Proof �� � 2 � �� �� 2 � � � � � i →∞ E lim A ( i ) − µ = lim i →∞ E A ( i ) − E A ( i ) � � � = lim i →∞ Var A ( i )
Proof �� � 2 � �� �� 2 � � � � � i →∞ E lim A ( i ) − µ = lim i →∞ E A ( i ) − E A ( i ) � � � = lim i →∞ Var A ( i ) σ 2 = lim i i →∞
Proof �� � 2 � �� �� 2 � � � � � i →∞ E lim A ( i ) − µ = lim i →∞ E A ( i ) − E A ( i ) � � � = lim i →∞ Var A ( i ) σ 2 = lim i i →∞ = 0
Strong law of large numbers Let � X be an iid discrete random process with mean µ � X := µ The average � A of � X converges with probability one to µ
Our Bernoulli Experiment: Look at averages µ i=1000 i=2000
iid standard Gaussian 2.0 Moving average 1.5 Mean of iid seq. 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0 10 20 30 40 50 i
iid standard Gaussian 2.0 Moving average 1.5 Mean of iid seq. 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0 100 200 300 400 500 i
iid standard Gaussian 2.0 Moving average 1.5 Mean of iid seq. 1.0 0.5 0.0 0.5 1.0 1.5 2.0 0 1000 2000 3000 4000 5000 i
iid geometric with p = 0 . 4 12 Moving average 10 Mean of iid seq. 8 6 4 2 0 0 10 20 30 40 50 i
iid geometric with p = 0 . 4 12 Moving average 10 Mean of iid seq. 8 6 4 2 0 0 100 200 300 400 500 i
iid geometric with p = 0 . 4 12 Moving average 10 Mean of iid seq. 8 6 4 2 0 0 1000 2000 3000 4000 5000 i
iid Cauchy 30 Moving average 25 Median of iid seq. 20 15 10 5 0 5 0 10 20 30 40 50 i
iid Cauchy 10 Moving average Median of iid seq. 5 0 5 10 0 100 200 300 400 500 i
iid Cauchy 30 Moving average 20 Median of iid seq. 10 0 10 20 30 40 50 60 0 1000 2000 3000 4000 5000 i
Strong law of large numbers Why do we care about the convergence of averages?
Strong law of large numbers Why do we care about the convergence of averages? One of the most fundamental tools a statistician/data science has access to SLLN says that as we acquire more data, the average will always converge to the true mean Justifies the convergence of pointwise estimators (coming soon)
Question to think about during break 1. Suppose � X ( 1 ) , . . . are iid with E [ � X ( i )] = µ , and E [ � X ( i ) 2 ] = η . Construct two sequences of random variables from the � X ( i ) that converge to η and µ 2 , respectively, with probability one.
Question to think about during break 1. Suppose � X ( 1 ) , . . . are iid with E [ � X ( i )] = µ , and E [ � X ( i ) 2 ] = η . Construct two sequences of random variables from the � X ( i ) that converge to η and µ 2 , respectively, with probability one. Solution. n � 1 X ( i ) 2 → η � n i = 1
Question to think about during break 1. Suppose � X ( 1 ) , . . . are iid with E [ � X ( i )] = µ , and E [ � X ( i ) 2 ] = η . Construct two sequences of random variables from the � X ( i ) that converge to η and µ 2 , respectively, with probability one. Solution. n � 1 X ( i ) 2 → η � n i = 1 and � � 2 n � 1 � → µ 2 . X ( i ) n i = 1
Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation
Central Limit Theorem Let � X be an iid discrete random process with mean µ � X := µ and finite variance σ 2 � � √ n � A − µ converges in distribution to a Gaussian random variable with mean 0 and variance σ 2 The average � A is approximately Gaussian with mean µ and variance σ 2 / i
Height data ◮ Example: Data from a population of 25 000 people ◮ We compare the histogram of the heights and the pdf of a Gaussian random variable fitted to the data
Height data 0.25 Gaussian distribution Real data 0.20 0.15 0.10 0.05 60 62 64 66 68 70 72 74 76 Height (inches)
Sketch of proof Pdf of sum of two independent random variables is the convolution of their pdfs � ∞ f X + Y ( z ) = f X ( z − y ) f Y ( y ) d y y = −∞ Repeated convolutions of any pdf with finite variance results in a Gaussian!
Repeated convolutions i = 1 i = 2 i = 3 i = 4 i = 5
Repeated convolutions i = 1 i = 2 i = 3 i = 4 i = 5
iid exponential λ = 2, i = 10 2 9 8 7 6 5 4 3 2 1 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65
iid exponential λ = 2, i = 10 3 30 25 20 15 10 5 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65
iid exponential λ = 2, i = 10 4 90 80 70 60 50 40 30 20 10 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65
iid geometric p = 0 . 4, i = 10 2 2.5 2.0 1.5 1.0 0.5 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2
iid geometric p = 0 . 4, i = 10 3 7 6 5 4 3 2 1 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2
iid geometric p = 0 . 4, i = 10 4 25 20 15 10 5 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2
iid Cauchy, i = 10 2 0.30 0.25 0.20 0.15 0.10 0.05 20 15 10 5 0 5 10 15
iid Cauchy, i = 10 3 0.30 0.25 0.20 0.15 0.10 0.05 20 15 10 5 0 5 10 15
iid Cauchy, i = 10 4 0.30 0.25 0.20 0.15 0.10 0.05 20 15 10 5 0 5 10 15
Gaussian approximation to the binomial X is binomial with parameters n and p Computing the probability that X is in a certain interval requires summing its pmf over the interval Central limit theorem provides a quick approximation n � X = B i , E ( B i ) = p , Var ( B i ) = p ( 1 − p ) i = 1 1 n X is approximately Gaussian with mean p and variance p ( 1 − p ) / n X is approximately Gaussian with mean np and variance np ( 1 − p )
Gaussian approximation to the binomial Basketball player makes shot with probability p = 0 . 4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: 1000 � P ( X ≥ 420 ) = p X ( x ) x = 420 � 1000 � 1000 � 0 . 4 x 0 . 6 ( n − x ) = 10 . 4 · 10 − 2 = x x = 420 Approximation : P ( X ≥ 420 )
Gaussian approximation to the binomial Basketball player makes shot with probability p = 0 . 4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: 1000 � P ( X ≥ 420 ) = p X ( x ) x = 420 � 1000 � 1000 � 0 . 4 x 0 . 6 ( n − x ) = 10 . 4 · 10 − 2 = x x = 420 Approximation ( U is standard Gaussian): �� � P ( X ≥ 420 ) ≈ P np ( 1 − p ) U + np ≥ 420
Gaussian approximation to the binomial Basketball player makes shot with probability p = 0 . 4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: 1000 � P ( X ≥ 420 ) = p X ( x ) x = 420 � 1000 � 1000 � 0 . 4 x 0 . 6 ( n − x ) = 10 . 4 · 10 − 2 = x x = 420 Approximation ( U is standard Gaussian): �� � P ( X ≥ 420 ) ≈ P np ( 1 − p ) U + np ≥ 420 = P ( U ≥ 1 . 29 )
Gaussian approximation to the binomial Basketball player makes shot with probability p = 0 . 4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: 1000 � P ( X ≥ 420 ) = p X ( x ) x = 420 � 1000 � 1000 � 0 . 4 x 0 . 6 ( n − x ) = 10 . 4 · 10 − 2 = x x = 420 Approximation ( U is standard Gaussian): �� � P ( X ≥ 420 ) ≈ P np ( 1 − p ) U + np ≥ 420 = P ( U ≥ 1 . 29 ) = 1 − Φ ( 1 . 29 ) = 9 . 85 · 10 − 2
CLT: Things to think about 1. The CLT allows us to model many phenomena using Gaussian distributions 2. General intuition that an average of random variables concentrates tightly around the mean, since the Gaussian distribution has very thin tails (i.e., its pdf decays very quickly).
CLT vs Chebyshev: 5000 Flip Sequences √ +4 σ i √ +3 σ i √ +2 σ i √ +1 σ i µi √ − 1 σ i √ − 2 σ i √ − 3 σ i √ − 4 σ i i=50 i=100 Chebyshev says Pr( | X − µ | > 3 σ ) ≤ 1 9 3 CLT approximation says Pr( | X − µ | > 3 σ ) ≈ 1000
Types of convergence Law of Large Numbers Central Limit Theorem Monte Carlo simulation
Monte Carlo simulation Simulation is a powerful tool in probability and statistics Models are too complex to derive closed-form solutions (life is not a homework problem!) Example: Game of solitaire
Game of solitaire Aim: Compute the probability that you win at solitaire If every permutation of the cards has the same probability P ( Win ) = Number of permutations that lead to a win Total number Problem: Characterizing permutations that lead to a win is very difficult without playing out the game We can’t just check because there are 52 ! ≈ 8 · 10 67 permutations! Solution: Sample many permutations and compute the fraction of wins
Recommend
More recommend