13. The Weak Law and the Strong Law of Large Numbers James - PowerPoint PPT Presentation

13. The Weak Law and the Strong Law of Large Numbers James Bernoulli proved the weak law of large numbers (WLLN) around 1700 which was published posthumously in 1713 in his treatise Ars Conjectandi. Poisson generalized Bernoulli’s theorem around 1800, and in 1866 Tchebychev discovered the method bearing his name. Later on one of his students, Markov observed that Tchebychev’s reasoning can be used to extend Bernoulli’s theorem to dependent random variables as well. In 1909 the French mathematician Emile Borel proved a deeper theorem known as the strong law of large numbers that further generalizes Bernoulli’s theorem. In 1926 Kolmogorov derived conditions that were necessary and sufficient for a set of mutually independent random variables to obey the law of large numbers. 1 PILLAI

Let be independent, identically distributed Bernoulli random X i Variables such that = = = − = P ( X ) p , P ( X 0 ) 1 p q , i i = + + + k X X � X and let represent the number of “successes” 1 2 n in n trials. Then the weak law due to Bernoulli states that [see Theorem 3-1, page 58, Text] { } pq k − > ε ≤ (13-1) P p . n ε 2 n i.e., the ratio “total number of successes to the total number of trials” tends to p in probability as n increases . A stronger version of this result due to Borel and Cantelli states that the above ratio k/n tends to p not only in probability , but with probability 1. This is the strong law of large numbers (SLLN). 2 PILLAI

What is the difference between the weak law and the strong law? ε { n } The strong law of large numbers states that if is a sequence of positive numbers converging to zero, then { } ∞ ∑ k − ≥ ε < ∞ P p . (13-2) n n = n 1 From Borel-Cantelli lemma [see (2-69) Text], when (13-2) is { } ∆ k − ≥ ε satisfied the events can occur only for a finite A = p n n n number of indices n in an infinite sequence, or equivalently, the { } k − < ε p events occur infinitely often, i.e., the event k/n n n converges to p almost-surely. Proof: To prove (13-2), we proceed as follows. Since 4 k − ≥ ε ⇒ − ≥ ε 4 4 p k np n n 3 PILLAI

we have ( { } { } ) n ∑ − ≥ ε = ε k − ≥ ε + k − < ε 4 4 4 4 4 ( k np ) p ( k ) n n P p P p n n n = k 0 and hence n ∑ − 4 ( k np ) p ( k ) { } n k − ≥ ε ≤ (13-3) = P p k 0 n ε 4 4 n where   n ∑ = = = − n k n k   p ( k ) P X k p q     k n i     = i 1 By direct computation n n n { } { } ∑ ∑ ∑ ( ) − = − = − 4 4 4 ( k np ) p ( k ) E ( X np ) E ( X p ) n i i = = = k 0 i 1 i 1 4 PILLAI

n n n n n ∑ ∑∑∑∑ = = 4 E {( Y ) } E ( Y Y Y Y ) = 1 → can coincide with i n i i k j l j, k or l , and the second variable = = = = = i 1 i 1 k 1 j 1 l 1 0 takes (n-1) values n n n n n ∑ ∑∑ ∑∑ 4 = + − + − 3 2 2 E ( Y ) 4 n ( n 1 ) E ( Y ) E ( Y ) 3 n ( n 1 ) E ( Y ) E ( Y ) i i j i j = = = = = i 1 i 1 j 1 i 1 j 1 = + + − ≤ + − 3 3 2 n ( p q ) pq 3 n ( n 1 )( pq ) [ n 3 n ( n 1 )] pq = 2 3 n pq , (13-4) since + = + − − < ≤ < 3 3 3 2 2 p q ( p q ) 3 p q 3 pq 1 , pq 1 / 2 1 Substituting (13-4) also (13-3) we obtain 3 pq { } k − ≥ ε ≤ P p n ε 2 4 n ε = 1 Let so that the above integral reads 1/8 n and hence { } ∞ ∞ 1 1 ∞ ∑ ∑ ( ) k ∫ − − ≥ ≤ ≤ + 3/ 2 P p 3 pq 3 pq 1 x dx n 1/8 3/ 2 n n 1 = = n 1 n 1 5 = + = < ∞ (13-5) 3 pq (1 2) 9 pq , PILLAI

thus proving the strong law by exhibiting a sequence of positive ε = 1/8 numbers that converges to zero and satisfies (13-2). 1/ n n We return back to the same question: “What is the difference between the weak law and the strong law?.” The weak law states that for every n that is large enough, the n = ratio is likely to be near p with certain probability that ∑ ( i X ) / n k n / i = 1 tends to 1 as n increases. However, it does not say that k/n is bound to stay near p if the number of trials is increased. Suppose (13-1) is n . ε satisfied for a given in a certain number of trials If additional 0 n , trials are conducted beyond the weak law does not guarantee that 0 the new k/n is bound to stay near p for such trials. In fact there can n > n > p + ε be events for which for in some regular manner. k / n , 0 The probability for such an event is the sum of a large number of very small probabilities, and the weak law is unable to say anything specific about the convergence of that sum. However, the strong law states (through (13-2)) that not only all such sums converge, but the total number of all such events 6 PILLAI

> p + ε k / n where is in fact finite! This implies that the probability { } of the events as n increases becomes and remains − p > ε k n small, since with probability 1 only finitely many violations to → ∞ n . the above inequality takes place as Interestingly, if it possible to arrive at the same conclusion using a powerful bound known as Bernstein’s inequality that is based on the WLLN. Bernstein’s inequality : Note that k − > ε ⇒ > + ε p k n ( p ) n and for any this gives λ − + ε > λ > ( k n ( p )) e 1 . 0 , Thus n ( ) ∑ − k − > ε = k n k n P n { p } p q k = + ε   k n ( p ) n ( ) ∑ ≤ λ − + ε − ( k n ( p )) k n k e n p q k = + ε k  n ( p )  n ( ) ∑ λ − + ε − ≤ ( k n ( p )) k n k e n p q 7 k = PILLAI k 0

n ∑ − λ ε λ − λ − k − > ε = n q k p n k P { p } e ( pe ) ( qe ) ( ) n n k = k 0 − λ ε λ − λ = + n q p n e ( pe qe ) . (13-6) Since for any real x , 2 ≤ + x x e x e 2 2 2 2 λ − λ λ λ + ≤ λ + + − λ + q p q p pe qe p ( q e ) q ( p e ) 2 2 2 2 2 = λ + λ ≤ λ q p (13-7) pe qe e . Substituting (13-7) into (13-6), we get 2 k − > ε ≤ λ − λ ε n n P n { p } e . λ n − λ ε But is minimum for and hence 2 λ = ε n / 2 2 k − > ε ≤ − ε ε > n / 4 P n { p } e , 0. (13-8) Similarly 2 k − < − ε ≤ − ε n / 4 P n { p } e 8 PILLAI

and hence we obtain Bernstein’s inequality 2 / 4 k − > ε ≤ − ε (13-9) n P { p } 2 e . n Bernstein’s inequality is more powerful than Tchebyshev’s inequality as it states that the chances for the relative frequency k /n exceeding → ∞ its probability p tends to zero exponentially fast as n . Chebyshev’s inequality gives the probability of k /n to lie − ε + ε p between and for a specific n . We can use Bernstein’s p − ε inequality to estimate the probability for k /n to lie between p + ε p and for all large n . Towards this, let = − ε ≤ k < + ε y { p p } n n so that 2 / 4 = n − > ε ≤ − ε c n P y ( ) P { p } 2 e k n ∞ To compute the probability of the event note that its ∩ y , n ∞ ∞ = n m complement is given by = ∪ c c ( ∩ y ) y 9 n n PILLAI = = n m n m

and using Eq. (2-68) Text, 2 − ε m / 4 ∞ ∞ ∞ 2 e ∑ ∑ 2 − ε ≤ ≤ = c c n / 4 ∪ P ( y ) P y ( ) 2 e . n n 2 − ε − / 4 1 e = n m = = n m n m This gives 2 − ε m / 4 ∞ ∞ 2 e = − ≥ − → → ∞ ∩ ∪ P ( y ) {1 P ( y )} 1 1 as m n n 2 − ε − / 4 1 e = = n m n m or, − ε ≤ k ≤ + ε ≥ → → ∞ P { p p , for all n m } 1 as m . n Thus k /n is bound to stay near p for all large enough n , in probability, a conclusion already reached by the SLLN. ε = Discussion: Let Thus if we toss a fair coin 1,000 times, 0 . 1 . from the weak law { } 1 k − 1 ≥ ≤ P 0 . 01 . n 2 40 10 PILLAI

13. The Weak Law and the Strong Law of Large Numbers James - PowerPoint PPT Presentation

13. The Weak Law and the Strong Law of Large Numbers James Bernoulli proved the weak law of large numbers (WLLN) around 1700 which was published posthumously in 1713 in his treatise Ars Conjectandi. Poisson generalized Bernoullis theorem

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS

Strong Law of Large Numbers Will Perkins February 12, 2013 The Theorem Theorem (Strong Law of

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 1

Modelling and Verification Lecture 4 Weak bisimilarity and weak bisimulation games Properties of

18.175: Lecture 9 Borel-Cantelli and strong law Scott Sheffield MIT 1 18.175 Lecture 9 Outline

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

The weak-charged WIMP Shigeki Matsumoto (Kavli IPMU) The weak-charged WIMP, Majorana fermion with

Making weak maps compose strictly Richard Garner Uppsala University CT 2008, Calais Outline

Convergence and Efficiency of the Wang Landau algorithm Gersende FORT CNRS & Telecom

Introduction to Stochastic Optimization January 13, 2015 P. Carpentier Master MMMEF Cours

Convergence to stable laws in the space D cois Roueff 1 Philippe Soulier 2 Fran Poitiers,

Scaling limit of random planar maps Lecture 2. Olivier Bernardi, CNRS, Universit Paris-Sud

Reinforcement Learning Algorithms A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2

t r trs

Step Size Matters in Deep Learning Kamil Nar Shankar Sastry Neural Information Processing

Interpolation, Growth Conditions, and Stochastic Gradient Descent Aaron Mishkin,

Sambuz

Useful Links

Newsletter

Mail Us

13. The Weak Law and the Strong Law of Large Numbers James - PowerPoint PPT Presentation

13. The Weak Law and the Strong Law of Large Numbers James Bernoulli proved the weak law of large numbers (WLLN) around 1700 which was published posthumously in 1713 in his treatise Ars Conjectandi. Poisson generalized Bernoullis theorem

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS BY THE NUMBERS

Strong Law of Large Numbers Will Perkins February 12, 2013 The Theorem Theorem (Strong Law of

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 1

Modelling and Verification Lecture 4 Weak bisimilarity and weak bisimulation games Properties of

18.175: Lecture 9 Borel-Cantelli and strong law Scott Sheffield MIT 1 18.175 Lecture 9 Outline

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

WEAK INTERPOLATION PROPERTY over THE MINIMAL LOGIC Larisa Maksimova Sobolev Institute of

Weak-Signal Digital Modes Weak-Signal Digital Modes The weak-signal digimodes have been

To the weak I became weak, that I might win the weak. I have become all things to all people,

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Weak memory models INF4140 - Models of concurrency Weak memory models Fall 2016 30. 10. 2016

The weak-charged WIMP Shigeki Matsumoto (Kavli IPMU) The weak-charged WIMP, Majorana fermion with

Making weak maps compose strictly Richard Garner Uppsala University CT 2008, Calais Outline

Convergence and Efficiency of the Wang Landau algorithm Gersende FORT CNRS &amp; Telecom

Introduction to Stochastic Optimization January 13, 2015 P. Carpentier Master MMMEF Cours

Convergence to stable laws in the space D cois Roueff 1 Philippe Soulier 2 Fran Poitiers,

Scaling limit of random planar maps Lecture 2. Olivier Bernardi, CNRS, Universit Paris-Sud

Reinforcement Learning Algorithms A. LAZARIC ( SequeL Team @INRIA-Lille ) ENS Cachan - Master 2

t r trs

Step Size Matters in Deep Learning Kamil Nar Shankar Sastry Neural Information Processing

Interpolation, Growth Conditions, and Stochastic Gradient Descent Aaron Mishkin,

Sambuz

Useful Links

Newsletter

Mail Us

Convergence and Efficiency of the Wang Landau algorithm Gersende FORT CNRS & Telecom