18.175: Lecture 11 Independent sums and large deviations Scott - PowerPoint PPT Presentation

18.175: Lecture 11 Independent sums and large deviations Scott Sheffield MIT 1 18.175 Lecture 11

Outline Recollections Large deviations 2 18.175 Lecture 11

Recall Borel-Cantelli lemmas S ∞ � First Borel-Cantelli lemma: If P ( A n ) < ∞ then n =1 P ( A n i.o. ) = 0. � Second Borel-Cantelli lemma: If A n are independent, then S ∞ P ( A n ) = ∞ implies P ( A n i.o. ) = 1. n =1 4 18.175 Lecture 11

Kolmogorov zero-one law Consider sequence of random variables X n on some probability � � space. Write F � = σ ( X n , X n 1 , . . . ) and T = ∩ n F � . n n T is called the tail σ -algebra . It contains the information you � � can observe by looking only at stuff arbitrarily far into the future. Intuitively, membership in tail event doesn’t change when finitely many X n are changed. Event that X n converge to a limit is example of a tail event. � � Other examples? Theorem: If X 1 , X 2 , . . . are independent and A ∈ T then � � P ( A ) ∈ { 0 , 1 } . 5 18.175 Lecture 11

Kolmogorov maximal inequality Thoerem: Suppose X i are independent with mean zero and � � S n finite variances, and S n = i =1 X n . Then − 2 Var ( S n ) = x − 2 E | S n | 2 . P ( max | S k | ≥ x ) ≤ x 1 ≤ k ≤ n Main idea of proof: Consider first time maximum is � � exceeded. Bound below the expected square sum on that event. 6 18.175 Lecture 11

Kolmogorov three-series theorem Theorem: Let X 1 , X 2 , . . . be independent and fix A > 0. � � Write Y i = X i 1 ( | X i |≤ A ) . Then S X i converges a.s. if and only if the following are all true: � S ∞ P ( | X n | > A ) < ∞ n =1 S ∞ EY n converges � n =1 S ∞ Var ( Y n ) < ∞ � n =1 Main ideas behind the proof: Kolmogorov zero-one law � � implies that S X i converges with probability p ∈ { 0 , 1 } . We just have to show that p = 1 when all hypotheses are satisfied (sufficiency of conditions) and p = 0 if any one of them fails (necessity). To prove sufficiency, apply Borel-Cantelli to see that � � probability that X n = Y n i.o. is zero. Subtract means from Y n , reduce to case that each Y n has mean zero. Apply Kolmogorov maximal inequality. 7 18.175 Lecture 11

Recall: moment generating functions Let X be a random variable. � � The moment generating function of X is defined by � � M ( t ) = M X ( t ) := E [ e tX ]. tx When X is discrete, can write M ( t ) = S e p X ( x ). So M ( t ) � � x is a weighted average of countably many exponential functions. ∞ e tx f ( x ) dx . So When X is continuous, can write M ( t ) = � � −∞ M ( t ) is a weighted average of a continuum of exponential functions. We always have M (0) = 1. � � If b > 0 and t > 0 then � � tX ] ≥ E [ e t min { X , b } ] ≥ P { X ≥ b } e tb E [ e . If X takes both positive and negative values with positive � � probability then M ( t ) grows at least exponentially fast in | t | as | t | → ∞ . 10 18.175 Lecture 11

Recall: moment generating functions for i.i.d. sums We showed that if Z = X + Y and X and Y are independent, � � then M Z ( t ) = M X ( t ) M Y ( t ) If X 1 . . . X n are i.i.d. copies of X and Z = X 1 + . . . + X n then � � what is M Z ? n . Follows by repeatedly applying formula above. Answer: M X � � This a big reason for studying moment generating functions. � � It helps us understand what happens when we sum up a lot of independent copies of the same random variable. 11 18.175 Lecture 11

Large deviations Consider i.i.d. random variables X i . Want to show that if � � φ ( θ ) := M X i ( θ ) = E exp( θ X i ) is less than infinity for some θ > 0, then P ( S n ≥ na ) → 0 exponentially fast when a > E [ X i ]. Kind of a quantitative form of the weak law of large numbers. � � The empirical average A n is very unlikely to E away from its expected value (where “very” means with probability less than some exponentially decaying function of n ). 1 Write γ ( a ) = lim n →∞ log P ( S n ≥ na ). It gives the “rate” of � � n exponential decay as a function of a . 12 18.175 Lecture 11

MIT OpenCourseWare http://ocw.mit.edu 18.175 Theory of Probability Spring 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.175: Lecture 11 Independent sums and large deviations Scott - PowerPoint PPT Presentation

18.175: Lecture 11 Independent sums and large deviations Scott Sheffield MIT 1 18.175 Lecture 11 Outline Recollections Large deviations 2 18.175 Lecture 11 Outline Recollections Large deviations 3 18.175 Lecture 11 Recall Borel-Cantelli lemmas S

18.175: Lecture 13 More large deviations Scott Sheffield MIT 1 18.175 Lecture 13 Outline Legendre

18.175: Lecture 7 Sums of random variables Scott Sheffield MIT 1 18.175 Lecture 7 Outline

Selection, large deviations and metastability () Dynamics with selection, large deviations and

Selection, large deviations and metastability Kyoto () Dynamics with selection, large deviations

18.175: Lecture 3 Random variables and distributions Scott Sheffield MIT 1 18.175 Lecture 3

18.175: Lecture 5 More integration and expectation Scott Sheffield MIT 1 18.175 Lecture 5 Outline

18.175: Lecture 23 Random walks Scott Sheffield MIT 18.175 Lecture 23 1 Outline Random walks

18.175: Lecture 18 Poisson random variables Scott Sheffield MIT 18.175 Lecture 18 1 Outline Extend

18.175: Lecture 4 Integration Scott Sheffield MIT 1 18.175 Lecture 4 Outline Integration

Dedekind sums ingredients Dedekind sums Fourier- Dedekind sums Restricted partition Mirco

18.175: Lecture 9 Borel-Cantelli and strong law Scott Sheffield MIT 1 18.175 Lecture 9 Outline

18.175: Lecture 32 More Markov chains Scott Sheffield MIT 1 18.175 Lecture 32 Outline General

18.175: Lecture 1 Probability spaces and -algebras Scott Sheffield MIT 1 18.175 Lecture 1

18.175: Lecture 14 Weak convergence and characteristic functions Scott Sheffield MIT 1 18.175

18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT 1 18.175

Large Deviations for Multi-valued Stochastic Differential Equations Large Deviations for

Modeling the cognitive spatio-temporal operations using associative memories and multiplicative

Machine Translation Research in META-NET Jan Haji Institute of Formal and Applied Linguistics

Machine Learning in Reservoir Production Simulation and Forecast Serge A. Terekhov NeurOK

Data Warehousing and Machine Learning Preprocessing Thomas D. Nielsen Aalborg University

Kolmogorov complexity as a language Alexander Shen LIF CNRS, Marseille; on leave from

trt t ttst r

On Essentially Conditional Information Inequalities Tarik Kaced 1 and Andrei Romashchenko 2 1 LIF

Harnack Chains and Control Problems in Hypoelliptic Partial Differential Equations Sergio