18 175 lecture 13 more large deviations
play

18.175: Lecture 13 More large deviations Scott Sheffield MIT 1 - PowerPoint PPT Presentation

18.175: Lecture 13 More large deviations Scott Sheffield MIT 1 18.175 Lecture 13 Outline Legendre transform Large deviations 2 18.175 Lecture 13 Outline Legendre transform Large deviations 3 18.175 Lecture 13 Legendre transform Define


  1. 18.175: Lecture 13 More large deviations Scott Sheffield MIT 1 18.175 Lecture 13

  2. Outline Legendre transform Large deviations 2 18.175 Lecture 13

  3. Outline Legendre transform Large deviations 3 18.175 Lecture 13

  4. Legendre transform � Define Legendre transform (or Legendre dual) of a function Λ : R d → R by ∗ ( x ) = sup { ( λ, x ) − Λ( λ ) } . Λ λ ∈ R d � Let’s describe the Legendre dual geometrically if d = 1: Λ ∗ ( x ) is where tangent line to Λ of slope x intersects the real axis. We can “roll” this tangent line around the convex hull of the graph of Λ, to get all Λ ∗ values. � Is the Legendre dual always convex? � What is the Legendre dual of x 2 ? Of the function equal to 0 at 0 and ∞ everywhere else? � How are derivatives of Λ and Λ ∗ related? � What is the Legendre dual of the Legendre dual of a convex function? � What’s the higher dimensional analog of rolling the tangent line? 4 18.175 Lecture 13

  5. Outline Legendre transform Large deviations 5 18.175 Lecture 13

  6. Outline Legendre transform Large deviations 6 18.175 Lecture 13

  7. Recall: moment generating functions Let X be a random variable. � � The moment generating function of X is defined by � � M ( t ) = M X ( t ) := E [ e tX ]. tx When X is discrete, can write M ( t ) = e p X ( x ). So M ( t ) � � x is a weighted average of countably many exponential functions. ∞ e tx f ( x ) dx . So When X is continuous, can write M ( t ) = � � −∞ M ( t ) is a weighted average of a continuum of exponential functions. We always have M (0) = 1. � � If b > 0 and t > 0 then � � tX ] ≥ E [ e t min { X , b } ] ≥ P { X ≥ b } e tb E [ e . If X takes both positive and negative values with positive � � probability then M ( t ) grows at least exponentially fast in | t | as | t | → ∞ . 7 18.175 Lecture 13

  8. Recall: moment generating functions for i.i.d. sums We showed that if Z = X + Y and X and Y are independent, � � then M Z ( t ) = M X ( t ) M Y ( t ) If X 1 . . . X n are i.i.d. copies of X and Z = X 1 + . . . + X n then � � what is M Z ? n . Answer: M X � � 8 18.175 Lecture 13

  9. Large deviations Consider i.i.d. random variables X i . Can we show that � � P ( S n ≥ na ) → 0 exponentially fast when a > E [ X i ]? Kind of a quantitative form of the weak law of large numbers. � � The empirical average A n is very unlikely to E away from its expected value (where “very” means with probability less than some exponentially decaying function of n ). 9 18.175 Lecture 13

  10. General large deviation principle More general framework: a large deviation principle describes � � limiting behavior as n → ∞ of family { µ n } of measures on measure space ( X , B ) in terms of a rate function I . The rate function is a lower-semicontinuous map � � I : X → [0 , ∞ ]. (The sets { x : I ( x ) ≤ a } are closed — rate function called “good” if these sets are compact.) DEFINITION: { µ n } satisfy LDP with rate function I and � � speed n if for all Γ ∈ B , 1 1 − inf I ( x ) ≤ lim inf log µ n (Γ) ≤ lim sup log µ n (Γ) ≤ − inf I ( x ) . n →∞ n n →∞ n x ∈ Γ 0 x ∈ Γ INTUITION: when “near x ” the probability density function � � − I ( x ) n for µ n is tending to zero like e , as n → ∞ . Simple case: I is continuous, Γ is closure of its interior. � � Question: How would I change if we replaced the measures � � ( λ n , · ) µ n by weighted measures e µ n ? Replace I ( x ) by I ( x ) − ( λ, x )? What is inf x I ( x ) − ( λ, x )? � � 10 18.175 Lecture 13

  11. Cramer’s theorem n 1 Let µ n be law of empirical mean A n = X j for i.i.d. � � j =1 n vectors X 1 , X 2 , . . . , X n in R d with same law as X . Define log moment generating function of X by � � ( λ, X ) Λ( λ ) = Λ X ( λ ) = log M X ( λ ) = log E e , where ( · , · ) is inner product on R d . Define Legendre transform of Λ by � � ∗ ( x ) = sup { ( λ, x ) − Λ( λ ) } . Λ λ ∈ R d CRAMER’S THEOREM: µ n satisfy LDP with convex rate � � function Λ ∗ . 11 18.175 Lecture 13

  12. Thinking about Cramer’s theorem n 1 Let µ n be law of empirical mean A n = X j . � � j =1 n CRAMER’S THEOREM: µ n satisfy LDP with convex rate � � function ∗ ( x ) = sup { ( λ, x ) − Λ( λ ) } , I ( x ) = Λ λ ∈ R d ( λ, X 1 ) where Λ( λ ) = log M ( λ ) = E e . This means that for all Γ ∈ B we have this asymptotic lower � � bound on probabilities µ n (Γ) 1 − inf I ( x ) ≤ lim inf log µ n (Γ) , n →∞ n x ∈ Γ 0 − n inf x ∈ Γ0 I ( x ) so (up to sub-exponential error) µ n (Γ) ≥ e . and this asymptotic upper bound on the probabilities µ n (Γ) � � 1 lim sup log µ n (Γ) ≤ − inf I ( x ) , n →∞ n x ∈ Γ − n inf I ( x ) which says (up to subexponential error) µ n (Γ) ≤ e . x ∈ Γ 12 18.175 Lecture 13

  13. Proving Cramer upper bound Recall that I ( x ) = Λ ∗ ( x ) = sup λ ∈ R d { ( λ, x ) − Λ( λ ) } . � � For simplicity, assume that Λ is defined for all x (which � � implies that X has moments of all orders and Λ and Λ ∗ are strictly convex, and the derivatives of Λ and Λ N are inverses of each other). It is also enough to consider the case X has mean zero, which implies that Λ(0) = 0 is a minimum of Λ, and Λ ∗ (0) = 0 is a minimum of Λ ∗ . We aim to show (up to subexponential error) that � � − n inf x ∈ Γ I ( x ) µ n (Γ) ≤ e . If Γ were singleton set { x } we could find the λ corresponding � � to x , so Λ ∗ ( x ) = ( x , λ ) − Λ( λ ). Note then that ( n λ, A n ) ( λ, S n ) n Λ( λ ) n ( λ ) = e E e = E e = M X , ( n λ, A n ) ≥ e n ( λ, x ) and also E e µ n { x } . Taking logs and dividing by n gives Λ( λ ) ≥ 1 log µ n + ( λ, x ), so that n 1 log µ n (Γ) ≤ − Λ ∗ ( x ), as desired. n General Γ: cut into finitely many pieces, bound each piece? � � 13 18.175 Lecture 13

  14. Proving Cramer lower bound Recall that I ( x ) = Λ ∗ ( x ) = sup λ ∈ R d { ( λ, x ) − Λ( λ ) } . � � − n inf x ∈ Γ0 I ( x ) We aim to show that asymptotically µ n (Γ) ≥ e . � � It’s enough to show that for each given x ∈ Γ 0 , we have that � � − n inf x ∈ Γ0 I ( x ) asymptotically µ n (Γ) ≥ e . Idea is to weight the law of X by e ( λ, x ) for some λ and � � normalize to get a new measure whose expectation is this point x . In this new measure, A n is “typically” in Γ for large Γ, so the probability is of order 1. But by how much did we have to modify the measure to make � � − n inf x ∈ Γ0 I ( x ) this typical? Not more than by factor e . 14 18.175 Lecture 13

  15. MIT OpenCourseWare http://ocw.mit.edu 18.175 Theory of Probability Spring 2014 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend