18 175 lecture 8 weak laws and moment generating
play

18.175: Lecture 8 Weak laws and moment-generating/characteristic - PowerPoint PPT Presentation

18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 1 18.175 Lecture 8 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic


  1. 18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 1 18.175 Lecture 8

  2. Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic function approach 2 18.175 Lecture 8

  3. Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic function approach 3 18.175 Lecture 8

  4. Moment generating functions � Let X be a random variable. � The moment generating function of X is defined by M ( t ) = M X ( t ) := E [ e tX ]. tx � When X is discrete, can write M ( t ) = e p X ( x ). So M ( t ) x is a weighted average of countably many exponential functions. ∞ e tx f ( x ) dx . So � When X is continuous, can write M ( t ) = −∞ M ( t ) is a weighted average of a continuum of exponential functions. � We always have M (0) = 1. � If b > 0 and t > 0 then tX ] ≥ E [ e t min { X , b } ] ≥ P { X ≥ b } e tb E [ e . � If X takes both positive and negative values with positive probability then M ( t ) grows at least exponentially fast in | t | as | t | → ∞ . 4 18.175 Lecture 8

  5. Moment generating functions actually generate moments Let X be a random variable and M ( t ) = E [ e tX ]. � � Then M " ( t ) = d d E [ e tX ] = E tX ) = E [ Xe tX ]. ( e � � dt dt in particular, M " (0) = E [ X ]. � � Also M "" ( t ) = d M " ( t ) = d 2 e tX ]. E [ Xe tX ] = E [ X � � dt dt So M "" (0) = E [ X 2 ]. Same argument gives that n th derivative � � n ]. of M at zero is E [ X Interesting: knowing all of the derivatives of M at a single � � k ] for all integer k ≥ 0. point tells you the moments E [ X Another way to think of this: write � � = 1 + tX + t 2 X 2 + t 3 X 3 tX + . . . . e 2! 3! Taking expectations gives � � E [ e tX ] = 1 + tm 1 + t 2 + t 3 m 2 m 3 + . . . , where m k is the k th 2! 3! moment. The k th derivative at zero is m k . 5 18.175 Lecture 8

  6. Moment generating functions for independent sums Let X and Y be independent random variables and � � Z = X + Y . tX ] Write the moment generating functions as M X ( t ) = E [ e � � and M Y ( t ) = E [ e tY ] and M Z ( t ) = E [ e tZ ]. If you knew M X and M Y , could you compute M Z ? � � By independence, M Z ( t ) = E [ e t ( X + Y ) ] = E [ e tX e tY ] = � � tX ] E [ e tY ] = M X ( t ) M Y ( t ) for all t . E [ e In other words, adding independent random variables � � corresponds to multiplying moment generating functions. 6 18.175 Lecture 8

  7. Moment generating functions for sums of i.i.d. random variables We showed that if Z = X + Y and X and Y are independent, � � then M Z ( t ) = M X ( t ) M Y ( t ) If X 1 . . . X n are i.i.d. copies of X and Z = X 1 + . . . + X n then � � what is M Z ? n . Follows by repeatedly applying formula above. Answer: M X � � This a big reason for studying moment generating functions. � � It helps us understand what happens when we sum up a lot of independent copies of the same random variable. 7 18.175 Lecture 8

  8. Other observations If Z = aX then can I use M X to determine M Z ? � � Answer: Yes. M Z ( t ) = E [ e tZ ] = E [ e taX ] = M X ( at ). � � If Z = X + b then can I use M X to determine M Z ? � � Answer: Yes. M Z ( t ) = E [ e tZ ] = E [ e tX + bt ] = e bt M X ( t ). � � Latter answer is the special case of M Z ( t ) = M X ( t ) M Y ( t ) � � where Y is the constant random variable b . 8 18.175 Lecture 8

  9. Existence issues Seems that unless f X ( x ) decays superexponentially as x tends � � to infinity, we won’t have M X ( t ) defined for all t . 1 What is M X if X is standard Cauchy, so that f X ( x ) = . � � π (1+ x 2 ) Answer: M X (0) = 1 (as is true for any X ) but otherwise � � M X ( t ) is infinite for all t = 0. Informal statement: moment generating functions are not � � defined for distributions with fat tails. 9 18.175 Lecture 8

  10. Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic function approach 10 18.175 Lecture 8

  11. Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic function approach 11 18.175 Lecture 8

  12. Markov’s and Chebyshev’s inequalities Markov’s inequality: Let X be non-negative random � � E [ X ] variable. Fix a > 0. Then P { X ≥ a } ≤ . a Proof: Consider a random variable Y defined by � � f a X ≥ a Y = . Since X ≥ Y with probability one, it 0 X < a follows that E [ X ] ≥ E [ Y ] = aP { X ≥ a } . Divide both sides by a to get Markov’s inequality. Chebyshev’s inequality: If X has finite mean µ , variance σ 2 , � � and k > 0 then σ 2 P {| X − µ | ≥ k } ≤ . k 2 Proof: Note that ( X − µ ) 2 is a non-negative random variable � � and P {| X − µ | ≥ k } = P { ( X − µ ) 2 ≥ k 2 } . Now apply Markov’s inequality with a = k 2 . 12 18.175 Lecture 8

  13. Markov and Chebyshev: rough idea Markov’s inequality: Let X be non-negative random variable � � with finite mean. Fix a constant a > 0. Then E [ X ] P { X ≥ a } ≤ . a Chebyshev’s inequality: If X has finite mean µ , variance σ 2 , � � and k > 0 then σ 2 P {| X − µ | ≥ k } ≤ . k 2 Inequalities allow us to deduce limited information about a � � distribution when we know only the mean (Markov) or the mean and variance (Chebyshev). Markov: if E [ X ] is small, then it is not too likely that X is � � large. Chebyshev: if σ 2 = Var [ X ] is small, then it is not too likely � � that X is far from its mean. 13 18.175 Lecture 8

  14. Statement of weak law of large numbers Suppose X i are i.i.d. random variables with mean µ . � � X 1 + X 2 + ... + X n Then the value A n := is called the empirical � � n average of the first n trials. We’d guess that when n is large, A n is typically close to µ . � � Indeed, weak law of large numbers states that for all E > 0 � � we have lim n →∞ P {| A n − µ | > E } = 0. Example: as n tends to infinity, the probability of seeing more � � than . 50001 n heads in n fair coin tosses tends to zero. 14 18.175 Lecture 8

  15. Proof of weak law of large numbers in finite variance case As above, let X i be i.i.d. random variables with mean µ and � � X 1 + X 2 + ... + X n write A n := . n By additivity of expectation, E [ A n ] = µ . � � Similarly, Var [ A n ] = n σ 2 = σ 2 / n . � � n 2 σ 2 Var [ A n ] By Chebyshev P | A n − µ | ≥ E ≤ = 2 . � � : 2 n : No matter how small E is, RHS will tend to zero as n gets � � large. 15 18.175 Lecture 8

  16. L 2 weak law of large numbers Say X i and X j are uncorrelated if E ( X i X j ) = EX i EX j . � � Chebyshev/Markov argument works whenever variables are � � uncorrelated (does not actually require independence). 16 18.175 Lecture 8

  17. What else can you do with just variance bounds? Having “almost uncorrelated” X i is sometimes enough: just � � need variance of A n to go to zero. Toss α n bins into n balls. How many bins are filled? � � When n is large, the number of balls in the first bin is � � approximately a Poisson random variable with expectation α . Probability first bin contains no ball is (1 − 1 / n ) α n ≈ e − α . � � We can explicitly compute variance of the number of bins � � with no balls. Allows us to show that fraction of bins with no balls concentrates about its expectation, which is e − α . 17 18.175 Lecture 8

  18. How do you extend to random variables without variance? Assume X n are i.i.d. non-negative instances of random � � variable X with finite mean. Can one prove law of large numbers for these? Try truncating. Fix large N and write A = X 1 X > N and � � B = X 1 X ≤ N so that X = A + B . Choose N so that EB is very small. Law of large numbers holds for A . 18 18.175 Lecture 8

  19. Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic function approach 19 18.175 Lecture 8

  20. Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach Weak law of large numbers: characteristic function approach 20 18.175 Lecture 8

  21. Extent of weak law Question: does the weak law of large numbers apply no � � matter what the probability distribution for X is? X 1 + X 2 + ... + X n Is it always the case that if we define A n := then � � n A n is typically close to some fixed value when n is large? What if X is Cauchy? � � In this strange and delightful case A n actually has the same � � probability distribution as X . In particular, the A n are not tightly concentrated around any � � particular value even when n is very large. But weak law holds as long as E [ | X | ] is finite, so that µ is � � well defined. One standard proof uses characteristic functions. � � 21 18.175 Lecture 8

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend