CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, - PowerPoint PPT Presentation

CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, Confidence Intervals, Linear Regression 1. About M3 2. Chernoff 3. Jensen 4. Polling 5. Confidence Intervals 6. Linear Regression

About M3 Not easy. Definitely! Should I worry? Why? Me worry? Probability takes a while to get used to. The math looks trivial, but what the ... are we computing? Be patient! Patience has its own rewards.

Some Mysteries ◮ A probability space is a set Ω with probabilities assigned to elements ... ◮ It is uniform is all the elements have the same probability ... ◮ Let Ω = { 1 , 2 , 3 , 4 } be a uniform probability space ... ◮ Say what! Never heard of that before!!! ◮ A random variable is a function X : Ω → ℜ ... ◮ Define two random variables on the uniform probability space Ω = { 1 , 2 , 3 , 4 } so that ... ◮ Let me try ”If you first get an odd number, then X = 2, if you then get an even number, then Y = − 3 .... ◮ What happened? ◮ Gee!!, these are conceptual questions! Not like the homework!! Nor the M3 review!! ◮ Meaning, “to do the homework, we did not need to understand probability space nor random variables.” Really??

Seriously Folks! You have time to get these ideas straight. If you knew it all already, you would not learn anything from this course. It is not that complicated. You will get to the bottom of this! A midterm de-briefing will take place next week. Time and place TBA on Piazza.

Sample Question Question: On uniform probability space Ω := { 1 , 2 , 3 , 4 } , define the RVs X and Y such that E [ XY ] = E [ X ] E [ Y ] even though X and Y are not independent. Recall M3 review in lecture: E [ XY ] = E [ X ] E [ Y ] if X and Y are independent, not only if. We have to define X : Ω → ℜ and Y : Ω → ℜ so that .... Let us try: We see that XY = 0, so E [ XY ] = 0 = E [ X ] E [ Y ] . Also, X , Y not independent. Note that X = 0 and Y = ... does not work because then X and Y are independent.

Herman Chernoff Herman Chernoff (born July 1, 1923, New York) is an American applied mathematician, statistician and physicist, formerly a professor at MIT and currently working at Harvard University.

Chernoff Faces Figure : This example shows Chernoff faces for lawyers’ ratings of twelve judges Chernoff faces, invented by Herman Chernoff, display multivariate data in the shape of a human face. The individual parts, such as eyes, ears, mouth and nose represent values of the variables by their shape, size, placement and orientation.

Chernoff’s Inequality Chernoff’s inequality is due to Herman Rubin, continuing the tradition started with Markov’s inequality (that is due to Chebyshev). Theorem Chernoff’s Inequality E [ e θ X ] Pr [ X ≥ a ] ≤ min . e θ a θ > 0 Proof: We use Markov’s inequality with f ( x ) = e θ x with θ > 0. We find = E [ e θ X ] Pr [ X ≥ a ] ≤ E [ f ( X )] . e θ a f ( a ) Since the inequality holds for all θ > 0, this concludes the proof.

Chernoff’s Inequality and B(n,p) Let X = B ( n , p ) . We want a bound on Pr [ X ≥ a ] . Since X = X 1 + ··· + X n with Pr [ X m = 1 ] = p = 1 − Pr [ X m = 0 ] , we have E [ e θ ( X 1 + ··· + X n ) ] = E [ e θ X 1 ×···× e θ X n ] E [ e θ X ] = � n � = [ pe θ +( 1 − p )] n . E [ e θ X 1 ] = Thus, Pr [ X ≥ a ] ≤ [ pe θ +( 1 − p )] n . e θ a We minimize the RHS over θ > 0 and we find (after some algebra ...) Pr [ X ≥ a ] ≤ Pr [ X = a ] Pr [ Y = a ] where Y = B ( n , a n ) .

Chernoff’s Inequality and B(n,p) Here is a picture;

Chernoff’s Inequality and P ( λ ) Let X = P ( λ ) . We want a bound on Pr [ X ≥ a ] . We have e θ n λ n ( λ e θ ) n n ! e − λ = ∑ = ∑ E [ e θ X ] e − λ n ! n ≥ 0 n ≥ 0 exp { λ e θ } exp {− λ } = exp { λ ( e θ − 1 ) } . = Thus, Pr [ X ≥ a ] ≤ E [ e θ X ] = exp { λ ( e θ − 1 ) − θ a } . e θ a We minimize over θ > 0 and we find (after some algebra) � a � λ e a − λ = Pr [ X = a ] Pr [ X ≥ a ] ≤ Pr [ Y = a ] where Y = P ( a ) . a

Chernoff’s Inequality and P ( λ ) Here is a picture:

Chernoff’s Inequality Chernoff’s inequality is typically used to estimate Pr [ X 1 + ··· + X n ≥ a ] n where X 1 ,..., X n are independent and have the same distribution and n ≫ 1 and a > E [ X 1 ] . We expect the average X 1 + ··· + X n to be close to the mean, so n that the desired probability is small. Chernoff’s inequality yields useful bounds. It works because E [ exp { θ ( X 1 + ··· + X n ) / n } ] = ( E [ exp { θ X 1 / n } ]) n , by independence. Thus, Chernoff’s bound is typically used for rare events. Herman Chernoff is now 92, a rare event.

Jensen’s Inequality A function g ( x ) is convex if it is above all its tangents. Consider the tangent at the point ( E [ X ] , g ( E [ X ])) shown in the figure. We have g ( X ) ≥ g ( E [ X ])+ a ( X − E [ X ]) . Taking expectation, we conclude that g ( · ) convex ⇒ E [ g ( X )] ≥ g ( E [ X ]) .

Jensen’s Inequality: Examples ◮ E [ | X | ] ≥ | E [ X ] | ◮ E [ X 4 ] ≥ E [ X ] 4 ◮ E [ e θ X ] ≥ e θ E [ X ] ◮ E [ ln ( X )] ≤ ln ( E [ X ]) ◮ E [ max { X 2 , 1 + X } ] ≥ max { E [ X ] 2 , 1 + E [ X ] }

Polling: Problem Here is a central question about polling. Setup: Assume people vote democrat with probability p . We poll n people. Let A n be the fraction of those who vote democrat. Question: How large should n be so that Pr [ | A n − p | ≥ 0 . 1 ] ≤ 0 . 05 ?

Polling: Analysis Recall the problem: Find n so that Pr [ | A n − p | ≥ 0 . 1 ] ≤ 0 . 05 ? Approach: Chebyshev! Recall Chebyshev’s inequality: Pr [ | X − E [ X ] | ≥ a ] ≤ var [ X ] . a 2 Here, X = A n = Y / n where Y is the number of people out of n who vote democrat. Thus, Y = B ( n , p ) . Hence, E [ Y ] = np and var [ Y ] = np ( 1 − p ) . Consequently, E [ X ] = p and var [ X ] = p ( 1 − p ) / n . This gives Pr [ | A n − p | ≥ 0 . 1 ] ≤ p ( 1 − p ) n ( 0 . 1 ) 2 = 100 p ( 1 − p ) . n However, we do not know p . What should we do? We know that p ( 1 − p ) ≤ 1 / 4. Hence, Pr [ | A n − p | ≥ 0 . 1 ] ≤ 25 n . Thus, if n = 500, we find Pr [ | A n − p | ≥ 0 . 1 ] ≤ 0 . 05 .

Estimation Common problem: Estimate a mean value. Examples: height, weight, lifetime, arrival rate, job duration, .... Setup: X 1 , X 2 ,... are independent RVs with E [ X n ] = µ and var [ X n ] = σ 2 . We observe { X 1 ,..., X n } and want to estimate µ . Approach: Let A n = ( X 1 + ··· + X n ) / n be the average (sample mean). Then, E [ A n ] = µ and var [ A n ] = σ 2 n . Using Chebyshev: Pr [ | A n − µ | ≥ a ] ≤ σ 2 na 2 . Thus, Pr [ | A n − µ | ≥ a ] → 0 as n → ∞ . This is the WLLN, as we know.

Confidence Interval How much confidence in our estimate? Recall the setup X m independent with mean µ and variance σ 2 . Chebyshev told us Pr [ | A n − µ | ≥ a ] ≤ σ 2 na 2 . √ This probability is at most δ if σ 2 ≤ na 2 δ , i.e., a ≥ σ / n δ . Thus σ Pr [ | A n − µ | ≥ √ ] ≤ δ . n δ Equivalently, σ σ √ √ Pr [ µ ∈ [ A n − , A n + ]] ≥ 1 − δ . n δ n δ We say that σ σ [ A n − √ , A n + √ ] is a ( 1 − δ ) − confidence interval for µ . n δ n δ

Confidence Interval, continued We just found out that σ σ [ A n − √ , A n + √ ] is a ( 1 − δ ) − confidence interval for µ . n δ n δ For δ = 0 . 05, this shows that [ A n − 4 . 5 σ √ n , A n + 4 . 5 σ √ n ] is a 95 % − confidence interval for µ . A more refined analysis, using the Central Limit Theorem, allows to replace 4 . 5 by 2.

CI with Unknown Variance If σ is not known, we replace it by the estimate s n : n n = 1 s 2 ( X m − A n ) 2 . ∑ n m = 1 Thus, we expect that, for n large enough (e.g., larger than 20), [ A n − 2 s n √ n , A n + 2 s n √ n ] is a 95 % − confidence interval for µ . Does this work well? The theory says we have to be careful. What happens is that the error in estimating σ may throw us off. What is known is that if the X m have a nice distribution (e.g., Gaussian), and if n is not too small (say ≥ 15), then this is fine.

CI for Pr [ H ] CIs using the upper bound σ ≤ 0 . 5 or using the estimated s n .

Linear Regression An example: Random experiment: select people at random and plot their (age, height). You get ( X n , Y n ) for n = 1 ,..., N where X n = age and Y n = height for person n . The linear regression is a guess a + bX n for Y n that is close to the true values, in some sense to be made precise.

Linear Regression Another example:

LR: Two Viewpoints Linear regression: a + bX n is a guess for Y n . There are two ways to look at the linear regression: Bayesian and non-Bayesian. ◮ Bayesian Viewpoint: ◮ We have a prior: Pr [ X = x , Y = y ] , x = ..., y = ... ; ◮ We choose ( a , b ) to minimize E [( Y − a − bX ) 2 ] . ◮ Non-Bayesian Viewpoint: ◮ We have no prior, but samples: { ( X n , Y n ) , n = 1 ,..., N } ; ◮ We choose ( a , b ) to minimize ∑ N n = 1 ( Y n − a − bX n ) 2 . ◮ We hope Y k ≈ a + bX k for future samples.

Summary Chernoff, Jensen, Polling, Confidence Intervals, Linear Regression ◮ Chernoff: Pr [ X ≥ a ] ≤ min θ > 0 E [ e θ ( X − a ) ] . ◮ Jensen: E [ c ( X )] ≥ c ( E [ X ]) for c ( · ) convex. ◮ Polling: How many people to poll? ◮ Confidence Interval: Sample mean ± 2 σ / √ n . ◮ Linear Regression: Y ≈ a + bX . B or not B?

CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, - PowerPoint PPT Presentation

CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, Confidence Intervals, Linear Regression 1. About M3 2. Chernoff 3. Jensen 4. Polling 5. Confidence Intervals 6. Linear Regression About M3 Not easy. Definitely! Should I worry?

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

CS70: Jean Walrand: Lecture 36. Continuous Probability 3 CS70: Jean Walrand: Lecture 36.

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

CS70: Jean Walrand: Lecture 24. Changing your mind? CS70: Jean Walrand: Lecture 24. Changing

CS70: Jean Walrand: Lecture 22. How to model uncertainty? CS70: Jean Walrand: Lecture 22. How to

CS70: Jean Walrand: Lecture 37. Statistics are Confusing; Whats next CS70: Jean Walrand:

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

CS70: Jean Walrand: Lecture 23. Conditional Probability: Review Conditional Probability: Pictures

CS70: Jean Walrand: Lecture 37. Gaussian RVs and CLT 1. Review: Continuous Probability 2. Normal

CS70: Jean Walrand: Lecture 26. Expectation; Geometric & Poisson 1. Random Variables: Brief

CS70: Jean Walrand: Lecture 22. Conditional Probability, Bayes Rule 1. Review 2. Conditional

CS70: Jean Walrand: Lecture 21. Events, Conditional Probability 1. Probability Basics Review 2.

CS70: Jean Walrand: Lecture 25. Balls and Coupons & Random Variables Coupons Random

CS70: Jean Walrand: Lecture 29. Confidence? Confidence? Confidence is essential is many

CS70: Jean Walrand: Lecture 20. Modeling Uncertainty: Probability Space 1. Key Points 2. Random

Supply Chain Integration For Integrity Policy and architecture for built-in supply chain

Dr. Barry McAuley, CitA and TU Dublin BIM in Ireland Update 30 th January 20199 BIM IM in in Ir

Keep Calm and Carry On? Policy, Psychology and the Effects of Economic War ! ! British

Polynomials on F 2 m with good resistance to cryptanalysis Y. Aubry 1 G. McGuire 2 . Rodier 1 F 1

1/30/2019

Optimizing User Adoption Speaker: Laurence Leong February, 2019 Laurence Leong Vice President

Pai ain n th the e City ty Col olorfull orfully: y: Lo Locatio ation n Visualizat

? A New Frontier Martin Kay Stanford University and The University of the Saarland Martin Kay

CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, - PowerPoint PPT Presentation

CS70: Jean Walrand: Lecture 32. Chernoff, Jensen, Polling, Confidence Intervals, Linear Regression 1. About M3 2. Chernoff 3. Jensen 4. Polling 5. Confidence Intervals 6. Linear Regression About M3 Not easy. Definitely! Should I worry?

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

CS70: Jean Walrand: Lecture 36. Continuous Probability 3 CS70: Jean Walrand: Lecture 36.

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34.

CS70: Jean Walrand: Lecture 24. Changing your mind? CS70: Jean Walrand: Lecture 24. Changing

CS70: Jean Walrand: Lecture 22. How to model uncertainty? CS70: Jean Walrand: Lecture 22. How to

CS70: Jean Walrand: Lecture 37. Statistics are Confusing; Whats next CS70: Jean Walrand:

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This

CS70: Jean Walrand: Lecture 23. Bayes Rule, Independence, Mutual Independence 1. Conditional

CS70: Jean Walrand: Lecture 23. Conditional Probability: Review Conditional Probability: Pictures

CS70: Jean Walrand: Lecture 37. Gaussian RVs and CLT 1. Review: Continuous Probability 2. Normal

CS70: Jean Walrand: Lecture 26. Expectation; Geometric &amp; Poisson 1. Random Variables: Brief

CS70: Jean Walrand: Lecture 22. Conditional Probability, Bayes Rule 1. Review 2. Conditional

CS70: Jean Walrand: Lecture 21. Events, Conditional Probability 1. Probability Basics Review 2.

CS70: Jean Walrand: Lecture 25. Balls and Coupons &amp; Random Variables Coupons Random

CS70: Jean Walrand: Lecture 29. Confidence? Confidence? Confidence is essential is many

CS70: Jean Walrand: Lecture 20. Modeling Uncertainty: Probability Space 1. Key Points 2. Random

Supply Chain Integration For Integrity Policy and architecture for built-in supply chain

Dr. Barry McAuley, CitA and TU Dublin BIM in Ireland Update 30 th January 20199 BIM IM in in Ir

Keep Calm and Carry On? Policy, Psychology and the Effects of Economic War ! ! British

Polynomials on F 2 m with good resistance to cryptanalysis Y. Aubry 1 G. McGuire 2 . Rodier 1 F 1

1/30/2019

Optimizing User Adoption Speaker: Laurence Leong February, 2019 Laurence Leong Vice President

Pai ain n th the e City ty Col olorfull orfully: y: Lo Locatio ation n Visualizat

? A New Frontier Martin Kay Stanford University and The University of the Saarland Martin Kay

CS70: Jean Walrand: Lecture 26. Expectation; Geometric & Poisson 1. Random Variables: Brief

CS70: Jean Walrand: Lecture 25. Balls and Coupons & Random Variables Coupons Random