1 One- Sided Chebyshevs Inequality Of the Midterm What Say You - PDF document

Markov’s Inequality Inequality, Probability, and Joviality • In many cases, we don’t know the true form of a • Say X is a non-negative random variable probability distribution E [ X ]    for all P ( X a ) , a 0  E.g., Midterm scores a  But, we know the mean • Proof:  May also have other measures/properties  I = 1 if X ≥ a , 0 otherwise o Variance X   Since  X 0 , I o Non-negativity a  Taking expectations: o Etc.   X E [ X ]      Inequalities and bounds still allow us to say something E [ I ] P ( X a ) E     a a about the probability distribution in such cases o May be imprecise compared to knowing true distribution! Andrey Andreyevich Markov Markov and the Midterm • Statistics from last quarter’s CS109 midterm • Andrey Andreyevich Markov (1856-1922) was a Russian mathematician  X = midterm score  Using sample mean X = 78.1  E[X]  What is P(X ≥ 91)? [ ] 78 . 1 E X     P ( X 91 ) 0 . 8582 91 91  Markov bound:  85.82% of class scored 91 or greater  Markov’s Inequality is named after him  In fact, 34.44% of class scored 91 or greater  He also invented Markov Chains… o Markov inequality can be a very loose bound o …which are the basis for Google’s PageRank algorithm  His facial hair inspires fear in Charlie Sheen o But, it made no assumption at all about form of distribution! Chebyshev’s Inequality Pafnuty Chebyshev • X is a random variable with E[X] = m , Var(X) = s 2 • Pafnuty Lvovich Chebyshev (1821-1894) was also a Russian mathematician s 2  m    for all P ( X k ) , k 0 2 k • Proof:  Since (X – m ) 2 is non-negative random variable, apply Markov’s Inequality with a = k 2  m s 2 2 E [( X ) ]  m    2 2 P (( X ) k )  Chebyshev’s Inequality is named after him 2 2 k k  Note that: (X – m ) 2 ≥ k 2  |X – m | ≥ k , yielding: o But actually formulated by his colleague Irénée-Jules Bienaymé  He was Markov’s doctoral advisor s 2  m   P ( X k ) o And sometimes credited with first deriving Markov’s Inequality 2 k  There is a crater on the moon named in his honor 1

One- Sided Chebyshev’s Inequality Of the Midterm What Say You Chebyshev? • Statistics from last quarter’s CS109 midterm • X is a random variable with E[X] = 0, Var(X) = s 2 s 2  X = midterm score    for any P ( X a ) , a 0 s  2 2 a  Using sample mean X = 78.1  E[X]  Equivalently, when E[Y] = m and Var(Y) = s 2 :  Using sample variance S 2 = (24.5) 2 = 600.25  s 2 s 2  What is P(| X – 78.1 | ≥ 30)?     for any P ( Y E [ Y ] a ) , a 0 s  2 2 a s 2 600 . 25      P ( X E [ X ] 30 ) 0 . 6669 s 2 2 ( 30 ) 900     for any P ( Y E [ Y ] a ) , a 0 s  2 2          a P ( X E [ X ] 30 ) 1 P ( X E [ X ] 30 ) 1 0 . 6669 0 . 3331  Follows directly by setting X = Y – E[Y], noting E[X] = 0  Chebyshev bound:  66.69% scored ≥ 108.1 or  48.1  In fact, 21.85% of class scored ≥ 108.1 or  48.1 o Chebyshev’s inequality is really a theoretical tool Comments on Midterm, One-Sided One? Chernoff Bound • Statistics from last quarter’s CS109 midterm • Say we have MGF, M( t ), for a random variable X  X = midterm score  Chernoff bounds:      Using sample mean X = 78.1  E[X] ta for all P ( X a ) e M ( t ), t 0      Using sample variance S 2 = (24.5) 2 = 600.25  s 2 ta for all P ( X a ) e M ( t ), t 0  Bounds hold for t  0, so use t that minimizes e - ta M( t )  What is P(X ≥ 103.1)? 600 . 25    2  • Proof: P ( X 78 . 1 25 ) 0 . 4899  600 . 25 ( 25 )  X has MGF: M( t ) = E[e tX ]  One-sided Chebyshev bound:  48.99% scored ≥ 103.1  Note P(X ≥ a ) = P(e tX ≥ e ta ), use Markov’s inequality:  In fact, 13.26% of class scored ≥ 103.1 tX E [ e ]          tX ta ta tX ta P ( X a ) P ( e e ) e E [ e ] e M ( t ), for all t 0 ta e 78 . 1  Using Markov’s inequality:    P ( X 103 . 1 ) 0 . 7575  Similarity for P(X  a) when t < 0 103 . 1 Chernoff’s Feeling (Unit) Normal Herman Chernoff • Herman Chernoff (1923-) is an American • Z is standard normal random variable: Z ~ N(0, 1) mathematician and statistician  2 t / 2  Moment generating function: M ( t ) e Z  Chernoff bounds for P(Z ≥ a )    2  2   ta t / 2 t / 2 ta for all P ( Z a ) e e e , t 0  To minimize bound, minimize: t 2 /2 – ta o Differentiate w.r.t. t , and set to 0: t – a = 0  t = a    2   a / 2 for all P ( Z a ) e , t a 0  Chernoff Bound is named after him  Can proceed similarly for t = a < 0 to obtain: o And it actually was derived by him!    2   a / 2 P ( Z a ) e , for all t a 0  He is Professor Emeritus of Applied Mathematics at z 1 MIT and of Statistics at Harvard University         2  Compare to: x / 2 P ( Z z ) 1 P ( Z z ) 1 e dx  o I do not know if he is a fan of Charlie Sheen 2   2

Chernoff’s Poisson Pill Jensen’s Inequality • X is Poisson random variable: X ~ Poi( l ) • If f ( x ) is a convex function then E[ f ( x )] ≥ f (E[X]) l   t ( e 1 )  f ( x ) is convex if f ’’ ( x ) ≥ 0 for all x  Moment generating function: M ( t ) e X  Chernoff bounds for P(X ≥ i )  Intuition: Convex = “bowl”. E.g.: f ( x ) = x 2 , f ( x ) = e x   l t    l t    ( e 1 ) it ( e 1 ) it P ( X i ) e e e , for all t 0  To minimize bound, minimize: l ( e t – 1) – it o Differentiate w.r.t. t , and set to 0: l e t – i = 0  e t = i / l    i  l  i  l  i  if g ( x ) = - f ( x ) is convex, then f ( x ) is concave i e   l l    l   l l        ( i / 1 ) i for all P ( X i ) e e e e , i/ 1 l        Proof outline: Taylor series of f ( x ) about m . Be happy. i i l i  Note: E[ f ( x )] = f (E[X]) only holds when f (x) is a line    l  Compare to: P ( X i ) e i ! o That is when: f ’’ ( x ) = 0 for all x Johan Jensen A Brief Digression on Utility Theory • Utility U(x) is “value” you derive from x • Johan Ludwig William Valdemar Jensen (1859- 1925) was a Danish mathematician 0.5 $20,000 yes $0 0.5 Play? no $10,000 0.5 U($20,000) yes U($0)  He derived Jensen’s inequality 0.5 Play?  He was president of the Danish Mathematical Society no U($10,000) from 1892 to 1903  Can be monetary, but often includes intangibles  He has more names than Charlie Sheen o E.g., quality of life, life expectancy, personal beliefs, etc. Jensen’s Investment Advice Utility Curves • Example: risk-taking investor, with two choices:  Choice 1: Invest money to get return X where E[X] = m  Choice 2: Invest money to get return m (probability 1) Utility • Want to maximize utility: u (R), where R is return  if u (X) convex then E[ u (X)] ≥ u ( m ), so choice 1 better  If u (X) concave then E[ u (X)]  u ( m ) so choice 2 better  Convex u  “risk preferring”, concave u  “risk averse” Dollars • Utility curve determines your “risk preference”  Can be different in different parts of the curve  We’ll talk more about this near the end of the quarter 3

1 One- Sided Chebyshevs Inequality Of the Midterm What Say You - PDF document

Markovs Inequality Inequality, Probability, and Joviality In many cases, we dont know the true form of a Say X is a non-negative random variable probability distribution E [ X ] for all P ( X a ) , a 0

Machine Code Sean Barker 1 From C to Executable Code text C program ( p1.c p2.c ) Compiler

Fast Constant-Time GCD Computation and Modular Inversion Daniel J. Bernstein 1,2 Bo-Yin Yang 3 1

Computing zeta functions of nondegenerate hypersurfaces in toric varieties Edgar Costa (Dartmouth

Outline Outline Conditional Distribution and Density Conditional Distribution and

Logic as a Tool Chapter 4: Deductive Reasoning in First-Order Logic 4.3 Natural Deduction for

Process Models and Student Skills Paolo Ciancarini (with C. Dos and S. Zuppiroli) Department of

Slides Set 11 (part a): Sampling Techniques for Probabilistic and Deterministic Graphical models

Todays goals eXtreme Programming What is XP? When

Bayesian Networks Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Topological Sorting Union-Find Wheeler

Data Visualization Steve Marschner Cornell CS 322 unless noted, images are from our

A Four-Terabit Single-Stage A Four-Terabit Single-Stage Packet Switch with Large Packet Switch

Fast Fourier Transform Fourier Series & Transform Summary Discrete-time windowing X [

Introduction to Agile Software Development Word Association Write down the first word or phrase

Day 2: Linear Regression and Statistical Learning Lucas Leemann Essex Summer School Introduction

Deep Agile Blending Scrum and Extreme Programming Jeff Sutherland Ron Jeffries Separation of

Fun with Parameterized Complexity Theoretical Computer Science @NCSU 2014 Felix Reidl &

Agile Development and Project Management CogSci 121 - HCI Programming Studio Adapted from

Equivariant K -theory and tangent spaces to Schubert varieties William Graham and Victor Kreiman

ACK These slides are collected from many authors along with a few of mine. Extreme/Agile

Lecture 13 Midterm Review Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Midterm

Prr r r

L ECTURE 28: T ASK A LLOCATION 1 T EACHER : G IANNI A. D I C ARO (C ENTRALIZED ) M ODELS OF T ASK A

L ECTURE 22: T ASK A LLOCATION 1 I NSTRUCTOR : G IANNI A. D I C ARO (C ENTRALIZED ) M ODELS OF T

1 One- Sided Chebyshevs Inequality Of the Midterm What Say You - PDF document

Markovs Inequality Inequality, Probability, and Joviality In many cases, we dont know the true form of a Say X is a non-negative random variable probability distribution E [ X ] for all P ( X a ) , a 0

Machine Code Sean Barker 1 From C to Executable Code text C program ( p1.c p2.c ) Compiler

Fast Constant-Time GCD Computation and Modular Inversion Daniel J. Bernstein 1,2 Bo-Yin Yang 3 1

Computing zeta functions of nondegenerate hypersurfaces in toric varieties Edgar Costa (Dartmouth

Outline Outline Conditional Distribution and Density Conditional Distribution and

Logic as a Tool Chapter 4: Deductive Reasoning in First-Order Logic 4.3 Natural Deduction for

Process Models and Student Skills Paolo Ciancarini (with C. Dos and S. Zuppiroli) Department of

Slides Set 11 (part a): Sampling Techniques for Probabilistic and Deterministic Graphical models

Todays goals eXtreme Programming What is XP? When

Bayesian Networks Part 1 Yingyu Liang Computer Sciences 760 Fall 2017

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Topological Sorting Union-Find Wheeler

Data Visualization Steve Marschner Cornell CS 322 unless noted, images are from our

A Four-Terabit Single-Stage A Four-Terabit Single-Stage Packet Switch with Large Packet Switch

Fast Fourier Transform Fourier Series &amp; Transform Summary Discrete-time windowing X [

Introduction to Agile Software Development Word Association Write down the first word or phrase

Day 2: Linear Regression and Statistical Learning Lucas Leemann Essex Summer School Introduction

Deep Agile Blending Scrum and Extreme Programming Jeff Sutherland Ron Jeffries Separation of

Fun with Parameterized Complexity Theoretical Computer Science @NCSU 2014 Felix Reidl &amp;

Agile Development and Project Management CogSci 121 - HCI Programming Studio Adapted from

Equivariant K -theory and tangent spaces to Schubert varieties William Graham and Victor Kreiman

ACK These slides are collected from many authors along with a few of mine. Extreme/Agile

Lecture 13 Midterm Review Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Midterm

Prr r r

L ECTURE 28: T ASK A LLOCATION 1 T EACHER : G IANNI A. D I C ARO (C ENTRALIZED ) M ODELS OF T ASK A

L ECTURE 22: T ASK A LLOCATION 1 I NSTRUCTOR : G IANNI A. D I C ARO (C ENTRALIZED ) M ODELS OF T

Fast Fourier Transform Fourier Series & Transform Summary Discrete-time windowing X [

Fun with Parameterized Complexity Theoretical Computer Science @NCSU 2014 Felix Reidl &