1
play

1 Rolling Dice Hyper for the Hypergeometric Roll two 6-sided dice - PDF document

Viva La Correlacin! Fun with Indicator Variables Say X and Y are arbitrary random variables Let I A and I B be indicators for events A and B Correlation of X and Y, denoted r (X, Y) : 1 if A occurs 1 if B occurs


  1. Viva La Correlación! Fun with Indicator Variables • Say X and Y are arbitrary random variables • Let I A and I B be indicators for events A and B    Correlation of X and Y, denoted r (X, Y) : 1 if A occurs 1 if B occurs     I A I B  0 otherwise  0 otherwise Cov ( X , Y ) r  ( , ) X Y  E[ I A ] = P(A), E[ I B ] = P(B), E[ I A I B ] = P(AB) Var(X)Var( Y) = E[ I A I B ] – E[ I A ] E[ I B ]  Note: -1  r (X, Y)  1  Cov( I A , I B ) = P(AB) – P(A)P(B)  Correlation measures linearity between X and Y = P(A | B)P(B) – P(A)P(B)  r (X, Y) = 1  Y = aX + b where a = s y / s x = P(B)[P(A | B) – P(A)]  r (X, Y) = -1  Y = aX + b where a = - s y / s x  Cov( I A , I B ) determined by P(A | B) – P(A)  r (X, Y) = 0  absence of linear relationship  P(A | B) > P(A)  r ( I A , I B ) > 0 o But, X and Y can still be related in some other way!  If r (X, Y) = 0, we say X and Y are “uncorrelated”  P(A | B) = P(A)  r ( I A , I B ) = 0 (and Cov( I A , I B ) = 0)  P(A | B) < P(A)  r ( I A , I B ) < 0 o Note: Independence implies uncorrelated, but not vice versa! Can’t Get Enough of that Multinomial Covariance and the Multinomial • Multinomial distribution • Computing Cov( X i , X j )  n independent trials of experiment performed  Indicator I i ( k ) = 1 if trial k has outcome i , 0 otherwise n n    Each trials results in one of m outcomes, with    m [ ( )] ( ) ( ) E I k p X I k X I k  p  1 i i i i j j respective probabilities: p 1 , p 2 , …, p m where   1 1 i k k i  1 n n   X i = number of trials with outcome i  Cov ( , ) Cov ( ( ), ( ))  X X I b I a i j i j   n   a 1 b 1       c c c ( , ,..., ) ...  When a  b , trial a and b independent: P X c X c X c   p 1 p 2 p m  Cov ( ( ), ( )) 0 I b I a 1 1 2 2 m m 1 2 m  c , c ,..., c  i j 1 2 m    When a = b : Cov ( I ( b ), I ( a )) E [ I ( a ) I ( a )] E [ I ( a )] E [ I ( a )]  E.g., Rolling 6-sided die multiple times and counting how i j i j i j many of each value {1, 2, 3, 4, 5, 6} we get   Since trial a cannot have outcome i and j : E [ I ( a ) I ( a )] 0 i j n n  Would expect that X i are negatively correlated      Cov ( X , X ) Cov ( I ( b ), I ( a )) ( E [ I ( a )] E [ I ( a )]) i j i j i j  Let’s see... when i  j , what is Cov( X i , X j )?    a b 1 a 1 n       X i and X j negatively correlated ( ) p p np p i j i j  1 a Multinomials All Around Conditional Expectation • Multinomial distributions: • X and Y are jointly discrete random variables  Count of strings hashed across buckets in hash table  Recall conditional PMF of X given Y = y:  Number of server requests across machines in cluster p ( x , y )     X , Y p ( x | y ) P ( X x | Y y )  Distribution of words/tokens in an email X | Y ( ) p y Y  Etc. • Define conditional expectation of X given Y = y: • When m (# outcomes) is large, p i is small        [ | ] ( | ) ( | ) E X Y y x P X x Y y x p x y  For equally likely outcomes: p i = 1/m X | Y x x n     Cov ( X , X ) np p • Analogously, jointly continuous random variables: i j i j 2 m   Large m  X i and X j very mildly negatively correlated ( , ) f x y     X , Y ( | ) [ | ] ( | ) f x y E X Y y x f x y dx | X | Y X Y  Poisson paradigm still applicable f ( y )   Y 1

  2. Rolling Dice Hyper for the Hypergeometric • Roll two 6-sided dice D 1 and D 2 • X and Y are independent random variables  X = value of D 1 + D 2 Y = value of D 2  X ~ Bin(n, p) Y ~ Bin(n, p)  What is E[X | X + Y = m], where m ≤ n?  What is E[X | Y = 6]?   Start by computing P(X = k | X + Y = m):     [ | 6 ] ( | 6 ) E X Y x P X x Y          ( , ) ( , ) ( ) ( ) x P X k X Y m P X k Y m k P X k P Y m k       P ( X k | X Y m )         P ( X Y m ) P ( X Y m ) P ( X Y m ) 1 57           ( 7 8 9 10 11 12 ) 9 . 5           6 6 n n n n           k ( 1 ) n k   m k ( 1 ) n ( m k )     p p p p           k m k k m k       2 n 2 n     m 2 n m   p ( 1 p )  Intuitively makes sense: 6 + E[value of D 1 ] = 6 + 3.5     m m  Hypergeometric: (X | X + Y = m) ~ HypG( m , 2n, n )  E[X | X + Y = m] = nm /2 n = m /2 # total “X” total successes trials trials Properties of Conditional Expectation Expectations of Conditional Expectations • X and Y are jointly distributed random variables • Define g (Y) = E[X | Y]    g (Y) is a random variable    E [ g ( X ) | Y y ] g ( x ) p ( x | y ) or g ( x ) f ( x | y ) dx X | Y X | Y x   For any Y = y, g (Y) = E[X | Y = y] - o This is just function of Y, since we sum over all values of X • Expectation of conditional sum:  What is E[E[X | Y]] = E[ g (Y)]? (Consider discrete case)   n n          [ [ | ]] [ | ] ( ) E  X | Y y  E [ X | Y y ] E E X Y E X Y y P Y y i i     1 1 y i i       [ ( | ) ] ( ) x P X x Y y P Y y y x          xP ( X x , Y y ) x P ( X x , Y y ) y x x y     ( ) [ ] xP X x E X (Same for continuous) x Analyzing Recursive Code Random Number of Random Variables int Recurse() • Say you have a web site: PimentoLoaf.com { int x = randomInt(1, 3); // Equally likely values  X = Number of people/day visit your site. X ~ N(50, 25)  Y i = Number of minutes spent by visitor i . Y i ~ Poi(8) if (x == 1) return 3; else if (x == 2) return (5 + Recurse());  X and all Y i are independent else return (7 + Recurse()); X }    Time spent by all visitors/day: . What is E[W]? W Y i  i 1 • Let Y = value returned by Recurse() . What is E[Y]?       X X            E [ W ] E  Y  E  E  Y | X   [ ] [ ] [ ] 50 8 E X E Y E X E Y          i i E [ Y ] E [ Y | X 1 ] P ( X 1 ) E [ Y | X 2 ] P ( X 2 ) E [ Y | X 3 ] P ( X 3 )       i i   i 1 i 1           X n n [ | 1 ] 3 [ | 2 ] 5 [ ] [ | 3 ] 7 [ ]    E Y X E Y X E Y E Y X E Y      | [ | ] [ ] [ ] E  Y X n  E Y X n E Y nE Y i i i i             E [ Y ] 3 ( 1 / 3 ) ( 5 E [ Y ])( 1 / 3 ) ( 7 E [ Y ])( 1 / 3 ) ( 1 / 3 )( 15 2 E [ Y ]) 1 1 1 i i i    X    E  Y | X  X E [ Y ] [ ] 15 E Y i i    i 1 2

  3. Conditional Variance Making Predictions • Recall definition: Var(X) = E[(X – E[X]) 2 ] • We observe random variable X  Define: Var(X | Y) = E[(X – E[X | Y]) 2 | Y]  Want to make prediction about Y • Derived: Var(X) = E[X 2 ] – (E[X]) 2  E.g., X = stock price at 9am, Y = stock price at 10am ˆ  Can derive: Var(X | Y) = E[X 2 | Y] – (E[X | Y]) 2 Y   Let g ( X ) be function we use to predict Y, i.e.: ( ) g X  Choose g ( X ) to minimize E[(Y – g ( X ) ) 2 ] • After a bit more math (in the book):  Best predictor: g ( X ) = E[Y | X]  Var(X) = E[Var(X | Y)] + Var(E[X | Y])  Intuitively: E[(Y – c) 2 ] is minimized when c = E[Y]  Intuitively, let Y = true temperature, X = thermostat value o Now, you observe X, and Y depends on X, then use c = E[Y | X]  Variance in thermostat readings depends on:  You just got your first baby steps into Machine Learning o Average variance in thermostat at different temperatures + o We’ll go into this more rigorously in a few weeks o Variance in average thermostat value at different temperatures Speaking of Babies... • Say my height is X inches ( x = 71)  My son: He does not look like:  Say, historically, sons grow to heights Y ~ N( X + 1, 4), where X is height of father o Y = (X + 1) + C where C ~ N(0, 4)  What should I predict for the eventual height of my son?  E[Y | X = 71] = E[X + 1 + C | X = 71] = E[72 + C] = E[72] + E[C] = 72 + 0 = 72 inches 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend