1 Rolling Dice Hyper for the Hypergeometric Roll two 6-sided dice - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Rolling Dice Hyper for the Hypergeometric Roll two 6-sided dice - - PDF document

Viva La Correlacin! Fun with Indicator Variables Say X and Y are arbitrary random variables Let I A and I B be indicators for events A and B Correlation of X and Y, denoted r (X, Y) : 1 if A occurs 1 if B occurs


slide-1
SLIDE 1

1 Viva La Correlación!

  • Say X and Y are arbitrary random variables
  • Correlation of X and Y, denoted r(X, Y):
  • Note: -1  r(X, Y)  1
  • Correlation measures linearity between X and Y
  • r(X, Y) = 1

 Y = aX + b where a = sy/sx

  • r(X, Y) = -1

 Y = aX + b where a = -sy/sx

  • r(X, Y) = 0

 absence of linear relationship

  • But, X and Y can still be related in some other way!
  • If r(X, Y) = 0, we say X and Y are “uncorrelated”
  • Note: Independence implies uncorrelated, but not vice versa!

Y) Var(X)Var( ) , ( Cov ) , ( Y X Y X  r

Fun with Indicator Variables

  • Let IA and IB be indicators for events A and B
  • E[IA] = P(A), E[IB] = P(B), E[IAIB] = P(AB)
  • Cov(IA, IB)

= E[IAIB] – E[IA] E[IB] = P(AB) – P(A)P(B) = P(A | B)P(B) – P(A)P(B) = P(B)[P(A | B) – P(A)]

  • Cov(IA, IB) determined by P(A | B) – P(A)
  • P(A | B) > P(A) 

r(IA, IB) > 0

  • P(A | B) = P(A) 

r(IA, IB) = 0 (and Cov(IA, IB) = 0)

  • P(A | B) < P(A) 

r(IA, IB) < 0

   

  • therwise
  • ccurs

if 1 A I A    

  • therwise
  • ccurs

if 1 B I B

Can’t Get Enough of that Multinomial

  • Multinomial distribution
  • n independent trials of experiment performed
  • Each trials results in one of m outcomes, with

respective probabilities: p1, p2, …, pm where

  • Xi = number of trials with outcome i
  • E.g., Rolling 6-sided die multiple times and counting how

many of each value {1, 2, 3, 4, 5, 6} we get

  • Would expect that Xi are negatively correlated
  • Let’s see... when i  j, what is Cov(Xi, Xj)?

m i i

p

1

1

m

c m c c m m m

p p p c c c n c X c X c X P ... ,..., , ) ,..., , (

2 1

2 1 2 1 2 2 1 1

           

Covariance and the Multinomial

  • Computing Cov(Xi, Xj)
  • Indicator Ii(k) = 1 if trial k has outcome i, 0 otherwise
  • When a  b, trial a and b independent:
  • When a = b:
  • Since trial a cannot have outcome i and j:

 Xi and Xj negatively correlated

n k i i

k I X

1

) (

n k j j

k I X

1

) (



 

n a n b j i j i

a I b I X X

1 1

)) ( ), ( ( Cov ) , ( Cov )) ( ), ( ( Cov  a I b I

j i

)] ( [ )] ( [ )] ( ) ( [ )) ( ), ( ( Cov a I E a I E a I a I E a I b I

j i j i j i

 

i i

p k I E  )] ( [ )] ( ) ( [  a I a I E

j i

 

  

  

n a j i n b a j i j i

a I E a I E a I b I X X

1 1

)]) ( [ )] ( [ ( )) ( ), ( ( Cov ) , ( Cov

j i n a j i

p np p p    

1

) (

Multinomials All Around

  • Multinomial distributions:
  • Count of strings hashed across buckets in hash table
  • Number of server requests across machines in cluster
  • Distribution of words/tokens in an email
  • Etc.
  • When m (# outcomes) is large, pi is small
  • For equally likely outcomes: pi = 1/m
  • Large m  Xi and Xj very mildly negatively correlated
  • Poisson paradigm still applicable

2

) , ( Cov

m n

j i j i

p np X X    

Conditional Expectation

  • X and Y are jointly discrete random variables
  • Recall conditional PMF of X given Y = y:
  • Define conditional expectation of X given Y = y:
  • Analogously, jointly continuous random variables:

 

    

x Y X x

y x p x y Y x X P x y Y X E ) | ( ) | ( ] | [

|

) ( ) , ( ) | ( ) | (

, |

y p y x p y Y x X P y x p

Y Y X Y X

    ) ( ) , ( ) | (

, |

y f y x f y x f

Y Y X Y X

 dx y x f x y Y X E

Y X

) | ( ] | [

|

  

 

slide-2
SLIDE 2

2 Rolling Dice

  • Roll two 6-sided dice D1 and D2
  • X = value of D1 + D2

Y = value of D2

  • What is E[X | Y = 6]?
  • Intuitively makes sense: 6 + E[value of D1] = 6 + 3.5

   

x

Y x X P x Y X E ) 6 | ( ] 6 | [ 5 . 9 6 57 ) 12 11 10 9 8 7 ( 6 1              

Hyper for the Hypergeometric

  • X and Y are independent random variables
  • X ~ Bin(n, p) Y ~ Bin(n, p)
  • What is E[X | X + Y = m], where m ≤ n?
  • Start by computing P(X = k | X + Y = m):
  • Hypergeometric: (X | X + Y = m) ~ HypG(m, 2n, n)
  • E[X | X + Y = m] = nm/2n = m/2

) ( ) ( ) ( ) ( ) , ( ) ( ) , ( ) | ( m Y X P k m Y P k X P m Y X P k m Y k X P m Y X P m Y X k X P m Y X k X P                                                                

 

    

m p p m p p p p

n k m n k n n k m n k n

m n m k m n k m k n k

2 2

2 ) (

) 1 ( ) 1 ( ) 1 ( total trials “X” trials # total successes

Properties of Conditional Expectation

  • X and Y are jointly distributed random variables
  • Expectation of conditional sum:

 

 

 

  • |

|

) | ( ) ( ) | ( ) ( ] | ) ( [ dx y x f x g y x p x g y Y X g E

Y X x Y X

  • r

 

 

        

n i i n i i

y Y X E y Y X E

1 1

] | [ |

Expectations of Conditional Expectations

  • Define g(Y) = E[X | Y]
  • g(Y) is a random variable
  • For any Y = y, g(Y) = E[X | Y = y]
  • This is just function of Y, since we sum over all values of X
  • What is E[E[X | Y]] = E[g(Y)]? (Consider discrete case)

(Same for continuous) ) ( ] | [ ]] | [ [ y Y P y Y X E Y X E E

y

   ) ( ] ) | ( [ y Y P y Y x X P x

y x

    

  

     

x y y x

y Y x X P x y Y x X xP ) , ( ) , (

  

x

X E x X xP ] [ ) (

Analyzing Recursive Code

int Recurse() { int x = randomInt(1, 3); // Equally likely values if (x == 1) return 3; else if (x == 2) return (5 + Recurse()); else return (7 + Recurse()); }

  • Let Y = value returned by Recurse(). What is E[Y]?

) 3 ( ] 3 | [ ) 2 ( ] 2 | [ ) 1 ( ] 1 | [ ] [          X P X Y E X P X Y E X P X Y E Y E 3 ] 1 | [   X Y E ] [ 5 ] 2 | [ Y E X Y E    ] [ 7 ] 3 | [ Y E X Y E    ]) [ 2 15 )( 3 / 1 ( ) 3 / 1 ])( [ 7 ( ) 3 / 1 ])( [ 5 ( ) 3 / 1 ( 3 ] [ Y E Y E Y E Y E        15 ] [  Y E

Random Number of Random Variables

  • Say you have a web site: PimentoLoaf.com
  • X = Number of people/day visit your site. X ~ N(50, 25)
  • Yi = Number of minutes spent by visitor i. Yi ~ Poi(8)
  • X and all Yi are independent
  • Time spent by all visitors/day: . What is E[W]?

X i i

Y W

1

                   

 

  X i i X i i

X Y E E Y E W E

1 1

| ] [ ] [ ] [ ] | [ |

1 1 1 i n i i n i i X i i

Y nE Y E n X Y E n X Y E           

  

  

 

8 50 ] [ ] [ ] [     

i i

Y E X E Y E X E ] [ |

1 i X i i

Y E X X Y E        

slide-3
SLIDE 3

3 Conditional Variance

  • Recall definition: Var(X) = E[(X – E[X])2]
  • Define: Var(X | Y) = E[(X – E[X | Y])2 | Y]
  • Derived: Var(X) = E[X2] – (E[X])2
  • Can derive: Var(X | Y) = E[X2 | Y] – (E[X | Y])2
  • After a bit more math (in the book):
  • Var(X) = E[Var(X | Y)] + Var(E[X | Y])
  • Intuitively, let Y = true temperature, X = thermostat value
  • Variance in thermostat readings depends on:
  • Average variance in thermostat at different temperatures +
  • Variance in average thermostat value at different temperatures

Making Predictions

  • We observe random variable X
  • Want to make prediction about Y
  • E.g., X = stock price at 9am, Y = stock price at 10am
  • Let g(X) be function we use to predict Y, i.e.:
  • Choose g(X) to minimize E[(Y – g(X))2]
  • Best predictor: g(X) = E[Y | X]
  • Intuitively: E[(Y – c)2] is minimized when c = E[Y]
  • Now, you observe X, and Y depends on X, then use c = E[Y | X]
  • You just got your first baby steps into Machine Learning
  • We’ll go into this more rigorously in a few weeks

) ( ˆ X g Y 

Speaking of Babies...

  • Say my height is X inches (x = 71)
  • My son:
  • Say, historically, sons grow to heights Y ~ N(X + 1, 4),

where X is height of father

  • Y = (X + 1) + C where C ~ N(0, 4)
  • What should I predict for the eventual height of my son?
  • E[Y | X = 71]

= E[X + 1 + C | X = 71] = E[72 + C] = E[72] + E[C] = 72 + 0 = 72 inches He does not look like: