EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: - - PowerPoint PPT Presentation

eecs 70 lecture 27
SMART_READER_LITE
LIVE PREVIEW

EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: - - PowerPoint PPT Presentation

EECS 70: Lecture 27. Joint and Conditional Distributions. EECS 70: Lecture 27. Joint and Conditional Distributions. 1. Recap of variance of a random variable EECS 70: Lecture 27. Joint and Conditional Distributions. 1. Recap of variance of a


slide-1
SLIDE 1

EECS 70: Lecture 27.

Joint and Conditional Distributions.

slide-2
SLIDE 2

EECS 70: Lecture 27.

Joint and Conditional Distributions.

  • 1. Recap of variance of a random variable
slide-3
SLIDE 3

EECS 70: Lecture 27.

Joint and Conditional Distributions.

  • 1. Recap of variance of a random variable
  • 2. Joint distributions
slide-4
SLIDE 4

EECS 70: Lecture 27.

Joint and Conditional Distributions.

  • 1. Recap of variance of a random variable
  • 2. Joint distributions
  • 3. Recap of indep. rand. variables: Variance of B(n,p)
  • 4. Conditioning of Random Variables (revisit G(p))
slide-5
SLIDE 5

EECS 70: Lecture 27.

Joint and Conditional Distributions.

  • 1. Recap of variance of a random variable
  • 2. Joint distributions
  • 3. Recap of indep. rand. variables: Variance of B(n,p)
  • 4. Conditioning of Random Variables (revisit G(p))
slide-6
SLIDE 6

Recap

Variance

slide-7
SLIDE 7

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2

slide-8
SLIDE 8

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X]

slide-9
SLIDE 9

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y).

slide-10
SLIDE 10

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] :

slide-11
SLIDE 11

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

slide-12
SLIDE 12

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] =

slide-13
SLIDE 13

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ;

slide-14
SLIDE 14

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .;

slide-15
SLIDE 15

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) :

slide-16
SLIDE 16

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] =

slide-17
SLIDE 17

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

slide-18
SLIDE 18

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] =

slide-19
SLIDE 19

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p;

slide-20
SLIDE 20

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] =

slide-21
SLIDE 21

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) :

slide-22
SLIDE 22

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) : Pr[X = m] =

slide-23
SLIDE 23

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) : Pr[X = m] =

n

m

  • pm(1−p)n−m,m = 0,...,n;
slide-24
SLIDE 24

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) : Pr[X = m] =

n

m

  • pm(1−p)n−m,m = 0,...,n;

E[X] =

slide-25
SLIDE 25

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) : Pr[X = m] =

n

m

  • pm(1−p)n−m,m = 0,...,n;

E[X] = np;

slide-26
SLIDE 26

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) : Pr[X = m] =

n

m

  • pm(1−p)n−m,m = 0,...,n;

E[X] = np; var(X) =

slide-27
SLIDE 27

Recap

Variance

◮ Variance: var[X] := E[(X −E[X])2] = E[X 2]−E[X]2 ◮ Fact: var[aX +b] = a2var[X] ◮ Thm.: If X,Y are indep., Var(X +Y) = Var(X)+Var(Y). ◮ U[1,...,n] : Pr[X = m] = 1 n,m = 1,...,n;

E[X] = n+1

2 ; var(X) = n2−1 12 .; ◮ G(p) : Pr[X = n] = (1−p)n−1p,n = 1,2,...;

E[X] = 1

p; var[X] = 1−p p2 . ◮ B(n,p) : Pr[X = m] =

n

m

  • pm(1−p)n−m,m = 0,...,n;

E[X] = np; var(X) = = np(1−p).

slide-28
SLIDE 28

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P).

slide-29
SLIDE 29

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]?

slide-30
SLIDE 30

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1.

slide-31
SLIDE 31

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]?

slide-32
SLIDE 32

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1.

slide-33
SLIDE 33

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y].

slide-34
SLIDE 34

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]?

slide-35
SLIDE 35

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint?

slide-36
SLIDE 36

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes!

slide-37
SLIDE 37

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω.

slide-38
SLIDE 38

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space?

slide-39
SLIDE 39

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes!

slide-40
SLIDE 40

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes! X and Y are functions on Ω.

slide-41
SLIDE 41

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes! X and Y are functions on Ω. So,

slide-42
SLIDE 42

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes! X and Y are functions on Ω. So, ∑x,y P[X = x,Y = y] = 1.

slide-43
SLIDE 43

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes! X and Y are functions on Ω. So, ∑x,y P[X = x,Y = y] = 1. Joint Distribution: P[X = x,Y = y].

slide-44
SLIDE 44

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes! X and Y are functions on Ω. So, ∑x,y P[X = x,Y = y] = 1. Joint Distribution: P[X = x,Y = y]. Marginal Distributions: P[X = x] and P[Y = y].

slide-45
SLIDE 45

Joint distribution.

Two random variables, X and Y, in probability space: (Ω,P). What is ∑x P[X = x]? 1. What is ∑y P[Y = y]? 1. Let’s think about: P[X = x,Y = y]. What is ∑x,y P[X = x,Y = y]? Are the events “X = x, Y = y” disjoint? Yes! Y and X are functions on Ω. Do they cover the entire sample space? Yes! X and Y are functions on Ω. So, ∑x,y P[X = x,Y = y] = 1. Joint Distribution: P[X = x,Y = y]. Marginal Distributions: P[X = x] and P[Y = y]. Important for inference.

slide-46
SLIDE 46

Two random variables, same outcome space.

Experiment: pick a random person.

slide-47
SLIDE 47

Two random variables, same outcome space.

Experiment: pick a random person. X = number of episodes of Games of Thrones they have seen.

slide-48
SLIDE 48

Two random variables, same outcome space.

Experiment: pick a random person. X = number of episodes of Games of Thrones they have seen. Y = number of episodes of Westworld they have seen.

slide-49
SLIDE 49

Two random variables, same outcome space.

Experiment: pick a random person. X = number of episodes of Games of Thrones they have seen. Y = number of episodes of Westworld they have seen. X 1 2 3 5 40 All P 0.3 0.05 0.05 0.05 0.05 0.1 0.4

slide-50
SLIDE 50

Two random variables, same outcome space.

Experiment: pick a random person. X = number of episodes of Games of Thrones they have seen. Y = number of episodes of Westworld they have seen. X 1 2 3 5 40 All P 0.3 0.05 0.05 0.05 0.05 0.1 0.4 Is this a distribution?

slide-51
SLIDE 51

Two random variables, same outcome space.

Experiment: pick a random person. X = number of episodes of Games of Thrones they have seen. Y = number of episodes of Westworld they have seen. X 1 2 3 5 40 All P 0.3 0.05 0.05 0.05 0.05 0.1 0.4 Is this a distribution? Yes! All the probabilities are non-negative and add up to 1.

slide-52
SLIDE 52

Two random variables, same outcome space.

Experiment: pick a random person. X = number of episodes of Games of Thrones they have seen. Y = number of episodes of Westworld they have seen. X 1 2 3 5 40 All P 0.3 0.05 0.05 0.05 0.05 0.1 0.4 Is this a distribution? Yes! All the probabilities are non-negative and add up to 1. Y 1 5 10 P 0.3 0.1 0.1 0.5

slide-53
SLIDE 53

Joint distribution: Example.

The joint distribution of X and Y is: Y/X 1 2 3 5 40 All 0.15 0.1 0.05 =0.3 1 0.05 0.05 =0.1 5 0.05 0.05 =0.1 10 0.15 0.35 =0.5 =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4

slide-54
SLIDE 54

Joint distribution: Example.

The joint distribution of X and Y is: Y/X 1 2 3 5 40 All 0.15 0.1 0.05 =0.3 1 0.05 0.05 =0.1 5 0.05 0.05 =0.1 10 0.15 0.35 =0.5 =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4 Is this a valid distribution?

slide-55
SLIDE 55

Joint distribution: Example.

The joint distribution of X and Y is: Y/X 1 2 3 5 40 All 0.15 0.1 0.05 =0.3 1 0.05 0.05 =0.1 5 0.05 0.05 =0.1 10 0.15 0.35 =0.5 =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4 Is this a valid distribution? Yes!

slide-56
SLIDE 56

Joint distribution: Example.

The joint distribution of X and Y is: Y/X 1 2 3 5 40 All 0.15 0.1 0.05 =0.3 1 0.05 0.05 =0.1 5 0.05 0.05 =0.1 10 0.15 0.35 =0.5 =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4 Is this a valid distribution? Yes! Notice that P[X = a] and P[Y = b] are (marginal) distributions!

slide-57
SLIDE 57

Joint distribution: Example.

The joint distribution of X and Y is: Y/X 1 2 3 5 40 All 0.15 0.1 0.05 =0.3 1 0.05 0.05 =0.1 5 0.05 0.05 =0.1 10 0.15 0.35 =0.5 =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4 Is this a valid distribution? Yes! Notice that P[X = a] and P[Y = b] are (marginal) distributions! But now we have more information!

slide-58
SLIDE 58

Joint distribution: Example.

The joint distribution of X and Y is: Y/X 1 2 3 5 40 All 0.15 0.1 0.05 =0.3 1 0.05 0.05 =0.1 5 0.05 0.05 =0.1 10 0.15 0.35 =0.5 =0.3 =0.05 =0.05 =0.05 =0.05 =0.1 =0.4 Is this a valid distribution? Yes! Notice that P[X = a] and P[Y = b] are (marginal) distributions! But now we have more information! For example, if I tell you someone watched 5 episodes of Westworld, they definitely didn’t watch all the episodes of GoT.

slide-59
SLIDE 59

Independent random variables.

Definition: Independence

slide-60
SLIDE 60

Independent random variables.

Definition: Independence The random variables X and Y are independent if and only if P[Y = b | X = a] = P[Y = b], for all a and b.

slide-61
SLIDE 61

Independent random variables.

Definition: Independence The random variables X and Y are independent if and only if P[Y = b | X = a] = P[Y = b], for all a and b. Fact:

slide-62
SLIDE 62

Independent random variables.

Definition: Independence The random variables X and Y are independent if and only if P[Y = b | X = a] = P[Y = b], for all a and b. Fact: X,Y are independent if and only if P[X = a,Y = b] = P[X = a]P[Y = b], for all a and b.

slide-63
SLIDE 63

Independent random variables.

Definition: Independence The random variables X and Y are independent if and only if P[Y = b | X = a] = P[Y = b], for all a and b. Fact: X,Y are independent if and only if P[X = a,Y = b] = P[X = a]P[Y = b], for all a and b. Don’t need a huge table of probabilities like the previous slide.

slide-64
SLIDE 64

Independence: examples.

Example 1 Roll two dices. X,Y = number of pips on the two dice. X,Y are independent.

slide-65
SLIDE 65

Independence: examples.

Example 1 Roll two dices. X,Y = number of pips on the two dice. X,Y are independent. Indeed: P[X = a,Y = b] = 1/36,P[X = a] = P[Y = b] = 1/6.

slide-66
SLIDE 66

Independence: examples.

Example 1 Roll two dices. X,Y = number of pips on the two dice. X,Y are independent. Indeed: P[X = a,Y = b] = 1/36,P[X = a] = P[Y = b] = 1/6. Example 2 Roll two dices. X = total number of pips, Y = number of pips on die 1 minus number on die 2. X and Y are

slide-67
SLIDE 67

Independence: examples.

Example 1 Roll two dices. X,Y = number of pips on the two dice. X,Y are independent. Indeed: P[X = a,Y = b] = 1/36,P[X = a] = P[Y = b] = 1/6. Example 2 Roll two dices. X = total number of pips, Y = number of pips on die 1 minus number on die 2. X and Y are not independent.

slide-68
SLIDE 68

Independence: examples.

Example 1 Roll two dices. X,Y = number of pips on the two dice. X,Y are independent. Indeed: P[X = a,Y = b] = 1/36,P[X = a] = P[Y = b] = 1/6. Example 2 Roll two dices. X = total number of pips, Y = number of pips on die 1 minus number on die 2. X and Y are not independent. Indeed: P[X = 12,Y = 1] = 0 = P[X = 12]P[Y = 1] > 0.

slide-69
SLIDE 69

Independence: examples.

Example 1 Roll two dices. X,Y = number of pips on the two dice. X,Y are independent. Indeed: P[X = a,Y = b] = 1/36,P[X = a] = P[Y = b] = 1/6. Example 2 Roll two dices. X = total number of pips, Y = number of pips on die 1 minus number on die 2. X and Y are not independent. Indeed: P[X = 12,Y = 1] = 0 = P[X = 12]P[Y = 1] > 0.

slide-70
SLIDE 70

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y].

slide-71
SLIDE 71

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y].

slide-72
SLIDE 72

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y]. Hence, E[XY] = ∑

x,y

xyP[X = x,Y = y]

slide-73
SLIDE 73

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y]. Hence, E[XY] = ∑

x,y

xyP[X = x,Y = y] = ∑

x,y

xyP[X = x]P[Y = y]

slide-74
SLIDE 74

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y]. Hence, E[XY] = ∑

x,y

xyP[X = x,Y = y] = ∑

x,y

xyP[X = x]P[Y = y], by ind. = ∑

x

y

xyP[X = x]P[Y = y]

slide-75
SLIDE 75

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y]. Hence, E[XY] = ∑

x,y

xyP[X = x,Y = y] = ∑

x,y

xyP[X = x]P[Y = y], by ind. = ∑

x

y

xyP[X = x]P[Y = y]

  • = ∑

x

  • xP[X = x]

y

yP[Y = y]

slide-76
SLIDE 76

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y]. Hence, E[XY] = ∑

x,y

xyP[X = x,Y = y] = ∑

x,y

xyP[X = x]P[Y = y], by ind. = ∑

x

y

xyP[X = x]P[Y = y]

  • = ∑

x

  • xP[X = x]

y

yP[Y = y]

  • = ∑

x

xP[X = x]E[Y]

slide-77
SLIDE 77

Mean of product of independent RVs.

Theorem Let X,Y be independent RVs. Then E[XY] = E[X]E[Y]. Proof:

Recall that E[g(X,Y)] = ∑x,y g(x,y)P[X = x,Y = y]. Hence, E[XY] = ∑

x,y

xyP[X = x,Y = y] = ∑

x,y

xyP[X = x]P[Y = y], by ind. = ∑

x

y

xyP[X = x]P[Y = y]

  • = ∑

x

  • xP[X = x]

y

yP[Y = y]

  • = ∑

x

xP[X = x]E[Y] = E[X]E[Y].

slide-78
SLIDE 78

Variance of sum of two independent random variables

slide-79
SLIDE 79

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y).

slide-80
SLIDE 80

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means.

slide-81
SLIDE 81

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0.

slide-82
SLIDE 82

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0. Then, by independence, E(XY) = E(X)E(Y) = 0.

slide-83
SLIDE 83

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0. Then, by independence, E(XY) = E(X)E(Y) = 0. Hence, var(X +Y) = E((X +Y)2)

slide-84
SLIDE 84

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0. Then, by independence, E(XY) = E(X)E(Y) = 0. Hence, var(X +Y) = E((X +Y)2) = E(X 2 +2XY +Y 2)

slide-85
SLIDE 85

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0. Then, by independence, E(XY) = E(X)E(Y) = 0. Hence, var(X +Y) = E((X +Y)2) = E(X 2 +2XY +Y 2) = E(X 2)+2E(XY)+E(Y 2)

slide-86
SLIDE 86

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0. Then, by independence, E(XY) = E(X)E(Y) = 0. Hence, var(X +Y) = E((X +Y)2) = E(X 2 +2XY +Y 2) = E(X 2)+2E(XY)+E(Y 2) = E(X 2)+E(Y 2)

slide-87
SLIDE 87

Variance of sum of two independent random variables

Theorem: If X and Y are independent, then Var(X +Y) = Var(X)+Var(Y). Proof: Since shifting the random variables does not change their variance, let us subtract their means. That is, we assume that E(X) = 0 and E(Y) = 0. Then, by independence, E(XY) = E(X)E(Y) = 0. Hence, var(X +Y) = E((X +Y)2) = E(X 2 +2XY +Y 2) = E(X 2)+2E(XY)+E(Y 2) = E(X 2)+E(Y 2) = var(X)+var(Y).

slide-88
SLIDE 88

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1.

slide-89
SLIDE 89

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1. Then E[(X +2Y +3Z)2] = E[X 2 +4Y 2 +9Z 2 +4XY +12YZ +6XZ]

slide-90
SLIDE 90

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1. Then E[(X +2Y +3Z)2] = E[X 2 +4Y 2 +9Z 2 +4XY +12YZ +6XZ] = 1+4+9+4×0+12×0+6×0

slide-91
SLIDE 91

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1. Then E[(X +2Y +3Z)2] = E[X 2 +4Y 2 +9Z 2 +4XY +12YZ +6XZ] = 1+4+9+4×0+12×0+6×0 = 14.

slide-92
SLIDE 92

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1. Then E[(X +2Y +3Z)2] = E[X 2 +4Y 2 +9Z 2 +4XY +12YZ +6XZ] = 1+4+9+4×0+12×0+6×0 = 14. (2) Let X,Y be independent and U{1,2,...,n}. Then

slide-93
SLIDE 93

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1. Then E[(X +2Y +3Z)2] = E[X 2 +4Y 2 +9Z 2 +4XY +12YZ +6XZ] = 1+4+9+4×0+12×0+6×0 = 14. (2) Let X,Y be independent and U{1,2,...,n}. Then E[(X −Y)2] = E[X 2 +Y 2 −2XY] = 2E[X 2]−2E[X]2

slide-94
SLIDE 94

Examples.

(1) Assume that X,Y,Z are (pairwise) independent, with E[X] = E[Y] = E[Z] = 0 and E[X 2] = E[Y 2] = E[Z 2] = 1. Then E[(X +2Y +3Z)2] = E[X 2 +4Y 2 +9Z 2 +4XY +12YZ +6XZ] = 1+4+9+4×0+12×0+6×0 = 14. (2) Let X,Y be independent and U{1,2,...,n}. Then E[(X −Y)2] = E[X 2 +Y 2 −2XY] = 2E[X 2]−2E[X]2 = 1+3n +2n2 3 − (n +1)2 2 .

slide-95
SLIDE 95

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.
slide-96
SLIDE 96

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

=

slide-97
SLIDE 97

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

= Really???!!##... Too hard!

slide-98
SLIDE 98

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

= Really???!!##... Too hard! Ok..

slide-99
SLIDE 99

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

= Really???!!##... Too hard! Ok.. fine.

slide-100
SLIDE 100

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

= Really???!!##... Too hard! Ok.. fine. Let’s do something else.

slide-101
SLIDE 101

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

= Really???!!##... Too hard! Ok.. fine. Let’s do something else. Maybe not much easier...

slide-102
SLIDE 102

Variance: binomial.

E[X 2] =

n

i=0

i2 n i

  • pi(1−p)n−i.

= Really???!!##... Too hard! Ok.. fine. Let’s do something else. Maybe not much easier...but there is a payoff.

slide-103
SLIDE 103

Variance of Binomial Distribution.

Flip coin with heads probability p.

slide-104
SLIDE 104

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads?

slide-105
SLIDE 105

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise
slide-106
SLIDE 106

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i )

slide-107
SLIDE 107

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p)

slide-108
SLIDE 108

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

slide-109
SLIDE 109

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2

slide-110
SLIDE 110

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2

slide-111
SLIDE 111

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p).

slide-112
SLIDE 112

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0

slide-113
SLIDE 113

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0

slide-114
SLIDE 114

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1

slide-115
SLIDE 115

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0

slide-116
SLIDE 116

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0

slide-117
SLIDE 117

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0 X = X1 +X2 +...Xn.

slide-118
SLIDE 118

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0 X = X1 +X2 +...Xn. Xi and Xj are independent:

slide-119
SLIDE 119

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0 X = X1 +X2 +...Xn. Xi and Xj are independent: Pr[Xi = 1|Xj = 1] = Pr[Xi = 1].

slide-120
SLIDE 120

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0 X = X1 +X2 +...Xn. Xi and Xj are independent: Pr[Xi = 1|Xj = 1] = Pr[Xi = 1]. Var(X) = Var(X1 +···Xn)

slide-121
SLIDE 121

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0 X = X1 +X2 +...Xn. Xi and Xj are independent: Pr[Xi = 1|Xj = 1] = Pr[Xi = 1]. Var(X) = Var(X1 +···Xn) = np(1−p).

slide-122
SLIDE 122

Variance of Binomial Distribution.

Flip coin with heads probability p. X- how many heads? Xi = 1 if ith flip is heads

  • therwise

E(X 2

i ) = 12 ×p +02 ×(1−p) = p.

Var(Xi) = p −(E(X))2 = p −p2 = p(1−p). p = 0 = ⇒ Var(Xi) = 0 p = 1 = ⇒ Var(Xi) = 0 X = X1 +X2 +...Xn. Xi and Xj are independent: Pr[Xi = 1|Xj = 1] = Pr[Xi = 1]. Var(X) = Var(X1 +···Xn) = np(1−p).

slide-123
SLIDE 123

Conditioning of RVs

Recall conditioning on an event A P[X = k | A] = P[(X = k)∩A] P[A]

slide-124
SLIDE 124

Conditioning of RVs

Recall conditioning on an event A P[X = k | A] = P[(X = k)∩A] P[A] Conditioning on another RV P[X = k | Y = m]

slide-125
SLIDE 125

Conditioning of RVs

Recall conditioning on an event A P[X = k | A] = P[(X = k)∩A] P[A] Conditioning on another RV P[X = k | Y = m] = P[X = k,Y = m] P[Y = m]

slide-126
SLIDE 126

Conditioning of RVs

Recall conditioning on an event A P[X = k | A] = P[(X = k)∩A] P[A] Conditioning on another RV P[X = k | Y = m] = P[X = k,Y = m] P[Y = m] = pX|Y(x | y)

slide-127
SLIDE 127

Conditioning of RVs

Recall conditioning on an event A P[X = k | A] = P[(X = k)∩A] P[A] Conditioning on another RV P[X = k | Y = m] = P[X = k,Y = m] P[Y = m] = pX|Y(x | y) pX|Y(x | y) is called the conditional distribution or conditional probability mass function (pmf) of X given Y pX|Y(x | y) = pXY(x,y) pY(y)

slide-128
SLIDE 128

Conditional distributions

X | Y is a RV:

x

pX|Y(x | y) = ∑

x

pXY(x,y) pY(y) = 1

slide-129
SLIDE 129

Conditional distributions

X | Y is a RV:

x

pX|Y(x | y) = ∑

x

pXY(x,y) pY(y) = 1 Multiplication or Product Rule: pXY(x,y) = pX(x)pY|X(y | x) = pY(y)pX|Y(x | y)

slide-130
SLIDE 130

Conditional distributions

X | Y is a RV:

x

pX|Y(x | y) = ∑

x

pXY(x,y) pY(y) = 1 Multiplication or Product Rule: pXY(x,y) = pX(x)pY|X(y | x) = pY(y)pX|Y(x | y) Total Probability Theorem: If A1, A2, ..., AN partition Ω, and P[Ai] > 0 ∀i, then pX(x) =

N

i=1

P[Ai]P[X = x | Ai]

slide-131
SLIDE 131

Conditional distributions

X | Y is a RV:

x

pX|Y(x | y) = ∑

x

pXY(x,y) pY(y) = 1 Multiplication or Product Rule: pXY(x,y) = pX(x)pY|X(y | x) = pY(y)pX|Y(x | y) Total Probability Theorem: If A1, A2, ..., AN partition Ω, and P[Ai] > 0 ∀i, then pX(x) =

N

i=1

P[Ai]P[X = x | Ai] Nothing special about just two random variables, naturally extends to more.

slide-132
SLIDE 132

Conditional distributions

X | Y is a RV:

x

pX|Y(x | y) = ∑

x

pXY(x,y) pY(y) = 1 Multiplication or Product Rule: pXY(x,y) = pX(x)pY|X(y | x) = pY(y)pX|Y(x | y) Total Probability Theorem: If A1, A2, ..., AN partition Ω, and P[Ai] > 0 ∀i, then pX(x) =

N

i=1

P[Ai]P[X = x | Ai] Nothing special about just two random variables, naturally extends to more. Let’s visit the mean and variance of the geometric distribution using conditional expectation.

slide-133
SLIDE 133

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m].

slide-134
SLIDE 134

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X].

slide-135
SLIDE 135

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why?

slide-136
SLIDE 136

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why?

slide-137
SLIDE 137

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why? (Recall E[g(X)] = ∑l g(l)P[X = l])

slide-138
SLIDE 138

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why? (Recall E[g(X)] = ∑l g(l)P[X = l]) E[X | X > 1] =

k=1

kP[X = k | X > 1]

slide-139
SLIDE 139

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why? (Recall E[g(X)] = ∑l g(l)P[X = l]) E[X | X > 1] =

k=1

kP[X = k | X > 1] =

k=2

kP[X = k−1] (memoryless)

slide-140
SLIDE 140

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why? (Recall E[g(X)] = ∑l g(l)P[X = l]) E[X | X > 1] =

k=1

kP[X = k | X > 1] =

k=2

kP[X = k−1] (memoryless) =

l=1

(l +1)P[X = l] (l = k −1)

slide-141
SLIDE 141

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why? (Recall E[g(X)] = ∑l g(l)P[X = l]) E[X | X > 1] =

k=1

kP[X = k | X > 1] =

k=2

kP[X = k−1] (memoryless) =

l=1

(l +1)P[X = l] (l = k −1) = E[X +1]

slide-142
SLIDE 142

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = n +m | X > n] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. Why? (Recall E[g(X)] = ∑l g(l)P[X = l]) E[X | X > 1] =

k=1

kP[X = k | X > 1] =

k=2

kP[X = k−1] (memoryless) =

l=1

(l +1)P[X = l] (l = k −1) = E[X +1] = 1+E[X]

slide-143
SLIDE 143

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X].

slide-144
SLIDE 144

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+ .

slide-145
SLIDE 145

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+P[X > 1]E[X | X > 1].

slide-146
SLIDE 146

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+P[X > 1]E[X | X > 1]. ⇒ E[X]= p.1+(1−p)(E[X]+1)

slide-147
SLIDE 147

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+P[X > 1]E[X | X > 1]. ⇒ E[X]= p.1+(1−p)(E[X]+1) ⇒ E[X] = p +1−p +E[X]−pE[X]

slide-148
SLIDE 148

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+P[X > 1]E[X | X > 1]. ⇒ E[X]= p.1+(1−p)(E[X]+1) ⇒ E[X] = p +1−p +E[X]−pE[X] ⇒ pE[X] = 1

slide-149
SLIDE 149

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+P[X > 1]E[X | X > 1]. ⇒ E[X]= p.1+(1−p)(E[X]+1) ⇒ E[X] = p +1−p +E[X]−pE[X] ⇒ pE[X] = 1 ⇒ E[X] = 1 p

slide-150
SLIDE 150

Revisiting mean of geometric RV X ∼ G(p)

X is memoryless P[X = k +m | X > k] = P[X = m]. Thus E[X | X > 1] = 1+E[X]. We have E[X] = P[X = 1]E[X | X = 1]+P[X > 1]E[X | X > 1]. ⇒ E[X]= p.1+(1−p)(E[X]+1) ⇒ E[X] = p +1−p +E[X]−pE[X] ⇒ pE[X] = 1 ⇒ E[X] = 1 p Derive the variance for X ∼ G(p) by finding E[X 2] using conditioning.

slide-151
SLIDE 151

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k

slide-152
SLIDE 152

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k]

slide-153
SLIDE 153

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k] Numerator: Joint distribution of (X,Y).

slide-154
SLIDE 154

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k] Numerator: Joint distribution of (X,Y). Denominator: Marginal distribution of Y.

slide-155
SLIDE 155

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k] Numerator: Joint distribution of (X,Y). Denominator: Marginal distribution of Y. (Aside: surprising result using conditioning of RVs):

slide-156
SLIDE 156

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k] Numerator: Joint distribution of (X,Y). Denominator: Marginal distribution of Y. (Aside: surprising result using conditioning of RVs): Theorem: If X ∼ Poisson(λ1), Y ∼ Poisson(λ2) are independent, then X +Y ∼ Poisson(λ1 +λ2).

slide-157
SLIDE 157

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k] Numerator: Joint distribution of (X,Y). Denominator: Marginal distribution of Y. (Aside: surprising result using conditioning of RVs): Theorem: If X ∼ Poisson(λ1), Y ∼ Poisson(λ2) are independent, then X +Y ∼ Poisson(λ1 +λ2). “Sum of independent Poissons is Poisson.”

slide-158
SLIDE 158

Summary of Conditional distribution

For Random Variables X and Y, P[X = x | Y = k] is the conditional distribution of X given Y = k P[X = x | Y = k] = P[X = x,Y = k] P[Y = k] Numerator: Joint distribution of (X,Y). Denominator: Marginal distribution of Y. (Aside: surprising result using conditioning of RVs): Theorem: If X ∼ Poisson(λ1), Y ∼ Poisson(λ2) are independent, then X +Y ∼ Poisson(λ1 +λ2). “Sum of independent Poissons is Poisson.”

slide-159
SLIDE 159

Summary.

Joint and Conditional Distributions.

slide-160
SLIDE 160

Summary.

Joint and Conditional Distributions. Joint distributions:

slide-161
SLIDE 161

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1.

slide-162
SLIDE 162

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1. ◮ Marginalization: ∑y P[X = x,Y = y] = P[X = x].

slide-163
SLIDE 163

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1. ◮ Marginalization: ∑y P[X = x,Y = y] = P[X = x]. ◮ Independence: P[X = x,Y = y] = P[X = x]P[Y = y] for all

x,y.

slide-164
SLIDE 164

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1. ◮ Marginalization: ∑y P[X = x,Y = y] = P[X = x]. ◮ Independence: P[X = x,Y = y] = P[X = x]P[Y = y] for all

x,y. E[XY] = E[X]E[Y].

slide-165
SLIDE 165

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1. ◮ Marginalization: ∑y P[X = x,Y = y] = P[X = x]. ◮ Independence: P[X = x,Y = y] = P[X = x]P[Y = y] for all

x,y. E[XY] = E[X]E[Y]. Conditional distributions:

slide-166
SLIDE 166

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1. ◮ Marginalization: ∑y P[X = x,Y = y] = P[X = x]. ◮ Independence: P[X = x,Y = y] = P[X = x]P[Y = y] for all

x,y. E[XY] = E[X]E[Y]. Conditional distributions:

◮ Sum of independent Poissons is Poisson.

slide-167
SLIDE 167

Summary.

Joint and Conditional Distributions. Joint distributions:

◮ Normalization: ∑x,y P[X = x,Y = y] = 1. ◮ Marginalization: ∑y P[X = x,Y = y] = P[X = x]. ◮ Independence: P[X = x,Y = y] = P[X = x]P[Y = y] for all

x,y. E[XY] = E[X]E[Y]. Conditional distributions:

◮ Sum of independent Poissons is Poisson. ◮ Conditional expectation: useful for mean & variance

calculations