CS70: Jean Walrand: Lecture 27. Expectation; Conditional - - PowerPoint PPT Presentation

cs70 jean walrand lecture 27
SMART_READER_LITE
LIVE PREVIEW

CS70: Jean Walrand: Lecture 27. Expectation; Conditional - - PowerPoint PPT Presentation

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of Expectation 2. Linearity of Expectation 3. Conditional Expectation 4. Independence of RVs 5. Applications 6. Important Distributions and


slide-1
SLIDE 1

CS70: Jean Walrand: Lecture 27.

Expectation; Conditional Expectation; B(n, p); G(p)

  • 1. Review of Expectation
  • 2. Linearity of Expectation
  • 3. Conditional Expectation
  • 4. Independence of RVs
  • 5. Applications
  • 6. Important Distributions and Expectations.
slide-2
SLIDE 2

Expectation

Recall: X : Ω → ℜ;Pr[X = a];= Pr[X −1(a)]; Definition: The expectation of a random variable X is E[X] = ∑

a

a×Pr[X = a]. Indicator: Let A be an event. The random variable X defined by X(ω) = 1, if ω ∈ A 0, if ω / ∈ A is called the indicator of the event A. Note that Pr[X = 1] = Pr[A] and Pr[X = 0] = 1−Pr[A]. Hence, E[X] = 1×Pr[X = 1]+0×Pr[X = 0] = Pr[A]. The random variable X is sometimes written as 1{ω ∈ A} or 1A(ω).

slide-3
SLIDE 3

Linearity of Expectation

Theorem: E[X] = ∑

ω

X(ω)×Pr[ω]. Theorem: Expectation is linear E[a1X1 +···+anXn] = a1E[X1]+···+anE[Xn]. Proof: E[a1X1 +···+anXn] = ∑

ω

(a1X1 +···+anXn)(ω)Pr[ω] = ∑

ω

(a1X1(ω)+···+anXn(ω))Pr[ω] = a1∑

ω

X1(ω)Pr[ω]+···+an∑

ω

Xn(ω)Pr[ω] = a1E[X1]+···+anE[Xn].

slide-4
SLIDE 4

Using Linearity - 1: Pips on dice

Roll a die n times. Xm = number of pips on roll m. X = X1 +···+Xn = total number of pips in n rolls. E[X] = E[X1 +···+Xn] = E[X1]+···+E[Xn], by linearity = nE[X1], because the Xm have the same distribution Now, E[X1] = 1× 1 6 +···+6× 1 6 = 6×7 2 × 1 6 = 7 2. Hence, E[X] = 7n 2 .

slide-5
SLIDE 5

Using Linearity - 2: Fixed point.

Hand out assignments at random to n students. X = number of students that get their own assignment back. X = X1 +···+Xn where Xm = 1{student m gets his/her own assignment back}. One has E[X] = E[X1 +···+Xn] = E[X1]+···+E[Xn], by linearity = nE[X1], because all the Xm have the same distribution = nPr[X1 = 1], because X1 is an indicator = n(1/n), because student 1 is equally likely to get any one of the n assignments = 1. Note that linearity holds even though the Xm are not independent (whatever that means).

slide-6
SLIDE 6

Using Linearity - 3: Binomial Distribution.

Flip n coins with heads probability p. X - number of heads Binomial Distibution: Pr[X = i], for each i. Pr[X = i] = n i

  • pi(1−p)n−i.

E[X] = ∑

i

i ×Pr[X = i] = ∑

i

i × n i

  • pi(1−p)n−i.

Uh oh. ... Or... a better approach: Let Xi = 1 if ith flip is heads

  • therwise

E[Xi] = 1×Pr[“heads′′]+0×Pr[“tails′′] = p. Moreover X = X1 +···Xn and E[X] = E[X1]+E[X2]+···E[Xn] = n ×E[Xi]= np.

slide-7
SLIDE 7

Conditional Expectation

How do observations affect expectation? Example 1: Roll one die. You are told that the outcome X is at least 3. What is the expected value of X given that information? Given that X ≥ 3, we know that X is uniform in {3,4,5,6}. Hence, the mean value is 4.5. We write E[X|X ≥ 3] = 4.5. Similarly, we have E[X|X < 3] = 1.5 because, given that X < 3, X is uniform in {1,2}. Note that E[X|X ≥ 3]×Pr[X ≥ 3]+E[X|X < 2]×Pr[X < 2] = 4.5× 4 6 +1.5× 2 6 = 3+0.5 = 3.5 = E[X]. Is this a coincidence?

slide-8
SLIDE 8

Conditional Expectation

How do observations affect expectation? Example 2: Roll two dice. You are told that the total number X of pips is at least 8. What is the expected value of X given that information? Recall the distribution of X: Pr[X = 2] = Pr[X = 12] = 1/36,Pr[X = 3] = Pr[X = 11] = 2/36,.... Given that X ≥ 8, the distribution of X becomes {(8,5/15),(9,4/15),(10,3/15),(11,2/15),(12,1/15)}. For instance, Pr[X = 8|X ≥ 8] = Pr[X = 8] Pr[X ≥ 8] = 5/36 15/36 = 5 15. Hence, E[X|X ≥ 8] = 8 5 15 +9 4 15 +10 3 15 +11 2 15 +12 1 15 = 140 15 ≈ 9.33.

slide-9
SLIDE 9

Conditional Expectation

How do observations affect expectation? Example 2: continued Roll two dice. You are told that the total number X of pips is less than 8. What is the expected value of X given that information? We find that E[X|X < 8] = 2 1 21 +3 3 21 +···+7 6 21 = 112 21 ≈ 5.33. Observe that E[X|X ≥ 8]Pr[X ≥ 8]+E[X|X < 8]Pr[X < 8] = 9.33× 15 36 +5.3321 36 = 7 = E[X]. Coincidence? Probably not.

slide-10
SLIDE 10

Conditional Probability

Definition Let X be a RV and A an event. Then E[X|A] := ∑

a

a×Pr[X = a|A]. It is easy (really) to see that E[X|A] = ∑

ω

X(ω)Pr[ω|A] = 1 Pr[A] ∑

ω∈A

X(ω)Pr[ω]. Theorem Conditional Expectation is linear E[a1X1 +···+anXn|A] = a1E[X1|A]+···+anE[Xn|A]. Proof:

E[a1X1 +···+anXn|A] = ∑

ω

[a1X1(ω)+···+anXn(ω)]Pr[ω|A] = a1∑

ω

X1(ω)Pr[ω|A]+···+an∑

ω

Xn(ω)Pr[ω|A] = a1E[X1|A]+···anE[Xn|A].

slide-11
SLIDE 11

Conditional Probability

Theorem E[X] = E[X|A]Pr[A]+E[X|¯ A]Pr[¯ A]. Proof The law of total probability says that Pr[ω] = Pr[ω|A]Pr[A]+Pr[ω|¯ A]Pr[¯ A]. Hence, E[X] = ∑

ω

X(ω)Pr[ω] = ∑

ω

X(ω)Pr[ω|A]Pr[A]+∑

ω

X(ω)Pr[ω|¯ A]Pr[¯ A] = E[X|A]Pr[A]+E[X|¯ A]Pr[¯ A].

slide-12
SLIDE 12

Geometric Distribution

Let’s flip a coin with Pr[H] = p until we get H. For instance: ω1 = H, or ω2 = T H, or ω3 = T T H, or ωn = T T T T ··· T H. Note that Ω = {ωn,n = 1,2,...}. Let X be the number of flips until the first H. Then, X(ωn) = n. Also, Pr[X = n] = (1−p)n−1p, n ≥ 1.

slide-13
SLIDE 13

Geometric Distribution

Pr[X = n] = (1−p)n−1p,n ≥ 1.

slide-14
SLIDE 14

Geometric Distribution

Pr[X = n] = (1−p)n−1p,n ≥ 1. Note that

n=1

Pr[Xn] =

n=1

(1−p)n−1p = p

n=1

(1−p)n−1 = p

n=0

(1−p)n. Now, if |a| < 1, then S := ∑∞

n=0 an = 1 1−a. Indeed,

S = 1+a+a2 +a3 +··· aS = a+a2 +a3 +a4 +··· (1−a)S = 1+a−a+a2 −a2 +··· = 1. Hence,

n=1

Pr[Xn] = p 1 1−(1−p) = 1.

slide-15
SLIDE 15

Geometric Distribution: Expectation

X =D G(p), i.e., Pr[X = n] = (1−p)n−1p,n ≥ 1. One has E[X] =

n=1

nPr[X = n] =

n=1

n(1−p)n−1p. Thus, E[X] = p +2(1−p)p +3(1−p)2p +4(1−p)3p +··· (1−p)E[X] = (1−p)p +2(1−p)2p +3(1−p)3p +··· pE[X] = p + (1−p)p + (1−p)2p + (1−p)3p +··· by subtracting the previous two identities =

n=1

Pr[X = n] = 1. Hence, E[X] = 1 p.

slide-16
SLIDE 16

Geometric Distribution: Renewal Trick

A different look at the algebra. We flip the coin once, and, if we get T, let ω be the following flips. Note that X(Hω) = 1 and X(Tω) = 1+Y(ω). Hence,

E[X] = ∑

ω

1×Pr[Hω]+∑

ω

(1+Y(ω))Pr[Tω] = ∑

ω

pPr[ω]+∑

ω

(1+Y(ω))(1−p)Pr[ω] = p +(1−p)(1+E[Y]) = 1+(1−p)E[Y]. But, E[X] = E[Y]. Thus, E[X] = 1+(1−p)E[X], so that E[X] = 1/p.

slide-17
SLIDE 17

Geometric Distribution: Memoryless

Let X be G(p). Then, for n ≥ 0, Pr[X > n] = Pr[ first n flips are T] = (1−p)n. Theorem Pr[X > n +m|X > n] = Pr[X > m],m,n ≥ 0. Proof: Pr[X > n +m|X > n] = Pr[X > n +m and X > n] Pr[X > n] = Pr[X > n +m] Pr[X > n] = (1−p)n+m (1−p)n = (1−p)m = Pr[X > m].

slide-18
SLIDE 18

Geometric Distribution: Memoryless - Interpretation

Pr[X > n +m|X > n] = Pr[X > m],m,n ≥ 0. Pr[X > n +m|X > n] = Pr[A|B] = Pr[A] = Pr[X > m]. The coin is memoryless, therefore, so is X.

slide-19
SLIDE 19

Geometric Distribution: Yet another look

Theorem: For a r.v. X that takes the values {0,1,2,...}, one has E[X] =

i=1

Pr[X ≥ i]. [See later for a proof.] If X = G(p), then Pr[X ≥ i] = Pr[X > i −1] = (1−p)i−1. Hence, E[X] =

i=1

(1−p)i−1 =

i=0

(1−p)i = 1 1−(1−p) = 1 p.

slide-20
SLIDE 20

Expected Value of Integer RV

Theorem: For a r.v. X that takes values in {0,1,2,...}, one has E[X] =

i=1

Pr[X ≥ i]. Proof: One has

E[X] =

i=1

i ×Pr[X = i] =

i=1

i{Pr[X ≥ i]−Pr[X ≥ i +1]} =

i=1

{i ×Pr[X ≥ i]−i ×Pr[X ≥ i +1]} =

i=1

{i ×Pr[X ≥ i]−(i −1)×Pr[X ≥ i]} =

i=1

Pr[X ≥ i].

slide-21
SLIDE 21

Riding the bus.

n buses arrive uniformly at random throughout a 24 hour day. What is the time between buses? What is the time to wait for a bus? Here are typical arrival times, independent and uniform in [0,24]. Here is an alternative picture (left)

slide-22
SLIDE 22

Riding the bus.

Add the black dot uniformly at random and pretend that it represents 0/24. This is legitimate, because given the black dot, the other dots are uniform at random. Then, 24 = E[X1 +···+X5] = 5E[X1], by linearity and symmetry = 5E(X1]. Hence, E[X1] = E[Xm] = 24 5 = 24 n +1 for n busses.

slide-23
SLIDE 23

Summary.

Expectation; Conditional Expectation; B(n, p); G(p) Expectation: E[X] = ∑a a×Pr[X = a] = ∑ω X(ω)Pr[ω]. Linearity: E[a1X1 +···+anXn] = a1E[X1]+···+anE[Xn]. Binomial: Pr[X = i] = n

k

  • pi(1−p)(n−i);E(X) = pn.

Geometric: Pr[X = i] = (1−p)(i−1)p;E(X) = 1

p; memoryless.

Condition Expectation: E[X|A]. Linear and E[X] = E[X|A]Pr[A]+E[X|¯ A]Pr[¯ A].