CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous - - PowerPoint PPT Presentation

cs70 jean walrand lecture 35
SMART_READER_LITE
LIVE PREVIEW

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous - - PowerPoint PPT Presentation

CS70: Jean Walrand: Lecture 35. Conditional Expectation, Continuous Probability Warning: This lecture is rated R. 1. Conditional Expectation Review Going Viral Walts Identity CE = MMSE 2. Continuous Probability Motivation.


slide-1
SLIDE 1

CS70: Jean Walrand: Lecture 35.

Conditional Expectation, Continuous Probability Warning: This lecture is rated R.

  • 1. Conditional Expectation

◮ Review ◮ Going Viral ◮ Walt’s Identity ◮ CE = MMSE

  • 2. Continuous Probability

◮ Motivation. ◮ Continuous Random Variables. ◮ Cumulative Distribution Function. ◮ Probability Density Function ◮ Expectation and Variance

slide-2
SLIDE 2

Conditional Expectation

Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑

y

yPr[Y = y|X = x].

slide-3
SLIDE 3

Properties of Conditional Expectation

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].

slide-4
SLIDE 4

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)? In this example, d = 4.

slide-5
SLIDE 5

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C. In fact, one can show that pd ≥ 1 = ⇒ Pr[X = ∞] > 0.

slide-6
SLIDE 6

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn]. We conclude as before.

slide-7
SLIDE 7

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ. Hence, E[X1 +···+XZ] = E[µZ] = µE[Z].

slide-8
SLIDE 8

CE = MMSE

Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].

slide-9
SLIDE 9

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2] . Proof: First recall the projection property of CE: E[(Y −E[Y|X])h(X)] = 0,∀h(·). That is, the error Y −E[Y|X] is orthogonal to any h(X).

slide-10
SLIDE 10

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2] . Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property. Thus, E[(Y −h(X))2] ≥ E[(Y −g(X))2].

slide-11
SLIDE 11

E[Y|X] and L[Y|X] as projections

L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}: MMSE.

slide-12
SLIDE 12

Continuous Probability - James Bond.

◮ Escapes from SPECTRE sometime during 1,000 mile

flight.

◮ Uniformly likely to be at any point along path.

What is the chance he is at any point along the path? Discrete Setting: Uniorm over Ω = {1,...,1000}. Continuous setting: probability at any point in [0,1000]? Probability at any one of an infinite number of points is .. ...uh ...0?

slide-13
SLIDE 13

Continuous Probability: the interval!

Consider [a,b] ⊆ [0,ℓ] (for James, ℓ = 1000.) Let [a,b] also denote the event that point is in the interval [a,b]. Pr[[a,b]] = length of [a,b] length of [0,ℓ] = b −a ℓ = b −a 1000. Again, [a,b] ⊆ Ω = [0,ℓ] are events. Events in this space are unions of intervals. Example: the event A - “within 50 miles of base” is [0,50]∪[950,1000]. Pr[A] = Pr[[0,50]]+Pr[[950,10000]] = 1 10.

slide-14
SLIDE 14

Shooting..

Another Bond example: Spectre is chasing him in a buggie. Bond shoots at buggy and hits it at random spot. What is the chance he hits gas tank? Gas tank is a one foot circle and the buggy is 4×5 rectangle. buggy gas Ω = {(x,y) : x ∈ [0,4],y ∈ [0,5]}. The size of the event is π(1)2 = π. The “size” of the sample space which is 4×5. Since uniform, probability of event is π

20.

slide-15
SLIDE 15

Buffon’s needle.

Throw a needle on a board with horizontal lines at random. Lines 1 unit apart, needle has length 1. What is the probability that the needle hits a line? Clearly... 2 π .

slide-16
SLIDE 16

Buffon’s needle.

Sample space: possible positions of needle. Position: center position (X,Y), orientation, Θ.

·

Θ (X,Y) Y Relevant: X coordinate .. doesn’t matter; Y coordinate := distance from closest line. Y ∈ [0, 1

2]; Θ := closest angle to

vertical [−π

2, π 2]. When Y ≤ 1 2 cosΘ: needle intersects line.

Pr[“intersects”] =

π/2

−π/2

  • Pr[Θ ∈ [θ,θ +dθ]]Pr[Y ≤ 1

2 cosθ]

  • =

π/2

−π/2

  • [dθ

π ]×[(1/2)cosθ 1/2 ]

  • = 2

π [1 2 sinθ]π/2

−π/2 = 2

π .

slide-17
SLIDE 17

Continuous Random Variables: CDF

Pr[a ≤ X ≤ b] instead of Pr[X = a]. For all a and b specifies the behavior! Simpler: P[X ≤ x] for all x. Cumulative probability Distribution Function of X is F(x) = Pr[X ≤ x] Pr[a < X ≤ b] = Pr[X ≤ b]−Pr[X ≤ a] = F(b)−F(a). Idea: two events X ≤ b and X ≤ a. Difference is the event a ≤ X ≤ b.

slide-18
SLIDE 18

Example: CDF

Example: Bond’s position. F(x) = Pr[X ≤ x] =    for x < 0

x 1000

for 0 ≤ x ≤ 1000 1 for x > 1000 Probability that Bond is within 50 miles of center: Pr[450 < X ≤ 550] = Pr[X ≤ 550]−Pr[X ≤ 450] = 550 1000 − 450 1000 = 100 1000 = 1 10

slide-19
SLIDE 19

Example: CDF

Example: hitting random location on gas tank. Random location on circle. y 1 Random Variable: Y distance from center. Probability within y of center: Pr[Y ≤ y] = area of small circle area of dartboard = πy2 π = y2. Hence, FY(y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1

slide-20
SLIDE 20

Calculation of event with dartboard..

Probability between .5 and .6 of center? Recall CDF . FY(y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 Pr[0.5 < Y ≤ 0.6] = Pr[Y ≤ 0.6]−Pr[Y ≤ 0.5] = FY(0.6)−FY(0.5) = .36−.25 = .11

slide-21
SLIDE 21

Density function.

Is the dart more like to be (near) .5 or .1? Probability of “Near x” is Pr[x < X ≤ x +δ]. Goes to 0 as δ goes to zero. Try Pr[x < X ≤ x +δ] δ . The limit as δ goes to zero. lim

δ→0

Pr[x < X ≤ x +δ] δ = lim

δ→0

Pr[X ≤ x +δ]−Pr[X ≤ x] δ = lim

δ→0

FX(x +δ)−FX(x) δ = d(F(x)) dx .

slide-22
SLIDE 22

Density

Definition: (Density) A probability density function for a random variable X with cdf FX(x) = Pr[X ≤ x] is the function fX(x) where FX(x) =

x

−∞ fX(x)dx.

Thus, Pr[X ∈ (x,x +δ]] = FX(x +δ)−FX(x) = fX(x)δ.

slide-23
SLIDE 23

Examples: Density.

Example: uniform over interval [0,1000] fX(x) = F ′

X(x) =

   for x < 0

1 1000

for 0 ≤ x ≤ 1000 for x > 1000 Example: uniform over interval [0,ℓ] fX(x) = F ′

X(x) =

   for x < 0

1 ℓ

for 0 ≤ x ≤ ℓ for x > ℓ

slide-24
SLIDE 24

Examples: Density.

Example: “Dart” board. Recall that FY(y) = Pr[Y ≤ y] =    for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 fY(y) = F ′

Y(y) =

   for y < 0 2y for 0 ≤ y ≤ 1 for y > 1 The cumulative distribution function (cdf) and probability distribution function (pdf) give full information. Use whichever is convenient.

slide-25
SLIDE 25

U[a,b]

slide-26
SLIDE 26

Expo(λ)

The exponential distribution with parameter λ > 0 is defined by

fX(x) = λe−λx1{x ≥ 0} FX(x) = 0, if x < 0 1−e−λx, if x ≥ 0.

slide-27
SLIDE 27

Expectation

Recall that Pr[X ∈ (iδ,i(δ +1)]] = fX(iδ)δ. Thus, E[X] =

i=−∞

(iδ)Pr[iδ < X ≤ (i +1)δ] =

i=−∞

(iδ)fX(iδ)δ =

−∞ xfX(x)dx.

Definition The expectation, E[X] of a continuous random variable is defined as E[X] =

−∞ x f(x)dx.

slide-28
SLIDE 28

Expectation of U[a,b]

Let X = U[a,b]. That is, fX(x) = 1 b −a1{a ≤ x ≤ b}. Hence, E[X] =

−∞ x fX(x)dx =

b

a x

1 b −adx = 1 b −a

b

a xdx =

1 b −a[x2 2 ]b

a

= 1 2(b −a)[b2 −a2] = a+b 2 .

slide-29
SLIDE 29

Expectation: dartboard.

Example: distance from center on radius 1 dartboard. Recall: fY(y) = 2y1{0 ≤ y ≤ 1}. Hence,

−∞ y f(y)dy

=

−∞ 0+

1

0 2y2dy +

1 0dy

= 0+ 2y3 3

  • 1

0 +0

= 2 3 Try whole process for general radius. What do you get?

slide-30
SLIDE 30

Expectation: Exponential.

Let X = Expo(λ) Then, E[X] =

−∞ xfX(x)dx =

0 xλe−λxdx

= −

0 xde−λx

=(∗) −{[xe−λx]∞

0 −e−λxdx}

=

0 e−λxdx = − 1

λ

0 de−λx

= − 1 λ [e−λx]∞

0 = 1

λ .

(∗) We used the integration by parts formula:

b

a f(x)dg(x) = [f(x)g(x)]b a −

b

a g(x)df(x),

which follows from [f(x)g(x)]′ = f ′(x)g(x)+f(x)g′(x).

slide-31
SLIDE 31

Variance

Definition: The variance of a continuous random variable X is

E((X −E(X))2) = E(X 2)−(E(X))2 =

−∞ x2f(x)dx −

−∞ xf(x)dx

2 .

Example: uniform on [0,ℓ].

0 x2 1

ℓ dx = x3 3ℓ

0 = ℓ2

3 . And, E(X) = ℓ

  • 2. So

Var(X) = ℓ2 3 − ℓ2 4 = ℓ2 12. ≈ n2−1

12

for uniform discrete distribution on {1,...,n}.

slide-32
SLIDE 32

Summary

Conditional Expectation, Continuous Probability

  • 1. E[Y|X] := ∑y yPr[Y = y|X = x].
  • 2. Properties: Linearity, ...., MMSE.
  • 3. Applications: Diluting, Mixing, Going Viral, Wald.
  • 4. Motivation for Continuous Probability: The world is

continuous ....

  • 5. pdf: Pr[X ∈ (x,x +δ]] = fX(x)δ.
  • 6. CDF: Pr[X ≤ x] = FX(x) =

x

−∞ fX(y)dy.

  • 7. U[a,b], Expo(λ), target.
  • 8. Expectation: E[X] =

−∞ xfX(x)dx.

  • 9. Variance: var[X] = E[(X −E[X])2] = E[X 2]−E[X]2.