SLIDE 1 CS70: Jean Walrand: Lecture 35.
Conditional Expectation, Continuous Probability Warning: This lecture is rated R.
- 1. Conditional Expectation
◮ Review ◮ Going Viral ◮ Walt’s Identity ◮ CE = MMSE
- 2. Continuous Probability
◮ Motivation. ◮ Continuous Random Variables. ◮ Cumulative Distribution Function. ◮ Probability Density Function ◮ Expectation and Variance
SLIDE 2
Conditional Expectation
Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑
y
yPr[Y = y|X = x].
SLIDE 3
Properties of Conditional Expectation
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].
SLIDE 4
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)? In this example, d = 4.
SLIDE 5
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C. In fact, one can show that pd ≥ 1 = ⇒ Pr[X = ∞] > 0.
SLIDE 6
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn]. We conclude as before.
SLIDE 7
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ. Hence, E[X1 +···+XZ] = E[µZ] = µE[Z].
SLIDE 8
CE = MMSE
Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].
SLIDE 9
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2] . Proof: First recall the projection property of CE: E[(Y −E[Y|X])h(X)] = 0,∀h(·). That is, the error Y −E[Y|X] is orthogonal to any h(X).
SLIDE 10
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2] . Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property. Thus, E[(Y −h(X))2] ≥ E[(Y −g(X))2].
SLIDE 11
E[Y|X] and L[Y|X] as projections
L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}: MMSE.
SLIDE 12
Continuous Probability - James Bond.
◮ Escapes from SPECTRE sometime during 1,000 mile
flight.
◮ Uniformly likely to be at any point along path.
What is the chance he is at any point along the path? Discrete Setting: Uniorm over Ω = {1,...,1000}. Continuous setting: probability at any point in [0,1000]? Probability at any one of an infinite number of points is .. ...uh ...0?
SLIDE 13
Continuous Probability: the interval!
Consider [a,b] ⊆ [0,ℓ] (for James, ℓ = 1000.) Let [a,b] also denote the event that point is in the interval [a,b]. Pr[[a,b]] = length of [a,b] length of [0,ℓ] = b −a ℓ = b −a 1000. Again, [a,b] ⊆ Ω = [0,ℓ] are events. Events in this space are unions of intervals. Example: the event A - “within 50 miles of base” is [0,50]∪[950,1000]. Pr[A] = Pr[[0,50]]+Pr[[950,10000]] = 1 10.
SLIDE 14
Shooting..
Another Bond example: Spectre is chasing him in a buggie. Bond shoots at buggy and hits it at random spot. What is the chance he hits gas tank? Gas tank is a one foot circle and the buggy is 4×5 rectangle. buggy gas Ω = {(x,y) : x ∈ [0,4],y ∈ [0,5]}. The size of the event is π(1)2 = π. The “size” of the sample space which is 4×5. Since uniform, probability of event is π
20.
SLIDE 15
Buffon’s needle.
Throw a needle on a board with horizontal lines at random. Lines 1 unit apart, needle has length 1. What is the probability that the needle hits a line? Clearly... 2 π .
SLIDE 16 Buffon’s needle.
Sample space: possible positions of needle. Position: center position (X,Y), orientation, Θ.
·
Θ (X,Y) Y Relevant: X coordinate .. doesn’t matter; Y coordinate := distance from closest line. Y ∈ [0, 1
2]; Θ := closest angle to
vertical [−π
2, π 2]. When Y ≤ 1 2 cosΘ: needle intersects line.
Pr[“intersects”] =
π/2
−π/2
- Pr[Θ ∈ [θ,θ +dθ]]Pr[Y ≤ 1
2 cosθ]
π/2
−π/2
π ]×[(1/2)cosθ 1/2 ]
π [1 2 sinθ]π/2
−π/2 = 2
π .
SLIDE 17
Continuous Random Variables: CDF
Pr[a ≤ X ≤ b] instead of Pr[X = a]. For all a and b specifies the behavior! Simpler: P[X ≤ x] for all x. Cumulative probability Distribution Function of X is F(x) = Pr[X ≤ x] Pr[a < X ≤ b] = Pr[X ≤ b]−Pr[X ≤ a] = F(b)−F(a). Idea: two events X ≤ b and X ≤ a. Difference is the event a ≤ X ≤ b.
SLIDE 18
Example: CDF
Example: Bond’s position. F(x) = Pr[X ≤ x] = for x < 0
x 1000
for 0 ≤ x ≤ 1000 1 for x > 1000 Probability that Bond is within 50 miles of center: Pr[450 < X ≤ 550] = Pr[X ≤ 550]−Pr[X ≤ 450] = 550 1000 − 450 1000 = 100 1000 = 1 10
SLIDE 19
Example: CDF
Example: hitting random location on gas tank. Random location on circle. y 1 Random Variable: Y distance from center. Probability within y of center: Pr[Y ≤ y] = area of small circle area of dartboard = πy2 π = y2. Hence, FY(y) = Pr[Y ≤ y] = for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1
SLIDE 20
Calculation of event with dartboard..
Probability between .5 and .6 of center? Recall CDF . FY(y) = Pr[Y ≤ y] = for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 Pr[0.5 < Y ≤ 0.6] = Pr[Y ≤ 0.6]−Pr[Y ≤ 0.5] = FY(0.6)−FY(0.5) = .36−.25 = .11
SLIDE 21 Density function.
Is the dart more like to be (near) .5 or .1? Probability of “Near x” is Pr[x < X ≤ x +δ]. Goes to 0 as δ goes to zero. Try Pr[x < X ≤ x +δ] δ . The limit as δ goes to zero. lim
δ→0
Pr[x < X ≤ x +δ] δ = lim
δ→0
Pr[X ≤ x +δ]−Pr[X ≤ x] δ = lim
δ→0
FX(x +δ)−FX(x) δ = d(F(x)) dx .
SLIDE 22
Density
Definition: (Density) A probability density function for a random variable X with cdf FX(x) = Pr[X ≤ x] is the function fX(x) where FX(x) =
x
−∞ fX(x)dx.
Thus, Pr[X ∈ (x,x +δ]] = FX(x +δ)−FX(x) = fX(x)δ.
SLIDE 23
Examples: Density.
Example: uniform over interval [0,1000] fX(x) = F ′
X(x) =
for x < 0
1 1000
for 0 ≤ x ≤ 1000 for x > 1000 Example: uniform over interval [0,ℓ] fX(x) = F ′
X(x) =
for x < 0
1 ℓ
for 0 ≤ x ≤ ℓ for x > ℓ
SLIDE 24
Examples: Density.
Example: “Dart” board. Recall that FY(y) = Pr[Y ≤ y] = for y < 0 y2 for 0 ≤ y ≤ 1 1 for y > 1 fY(y) = F ′
Y(y) =
for y < 0 2y for 0 ≤ y ≤ 1 for y > 1 The cumulative distribution function (cdf) and probability distribution function (pdf) give full information. Use whichever is convenient.
SLIDE 25
U[a,b]
SLIDE 26
Expo(λ)
The exponential distribution with parameter λ > 0 is defined by
fX(x) = λe−λx1{x ≥ 0} FX(x) = 0, if x < 0 1−e−λx, if x ≥ 0.
SLIDE 27
Expectation
Recall that Pr[X ∈ (iδ,i(δ +1)]] = fX(iδ)δ. Thus, E[X] =
∞
∑
i=−∞
(iδ)Pr[iδ < X ≤ (i +1)δ] =
∞
∑
i=−∞
(iδ)fX(iδ)δ =
∞
−∞ xfX(x)dx.
Definition The expectation, E[X] of a continuous random variable is defined as E[X] =
∞
−∞ x f(x)dx.
SLIDE 28
Expectation of U[a,b]
Let X = U[a,b]. That is, fX(x) = 1 b −a1{a ≤ x ≤ b}. Hence, E[X] =
∞
−∞ x fX(x)dx =
b
a x
1 b −adx = 1 b −a
b
a xdx =
1 b −a[x2 2 ]b
a
= 1 2(b −a)[b2 −a2] = a+b 2 .
SLIDE 29 Expectation: dartboard.
Example: distance from center on radius 1 dartboard. Recall: fY(y) = 2y1{0 ≤ y ≤ 1}. Hence,
∞
−∞ y f(y)dy
=
−∞ 0+
1
0 2y2dy +
∞
1 0dy
= 0+ 2y3 3
0 +0
= 2 3 Try whole process for general radius. What do you get?
SLIDE 30
Expectation: Exponential.
Let X = Expo(λ) Then, E[X] =
∞
−∞ xfX(x)dx =
∞
0 xλe−λxdx
= −
∞
0 xde−λx
=(∗) −{[xe−λx]∞
0 −e−λxdx}
=
∞
0 e−λxdx = − 1
λ
∞
0 de−λx
= − 1 λ [e−λx]∞
0 = 1
λ .
(∗) We used the integration by parts formula:
b
a f(x)dg(x) = [f(x)g(x)]b a −
b
a g(x)df(x),
which follows from [f(x)g(x)]′ = f ′(x)g(x)+f(x)g′(x).
SLIDE 31 Variance
Definition: The variance of a continuous random variable X is
E((X −E(X))2) = E(X 2)−(E(X))2 =
∞
−∞ x2f(x)dx −
∞
−∞ xf(x)dx
2 .
Example: uniform on [0,ℓ].
ℓ
0 x2 1
ℓ dx = x3 3ℓ
0 = ℓ2
3 . And, E(X) = ℓ
Var(X) = ℓ2 3 − ℓ2 4 = ℓ2 12. ≈ n2−1
12
for uniform discrete distribution on {1,...,n}.
SLIDE 32 Summary
Conditional Expectation, Continuous Probability
- 1. E[Y|X] := ∑y yPr[Y = y|X = x].
- 2. Properties: Linearity, ...., MMSE.
- 3. Applications: Diluting, Mixing, Going Viral, Wald.
- 4. Motivation for Continuous Probability: The world is
continuous ....
- 5. pdf: Pr[X ∈ (x,x +δ]] = fX(x)δ.
- 6. CDF: Pr[X ≤ x] = FX(x) =
x
−∞ fX(y)dy.
- 7. U[a,b], Expo(λ), target.
- 8. Expectation: E[X] =
∞
−∞ xfX(x)dx.
- 9. Variance: var[X] = E[(X −E[X])2] = E[X 2]−E[X]2.