SLIDE 1
CS70: Jean Walrand: Lecture 34.
Conditional Expectation
SLIDE 2 CS70: Jean Walrand: Lecture 34.
Conditional Expectation
- 1. Review: joint distribution, LLSE
- 2. Definition of Conditional expectation
- 3. Properties of CE
- 4. Applications: Diluting, Mixing, Rumors
- 5. CE = MMSE
SLIDE 3
Review
Definitions Let X and Y be RVs on Ω.
SLIDE 4
Review
Definitions Let X and Y be RVs on Ω.
◮ Joint Distribution: Pr[X = x,Y = y]
SLIDE 5
Review
Definitions Let X and Y be RVs on Ω.
◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y]
SLIDE 6
Review
Definitions Let X and Y be RVs on Ω.
◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x]
SLIDE 7
Review
Definitions Let X and Y be RVs on Ω.
◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x] ◮ LLSE:
L[Y|X] = a+bX where a,b minimize E[(Y −a−bX)2].
SLIDE 8
Review
Definitions Let X and Y be RVs on Ω.
◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x] ◮ LLSE:
L[Y|X] = a+bX where a,b minimize E[(Y −a−bX)2]. We saw that L[Y|X] = E[Y]+ cov(X,Y) var[X] (X −E[X]).
SLIDE 9
Review
Definitions Let X and Y be RVs on Ω.
◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x] ◮ LLSE:
L[Y|X] = a+bX where a,b minimize E[(Y −a−bX)2]. We saw that L[Y|X] = E[Y]+ cov(X,Y) var[X] (X −E[X]). Recall the non-Bayesian and Bayesian viewpoints.
SLIDE 10
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear.
SLIDE 11
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight),
SLIDE 12
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income),
SLIDE 13
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk).
SLIDE 14
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk).
SLIDE 15
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal:
SLIDE 16
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X!
SLIDE 17
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X.
SLIDE 18
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X. Ambitious!
SLIDE 19
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X. Ambitious! Can it be done?
SLIDE 20
Conditional Expectation: Motivation
There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X. Ambitious! Can it be done? Amazingly, yes!
SLIDE 21
Conditional Expectation
Definition Let X and Y be RVs on Ω.
SLIDE 22
Conditional Expectation
Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X)
SLIDE 23
Conditional Expectation
Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑
y
yPr[Y = y|X = x].
SLIDE 24
Conditional Expectation
Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑
y
yPr[Y = y|X = x]. Fact
E[Y|X = x] = ∑
ω
Y(ω)Pr[ω|X = x].
SLIDE 25
Conditional Expectation
Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑
y
yPr[Y = y|X = x]. Fact
E[Y|X = x] = ∑
ω
Y(ω)Pr[ω|X = x]. Proof: E[Y|X = x] = E[Y|A] with A = {ω : X(ω) = x}.
SLIDE 26
Deja vu, all over again?
Have we seen this before?
SLIDE 27
Deja vu, all over again?
Have we seen this before? Yes.
SLIDE 28
Deja vu, all over again?
Have we seen this before? Yes. Is anything new?
SLIDE 29
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes.
SLIDE 30
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X).
SLIDE 31
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal?
SLIDE 32
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite!
SLIDE 33
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient.
SLIDE 34
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X.
SLIDE 35
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·).
SLIDE 36
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX.
SLIDE 37
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX. It could be that g(X) = a+bX +cX 2.
SLIDE 38
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX. It could be that g(X) = a+bX +cX 2. Or that g(X) = 2sin(4X)+exp{−3X}.
SLIDE 39
Deja vu, all over again?
Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX. It could be that g(X) = a+bX +cX 2. Or that g(X) = 2sin(4X)+exp{−3X}. Or something else.
SLIDE 40
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x]
SLIDE 41
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem
SLIDE 42
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y];
SLIDE 43
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X];
SLIDE 44
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·);
SLIDE 45
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·);
SLIDE 46
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].
SLIDE 47
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:
SLIDE 48
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:
(a),(b) Obvious
SLIDE 49
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:
(a),(b) Obvious (c) E[Yh(X)|X = x] = ∑
ω
Y(ω)h(X(ω)Pr[ω|X = x]
SLIDE 50
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:
(a),(b) Obvious (c) E[Yh(X)|X = x] = ∑
ω
Y(ω)h(X(ω)Pr[ω|X = x] = ∑
ω
Y(ω)h(x)Pr[ω|X = x]
SLIDE 51
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:
(a),(b) Obvious (c) E[Yh(X)|X = x] = ∑
ω
Y(ω)h(X(ω)Pr[ω|X = x] = ∑
ω
Y(ω)h(x)Pr[ω|X = x] = h(x)E[Y|X = x]
SLIDE 52
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
SLIDE 53
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
(d) E[h(X)E[Y|X]] = ∑
x
h(x)E[Y|X = x]Pr[X = x]
SLIDE 54
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
(d) E[h(X)E[Y|X]] = ∑
x
h(x)E[Y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[Y = y|X = x]Pr[X = x]
SLIDE 55
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
(d) E[h(X)E[Y|X]] = ∑
x
h(x)E[Y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[Y = y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[X = x,y = y]
SLIDE 56
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
(d) E[h(X)E[Y|X]] = ∑
x
h(x)E[Y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[Y = y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[X = x,y = y] = ∑
x,y
h(x)yPr[X = x,y = y]
SLIDE 57
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
(d) E[h(X)E[Y|X]] = ∑
x
h(x)E[Y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[Y = y|X = x]Pr[X = x] = ∑
x
h(x)∑
y
yPr[X = x,y = y] = ∑
x,y
h(x)yPr[X = x,y = y] = E[h(X)Y].
SLIDE 58
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
SLIDE 59
Properties of CE
E[Y|X = x] = ∑
y
yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)
(e) Let h(X) = 1 in (d).
SLIDE 60
Properties of CE
Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].
SLIDE 61
Properties of CE
Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0.
SLIDE 62
Properties of CE
Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0. We say that the estimation error Y −E[Y|X] is orthogonal to every function h(X) of X.
SLIDE 63
Properties of CE
Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0. We say that the estimation error Y −E[Y|X] is orthogonal to every function h(X) of X. We call this the projection property.
SLIDE 64
Properties of CE
Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0. We say that the estimation error Y −E[Y|X] is orthogonal to every function h(X) of X. We call this the projection property. More about this later.
SLIDE 65
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1.
SLIDE 66
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X].
SLIDE 67
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X]
SLIDE 68
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X]
SLIDE 69
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X] = 2+5X +7XE[Y]+11X 2 +13X 3E[Z 2]
SLIDE 70
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X] = 2+5X +7XE[Y]+11X 2 +13X 3E[Z 2] = 2+5X +11X 2 +13X 3(var[Z]+E[Z]2)
SLIDE 71
Application: Calculating E[Y|X]
Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X] = 2+5X +7XE[Y]+11X 2 +13X 3E[Z 2] = 2+5X +11X 2 +13X 3(var[Z]+E[Z]2) = 2+5X +11X 2 +13X 3.
SLIDE 72
Application: Diluting
SLIDE 73
Application: Diluting
At each step, pick a ball from a well-mixed urn.
SLIDE 74
Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue ball.
SLIDE 75 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n.
SLIDE 76 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]?
SLIDE 77 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N
SLIDE 78 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball)
SLIDE 79 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise.
SLIDE 80 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N)
SLIDE 81 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N.
SLIDE 82 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently,
SLIDE 83 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]]
SLIDE 84 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]] = ρE[Xn],n ≥ 1.
SLIDE 85 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]] = ρE[Xn],n ≥ 1. = ⇒ E[Xn] = ρn−1E[X1]
SLIDE 86 Application: Diluting
At each step, pick a ball from a well-mixed urn. Replace it with a blue
- ball. Let Xn be the number of red balls in the urn at step n. What is
E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]] = ρE[Xn],n ≥ 1. = ⇒ E[Xn] = ρn−1E[X1] = N(N −1 N )n−1,n ≥ 1.
SLIDE 87
Diluting
Here is a plot:
SLIDE 88
Diluting
Here is a plot:
SLIDE 89
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
SLIDE 90
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result.
SLIDE 91
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k.
SLIDE 92
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked.
SLIDE 93
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1.
SLIDE 94
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}.
SLIDE 95
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N).
SLIDE 96
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)]
SLIDE 97
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)] = NE[Yn(1)]
SLIDE 98
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)] = NE[Yn(1)] = NPr[Yn(1) = 1]
SLIDE 99
Diluting
By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1
N )n−1,n ≥ 1.
Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)] = NE[Yn(1)] = NPr[Yn(1) = 1] = N[(N −1)/N]n−1.
SLIDE 100
Application: Mixing
SLIDE 101
Application: Mixing
At each step, pick a ball from each well-mixed urn.
SLIDE 102
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn.
SLIDE 103
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n.
SLIDE 104
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]?
SLIDE 105
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q
SLIDE 106
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down)
SLIDE 107
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down).
SLIDE 108
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down). Thus, E[Xn+1|Xn] = Xn +p −q
SLIDE 109
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down). Thus, E[Xn+1|Xn] = Xn +p −q = Xn +1−2Xn/N
SLIDE 110
Application: Mixing
At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down). Thus, E[Xn+1|Xn] = Xn +p −q = Xn +1−2Xn/N = 1+ρXn, ρ := (1−2/N).
SLIDE 111
Mixing
We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N).
SLIDE 112
Mixing
We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn]
SLIDE 113
Mixing
We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N
SLIDE 114
Mixing
We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N E[X4] = 1+ρ(1+ρ +ρ2N) = 1+ρ +ρ2 +ρ3N
SLIDE 115
Mixing
We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N E[X4] = 1+ρ(1+ρ +ρ2N) = 1+ρ +ρ2 +ρ3N E[Xn] = 1+ρ +···+ρn−2 +ρn−1N.
SLIDE 116
Mixing
We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N E[X4] = 1+ρ(1+ρ +ρ2N) = 1+ρ +ρ2 +ρ3N E[Xn] = 1+ρ +···+ρn−2 +ρn−1N. Hence, E[Xn] = 1−ρn−1 1−ρ +ρn−1N,n ≥ 1.
SLIDE 117
Application: Mixing
Here is the plot.
SLIDE 118
Application: Mixing
Here is the plot.
SLIDE 119
Application: Going Viral
Consider a social network (e.g., Twitter).
SLIDE 120
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor
SLIDE 121
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird).
SLIDE 122
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends.
SLIDE 123
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p.
SLIDE 124
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc.
SLIDE 125
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread?
SLIDE 126
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out
SLIDE 127
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)?
SLIDE 128
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)?
SLIDE 129
Application: Going Viral
Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)? In this example, d = 4.
SLIDE 130
Application: Going Viral
SLIDE 131
Application: Going Viral
Fact:
SLIDE 132
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn.
SLIDE 133
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
SLIDE 134
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p).
SLIDE 135
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd.
SLIDE 136
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn.
SLIDE 137
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1.
SLIDE 138
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1
SLIDE 139
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1.
SLIDE 140
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C.
SLIDE 141
Application: Going Viral
Fact: Let X = ∑∞
n=1 Xn. Then, E[X] < ∞ iff pd < 1.
Proof:
Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C. In fact, one can show that pd ≥ 1 = ⇒ Pr[X = ∞] > 0.
SLIDE 142
Application: Going Viral
SLIDE 143
Application: Going Viral
An easy extension:
SLIDE 144
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d.
SLIDE 145
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds.
SLIDE 146
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people,
SLIDE 147
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p).
SLIDE 148
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk).
SLIDE 149
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk).
SLIDE 150
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk.
SLIDE 151
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn,
SLIDE 152
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn].
SLIDE 153
Application: Going Viral
An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn]. We conclude as before.
SLIDE 154
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide.
SLIDE 155
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity
SLIDE 156
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where
SLIDE 157
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...}
SLIDE 158
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1.
SLIDE 159
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z].
SLIDE 160
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof:
SLIDE 161
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk.
SLIDE 162
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ.
SLIDE 163
Application: Wald’s Identity
Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ. Hence, E[X1 +···+XZ] = E[µZ] = µE[Z].
SLIDE 164
CE = MMSE
Theorem E[Y|X] is the ‘best’ guess about Y based on X.
SLIDE 165
CE = MMSE
Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].
SLIDE 166
CE = MMSE
Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].
SLIDE 167
CE = MMSE
Theorem CE = MMSE
SLIDE 168
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2].
SLIDE 169
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof:
SLIDE 170
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X.
SLIDE 171
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] =
SLIDE 172
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2]
SLIDE 173
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))].
SLIDE 174
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property.
SLIDE 175
CE = MMSE
Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property. Thus, E[(Y −h(X))2] ≥ E[(Y −g(X))2].
SLIDE 176
E[Y|X] and L[Y|X] as projections
SLIDE 177
E[Y|X] and L[Y|X] as projections
L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}:
SLIDE 178
E[Y|X] and L[Y|X] as projections
L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE
SLIDE 179
E[Y|X] and L[Y|X] as projections
L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}:
SLIDE 180
E[Y|X] and L[Y|X] as projections
L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}: MMSE.
SLIDE 181
Summary
Conditional Expectation
SLIDE 182
Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x]
SLIDE 183
Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X);
SLIDE 184
Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
SLIDE 185
Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
SLIDE 186 Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
◮ Calculating E[Y|X]
SLIDE 187 Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
◮ Calculating E[Y|X] ◮ Diluting
SLIDE 188 Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing
SLIDE 189 Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing ◮ Rumors
SLIDE 190 Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing ◮ Rumors ◮ Wald
SLIDE 191 Summary
Conditional Expectation
◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,
Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]
◮ Some Applications:
◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing ◮ Rumors ◮ Wald
◮ MMSE: E[Y|X] minimizes E[(Y −g(X))2] over all g(·)