CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean - - PowerPoint PPT Presentation

cs70 jean walrand lecture 34
SMART_READER_LITE
LIVE PREVIEW

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean - - PowerPoint PPT Presentation

CS70: Jean Walrand: Lecture 34. Conditional Expectation CS70: Jean Walrand: Lecture 34. Conditional Expectation 1. Review: joint distribution, LLSE 2. Definition of Conditional expectation 3. Properties of CE 4. Applications: Diluting,


slide-1
SLIDE 1

CS70: Jean Walrand: Lecture 34.

Conditional Expectation

slide-2
SLIDE 2

CS70: Jean Walrand: Lecture 34.

Conditional Expectation

  • 1. Review: joint distribution, LLSE
  • 2. Definition of Conditional expectation
  • 3. Properties of CE
  • 4. Applications: Diluting, Mixing, Rumors
  • 5. CE = MMSE
slide-3
SLIDE 3

Review

Definitions Let X and Y be RVs on Ω.

slide-4
SLIDE 4

Review

Definitions Let X and Y be RVs on Ω.

◮ Joint Distribution: Pr[X = x,Y = y]

slide-5
SLIDE 5

Review

Definitions Let X and Y be RVs on Ω.

◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y]

slide-6
SLIDE 6

Review

Definitions Let X and Y be RVs on Ω.

◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x]

slide-7
SLIDE 7

Review

Definitions Let X and Y be RVs on Ω.

◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x] ◮ LLSE:

L[Y|X] = a+bX where a,b minimize E[(Y −a−bX)2].

slide-8
SLIDE 8

Review

Definitions Let X and Y be RVs on Ω.

◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x] ◮ LLSE:

L[Y|X] = a+bX where a,b minimize E[(Y −a−bX)2]. We saw that L[Y|X] = E[Y]+ cov(X,Y) var[X] (X −E[X]).

slide-9
SLIDE 9

Review

Definitions Let X and Y be RVs on Ω.

◮ Joint Distribution: Pr[X = x,Y = y] ◮ Marginal Distribution: Pr[X = x] = ∑y Pr[X = x,Y = y] ◮ Conditional Distribution: Pr[Y = y|X = x] = Pr[X=x,Y=y] Pr[X=x] ◮ LLSE:

L[Y|X] = a+bX where a,b minimize E[(Y −a−bX)2]. We saw that L[Y|X] = E[Y]+ cov(X,Y) var[X] (X −E[X]). Recall the non-Bayesian and Bayesian viewpoints.

slide-10
SLIDE 10

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear.

slide-11
SLIDE 11

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight),

slide-12
SLIDE 12

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income),

slide-13
SLIDE 13

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk).

slide-14
SLIDE 14

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk).

slide-15
SLIDE 15

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal:

slide-16
SLIDE 16

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X!

slide-17
SLIDE 17

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X.

slide-18
SLIDE 18

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X. Ambitious!

slide-19
SLIDE 19

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X. Ambitious! Can it be done?

slide-20
SLIDE 20

Conditional Expectation: Motivation

There are many situations where a good guess about Y given X is not linear. E.g., (diameter of object, weight), (school years, income), (PSA level, cancer risk). Our goal: Derive the best estimate of Y given X! That is, find the function g(·) so that g(X) is the best guess about Y given X. Ambitious! Can it be done? Amazingly, yes!

slide-21
SLIDE 21

Conditional Expectation

Definition Let X and Y be RVs on Ω.

slide-22
SLIDE 22

Conditional Expectation

Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X)

slide-23
SLIDE 23

Conditional Expectation

Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑

y

yPr[Y = y|X = x].

slide-24
SLIDE 24

Conditional Expectation

Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑

y

yPr[Y = y|X = x]. Fact

E[Y|X = x] = ∑

ω

Y(ω)Pr[ω|X = x].

slide-25
SLIDE 25

Conditional Expectation

Definition Let X and Y be RVs on Ω. The conditional expectation of Y given X is defined as E[Y|X] = g(X) where g(x) := E[Y|X = x] := ∑

y

yPr[Y = y|X = x]. Fact

E[Y|X = x] = ∑

ω

Y(ω)Pr[ω|X = x]. Proof: E[Y|X = x] = E[Y|A] with A = {ω : X(ω) = x}.

slide-26
SLIDE 26

Deja vu, all over again?

Have we seen this before?

slide-27
SLIDE 27

Deja vu, all over again?

Have we seen this before? Yes.

slide-28
SLIDE 28

Deja vu, all over again?

Have we seen this before? Yes. Is anything new?

slide-29
SLIDE 29

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes.

slide-30
SLIDE 30

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X).

slide-31
SLIDE 31

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal?

slide-32
SLIDE 32

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite!

slide-33
SLIDE 33

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient.

slide-34
SLIDE 34

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X.

slide-35
SLIDE 35

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·).

slide-36
SLIDE 36

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX.

slide-37
SLIDE 37

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX. It could be that g(X) = a+bX +cX 2.

slide-38
SLIDE 38

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX. It could be that g(X) = a+bX +cX 2. Or that g(X) = 2sin(4X)+exp{−3X}.

slide-39
SLIDE 39

Deja vu, all over again?

Have we seen this before? Yes. Is anything new? Yes. The idea of defining g(x) = E[Y|X = x] and then E[Y|X] = g(X). Big deal? Quite! Simple but most convenient. Recall that L[Y|X] = a+bX is a function of X. This is similar: E[Y|X] = g(X) for some function g(·). In general, g(X) is not linear, i.e., not a+bX. It could be that g(X) = a+bX +cX 2. Or that g(X) = 2sin(4X)+exp{−3X}. Or something else.

slide-40
SLIDE 40

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x]

slide-41
SLIDE 41

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem

slide-42
SLIDE 42

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y];

slide-43
SLIDE 43

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X];

slide-44
SLIDE 44

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·);

slide-45
SLIDE 45

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·);

slide-46
SLIDE 46

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].

slide-47
SLIDE 47

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:

slide-48
SLIDE 48

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:

(a),(b) Obvious

slide-49
SLIDE 49

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:

(a),(b) Obvious (c) E[Yh(X)|X = x] = ∑

ω

Y(ω)h(X(ω)Pr[ω|X = x]

slide-50
SLIDE 50

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:

(a),(b) Obvious (c) E[Yh(X)|X = x] = ∑

ω

Y(ω)h(X(ω)Pr[ω|X = x] = ∑

ω

Y(ω)h(x)Pr[ω|X = x]

slide-51
SLIDE 51

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof:

(a),(b) Obvious (c) E[Yh(X)|X = x] = ∑

ω

Y(ω)h(X(ω)Pr[ω|X = x] = ∑

ω

Y(ω)h(x)Pr[ω|X = x] = h(x)E[Y|X = x]

slide-52
SLIDE 52

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

slide-53
SLIDE 53

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

(d) E[h(X)E[Y|X]] = ∑

x

h(x)E[Y|X = x]Pr[X = x]

slide-54
SLIDE 54

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

(d) E[h(X)E[Y|X]] = ∑

x

h(x)E[Y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[Y = y|X = x]Pr[X = x]

slide-55
SLIDE 55

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

(d) E[h(X)E[Y|X]] = ∑

x

h(x)E[Y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[Y = y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[X = x,y = y]

slide-56
SLIDE 56

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

(d) E[h(X)E[Y|X]] = ∑

x

h(x)E[Y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[Y = y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[X = x,y = y] = ∑

x,y

h(x)yPr[X = x,y = y]

slide-57
SLIDE 57

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

(d) E[h(X)E[Y|X]] = ∑

x

h(x)E[Y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[Y = y|X = x]Pr[X = x] = ∑

x

h(x)∑

y

yPr[X = x,y = y] = ∑

x,y

h(x)yPr[X = x,y = y] = E[h(X)Y].

slide-58
SLIDE 58

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

slide-59
SLIDE 59

Properties of CE

E[Y|X = x] = ∑

y

yPr[Y = y|X = x] Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Proof: (continued)

(e) Let h(X) = 1 in (d).

slide-60
SLIDE 60

Properties of CE

Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y].

slide-61
SLIDE 61

Properties of CE

Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0.

slide-62
SLIDE 62

Properties of CE

Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0. We say that the estimation error Y −E[Y|X] is orthogonal to every function h(X) of X.

slide-63
SLIDE 63

Properties of CE

Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0. We say that the estimation error Y −E[Y|X] is orthogonal to every function h(X) of X. We call this the projection property.

slide-64
SLIDE 64

Properties of CE

Theorem (a) X,Y independent ⇒ E[Y|X] = E[Y]; (b) E[aY +bZ|X] = aE[Y|X]+bE[Z|X]; (c) E[Yh(X)|X] = h(X)E[Y|X],∀h(·); (d) E[h(X)E[Y|X]] = E[h(X)Y],∀h(·); (e) E[E[Y|X]] = E[Y]. Note that (d) says that E[(Y −E[Y|X])h(X)] = 0. We say that the estimation error Y −E[Y|X] is orthogonal to every function h(X) of X. We call this the projection property. More about this later.

slide-65
SLIDE 65

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1.

slide-66
SLIDE 66

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X].

slide-67
SLIDE 67

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X]

slide-68
SLIDE 68

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X]

slide-69
SLIDE 69

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X] = 2+5X +7XE[Y]+11X 2 +13X 3E[Z 2]

slide-70
SLIDE 70

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X] = 2+5X +7XE[Y]+11X 2 +13X 3E[Z 2] = 2+5X +11X 2 +13X 3(var[Z]+E[Z]2)

slide-71
SLIDE 71

Application: Calculating E[Y|X]

Let X,Y,Z be i.i.d. with mean 0 and variance 1. We want to calculate E[2+5X +7XY +11X 2 +13X 3Z 2|X]. We find E[2+5X +7XY +11X 2 +13X 3Z 2|X] = 2+5X +7XE[Y|X]+11X 2 +13X 3E[Z 2|X] = 2+5X +7XE[Y]+11X 2 +13X 3E[Z 2] = 2+5X +11X 2 +13X 3(var[Z]+E[Z]2) = 2+5X +11X 2 +13X 3.

slide-72
SLIDE 72

Application: Diluting

slide-73
SLIDE 73

Application: Diluting

At each step, pick a ball from a well-mixed urn.

slide-74
SLIDE 74

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue ball.

slide-75
SLIDE 75

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n.
slide-76
SLIDE 76

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]?

slide-77
SLIDE 77

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N

slide-78
SLIDE 78

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball)

slide-79
SLIDE 79

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise.

slide-80
SLIDE 80

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N)

slide-81
SLIDE 81

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N.

slide-82
SLIDE 82

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently,

slide-83
SLIDE 83

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]]

slide-84
SLIDE 84

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]] = ρE[Xn],n ≥ 1.

slide-85
SLIDE 85

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]] = ρE[Xn],n ≥ 1. = ⇒ E[Xn] = ρn−1E[X1]

slide-86
SLIDE 86

Application: Diluting

At each step, pick a ball from a well-mixed urn. Replace it with a blue

  • ball. Let Xn be the number of red balls in the urn at step n. What is

E[Xn]? Given Xn = m, Xn+1 = m −1 w.p. m/N (if you pick a red ball) and Xn+1 = m otherwise. Hence, E[Xn+1|Xn = m] = m −(m/N) = m(N −1)/N = Xnρ, with ρ := (N −1)/N. Consequently, E[Xn+1] = E[E[Xn+1|Xn]] = ρE[Xn],n ≥ 1. = ⇒ E[Xn] = ρn−1E[X1] = N(N −1 N )n−1,n ≥ 1.

slide-87
SLIDE 87

Diluting

Here is a plot:

slide-88
SLIDE 88

Diluting

Here is a plot:

slide-89
SLIDE 89

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

slide-90
SLIDE 90

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result.

slide-91
SLIDE 91

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k.

slide-92
SLIDE 92

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked.

slide-93
SLIDE 93

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1.

slide-94
SLIDE 94

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}.

slide-95
SLIDE 95

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N).

slide-96
SLIDE 96

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)]

slide-97
SLIDE 97

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)] = NE[Yn(1)]

slide-98
SLIDE 98

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)] = NE[Yn(1)] = NPr[Yn(1) = 1]

slide-99
SLIDE 99

Diluting

By analyzing E[Xn+1|Xn], we found that E[Xn] = N( N−1

N )n−1,n ≥ 1.

Here is another argument for that result. Consider one particular red ball, say ball k. At each step, it remains red w.p. (N −1)/N, when another ball is picked. Thus, the probability that it is still red at step n is [(N −1)/N]n−1. Let Yn(k) = 1{ball k is red at step n}. Then, Xn = Yn(1)+···+Yn(N). Hence, E[Xn] = E[Yn(1)+···+Yn(N)] = NE[Yn(1)] = NPr[Yn(1) = 1] = N[(N −1)/N]n−1.

slide-100
SLIDE 100

Application: Mixing

slide-101
SLIDE 101

Application: Mixing

At each step, pick a ball from each well-mixed urn.

slide-102
SLIDE 102

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn.

slide-103
SLIDE 103

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n.

slide-104
SLIDE 104

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]?

slide-105
SLIDE 105

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q

slide-106
SLIDE 106

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down)

slide-107
SLIDE 107

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down).

slide-108
SLIDE 108

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down). Thus, E[Xn+1|Xn] = Xn +p −q

slide-109
SLIDE 109

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down). Thus, E[Xn+1|Xn] = Xn +p −q = Xn +1−2Xn/N

slide-110
SLIDE 110

Application: Mixing

At each step, pick a ball from each well-mixed urn. We transfer them to the other urn. Let Xn be the number of red balls in the bottom urn at step n. What is E[Xn]? Given Xn = m, Xn+1 = m +1 w.p. p and Xn+1 = m −1 w.p. q where p = (1−m/N)2 (B goes up, R down) and q = (m/N)2 (R goes up, B down). Thus, E[Xn+1|Xn] = Xn +p −q = Xn +1−2Xn/N = 1+ρXn, ρ := (1−2/N).

slide-111
SLIDE 111

Mixing

We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N).

slide-112
SLIDE 112

Mixing

We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn]

slide-113
SLIDE 113

Mixing

We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N

slide-114
SLIDE 114

Mixing

We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N E[X4] = 1+ρ(1+ρ +ρ2N) = 1+ρ +ρ2 +ρ3N

slide-115
SLIDE 115

Mixing

We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N E[X4] = 1+ρ(1+ρ +ρ2N) = 1+ρ +ρ2 +ρ3N E[Xn] = 1+ρ +···+ρn−2 +ρn−1N.

slide-116
SLIDE 116

Mixing

We saw that E[Xn+1|Xn] = 1+ρXn, ρ := (1−2/N). Hence, E[Xn+1] = 1+ρE[Xn] E[X2] = 1+ρN;E[X3] = 1+ρ(1+ρN) = 1+ρ +ρ2N E[X4] = 1+ρ(1+ρ +ρ2N) = 1+ρ +ρ2 +ρ3N E[Xn] = 1+ρ +···+ρn−2 +ρn−1N. Hence, E[Xn] = 1−ρn−1 1−ρ +ρn−1N,n ≥ 1.

slide-117
SLIDE 117

Application: Mixing

Here is the plot.

slide-118
SLIDE 118

Application: Mixing

Here is the plot.

slide-119
SLIDE 119

Application: Going Viral

Consider a social network (e.g., Twitter).

slide-120
SLIDE 120

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor

slide-121
SLIDE 121

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird).

slide-122
SLIDE 122

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends.

slide-123
SLIDE 123

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p.

slide-124
SLIDE 124

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc.

slide-125
SLIDE 125

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread?

slide-126
SLIDE 126

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out

slide-127
SLIDE 127

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)?

slide-128
SLIDE 128

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)?

slide-129
SLIDE 129

Application: Going Viral

Consider a social network (e.g., Twitter). You start a rumor (e.g., Walrand is really weird). You have d friends. Each of your friend retweets w.p. p. Each of your friends has d friends, etc. Does the rumor spread? Does it die out (mercifully)? In this example, d = 4.

slide-130
SLIDE 130

Application: Going Viral

slide-131
SLIDE 131

Application: Going Viral

Fact:

slide-132
SLIDE 132

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn.

slide-133
SLIDE 133

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

slide-134
SLIDE 134

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p).

slide-135
SLIDE 135

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd.

slide-136
SLIDE 136

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn.

slide-137
SLIDE 137

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1.

slide-138
SLIDE 138

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1

slide-139
SLIDE 139

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1.

slide-140
SLIDE 140

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C.

slide-141
SLIDE 141

Application: Going Viral

Fact: Let X = ∑∞

n=1 Xn. Then, E[X] < ∞ iff pd < 1.

Proof:

Given Xn = k, Xn+1 = B(kd,p). Hence, E[Xn+1|Xn = k] = kpd. Thus, E[Xn+1|Xn] = pdXn. Consequently, E[Xn] = (pd)n−1,n ≥ 1. If pd < 1, then E[X1 +···+Xn] ≤ (1−pd)−1 = ⇒ E[X] ≤ (1−pd)−1. If pd ≥ 1, then for all C one can find n s.t. E[X] ≥ E[X1 +···+Xn] ≥ C. In fact, one can show that pd ≥ 1 = ⇒ Pr[X = ∞] > 0.

slide-142
SLIDE 142

Application: Going Viral

slide-143
SLIDE 143

Application: Going Viral

An easy extension:

slide-144
SLIDE 144

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d.

slide-145
SLIDE 145

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds.

slide-146
SLIDE 146

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people,

slide-147
SLIDE 147

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p).

slide-148
SLIDE 148

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk).

slide-149
SLIDE 149

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk).

slide-150
SLIDE 150

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk.

slide-151
SLIDE 151

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn,

slide-152
SLIDE 152

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn].

slide-153
SLIDE 153

Application: Going Viral

An easy extension: Assume that everyone has an independent number Di of friends with E[Di] = d. Then, the same fact holds. To see this, note that given Xn = k, and given the numbers of friends D1 = d1,...,Dk = dk of these Xn people, one has Xn+1 = B(d1 +···+dk,p). Hence, E[Xn+1|Xn = k,D1 = d1,...,Dk = dk] = p(d1 +···+dk). Thus, E[Xn+1|Xn = k,D1,...,Dk] = p(D1 +···+Dk). Consequently, E[Xn+1|Xn = k] = E[p(D1 +···+Dk)] = pdk. Finally, E[Xn+1|Xn] = pdXn, and E[Xn+1] = pdE[Xn]. We conclude as before.

slide-154
SLIDE 154

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide.

slide-155
SLIDE 155

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity

slide-156
SLIDE 156

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where

slide-157
SLIDE 157

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...}

slide-158
SLIDE 158

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1.

slide-159
SLIDE 159

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z].

slide-160
SLIDE 160

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof:

slide-161
SLIDE 161

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk.

slide-162
SLIDE 162

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ.

slide-163
SLIDE 163

Application: Wald’s Identity

Here is an extension of an identity we used in the last slide. Theorem Wald’s Identity Assume that X1,X2,... and Z are independent, where Z takes values in {0,1,2,...} and E[Xn] = µ for all n ≥ 1. Then, E[X1 +···+XZ] = µE[Z]. Proof: E[X1 +···+XZ|Z = k] = µk. Thus, E[X1 +···+XZ|Z] = µZ. Hence, E[X1 +···+XZ] = E[µZ] = µE[Z].

slide-164
SLIDE 164

CE = MMSE

Theorem E[Y|X] is the ‘best’ guess about Y based on X.

slide-165
SLIDE 165

CE = MMSE

Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].

slide-166
SLIDE 166

CE = MMSE

Theorem E[Y|X] is the ‘best’ guess about Y based on X. Specifically, it is the function g(X) of X that minimizes E[(Y −g(X))2].

slide-167
SLIDE 167

CE = MMSE

Theorem CE = MMSE

slide-168
SLIDE 168

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2].

slide-169
SLIDE 169

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof:

slide-170
SLIDE 170

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X.

slide-171
SLIDE 171

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] =

slide-172
SLIDE 172

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2]

slide-173
SLIDE 173

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))].

slide-174
SLIDE 174

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property.

slide-175
SLIDE 175

CE = MMSE

Theorem CE = MMSE g(X) := E[Y|X] is the function of X that minimizes E[(Y −g(X))2]. Proof: Let h(X) be any function of X. Then E[(Y −h(X))2] = E[(Y −g(X)+g(X)−h(X))2] = E[(Y −g(X))2]+E[(g(X)−h(X))2] +2E[(Y −g(X))(g(X)−h(X))]. But, E[(Y −g(X))(g(X)−h(X))] = 0 by the projection property. Thus, E[(Y −h(X))2] ≥ E[(Y −g(X))2].

slide-176
SLIDE 176

E[Y|X] and L[Y|X] as projections

slide-177
SLIDE 177

E[Y|X] and L[Y|X] as projections

L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}:

slide-178
SLIDE 178

E[Y|X] and L[Y|X] as projections

L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE

slide-179
SLIDE 179

E[Y|X] and L[Y|X] as projections

L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}:

slide-180
SLIDE 180

E[Y|X] and L[Y|X] as projections

L[Y|X] is the projection of Y on {a+bX,a,b ∈ ℜ}: LLSE E[Y|X] is the projection of Y on {g(X),g(·) : ℜ → ℜ}: MMSE.

slide-181
SLIDE 181

Summary

Conditional Expectation

slide-182
SLIDE 182

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x]

slide-183
SLIDE 183

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X);

slide-184
SLIDE 184

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

slide-185
SLIDE 185

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

slide-186
SLIDE 186

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

◮ Calculating E[Y|X]

slide-187
SLIDE 187

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

◮ Calculating E[Y|X] ◮ Diluting

slide-188
SLIDE 188

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing

slide-189
SLIDE 189

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing ◮ Rumors

slide-190
SLIDE 190

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing ◮ Rumors ◮ Wald

slide-191
SLIDE 191

Summary

Conditional Expectation

◮ Definition: E[Y|X] := ∑y yPr[Y = y|X = x] ◮ Properties: Linearity,

Y −E[Y|X] ⊥ h(X); E[E[Y|X]] = E[Y]

◮ Some Applications:

◮ Calculating E[Y|X] ◮ Diluting ◮ Mixing ◮ Rumors ◮ Wald

◮ MMSE: E[Y|X] minimizes E[(Y −g(X))2] over all g(·)