SLIDE 1
Expectation DS GA 1002 Probability and Statistics for Data Science - - PowerPoint PPT Presentation
Expectation DS GA 1002 Probability and Statistics for Data Science - - PowerPoint PPT Presentation
Expectation DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Aim Describe random variables with a few numbers: mean, variance, covariance Expectation
SLIDE 2
SLIDE 3
Expectation operator Mean and variance Covariance Conditional expectation
SLIDE 4
Discrete random variables
Average of the values of a function weighted by the pmf E (g (X)) =
- x∈R
g (x) pX (x) E (g (X, Y )) =
- x∈RX
- x∈RY
g (x, y) pX,Y (x, y) E
- g
- X
- =
- x1
- x2
· · ·
- xn
g ( x) p
X (
x)
SLIDE 5
Continuous random variables
Average of the values of a function weighted by the pdf E (g (X)) = ∞
x=−∞
g (x) fX (x) dx E (g (X, Y )) = ∞
x=−∞
∞
y=−∞
g (x, y) fX,Y (x, y) dx dy E
- g
- X
- =
∞
x1=−∞
∞
x2=−∞
· · · ∞
xn=−∞
g ( x) f
X (
x) dx1 dx2 . . . dxn
SLIDE 6
Discrete and continuous random variables
E (g (C, D)) = ∞
c=−∞
- d∈RD
g (c, d) fC (c) pD|C (d|c) dc =
- d∈RD
∞
c=−∞
g (c, d) pD (d) fC|D (c|d) dc
SLIDE 7
St Petersburg paradox
A casino offers you a game Flip an unbiased coin until it lands on heads You get 2k dollars where k = number of flips Expected gain?
SLIDE 8
St Petersburg paradox
E (Gain) =
∞
- k=1
2k · 1 2k
SLIDE 9
St Petersburg paradox
E (Gain) =
∞
- k=1
2k · 1 2k = ∞
SLIDE 10
Linearity of expectation
For any constants a and b and any functions g1 and g2 E (a g1 (X, Y ) + b g2 (X, Y )) = a E (g1 (X, Y )) + b E (g2 (X, Y )) Follows from linearity of sums and integrals
- x∈RX
- x∈RY
(ag1 (x, y) + bg2 (x, y))pX,Y (x, y) = a
- x∈RX
- x∈RY
g1 (x, y) pX,Y (x, y) + b
- x∈RX
- x∈RY
g2 (x, y) pX,Y (x, y)
SLIDE 11
Example: Coffee beans
◮ Company buys coffee beans from two local producers ◮ Beans from Colombia: C tons/year ◮ Beans from Vietnam: V tons/year ◮ Model:
◮ C uniform between 0 and 1 ◮ V uniform between 0 and 2 ◮ C and V independent
◮ What is the expected total amount of beans B?
SLIDE 12
Example: Coffee beans
E (C + V )
SLIDE 13
Example: Coffee beans
E (C + V ) = E (C) + E (V )
SLIDE 14
Example: Coffee beans
E (C + V ) = E (C) + E (V ) = 0.5 + 1 = 1.5 tons
SLIDE 15
Example: Coffee beans
E (C + V ) = E (C) + E (V ) = 0.5 + 1 = 1.5 tons Holds even if C and V are not independent
SLIDE 16
Independence
If X, Y are independent then E (g (X) h (Y )) = E (g (X)) E (h (Y ))
SLIDE 17
Independence
E (g (X) h (Y )) = ∞
x=−∞
∞
y=−∞
g (x) h (y) fX,Y (x, y) dx dy
SLIDE 18
Independence
E (g (X) h (Y )) = ∞
x=−∞
∞
y=−∞
g (x) h (y) fX,Y (x, y) dx dy = ∞
x=−∞
∞
y=−∞
g (x) h (y) fX (x) fY (y) dx dy
SLIDE 19
Independence
E (g (X) h (Y )) = ∞
x=−∞
∞
y=−∞
g (x) h (y) fX,Y (x, y) dx dy = ∞
x=−∞
∞
y=−∞
g (x) h (y) fX (x) fY (y) dx dy = E (g (X)) E (h (Y ))
SLIDE 20
Expectation operator Mean and variance Covariance Conditional expectation
SLIDE 21
Mean
The mean or first moment of X is E (X) It’s the center of mass of the distribution
SLIDE 22
Bernoulli
E (X) = 0 · pX (0) + 1 · pX (1) = p
SLIDE 23
Binomial
A binomial is a sum of n Bernoulli random variables X =
n
- i=1
Bi
SLIDE 24
Binomial
A binomial is a sum of n Bernoulli random variables X =
n
- i=1
Bi E (X) = E n
- i=1
Bi
SLIDE 25
Binomial
A binomial is a sum of n Bernoulli random variables X =
n
- i=1
Bi E (X) = E n
- i=1
Bi
- =
n
- i=1
E (Bi)
SLIDE 26
Binomial
A binomial is a sum of n Bernoulli random variables X =
n
- i=1
Bi E (X) = E n
- i=1
Bi
- =
n
- i=1
E (Bi) = np
SLIDE 27
Mean of important random variables
Random variable Parameters Mean Bernoulli p p Geometric p
1 p
Binomial n, p np Poisson λ λ Uniform a, b
a+b 2
Exponential λ
1 λ
Gaussian µ, σ µ
SLIDE 28
Cauchy random variable
−10 −5 5 10 0.1 0.2 0.3 x fX (x) fX(x) = 1 π(1 + x2).
SLIDE 29
Cauchy random variable
E(X) = ∞
−∞
x π(1 + x2) dx = ∞ x π(1 + x2) dx − ∞ x π(1 + x2) dx
SLIDE 30
Cauchy random variable
E(X) = ∞
−∞
x π(1 + x2) dx = ∞ x π(1 + x2) dx − ∞ x π(1 + x2) dx ∞ x π(1 + x2) dx = ∞ 1 2π(1 + t)dt = lim
t→∞
log(1 + t) 2π
SLIDE 31
Cauchy random variable
E(X) = ∞
−∞
x π(1 + x2) dx = ∞ x π(1 + x2) dx − ∞ x π(1 + x2) dx ∞ x π(1 + x2) dx = ∞ 1 2π(1 + t)dt = lim
t→∞
log(1 + t) 2π = ∞
SLIDE 32
Mean of a random vector
Vector formed by the means of its components E
- X
- :=
E (X1) E (X2) · · · E (Xn) By linearity of expectation, for any matrix A ∈ Rm×n and b ∈ Rm E
- A
X + b
- = A E
- X
- +
b
SLIDE 33
The mean as a typical value
The mean is a typical value of the random variable The probability that X equals E (X) can be zero The mean can be severely distorted by a subset of extreme values
SLIDE 34
Density with subset of extreme values
20 40 60 80 100 0.1 x fX (x) Uniform random variable X with support [−4.5, 4.5] ∪ [99.5, 100.5]
SLIDE 35
Density with subset of extreme values
E (X) = 4.5
x=−4.5
x fX (x) dx + 100.5
x=99.5
x fX (x) dx = 1 10 100.52 − 99.52 2 = 10
SLIDE 36
Density with subset of extreme values
20 40 60 80 100 0.1 x fX (x)
SLIDE 37
Median
Midpoint of the distribution: number m such that P (X ≤ m) ≥ 1 2 and P (X ≥ m) ≥ 1 2 For continuous random variables FX (m) = m
−∞
fX (x) dx = 1 2
SLIDE 38
Density with subset of extreme values
FX (m) = m
−4.5
fX (x) dx = m + 4.5 10
SLIDE 39
Density with subset of extreme values
FX (m) = m
−4.5
fX (x) dx = m + 4.5 10 = 1 2 ⇒ m = 0.5
SLIDE 40
Density with subset of extreme values
20 40 60 80 100 0.1 x fX (x) Mean Median
SLIDE 41
Variance
The mean square or second moment of X is E
- X 2
The variance of X is Var (X) := E
- (X − E (X))2
= E
- X 2 − 2XE (X) + E2 (X)
- = E
- X 2
− E2 (X) The standard deviation of X is σX :=
- Var (X)
SLIDE 42
Bernoulli
E
- X 2
= 0 · pX (0) + 1 · pX (1) = p Var (X) = E
- X 2
− E2 (X) = p − p2 = p (1 − p)
SLIDE 43
Variance of common random variables
Random variable Parameters Variance Bernoulli p p (1 − p) Geometric p
1−p p2
Binomial n, p np (1 − p) Poisson λ λ Uniform a, b
(b−a)2 12
Exponential λ
1 λ2
Gaussian µ, σ σ2
SLIDE 44
Geometric (p = 0.2)
5 10 15 20 5 · 10−2 0.1 0.15 0.2 k pX(k)
SLIDE 45
Binomial (n = 20, p = 0.5)
5 10 15 20 5 · 10−2 0.1 0.15 0.2 k
SLIDE 46
Poisson (λ = 25)
10 20 30 40 2 4 6 8 ·10−2 k
SLIDE 47
Uniform [0, 1]
−0.5 0.5 1 1.5 0.2 0.4 0.6 0.8 1 x fX (x)
SLIDE 48
Exponential (λ = 1)
1 2 3 4 5 0.2 0.4 0.6 0.8 1 x
SLIDE 49
Gaussian (µ = 0, σ = 1)
−4 −2 2 4 0.1 0.2 0.3 0.4 x
SLIDE 50
Variance
The variance operator is not linear, but Var (a X + b) = E
- (a X + b − E (a X + b))2
= E
- (a X + b − aE (X) − b)2
= a2 E
- (X − E (X))2
= a2 Var (X)
SLIDE 51
Bounding probabilities using expectations
Aim: Characterize behavior of X to some extent using E (X) and Var (X)
SLIDE 52
Markov’s inequality
For any nonnegative random variable X and any a > 0 P (X ≥ a) ≤ E (X) a
SLIDE 53
Markov’s inequality
Consider the indicator variable 1X≥a X − a 1X≥a ≥ 0
SLIDE 54
Markov’s inequality
Consider the indicator variable 1X≥a X − a 1X≥a ≥ 0 E (X) ≥ a E (1X≥a)
SLIDE 55
Markov’s inequality
Consider the indicator variable 1X≥a X − a 1X≥a ≥ 0 E (X) ≥ a E (1X≥a) = a P (X ≥ a)
SLIDE 56
Age of students at NYU
Mean: 20 years How many are younger than 30?
SLIDE 57
Age of students at NYU
Mean: 20 years How many are younger than 30? P(A ≥ 30) ≤ E (A) 30
SLIDE 58
Age of students at NYU
Mean: 20 years How many are younger than 30? P(A ≥ 30) ≤ E (A) 30 = 2 3 At least 1/3
SLIDE 59
Chebyshev’s inequality
For any positive constant a > 0, P (|X − E (X)| ≥ a) ≤ Var (X) a 2
SLIDE 60
Chebyshev’s inequality
For any positive constant a > 0, P (|X − E (X)| ≥ a) ≤ Var (X) a 2 Corollary: If Var (X) = 0 then P (X = E (X)) = 0
SLIDE 61
Chebyshev’s inequality
For any positive constant a > 0, P (|X − E (X)| ≥ a) ≤ Var (X) a 2 Corollary: If Var (X) = 0 then P (X = E (X)) = 0 For any ǫ > 0 P (|X − E (X)| ≥ ǫ) ≤ Var (X) ǫ2 = 0
SLIDE 62
Chebyshev’s inequality
Define Y := (X − E (X))2 By Markov’s inequality P (|X − E (X)| ≥ a) = P
- Y ≥ a2
SLIDE 63
Chebyshev’s inequality
Define Y := (X − E (X))2 By Markov’s inequality P (|X − E (X)| ≥ a) = P
- Y ≥ a2
≤ E (Y ) a2
SLIDE 64
Chebyshev’s inequality
Define Y := (X − E (X))2 By Markov’s inequality P (|X − E (X)| ≥ a) = P
- Y ≥ a2
≤ E (Y ) a2 = Var (X) a 2
SLIDE 65
Age of students at NYU
Mean: 20 years, standard deviation: 3 years How many are younger than 30?
SLIDE 66
Age of students at NYU
Mean: 20 years, standard deviation: 3 years How many are younger than 30? P(A ≥ 30) ≤ P(|A − 20| ≥ 10)
SLIDE 67
Age of students at NYU
Mean: 20 years, standard deviation: 3 years How many are younger than 30? P(A ≥ 30) ≤ P(|A − 20| ≥ 10) ≤ Var (A) 100 = 9 100 At least 91 %
SLIDE 68
Expectation operator Mean and variance Covariance Conditional expectation
SLIDE 69
Covariance
The covariance of X and Y is Cov (X, Y ) := E ((X − E (X)) (Y − E (Y ))) = E (XY − Y E (X) − XE (Y ) + E (X) E (Y )) = E (XY ) − E (X) E (Y ) If Cov (X, Y ) = 0, X and Y are uncorrelated
SLIDE 70
Covariance
Cov (X, Y ) 0.5 0.9 0.99 Cov (X, Y )
- 0.9
- 0.99
SLIDE 71
Variance of the sum
Var (X + Y ) = E
- (X + Y − E (X + Y ))2
= E
- (X − E (X))2
+ E
- (Y − E (Y ))2
+ 2E ((X − E (X)) (Y − E (Y ))) = Var (X) + Var (Y ) + 2 Cov (X, Y )
SLIDE 72
Variance of the sum
Var (X + Y ) = E
- (X + Y − E (X + Y ))2
= E
- (X − E (X))2
+ E
- (Y − E (Y ))2
+ 2E ((X − E (X)) (Y − E (Y ))) = Var (X) + Var (Y ) + 2 Cov (X, Y ) If X and Y are uncorrelated, then Var (X + Y ) = Var (X) + Var (Y )
SLIDE 73
Independence implies uncorrelation
Cov (X, Y ) = E (XY ) − E (X) E (Y ) = E (X) E (Y ) − E (X) E (Y ) = 0
SLIDE 74
Uncorrelation does not imply independence
X, Y are independent Bernoulli with parameter 1
2
Let U = X + Y and V = X − Y Are U and V independent? Are they uncorrelated?
SLIDE 75
Uncorrelation does not imply independence
pU (0) pV (0) pU,V (0, 0)
SLIDE 76
Uncorrelation does not imply independence
pU (0) = P (X = 0, Y = 0) = 1 4 pV (0) pU,V (0, 0)
SLIDE 77
Uncorrelation does not imply independence
pU (0) = P (X = 0, Y = 0) = 1 4 pV (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 pU,V (0, 0)
SLIDE 78
Uncorrelation does not imply independence
pU (0) = P (X = 0, Y = 0) = 1 4 pV (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 pU,V (0, 0) = P (X = 0, Y = 0) = 1 4
SLIDE 79
Uncorrelation does not imply independence
pU (0) = P (X = 0, Y = 0) = 1 4 pV (0) = P (X = 1, Y = 1) + P (X = 0, Y = 0) = 1 2 pU,V (0, 0) = P (X = 0, Y = 0) = 1 4 = pU (0) pV (0) = 1 8
SLIDE 80
Uncorrelation does not imply independence
Cov (U, V ) = E (UV ) − E (U) E (V ) = E ((X + Y ) (X − Y )) − E (X + Y ) E (X − Y ) = E
- X 2
− E
- Y 2
− E2 (X) + E2 (Y )
SLIDE 81
Uncorrelation does not imply independence
Cov (U, V ) = E (UV ) − E (U) E (V ) = E ((X + Y ) (X − Y )) − E (X + Y ) E (X − Y ) = E
- X 2
− E
- Y 2
− E2 (X) + E2 (Y ) = 0
SLIDE 82
Correlation coefficient
Pearson correlation coefficient of X and Y ρX,Y := Cov (X, Y ) σXσY . Covariance between X/σX and Y /σY
SLIDE 83
Correlation coefficient
σY = 1, Cov (X, Y ) = 0.9, ρX,Y = 0.9 σY = 3, Cov (X, Y ) = 0.9, ρX,Y = 0.3 σY = 3, Cov (X, Y ) = 2.7, ρX,Y = 0.9
SLIDE 84
Cauchy-Schwarz inequality
For any X and Y |E (XY )| ≤
- E (X 2) E (Y 2).
and E (XY ) =
- E (X 2) E (Y 2) ⇐
⇒ Y =
- E (Y 2)
E (X 2)X E (XY ) = −
- E (X 2) E (Y 2) ⇐
⇒ Y = −
- E (Y 2)
E (X 2)X
SLIDE 85
Cauchy-Schwarz inequality
We have Cov (X, Y ) ≤ σXσY and equivalently |ρX,Y | ≤ 1 In addition |ρX,Y | = 1 ⇐ ⇒ Y = c X + d where c :=
- σY
σX
if ρX,Y = 1, − σY
σX
if ρX,Y = −1, d := E (Y ) − cE (X)
SLIDE 86
Covariance matrix of a random vector
The covariance matrix of X is defined as Σ
X =
Var (X1) Cov (X1, X2) · · · Cov (X1, Xn) Cov (X2, X1) Var (X2) · · · Cov (X2, Xn) . . . . . . ... . . . Cov (Xn, X2) Cov (Xn, X2) · · · Var (Xn) = E
- X
X T − E
- X
- E
- X
T
SLIDE 87
Covariance matrix after a linear transformation
ΣA
X+ b
SLIDE 88
Covariance matrix after a linear transformation
ΣA
X+ b = E
- A
X + b A X + b T − E
- A
X + b
- E
- A
X + b T
SLIDE 89
Covariance matrix after a linear transformation
ΣA
X+ b = E
- A
X + b A X + b T − E
- A
X + b
- E
- A
X + b T = A E
- X
X T AT + b E
- X
T AT + A E
- X
- bT +
b bT − A E
- X
- E
- X
T AT − A E
- X
- bT −
b E
- X
T AT − b bT
SLIDE 90
Covariance matrix after a linear transformation
ΣA
X+ b = E
- A
X + b A X + b T − E
- A
X + b
- E
- A
X + b T = A E
- X
X T AT + b E
- X
T AT + A E
- X
- bT +
b bT − A E
- X
- E
- X
T AT − A E
- X
- bT −
b E
- X
T AT − b bT = A
- E
- X
X T − E
- X
- E
- X
T AT
SLIDE 91
Covariance matrix after a linear transformation
ΣA
X+ b = E
- A
X + b A X + b T − E
- A
X + b
- E
- A
X + b T = A E
- X
X T AT + b E
- X
T AT + A E
- X
- bT +
b bT − A E
- X
- E
- X
T AT − A E
- X
- bT −
b E
- X
T AT − b bT = A
- E
- X
X T − E
- X
- E
- X
T AT = AΣ
XAT
SLIDE 92
Variance in a fixed direction
For any unit vector u Var
- uT
X
- =
uTΣ
X
u
SLIDE 93
Direction of maximum variance
To find direction of maximum variance we must solve arg max
|| u||2=1
uTΣ
X
u
SLIDE 94
Linear algebra
Symmetric matrices have orthogonal eigenvectors Σ
X = UΛUT
=
- u1
- u2
· · ·
- un
-
λ1 · · · λ2 · · · · · · · · · λn
- u1
- u2
· · ·
- un
T
SLIDE 95
Linear algebra
λ1 = max
||u||2=1 uTAu
u1 = arg max
||u||2=1 uTAu
λk = max
||u||2=1,u⊥u1,...,uk−1
uTAu uk = arg max
||u||2=1,u⊥u1,...,uk−1
uTAu
SLIDE 96
Direction of maximum variance
√λ1 = 1.22, √λ2 = 0.71 √λ1 = 1, √λ2 = 1 √λ1 = 1.38, √λ2 = 0.32
SLIDE 97
Coloring
Goal: Transform uncorrelated samples with unit variance so that they have a prescribed covariance matrix Σ
- 1. Compute the eigendecomposition Σ = UΛUT.
- 2. Set
- y := U
√ Λ x where √ Λ := √λ1 · · · √λ2 · · · · · · · · · √λn
SLIDE 98
Coloring
Σ
Y
SLIDE 99
Coloring
Σ
Y = U
√ ΛΣ
X
√ Λ
TUT
SLIDE 100
Coloring
Σ
Y = U
√ ΛΣ
X
√ Λ
TUT
= U √ ΛI √ Λ
TUT
SLIDE 101
Coloring
Σ
Y = U
√ ΛΣ
X
√ Λ
TUT
= U √ ΛI √ Λ
TUT
= Σ
SLIDE 102
Coloring
- X
√ Λ X U √ Λ X
SLIDE 103
Generating Gaussian random vectors
Goal: Sampling from an n-dimensional Gaussian random vector with mean µ and covariance matrix Σ
- 1. Generate n independent standard Gaussian samples
x
- 2. Compute the eigendecomposition Σ = UΛUT
- 3. Set
- y := U
√ Λ x + µ For non-Gaussian random vectors, coloring does not necessarily preserve the distribution
SLIDE 104
For Gaussian rvs uncorrelation implies mutual independence
Uncorrelation implies Σ
X =
σ2
1
· · · σ2
2
· · · . . . . . . ... . . . · · · σ2
n
which in turn implies f
X (
x) = 1
- (2π)n |Σ|
exp
- −1
2 ( x − µ)T Σ−1 ( x − µ)
SLIDE 105
For Gaussian rvs uncorrelation implies mutual independence
Uncorrelation implies Σ
X =
σ2
1
· · · σ2
2
· · · . . . . . . ... . . . · · · σ2
n
which in turn implies f
X (
x) = 1
- (2π)n |Σ|
exp
- −1
2 ( x − µ)T Σ−1 ( x − µ)
- =
n
- i=1
1
- (2π)σi
exp
- −(xi − µi)2
2σ2
i
SLIDE 106
For Gaussian rvs uncorrelation implies mutual independence
Uncorrelation implies Σ
X =
σ2
1
· · · σ2
2
· · · . . . . . . ... . . . · · · σ2
n
which in turn implies f
X (
x) = 1
- (2π)n |Σ|
exp
- −1
2 ( x − µ)T Σ−1 ( x − µ)
- =
n
- i=1
1
- (2π)σi
exp
- −(xi − µi)2
2σ2
i
- =
n
- i=1
fXi (xi)
SLIDE 107
Expectation operator Mean and variance Covariance Conditional expectation
SLIDE 108
Conditional expectation
Expectation of g (X, Y ) given X = x? E (g (X, Y ) |X = x) = ∞
y=−∞
g(x, y) fY |X (y|x) dy, Can be interpreted as a function h (x) := E (g (X, Y ) |X = x) The conditional expectation of g (X, Y ) given X is E (g (X, Y ) |X) := h (X) It’s a random variable
SLIDE 109
Iterated expectation
For any X and Y and any function g : R2 → R E (g (X, Y )) = E (E (g (X, Y ) |X))
SLIDE 110
Iterated expectation
h (x) := E (g (X, Y ) |X = x) = ∞
y=−∞
g (x, y) fY |X (y|x) dy
SLIDE 111
Iterated expectation
h (x) := E (g (X, Y ) |X = x) = ∞
y=−∞
g (x, y) fY |X (y|x) dy E (E (g (X, Y ) |X)) = E (h (X))
SLIDE 112
Iterated expectation
h (x) := E (g (X, Y ) |X = x) = ∞
y=−∞
g (x, y) fY |X (y|x) dy E (E (g (X, Y ) |X)) = E (h (X)) = ∞
x=−∞
h (x) fX (x) dx
SLIDE 113
Iterated expectation
h (x) := E (g (X, Y ) |X = x) = ∞
y=−∞
g (x, y) fY |X (y|x) dy E (E (g (X, Y ) |X)) = E (h (X)) = ∞
x=−∞
h (x) fX (x) dx = ∞
x=−∞
∞
y=−∞
fX (x) fY |X (y|x) g (x, y) dy dx
SLIDE 114
Iterated expectation
h (x) := E (g (X, Y ) |X = x) = ∞
y=−∞
g (x, y) fY |X (y|x) dy E (E (g (X, Y ) |X)) = E (h (X)) = ∞
x=−∞
h (x) fX (x) dx = ∞
x=−∞
∞
y=−∞
fX (x) fY |X (y|x) g (x, y) dy dx = E (g (X, Y ))
SLIDE 115
Example: Desert
◮ Car traveling through the desert ◮ Time until the car breaks down: T ◮ State of the motor: M ◮ State of the road: R ◮ Model:
◮ M uniform between 0 (no problem) and 1 (very bad) ◮ R uniform between 0 (no problem) and 1 (very bad) ◮ M and R independent ◮ T exponential with parameter M + R
SLIDE 116
Example: Desert
E (T) = E (E (T|M, R))
SLIDE 117
Example: Desert
E (T) = E (E (T|M, R)) = E
- 1
M + R
SLIDE 118
Example: Desert
E (T) = E (E (T|M, R)) = E
- 1
M + R
- =
1 1 1 m + r dm dr
SLIDE 119
Example: Desert
E (T) = E (E (T|M, R)) = E
- 1
M + R
- =
1 1 1 m + r dm dr = 1 log (r + 1) − log (r) dr
SLIDE 120
Example: Desert
E (T) = E (E (T|M, R)) = E
- 1
M + R
- =
1 1 1 m + r dm dr = 1 log (r + 1) − log (r) dr = log 4 = 1.39
SLIDE 121
Grizzlies in Yellowstone
Model for the weight of grizzly bears in Yellowstone: Males: Gaussian with µ := 240 kg and σ := 40kg Females: Gaussian with µ := 140 kg and σ := 20kg There are about the same number of females and males
SLIDE 122
Grizzlies in Yellowstone
E (W ) = E (E (W |S))
SLIDE 123
Grizzlies in Yellowstone
E (W ) = E (E (W |S)) = E (W |S = 0) + E (W |S = 1) 2
SLIDE 124
Grizzlies in Yellowstone
E (W ) = E (E (W |S)) = E (W |S = 0) + E (W |S = 1) 2 = 180 kg
SLIDE 125
Bayesian coin flip
Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution
◮ You suspect a coin is biased ◮ You are uncertain about the bias so you model it as a random variable
with pdf fB (b) = 2t for t ∈ [0, 1]
◮ What is the expected value of the coin flip X?
SLIDE 126
Bayesian coin flip
E (X) = E (E (X|B))
SLIDE 127
Bayesian coin flip
E (X) = E (E (X|B)) = E (B)
SLIDE 128
Bayesian coin flip
E (X) = E (E (X|B)) = E (B) = 1 2b2 db
SLIDE 129