SLIDE 1
Multivariate random variables
DS GA 1002 Statistical and Mathematical Models
http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda
SLIDE 2 Joint distributions
Tool to characterize several uncertain numerical quantities of interest within the same probabilistic model We can group the variables into random vectors
X1 X2 · · · Xn
SLIDE 3
Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables
SLIDE 4 Joint probability mass function
The joint pmf of X and Y is defined as pX,Y (x, y) := P (X = x, Y = y) It is the probability of X, Y being equal to x, y respectively By the definition of a probability measure pX,Y (x, y) ≥ 0 for any x ∈ RX, y ∈ RY
pX,Y (x, y) = 1
SLIDE 5 Joint probability mass function
The joint pmf of a discrete random vector X is p
X (
x) := P (X1 = x1, X2 = x2, . . . , Xn = xn) It is the probability of X being equal to x By the definition of a probability measure p
X (
x) ≥ 0
· · ·
p
X (
x) = 1
SLIDE 6 Joint probability mass function
By the Law of Total Probability, for any set S ∈ RX × Ry, P ((X, Y ) ∈ S) = P
- ∪(x,y)∈S {X = x, Y = y}
- (union of disjoint events)
=
P (X = x, Y = y) =
pX,Y (x, y) Similarly, for any discrete set S ⊆ Rn P
p
X (
x)
SLIDE 7 Marginalization
To compute the marginal pmf of X from the joint pmf pX,Y pX (x) = P (X = x) = P (∪y∈RY {X = x, Y = y}) (union of disjoint events) =
P (X = x, Y = y) =
pX,Y (x, y) This is called marginalizing over Y
SLIDE 8 Marginalization
Marginal pmf of a subvector XI, I ⊆ {1, 2, . . . , n}, p
XI (
xI) =
· · ·
p
X (
x) {j1, j2, . . . , jn−m} := {1, 2, . . . , n} /I
SLIDE 9
Conditional probability mass function
The conditional pmf of Y given X is pY |X (y|x) = P (Y = y|X = x) = pX,Y (x, y) pX (x) , as long as pX (x) > 0 Valid pmf parametrized by x Chain rule for discrete random variables pX,Y (x, y) = pX (x) pY |X (y|x)
SLIDE 10 Conditional probability mass function
The conditional pmf of a random subvector XI, I ⊆ {1, 2, . . . , n}, given another subvector XJ is p
XI| XJ (
xI| xJ ) := p
X (
x) p
XJ (
xJ ) {j1, j2, . . . , jn−m} := {1, 2, . . . , n} /I Chain rule for discrete random vectors p
X (
x) = pX1 (x1) pX2|X1 (x2|x1) . . . pXn|X1,...,Xn−1 (xn|x1, . . . , xn−1) =
n
pXi|
X{1,...,i−1}
x{1,...,i−1}
SLIDE 11 Example: Flights and rain (continued)
Probabilistic model for late arrivals at an airport P (late, no rain) = 2 20, P (on time, no rain) = 14 20, P (late, rain) = 3 20, P (on time, rain) = 1 20 L =
if plane is late
R =
it rains
SLIDE 12
Example: Flights and rain (continued)
R L pL,R 1
14 20 1 20
1
2 20 3 20
pL
15 20 5 20
pL|R (·|0)
7 8 1 8
pL|R (·|1)
1 4 3 4
pR
16 20 4 20
pR|L (·|0)
14 15 1 15
pR|L (·|1)
2 5 3 5
SLIDE 13
Independence of discrete random variables
X and Y are independent if and only if pX,Y (x, y) = pX (x) pY (y) , for all x ∈ RX, y ∈ RY , Equivalently pX|Y (x|y) = pX (x) pY |X (y|x) = pY (y) for all x ∈ RX, y ∈ RY
SLIDE 14 Mutually independent random variables
The n entries X1, X2, . . . , Xn in a random vector X are mutually independent if and only if p
X (
x) =
n
pXi (xi)
SLIDE 15 Conditionally mutually independent random variables
The components of a subvector XI, I ⊆ {1, 2, . . . , n} are conditionally mutually independent given another subvector XJ , J ⊆ {1, 2, . . . , n}, if and only if p
XI| XJ (
xI| xJ ) =
pXi|
XJ (xi|
xJ )
SLIDE 16 Pairwise independence
X1 and X2 are outcomes of independent unbiased coin flips X3 =
if X1 = X2, if X1 = X2. Are X1, X2 and X3 independent?
SLIDE 17
Pairwise independence
X1 and X2 are independent by assumption
SLIDE 18
Pairwise independence
X1 and X2 are independent by assumption The pmf of X3 is pX3 (1) = pX3 (0) =
SLIDE 19
Pairwise independence
X1 and X2 are independent by assumption The pmf of X3 is pX3 (1) = pX1,X2 (1, 1) + pX1,X2 (0, 0) = 1 2, pX3 (0) = pX1,X2 (0, 1) + pX1,X2 (1, 0) = 1 2
SLIDE 20
Pairwise independence
Are X1 and X3 independent? ,
SLIDE 21
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = ,
SLIDE 22
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 ,
SLIDE 23
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) ,
SLIDE 24
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) ,
SLIDE 25
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) , pX1,X3 (0, 1) = pX1,X2 (0, 0) = 1 4 = pX1 (0) pX3 (1) ,
SLIDE 26
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) , pX1,X3 (0, 1) = pX1,X2 (0, 0) = 1 4 = pX1 (0) pX3 (1) , pX1,X3 (1, 1) = pX1,X2 (1, 1) = 1 4 = pX1 (1) pX3 (1)
SLIDE 27
Pairwise independence
Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) , pX1,X3 (0, 1) = pX1,X2 (0, 0) = 1 4 = pX1 (0) pX3 (1) , pX1,X3 (1, 1) = pX1,X2 (1, 1) = 1 4 = pX1 (1) pX3 (1) Yes
SLIDE 28
Pairwise independence
X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent?
SLIDE 29
Pairwise independence
X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent? pX1,X2,X3 (1, 1, 1) = pX1 (1) pX2 (1) pX3 (1) =
SLIDE 30
Pairwise independence
X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent? pX1,X2,X3 (1, 1, 1) = P (X1 = 1, X2 = 1) = 1 4 pX1 (1) pX2 (1) pX3 (1) = 1 8
SLIDE 31
Pairwise independence
X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent? pX1,X2,X3 (1, 1, 1) = P (X1 = 1, X2 = 1) = 1 4 pX1 (1) pX2 (1) pX3 (1) = 1 8 No!
SLIDE 32
Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables
SLIDE 33 Continuous random variables
We consider events that are composed of unions of Cartesian products
The joint cumulative distribution function (cdf) of X and Y is FX,Y (x, y) := P (X ≤ x, Y ≤ y) In words, probability of X and Y being smaller than x and y respectively The cdf of X is F
X (
x) := P (X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn)
SLIDE 34
Joint cumulative distribution function
Every joint cdf satisfies lim
x→−∞ FX,Y (x, y) = 0,
lim
y→−∞ FX,Y (x, y) = 0,
lim
x→∞,y→∞ FX,Y (x, y) = 1
FX,Y (x1, y1) ≤FX,Y (x2, y2) if x2 ≥ x1, y2 ≥ y1 (nondecreasing)
SLIDE 35
Joint cumulative distribution function
For any two-dimensional interval P (x1 ≤ X ≤ x2, y1 ≤ Y ≤ y2) = P ({X ≤ x2, Y ≤ y2} ∩ {X > x2} ∩ {Y > y2}) = P (X ≤ x2, Y ≤ y2) − P (X ≤ x1, Y ≤ y2) − P (X ≤ x2, Y ≤ y1) + P (X ≤ x1, Y ≤ y1) = FX,Y (x2, y2) − FX,Y (x1, y2) − FX,Y (x2, y1) + FX,Y (x1, y1) Completely characterizes the distribution of the random variables / random vector
SLIDE 36
Joint probability density function
If the joint cdf is differentiable fX,Y (x, y) := ∂2FX,Y (x, y) ∂x∂y f
X (
x) := ∂nF
X (
x) ∂x1 ∂x2 · · · ∂xn
SLIDE 37
Joint probability density function
Probability of (X, Y ) ∈ (x, x + ∆x) × (y, y + ∆y) for ∆x, ∆y → 0 is fX,Y (x, y) ∆x∆y It is a density, not a probability measure! From the monotonicity of the joint cdf fX,Y (x, y) ≥ 0 f
X (
x) ≥ 0
SLIDE 38 Joint probability density function
For any Borel set S ⊆ R2 P ((X, Y ) ∈ S) =
fX,Y (x, y) dx dy In particular, ∞
x=−∞
∞
y=−∞
fX,Y (x, y) dx dy = 1
SLIDE 39 Joint probability density function
For any Borel set S ⊆ Rn P
f
X (
x) d x In particular,
X (
x) d x = 1
SLIDE 40
Example: Triangle lake
−0.5 0.5 1 1.5 −0.5 0.5 1 1.5
A B C D E F
SLIDE 41
Example: Triangle lake
F
X (
x) = if x1 < 0 or x2 < 0, 2x1x2, if x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1, 2x1 + 2x2 − x2
2 − x2 1 − 1,
if x1 ≤ 1, x2 ≤ 1, x1 + x2 ≥ 1, 2x2 − x2
2,
if x1 ≥ 1, 0 ≤ x2 ≤ 1, 2x1 − x2
1,
if 0 ≤ x1 ≤ 1, x2 ≥ 1, 1, if x1 ≥ 1, x2 ≥ 1
SLIDE 42 Marginalization
We can compute the marginal cdf from the joint cdf FX (x) = P (X ≤ x) = lim
y→∞ FX,Y (x, y)
FX (x) = P (X ≤ x) = x
u=−∞
∞
y=−∞
fX,Y (u, y) du dy Differentiating we obtain fX (x) = ∞
y=−∞
fX,Y (x, y) dy
SLIDE 43 Marginalization
Marginal pdf of a subvector XI, I := {i1, i2, . . . , im}, f
XI (
xI) =
· · ·
f
X (
x) dxj1 dxj2 · · · dxjn−m where {j1, j2, . . . , jn−m} := {1, 2, . . . , n} /I
SLIDE 44 Example: Triangle lake (continued)
Marginal cdf of x1 FX1 (x1) = lim
x2→∞ F X (
x) = if x1 < 0, 2x1 − x2
1
if 0 ≤ x1 ≤ 1, 1 if x1 ≥ 1 Marginal pdf of x1 fX1 (x1) = dFX1 (x1) dx1 =
if 0 ≤ x1 ≤ 1
SLIDE 45 Joint conditional cdf and pdf given an event
If we know that (X, Y ) ∈ S for any Borel set in R2 FX,Y |(X,Y )∈S (x, y) := P (X ≤ x, Y ≤ y| (X, Y ) ∈ S) = P (X ≤ x, Y ≤ y, (X, Y ) ∈ S) P ((X, Y ) ∈ S) =
- u≤x,v≤y,(u,v)∈S fX,Y (u, v) du dv
- (u,v)∈S fX,Y (u, v) du dv
fX,Y |(X,Y )∈S (x, y) := ∂2FX,Y |(X,Y )∈S (x, y) ∂x∂y
SLIDE 46
Conditional cdf and pdf
Distribution of Y given X = x? The event has zero probability!
SLIDE 47
Conditional cdf and pdf
Distribution of Y given X = x? The event has zero probability! Define fY |X (y|x) := fX,Y (x, y) fX (x) , if fX (x) > 0 FY |X (y|x) := y
u=−∞
fY |X (u|x) du Chain rule for continuous random variables fX,Y (x, y) = fX (x) fY |X (y|x)
SLIDE 48
Conditional cdf and pdf
fX (x) = lim
∆x→0
P (x ≤ X ≤ x + ∆x) ∆x fX,Y (x, y) = lim
∆x→0
1 ∆x ∂P (x ≤ X ≤ x + ∆x, Y ≤ y) ∂y
SLIDE 49
Conditional cdf and pdf
FY |X (y|x) = y
u=−∞
lim
∆x→0,∆y→0
1 P (x ≤ X ≤ x + ∆x) ∂P (x ≤ X ≤ x + ∆x, Y ≤ u) ∂y du = lim
∆x→0
1 P (x ≤ X ≤ x + ∆x) y
u=−∞
∂P (x ≤ X ≤ x + ∆x, Y ≤ u) ∂y du = lim
∆x→0
P (x ≤ X ≤ x + ∆x, Y ≤ y) P (x ≤ X ≤ x + ∆x) = lim
∆x→0 P (Y ≤ y|x ≤ X ≤ x + ∆x)
SLIDE 50 Conditional pdf of a random subvector
Conditional pdf of a random subvector XI, I ⊆ {1, 2, . . . , n}, given another subvector X{1,...,n}/I is f
XI| X{1,...,n}/I
x{1,...,n}/I
f
X (
x) f
X{1,...,n}/I
- x{1,...,n}/I
- Chain rule for continuous random vectors
f
X (
x) = fX1 (x1) fX2|X1 (x2|x1) . . . fXn|X1,...,Xn−1 (xn|x1, . . . , xn−1) =
n
fXi|
X{1,...,i−1}
x{1,...,i−1}
SLIDE 51
Example: Triangle lake (continued)
Conditioned on {x1 = 0.75} what is the pdf and cdf of x2?
SLIDE 52
Example: Triangle lake (continued)
fX2|X1 (x2|x1)
SLIDE 53
Example: Triangle lake (continued)
fX2|X1 (x2|x1) = f
X (
x) fX1 (x1)
SLIDE 54
Example: Triangle lake (continued)
fX2|X1 (x2|x1) = f
X (
x) fX1 (x1) = 1 1 − x1 , 0 ≤ x2 ≤ 1 − x1
SLIDE 55
Example: Triangle lake (continued)
fX2|X1 (x2|x1) = f
X (
x) fX1 (x1) = 1 1 − x1 , 0 ≤ x2 ≤ 1 − x1 FX2|X1 (x2|x1) = x2
−∞
fX2|X1 (u|x1) du = x2 1 − x1
SLIDE 56 Example: Desert
◮ Car traveling through the desert ◮ Time until the car breaks down: T ◮ State of the motor: M ◮ State of the road: R ◮ Model:
◮ M uniform between 0 (no problem) and 1 (very bad) ◮ R uniform between 0 (no problem) and 1 (very bad) ◮ M and R independent ◮ T exponential with parameter M + R
SLIDE 57
Example: Desert
Joint pdf?
SLIDE 58
Example: Desert
Joint pdf? fM,R,T (m, r, t)
SLIDE 59
Example: Desert
Joint pdf? fM,R,T (m, r, t) = fM (m) fR|M (r|m) fT|M,R (t|m, r)
SLIDE 60
Example: Desert
Joint pdf? fM,R,T (m, r, t) = fM (m) fR|M (r|m) fT|M,R (t|m, r) = fM (m) fR (r) fT|M,R (t|m, r) by independence
SLIDE 61 Example: Desert
Joint pdf? fM,R,T (m, r, t) = fM (m) fR|M (r|m) fT|M,R (t|m, r) = fM (m) fR (r) fT|M,R (t|m, r) by independence =
for t ≥ 0, 0 ≤ m ≤ 1, 0 ≤ r ≤ 1,
SLIDE 62
Example: Desert
◮ Car breaks down after 15 min (0.25 h), T = 0.25 ◮ Road seems OK, R = 0.2 ◮ What was the state of the motor M?
SLIDE 63
Example: Desert
◮ Car breaks down after 15 min (0.25 h), T = 0.25 ◮ Road seems OK, R = 0.2 ◮ What was the state of the motor M?
fM|R,T (m|r, t) = fM,R,T (m, r, t) fR,T (r, t)
SLIDE 64
Example: Desert
fR,T (r, t) =
SLIDE 65
Example: Desert
fR,T (r, t) = 1
m=0
fM,R,T (m, r, t) dm
SLIDE 66
Example: Desert
fR,T (r, t) = 1
m=0
fM,R,T (m, r, t) dm = e−tr 1
m=0
me−tm dm + r 1
m=0
e−tm dm
SLIDE 67 Example: Desert
fR,T (r, t) = 1
m=0
fM,R,T (m, r, t) dm = e−tr 1
m=0
me−tm dm + r 1
m=0
e−tm dm
1 − (1 + t) e−t t2 + r (1 − e−t) t
SLIDE 68 Example: Desert
fR,T (r, t) = 1
m=0
fM,R,T (m, r, t) dm = e−tr 1
m=0
me−tm dm + r 1
m=0
e−tm dm
1 − (1 + t) e−t t2 + r (1 − e−t) t
t2
- 1 + tr − e−t (1 + t + tr)
- for t ≥ 0, 0 ≤ r ≤ 1
SLIDE 69
Example: Desert
fM|R,T (m|r, t) = fM,R,T (m, r, t) fR,T (r, t) = (m + r) e−(m+r)t
e−tr t2 (1 + tr − e−t (1 + t + tr))
= (m + r) t2e−tm 1 + tr − e−t (1 + t + tr)
SLIDE 70
Example: Desert
fM|R,T (m|r, t) = fM,R,T (m, r, t) fR,T (r, t) = (m + r) e−(m+r)t
e−tr t2 (1 + tr − e−t (1 + t + tr))
= (m + r) t2e−tm 1 + tr − e−t (1 + t + tr) fM|R,T (m|0.2, 0.25) = (m + 0.2) 0.252e−0.25m 1 + 0.25 · 0.2 − e−0.25 (1 + 0.25 + 0.25 · 0.2) = 1.66 (m + 0.2) e−0.25m for 0 ≤ m ≤ 1
SLIDE 71
State of the car
0.2 0.4 0.6 0.8 1 0.5 1 1.5 m fM|R,T (m |0.2, 0.25)
SLIDE 72
Independent continuous random variables
Two random variables X and Y are independent if and only if FX,Y (x, y) = FX (x) FY (y) , for all (x, y) ∈ R2 Equivalently, FX|Y (x|y) = FX (x) FY |X (y|x) = FY (y) for all (x, y) ∈ R2
SLIDE 73
Independent continuous random variables
Two random variables X and Y with joint pdf fX,Y are independent if and only if fX,Y (x, y) = fX (x) fY (y) , for all (x, y) ∈ R2 Equivalently, fX|Y (x|y) = fX (x) fY |X (y|x) = fY (y) for all (x, y) ∈ R2
SLIDE 74 Mutually independent continuous random variables
The components of a random vector X are mutually independent if and only if F
X (
x) =
n
FXi (xi) Equivalently, f
X (
x) =
n
fXi (xi)
SLIDE 75 Mutually conditionally independent random variables
The components of a subvector XI, I ⊆ {1, 2, . . . , n} are mutually conditionally independent given another subvector XJ , J ⊆ {1, 2, . . . , n}, if and only if F
XI| XJ (
xI| xJ ) =
FXi|
XJ (xi|
xJ ) Equivalently, f
XI| XJ (
xI| xJ ) =
fXi|
XJ (xi|
xJ )
SLIDE 76 Functions of random variables
U = g (X, Y ) and V = h (X, Y ) FU,V (u, v) = P (U ≤ u, V ≤ v) = P (g (X, Y ) ≤ u, h (X, Y ) ≤ v) =
- {(x,y) | g(x,y)≤u,h(x,y)≤v}
fX,Y (x, y) dx dy
SLIDE 77
Sum of independent random variables
X and Y are independent random variables, what is the pdf of Z = X + Y ?
SLIDE 78
Sum of independent random variables
X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z)
SLIDE 79
Sum of independent random variables
X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z)
SLIDE 80
Sum of independent random variables
X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z) = ∞
y=−∞
z−y
x=−∞
fX (x) fY (y) dx dy = ∞
y=−∞
FX (z − y) fY (y) dy
SLIDE 81
Sum of independent random variables
X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z) = ∞
y=−∞
z−y
x=−∞
fX (x) fY (y) dx dy = ∞
y=−∞
FX (z − y) fY (y) dy fZ (z) = d dz lim
u→∞
u
y=−u
FX (z − y) fY (y) dy
SLIDE 82
Sum of independent random variables
X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z) = ∞
y=−∞
z−y
x=−∞
fX (x) fY (y) dx dy = ∞
y=−∞
FX (z − y) fY (y) dy fZ (z) = d dz lim
u→∞
u
y=−u
FX (z − y) fY (y) dy = ∞
y=−∞
fX (z − y) fY (y) dy Convolution of individual pdfs
SLIDE 83 Example: Coffee beans
◮ Company buys coffee beans from two local producers ◮ Beans from Colombia: C tons/year ◮ Beans from Vietnam: V tons/year ◮ Model:
◮ C uniform between 0 and 1 ◮ V uniform between 0 and 2 ◮ C and V independent
◮ What is the distribution of the total amount of beans B?
SLIDE 84
Example: Coffee beans
fB (b) =
SLIDE 85
Example: Coffee beans
fB (b) = ∞
u=−∞
fC (b − u) fV (u) du
SLIDE 86
Example: Coffee beans
fB (b) = ∞
u=−∞
fC (b − u) fV (u) du = 1 2 2
u=0
fC (b − u) du
SLIDE 87
Example: Coffee beans
fB (b) = ∞
u=−∞
fC (b − u) fV (u) du = 1 2 2
u=0
fC (b − u) du =
1 2
b
u=0 du = b 2
if b ≤ 1
1 2
b
u=b−1 du = 1 2
if 1 ≤ b ≤ 2
1 2
2
u=b−1 du = 3−b 2
if 2 ≤ b ≤ 3
SLIDE 88
Example: Coffee beans
0.5 1 1.5 2 2.5 3 0.5 1 fC fV 0.5 1 1.5 2 2.5 3 0.5 1 fB
SLIDE 89 Gaussian random vector
A Gaussian random vector X has a joint pdf of the form f
X (
x) = 1
exp
2 ( x − µ)T Σ−1 ( x − µ)
µ ∈ Rn and the covariance matrix Σ is a symmetric positive definite matrix
SLIDE 90 Linear transformation of Gaussian random vectors
- X is a Gaussian r.v. of dimension n with mean
µ and covariance matrix Σ For any matrix A ∈ Rm×n and b ∈ Rm
X + b is Gaussian with mean A µ + b and covariance matrix AΣAT
SLIDE 91 Marginal distributions are Gaussian
Gaussian random vector,
with mean
µ
X
µ
Y
Σ
Z =
Σ
X
Σ
X Y
ΣT
Y
Σ
Y
- X is a Gaussian random vector with mean µ
X and covariance matrix Σ X
SLIDE 92
Marginal distributions are Gaussian
−3 −2 −1 1 2 3 −2 2 0.1 0.2 x y fX,Y (X, Y )
fY (y) fX(x)
SLIDE 93
Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables
SLIDE 94 Discrete and continuous random variables
How do we model the relation between a continuous random variable C and a discrete random variable D? Conditional cdf and pdf of C given D FC|D (c|d) := P (C ≤ c|D = d) fC|D (c|d) := dFC|D (c|d) dc By the Law of Total Probability FC (c) =
pD (d) FC|D (c|d) fC (c) =
pD (d) fC|D (c|d)
SLIDE 95
Mixture models
Data are drawn from continuous distribution whose parameters are chosen from a discrete set Important example: Gaussian mixture models
SLIDE 96
Grizzlies in Yellowstone
Model for the weight of grizzly bears in Yellowstone: Males: Gaussian with µ := 240 kg and σ := 40kg Females: Gaussian with µ := 140 kg and σ := 20kg There are about the same number of females and males
SLIDE 97
Grizzlies in Yellowstone
The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight)
SLIDE 98
Grizzlies in Yellowstone
The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) fW (w)
SLIDE 99 Grizzlies in Yellowstone
The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) fW (w) =
1
pS (s) fW |S (w|s)
SLIDE 100 Grizzlies in Yellowstone
The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) fW (w) =
1
pS (s) fW |S (w|s) = 1 2 √ 2π e− (w−240)2
3200
40 + e− (w−140)2
800
20
SLIDE 101
Grizzlies in Yellowstone
100 200 300 400 1 2 ·10−2 fW |S (·|0) fW |S (·|1) fW (·)
SLIDE 102
Continuous and discrete random variables
Conditional pmf of D given C? Event {C = c} has zero probability
SLIDE 103
Continuous and discrete random variables
Conditional pmf of D given C? Event {C = c} has zero probability pD|C (d|c) := lim
∆→0
P (D = d, c ≤ C ≤ c + ∆) P (c ≤ C ≤ c + ∆)
SLIDE 104
Continuous and discrete random variables
Conditional pmf of D given C? Event {C = c} has zero probability pD|C (d|c) := lim
∆→0
P (D = d, c ≤ C ≤ c + ∆) P (c ≤ C ≤ c + ∆) By the Law of Total Probability and a limit argument pD (d) = ∞
c=−∞
fC (c) pD|C (d|c) dc
SLIDE 105
Bayesian coin flip
Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution
SLIDE 106
Bayesian coin flip
Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution
◮ You suspect a coin is biased ◮ You are uncertain about the bias so you model it as a random variable
with pdf fB (b) = 2b for b ∈ [0, 1]
◮ What is the probability of heads?
SLIDE 107
Bayesian coin flip
0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 fB (·)
SLIDE 108
Bayesian coin flip
pX (1)
SLIDE 109
Bayesian coin flip
pX (1) = ∞
b=−∞
fB (b) pX|B (1|b) db
SLIDE 110
Bayesian coin flip
pX (1) = ∞
b=−∞
fB (b) pX|B (1|b) db = 1
b=0
2b2 db
SLIDE 111
Bayesian coin flip
pX (1) = ∞
b=−∞
fB (b) pX|B (1|b) db = 1
b=0
2b2 db = 2 3
SLIDE 112
Chain rule for continuous and discrete random variables
pD (d) fC|D (c|d) = lim
∆→0 P (D = d) P (c ≤ C ≤ c + ∆|D = d)
∆ = lim
∆→0
P (D = d, c ≤ C ≤ c + ∆) ∆ = lim
∆→0
P (c ≤ C ≤ c + ∆) ∆ · P (D = d, c ≤ C ≤ c + ∆) P (c ≤ C ≤ c + ∆) = fC (c) pD|C (d|c)
SLIDE 113
Grizzlies in Yellowstone
You spot a grizzly that is about 180 kg What is the probability that it is male?
SLIDE 114
Grizzlies in Yellowstone
You spot a grizzly that is about 180 kg What is the probability that it is male? pS|W (0|180)
SLIDE 115
Grizzlies in Yellowstone
You spot a grizzly that is about 180 kg What is the probability that it is male? pS|W (0|180) = pS (0) fW |S (180|0) fW (180)
SLIDE 116 Grizzlies in Yellowstone
You spot a grizzly that is about 180 kg What is the probability that it is male? pS|W (0|180) = pS (0) fW |S (180|0) fW (180) =
1 40 exp
3200
40 exp
3200
20 exp
800
= 0.545
SLIDE 117
Bayesian coin flip
Coin flip is tails What is the distribution of the bias now?
SLIDE 118
Bayesian coin flip
Coin flip is tails What is the distribution of the bias now? fB|X (b|0)
SLIDE 119
Bayesian coin flip
Coin flip is tails What is the distribution of the bias now? fB|X (b|0) = fB (b) pX|B (0|b) pX (0)
SLIDE 120
Bayesian coin flip
Coin flip is tails What is the distribution of the bias now? fB|X (b|0) = fB (b) pX|B (0|b) pX (0) = 2b (1 − b) 1/3 = 6b (1 − b)
SLIDE 121
Bayesian coin flip
0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 fB (·) fB|X (·|0)