Multivariate random variables DS GA 1002 Statistical and - - PowerPoint PPT Presentation

multivariate random variables
SMART_READER_LITE
LIVE PREVIEW

Multivariate random variables DS GA 1002 Statistical and - - PowerPoint PPT Presentation

Multivariate random variables DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda Joint distributions Tool to characterize several uncertain numerical quantities of


slide-1
SLIDE 1

Multivariate random variables

DS GA 1002 Statistical and Mathematical Models

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda

slide-2
SLIDE 2

Joint distributions

Tool to characterize several uncertain numerical quantities of interest within the same probabilistic model We can group the variables into random vectors

  • X =

    X1 X2 · · · Xn    

slide-3
SLIDE 3

Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables

slide-4
SLIDE 4

Joint probability mass function

The joint pmf of X and Y is defined as pX,Y (x, y) := P (X = x, Y = y) It is the probability of X, Y being equal to x, y respectively By the definition of a probability measure pX,Y (x, y) ≥ 0 for any x ∈ RX, y ∈ RY

  • x∈RX
  • y∈RY

pX,Y (x, y) = 1

slide-5
SLIDE 5

Joint probability mass function

The joint pmf of a discrete random vector X is p

X (

x) := P (X1 = x1, X2 = x2, . . . , Xn = xn) It is the probability of X being equal to x By the definition of a probability measure p

X (

x) ≥ 0

  • x1∈R1
  • x2∈R2

· · ·

  • xn∈Rn

p

X (

x) = 1

slide-6
SLIDE 6

Joint probability mass function

By the Law of Total Probability, for any set S ∈ RX × Ry, P ((X, Y ) ∈ S) = P

  • ∪(x,y)∈S {X = x, Y = y}
  • (union of disjoint events)

=

  • (x,y)∈S

P (X = x, Y = y) =

  • (x,y)∈S

pX,Y (x, y) Similarly, for any discrete set S ⊆ Rn P

  • X ∈ S
  • =
  • x∈S

p

X (

x)

slide-7
SLIDE 7

Marginalization

To compute the marginal pmf of X from the joint pmf pX,Y pX (x) = P (X = x) = P (∪y∈RY {X = x, Y = y}) (union of disjoint events) =

  • y∈RY

P (X = x, Y = y) =

  • y∈RY

pX,Y (x, y) This is called marginalizing over Y

slide-8
SLIDE 8

Marginalization

Marginal pmf of a subvector XI, I ⊆ {1, 2, . . . , n}, p

XI (

xI) =

  • j1∈Rj1
  • j2∈Rj2

· · ·

  • jn−m∈Rjn−m

p

X (

x) {j1, j2, . . . , jn−m} := {1, 2, . . . , n} /I

slide-9
SLIDE 9

Conditional probability mass function

The conditional pmf of Y given X is pY |X (y|x) = P (Y = y|X = x) = pX,Y (x, y) pX (x) , as long as pX (x) > 0 Valid pmf parametrized by x Chain rule for discrete random variables pX,Y (x, y) = pX (x) pY |X (y|x)

slide-10
SLIDE 10

Conditional probability mass function

The conditional pmf of a random subvector XI, I ⊆ {1, 2, . . . , n}, given another subvector XJ is p

XI| XJ (

xI| xJ ) := p

X (

x) p

XJ (

xJ ) {j1, j2, . . . , jn−m} := {1, 2, . . . , n} /I Chain rule for discrete random vectors p

X (

x) = pX1 (x1) pX2|X1 (x2|x1) . . . pXn|X1,...,Xn−1 (xn|x1, . . . , xn−1) =

n

  • i=1

pXi|

X{1,...,i−1}

  • xi|

x{1,...,i−1}

  • Any order works!
slide-11
SLIDE 11

Example: Flights and rain (continued)

Probabilistic model for late arrivals at an airport P (late, no rain) = 2 20, P (on time, no rain) = 14 20, P (late, rain) = 3 20, P (on time, rain) = 1 20 L =

  • 1

if plane is late

  • therwise

R =

  • 1

it rains

  • therwise
slide-12
SLIDE 12

Example: Flights and rain (continued)

R L pL,R 1

14 20 1 20

1

2 20 3 20

pL

15 20 5 20

pL|R (·|0)

7 8 1 8

pL|R (·|1)

1 4 3 4

pR

16 20 4 20

pR|L (·|0)

14 15 1 15

pR|L (·|1)

2 5 3 5

slide-13
SLIDE 13

Independence of discrete random variables

X and Y are independent if and only if pX,Y (x, y) = pX (x) pY (y) , for all x ∈ RX, y ∈ RY , Equivalently pX|Y (x|y) = pX (x) pY |X (y|x) = pY (y) for all x ∈ RX, y ∈ RY

slide-14
SLIDE 14

Mutually independent random variables

The n entries X1, X2, . . . , Xn in a random vector X are mutually independent if and only if p

X (

x) =

n

  • i=1

pXi (xi)

slide-15
SLIDE 15

Conditionally mutually independent random variables

The components of a subvector XI, I ⊆ {1, 2, . . . , n} are conditionally mutually independent given another subvector XJ , J ⊆ {1, 2, . . . , n}, if and only if p

XI| XJ (

xI| xJ ) =

  • i∈I

pXi|

XJ (xi|

xJ )

slide-16
SLIDE 16

Pairwise independence

X1 and X2 are outcomes of independent unbiased coin flips X3 =

  • 1

if X1 = X2, if X1 = X2. Are X1, X2 and X3 independent?

slide-17
SLIDE 17

Pairwise independence

X1 and X2 are independent by assumption

slide-18
SLIDE 18

Pairwise independence

X1 and X2 are independent by assumption The pmf of X3 is pX3 (1) = pX3 (0) =

slide-19
SLIDE 19

Pairwise independence

X1 and X2 are independent by assumption The pmf of X3 is pX3 (1) = pX1,X2 (1, 1) + pX1,X2 (0, 0) = 1 2, pX3 (0) = pX1,X2 (0, 1) + pX1,X2 (1, 0) = 1 2

slide-20
SLIDE 20

Pairwise independence

Are X1 and X3 independent? ,

slide-21
SLIDE 21

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = ,

slide-22
SLIDE 22

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 ,

slide-23
SLIDE 23

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) ,

slide-24
SLIDE 24

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) ,

slide-25
SLIDE 25

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) , pX1,X3 (0, 1) = pX1,X2 (0, 0) = 1 4 = pX1 (0) pX3 (1) ,

slide-26
SLIDE 26

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) , pX1,X3 (0, 1) = pX1,X2 (0, 0) = 1 4 = pX1 (0) pX3 (1) , pX1,X3 (1, 1) = pX1,X2 (1, 1) = 1 4 = pX1 (1) pX3 (1)

slide-27
SLIDE 27

Pairwise independence

Are X1 and X3 independent? pX1,X3 (0, 0) = pX1,X2 (0, 1) = 1 4 = pX1 (0) pX3 (0) , pX1,X3 (1, 0) = pX1,X2 (1, 0) = 1 4 = pX1 (1) pX3 (0) , pX1,X3 (0, 1) = pX1,X2 (0, 0) = 1 4 = pX1 (0) pX3 (1) , pX1,X3 (1, 1) = pX1,X2 (1, 1) = 1 4 = pX1 (1) pX3 (1) Yes

slide-28
SLIDE 28

Pairwise independence

X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent?

slide-29
SLIDE 29

Pairwise independence

X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent? pX1,X2,X3 (1, 1, 1) = pX1 (1) pX2 (1) pX3 (1) =

slide-30
SLIDE 30

Pairwise independence

X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent? pX1,X2,X3 (1, 1, 1) = P (X1 = 1, X2 = 1) = 1 4 pX1 (1) pX2 (1) pX3 (1) = 1 8

slide-31
SLIDE 31

Pairwise independence

X1, X2 and X3 are pairwise independent Are X1, X2 and X3 mutually independent? pX1,X2,X3 (1, 1, 1) = P (X1 = 1, X2 = 1) = 1 4 pX1 (1) pX2 (1) pX3 (1) = 1 8 No!

slide-32
SLIDE 32

Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables

slide-33
SLIDE 33

Continuous random variables

We consider events that are composed of unions of Cartesian products

  • f intervals (Borel sets)

The joint cumulative distribution function (cdf) of X and Y is FX,Y (x, y) := P (X ≤ x, Y ≤ y) In words, probability of X and Y being smaller than x and y respectively The cdf of X is F

X (

x) := P (X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn)

slide-34
SLIDE 34

Joint cumulative distribution function

Every joint cdf satisfies lim

x→−∞ FX,Y (x, y) = 0,

lim

y→−∞ FX,Y (x, y) = 0,

lim

x→∞,y→∞ FX,Y (x, y) = 1

FX,Y (x1, y1) ≤FX,Y (x2, y2) if x2 ≥ x1, y2 ≥ y1 (nondecreasing)

slide-35
SLIDE 35

Joint cumulative distribution function

For any two-dimensional interval P (x1 ≤ X ≤ x2, y1 ≤ Y ≤ y2) = P ({X ≤ x2, Y ≤ y2} ∩ {X > x2} ∩ {Y > y2}) = P (X ≤ x2, Y ≤ y2) − P (X ≤ x1, Y ≤ y2) − P (X ≤ x2, Y ≤ y1) + P (X ≤ x1, Y ≤ y1) = FX,Y (x2, y2) − FX,Y (x1, y2) − FX,Y (x2, y1) + FX,Y (x1, y1) Completely characterizes the distribution of the random variables / random vector

slide-36
SLIDE 36

Joint probability density function

If the joint cdf is differentiable fX,Y (x, y) := ∂2FX,Y (x, y) ∂x∂y f

X (

x) := ∂nF

X (

x) ∂x1 ∂x2 · · · ∂xn

slide-37
SLIDE 37

Joint probability density function

Probability of (X, Y ) ∈ (x, x + ∆x) × (y, y + ∆y) for ∆x, ∆y → 0 is fX,Y (x, y) ∆x∆y It is a density, not a probability measure! From the monotonicity of the joint cdf fX,Y (x, y) ≥ 0 f

X (

x) ≥ 0

slide-38
SLIDE 38

Joint probability density function

For any Borel set S ⊆ R2 P ((X, Y ) ∈ S) =

  • S

fX,Y (x, y) dx dy In particular, ∞

x=−∞

y=−∞

fX,Y (x, y) dx dy = 1

slide-39
SLIDE 39

Joint probability density function

For any Borel set S ⊆ Rn P

  • X ∈ S
  • =
  • S

f

X (

x) d x In particular,

  • Rn f

X (

x) d x = 1

slide-40
SLIDE 40

Example: Triangle lake

−0.5 0.5 1 1.5 −0.5 0.5 1 1.5

A B C D E F

slide-41
SLIDE 41

Example: Triangle lake

F

X (

x) =                      if x1 < 0 or x2 < 0, 2x1x2, if x1 ≥ 0, x2 ≥ 0, x1 + x2 ≤ 1, 2x1 + 2x2 − x2

2 − x2 1 − 1,

if x1 ≤ 1, x2 ≤ 1, x1 + x2 ≥ 1, 2x2 − x2

2,

if x1 ≥ 1, 0 ≤ x2 ≤ 1, 2x1 − x2

1,

if 0 ≤ x1 ≤ 1, x2 ≥ 1, 1, if x1 ≥ 1, x2 ≥ 1

slide-42
SLIDE 42

Marginalization

We can compute the marginal cdf from the joint cdf FX (x) = P (X ≤ x) = lim

y→∞ FX,Y (x, y)

  • r from the joint pdf

FX (x) = P (X ≤ x) = x

u=−∞

y=−∞

fX,Y (u, y) du dy Differentiating we obtain fX (x) = ∞

y=−∞

fX,Y (x, y) dy

slide-43
SLIDE 43

Marginalization

Marginal pdf of a subvector XI, I := {i1, i2, . . . , im}, f

XI (

xI) =

  • xj1
  • xj2

· · ·

  • xjn−m

f

X (

x) dxj1 dxj2 · · · dxjn−m where {j1, j2, . . . , jn−m} := {1, 2, . . . , n} /I

slide-44
SLIDE 44

Example: Triangle lake (continued)

Marginal cdf of x1 FX1 (x1) = lim

x2→∞ F X (

x) =      if x1 < 0, 2x1 − x2

1

if 0 ≤ x1 ≤ 1, 1 if x1 ≥ 1 Marginal pdf of x1 fX1 (x1) = dFX1 (x1) dx1 =

  • 2 (1 − x1)

if 0 ≤ x1 ≤ 1

  • therwise
slide-45
SLIDE 45

Joint conditional cdf and pdf given an event

If we know that (X, Y ) ∈ S for any Borel set in R2 FX,Y |(X,Y )∈S (x, y) := P (X ≤ x, Y ≤ y| (X, Y ) ∈ S) = P (X ≤ x, Y ≤ y, (X, Y ) ∈ S) P ((X, Y ) ∈ S) =

  • u≤x,v≤y,(u,v)∈S fX,Y (u, v) du dv
  • (u,v)∈S fX,Y (u, v) du dv

fX,Y |(X,Y )∈S (x, y) := ∂2FX,Y |(X,Y )∈S (x, y) ∂x∂y

slide-46
SLIDE 46

Conditional cdf and pdf

Distribution of Y given X = x? The event has zero probability!

slide-47
SLIDE 47

Conditional cdf and pdf

Distribution of Y given X = x? The event has zero probability! Define fY |X (y|x) := fX,Y (x, y) fX (x) , if fX (x) > 0 FY |X (y|x) := y

u=−∞

fY |X (u|x) du Chain rule for continuous random variables fX,Y (x, y) = fX (x) fY |X (y|x)

slide-48
SLIDE 48

Conditional cdf and pdf

fX (x) = lim

∆x→0

P (x ≤ X ≤ x + ∆x) ∆x fX,Y (x, y) = lim

∆x→0

1 ∆x ∂P (x ≤ X ≤ x + ∆x, Y ≤ y) ∂y

slide-49
SLIDE 49

Conditional cdf and pdf

FY |X (y|x) = y

u=−∞

lim

∆x→0,∆y→0

1 P (x ≤ X ≤ x + ∆x) ∂P (x ≤ X ≤ x + ∆x, Y ≤ u) ∂y du = lim

∆x→0

1 P (x ≤ X ≤ x + ∆x) y

u=−∞

∂P (x ≤ X ≤ x + ∆x, Y ≤ u) ∂y du = lim

∆x→0

P (x ≤ X ≤ x + ∆x, Y ≤ y) P (x ≤ X ≤ x + ∆x) = lim

∆x→0 P (Y ≤ y|x ≤ X ≤ x + ∆x)

slide-50
SLIDE 50

Conditional pdf of a random subvector

Conditional pdf of a random subvector XI, I ⊆ {1, 2, . . . , n}, given another subvector X{1,...,n}/I is f

XI| X{1,...,n}/I

  • xI|

x{1,...,n}/I

  • :=

f

X (

x) f

X{1,...,n}/I

  • x{1,...,n}/I
  • Chain rule for continuous random vectors

f

X (

x) = fX1 (x1) fX2|X1 (x2|x1) . . . fXn|X1,...,Xn−1 (xn|x1, . . . , xn−1) =

n

  • i=1

fXi|

X{1,...,i−1}

  • xi|

x{1,...,i−1}

  • Any order works!
slide-51
SLIDE 51

Example: Triangle lake (continued)

Conditioned on {x1 = 0.75} what is the pdf and cdf of x2?

slide-52
SLIDE 52

Example: Triangle lake (continued)

fX2|X1 (x2|x1)

slide-53
SLIDE 53

Example: Triangle lake (continued)

fX2|X1 (x2|x1) = f

X (

x) fX1 (x1)

slide-54
SLIDE 54

Example: Triangle lake (continued)

fX2|X1 (x2|x1) = f

X (

x) fX1 (x1) = 1 1 − x1 , 0 ≤ x2 ≤ 1 − x1

slide-55
SLIDE 55

Example: Triangle lake (continued)

fX2|X1 (x2|x1) = f

X (

x) fX1 (x1) = 1 1 − x1 , 0 ≤ x2 ≤ 1 − x1 FX2|X1 (x2|x1) = x2

−∞

fX2|X1 (u|x1) du = x2 1 − x1

slide-56
SLIDE 56

Example: Desert

◮ Car traveling through the desert ◮ Time until the car breaks down: T ◮ State of the motor: M ◮ State of the road: R ◮ Model:

◮ M uniform between 0 (no problem) and 1 (very bad) ◮ R uniform between 0 (no problem) and 1 (very bad) ◮ M and R independent ◮ T exponential with parameter M + R

slide-57
SLIDE 57

Example: Desert

Joint pdf?

slide-58
SLIDE 58

Example: Desert

Joint pdf? fM,R,T (m, r, t)

slide-59
SLIDE 59

Example: Desert

Joint pdf? fM,R,T (m, r, t) = fM (m) fR|M (r|m) fT|M,R (t|m, r)

slide-60
SLIDE 60

Example: Desert

Joint pdf? fM,R,T (m, r, t) = fM (m) fR|M (r|m) fT|M,R (t|m, r) = fM (m) fR (r) fT|M,R (t|m, r) by independence

slide-61
SLIDE 61

Example: Desert

Joint pdf? fM,R,T (m, r, t) = fM (m) fR|M (r|m) fT|M,R (t|m, r) = fM (m) fR (r) fT|M,R (t|m, r) by independence =

  • (m + r) e−(m+r)t

for t ≥ 0, 0 ≤ m ≤ 1, 0 ≤ r ≤ 1,

  • therwise
slide-62
SLIDE 62

Example: Desert

◮ Car breaks down after 15 min (0.25 h), T = 0.25 ◮ Road seems OK, R = 0.2 ◮ What was the state of the motor M?

slide-63
SLIDE 63

Example: Desert

◮ Car breaks down after 15 min (0.25 h), T = 0.25 ◮ Road seems OK, R = 0.2 ◮ What was the state of the motor M?

fM|R,T (m|r, t) = fM,R,T (m, r, t) fR,T (r, t)

slide-64
SLIDE 64

Example: Desert

fR,T (r, t) =

slide-65
SLIDE 65

Example: Desert

fR,T (r, t) = 1

m=0

fM,R,T (m, r, t) dm

slide-66
SLIDE 66

Example: Desert

fR,T (r, t) = 1

m=0

fM,R,T (m, r, t) dm = e−tr 1

m=0

me−tm dm + r 1

m=0

e−tm dm

slide-67
SLIDE 67

Example: Desert

fR,T (r, t) = 1

m=0

fM,R,T (m, r, t) dm = e−tr 1

m=0

me−tm dm + r 1

m=0

e−tm dm

  • = e−tr

1 − (1 + t) e−t t2 + r (1 − e−t) t

slide-68
SLIDE 68

Example: Desert

fR,T (r, t) = 1

m=0

fM,R,T (m, r, t) dm = e−tr 1

m=0

me−tm dm + r 1

m=0

e−tm dm

  • = e−tr

1 − (1 + t) e−t t2 + r (1 − e−t) t

  • = e−tr

t2

  • 1 + tr − e−t (1 + t + tr)
  • for t ≥ 0, 0 ≤ r ≤ 1
slide-69
SLIDE 69

Example: Desert

fM|R,T (m|r, t) = fM,R,T (m, r, t) fR,T (r, t) = (m + r) e−(m+r)t

e−tr t2 (1 + tr − e−t (1 + t + tr))

= (m + r) t2e−tm 1 + tr − e−t (1 + t + tr)

slide-70
SLIDE 70

Example: Desert

fM|R,T (m|r, t) = fM,R,T (m, r, t) fR,T (r, t) = (m + r) e−(m+r)t

e−tr t2 (1 + tr − e−t (1 + t + tr))

= (m + r) t2e−tm 1 + tr − e−t (1 + t + tr) fM|R,T (m|0.2, 0.25) = (m + 0.2) 0.252e−0.25m 1 + 0.25 · 0.2 − e−0.25 (1 + 0.25 + 0.25 · 0.2) = 1.66 (m + 0.2) e−0.25m for 0 ≤ m ≤ 1

slide-71
SLIDE 71

State of the car

0.2 0.4 0.6 0.8 1 0.5 1 1.5 m fM|R,T (m |0.2, 0.25)

slide-72
SLIDE 72

Independent continuous random variables

Two random variables X and Y are independent if and only if FX,Y (x, y) = FX (x) FY (y) , for all (x, y) ∈ R2 Equivalently, FX|Y (x|y) = FX (x) FY |X (y|x) = FY (y) for all (x, y) ∈ R2

slide-73
SLIDE 73

Independent continuous random variables

Two random variables X and Y with joint pdf fX,Y are independent if and only if fX,Y (x, y) = fX (x) fY (y) , for all (x, y) ∈ R2 Equivalently, fX|Y (x|y) = fX (x) fY |X (y|x) = fY (y) for all (x, y) ∈ R2

slide-74
SLIDE 74

Mutually independent continuous random variables

The components of a random vector X are mutually independent if and only if F

X (

x) =

n

  • i=1

FXi (xi) Equivalently, f

X (

x) =

n

  • i=1

fXi (xi)

slide-75
SLIDE 75

Mutually conditionally independent random variables

The components of a subvector XI, I ⊆ {1, 2, . . . , n} are mutually conditionally independent given another subvector XJ , J ⊆ {1, 2, . . . , n}, if and only if F

XI| XJ (

xI| xJ ) =

  • i∈I

FXi|

XJ (xi|

xJ ) Equivalently, f

XI| XJ (

xI| xJ ) =

  • i∈I

fXi|

XJ (xi|

xJ )

slide-76
SLIDE 76

Functions of random variables

U = g (X, Y ) and V = h (X, Y ) FU,V (u, v) = P (U ≤ u, V ≤ v) = P (g (X, Y ) ≤ u, h (X, Y ) ≤ v) =

  • {(x,y) | g(x,y)≤u,h(x,y)≤v}

fX,Y (x, y) dx dy

slide-77
SLIDE 77

Sum of independent random variables

X and Y are independent random variables, what is the pdf of Z = X + Y ?

slide-78
SLIDE 78

Sum of independent random variables

X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z)

slide-79
SLIDE 79

Sum of independent random variables

X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z)

slide-80
SLIDE 80

Sum of independent random variables

X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z) = ∞

y=−∞

z−y

x=−∞

fX (x) fY (y) dx dy = ∞

y=−∞

FX (z − y) fY (y) dy

slide-81
SLIDE 81

Sum of independent random variables

X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z) = ∞

y=−∞

z−y

x=−∞

fX (x) fY (y) dx dy = ∞

y=−∞

FX (z − y) fY (y) dy fZ (z) = d dz lim

u→∞

u

y=−u

FX (z − y) fY (y) dy

slide-82
SLIDE 82

Sum of independent random variables

X and Y are independent random variables, what is the pdf of Z = X + Y ? FZ (z) = P (X + Y ≤ z) = ∞

y=−∞

z−y

x=−∞

fX (x) fY (y) dx dy = ∞

y=−∞

FX (z − y) fY (y) dy fZ (z) = d dz lim

u→∞

u

y=−u

FX (z − y) fY (y) dy = ∞

y=−∞

fX (z − y) fY (y) dy Convolution of individual pdfs

slide-83
SLIDE 83

Example: Coffee beans

◮ Company buys coffee beans from two local producers ◮ Beans from Colombia: C tons/year ◮ Beans from Vietnam: V tons/year ◮ Model:

◮ C uniform between 0 and 1 ◮ V uniform between 0 and 2 ◮ C and V independent

◮ What is the distribution of the total amount of beans B?

slide-84
SLIDE 84

Example: Coffee beans

fB (b) =

slide-85
SLIDE 85

Example: Coffee beans

fB (b) = ∞

u=−∞

fC (b − u) fV (u) du

slide-86
SLIDE 86

Example: Coffee beans

fB (b) = ∞

u=−∞

fC (b − u) fV (u) du = 1 2 2

u=0

fC (b − u) du

slide-87
SLIDE 87

Example: Coffee beans

fB (b) = ∞

u=−∞

fC (b − u) fV (u) du = 1 2 2

u=0

fC (b − u) du =     

1 2

b

u=0 du = b 2

if b ≤ 1

1 2

b

u=b−1 du = 1 2

if 1 ≤ b ≤ 2

1 2

2

u=b−1 du = 3−b 2

if 2 ≤ b ≤ 3

slide-88
SLIDE 88

Example: Coffee beans

0.5 1 1.5 2 2.5 3 0.5 1 fC fV 0.5 1 1.5 2 2.5 3 0.5 1 fB

slide-89
SLIDE 89

Gaussian random vector

A Gaussian random vector X has a joint pdf of the form f

X (

x) = 1

  • (2π)n |Σ|

exp

  • −1

2 ( x − µ)T Σ−1 ( x − µ)

  • where the mean

µ ∈ Rn and the covariance matrix Σ is a symmetric positive definite matrix

slide-90
SLIDE 90

Linear transformation of Gaussian random vectors

  • X is a Gaussian r.v. of dimension n with mean

µ and covariance matrix Σ For any matrix A ∈ Rm×n and b ∈ Rm

  • Y = A

X + b is Gaussian with mean A µ + b and covariance matrix AΣAT

slide-91
SLIDE 91

Marginal distributions are Gaussian

Gaussian random vector,

  • Z :=
  • X
  • Y
  • ,

with mean

  • µ :=

µ

X

µ

Y

  • and covariance matrix

Σ

Z =

Σ

X

Σ

X Y

ΣT

  • X

Y

Σ

Y

  • X is a Gaussian random vector with mean µ

X and covariance matrix Σ X

slide-92
SLIDE 92

Marginal distributions are Gaussian

−3 −2 −1 1 2 3 −2 2 0.1 0.2 x y fX,Y (X, Y )

fY (y) fX(x)

slide-93
SLIDE 93

Discrete random variables Continuous random variables Joint distributions of discrete and continuous random variables

slide-94
SLIDE 94

Discrete and continuous random variables

How do we model the relation between a continuous random variable C and a discrete random variable D? Conditional cdf and pdf of C given D FC|D (c|d) := P (C ≤ c|D = d) fC|D (c|d) := dFC|D (c|d) dc By the Law of Total Probability FC (c) =

  • d∈RD

pD (d) FC|D (c|d) fC (c) =

  • d∈RD

pD (d) fC|D (c|d)

slide-95
SLIDE 95

Mixture models

Data are drawn from continuous distribution whose parameters are chosen from a discrete set Important example: Gaussian mixture models

slide-96
SLIDE 96

Grizzlies in Yellowstone

Model for the weight of grizzly bears in Yellowstone: Males: Gaussian with µ := 240 kg and σ := 40kg Females: Gaussian with µ := 140 kg and σ := 20kg There are about the same number of females and males

slide-97
SLIDE 97

Grizzlies in Yellowstone

The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight)

slide-98
SLIDE 98

Grizzlies in Yellowstone

The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) fW (w)

slide-99
SLIDE 99

Grizzlies in Yellowstone

The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) fW (w) =

1

  • s=0

pS (s) fW |S (w|s)

slide-100
SLIDE 100

Grizzlies in Yellowstone

The distribution of the weight of all bears W can be modeled as a Gaussian mixture with two random variables: S (sex) and W (weight) fW (w) =

1

  • s=0

pS (s) fW |S (w|s) = 1 2 √ 2π  e− (w−240)2

3200

40 + e− (w−140)2

800

20  

slide-101
SLIDE 101

Grizzlies in Yellowstone

100 200 300 400 1 2 ·10−2 fW |S (·|0) fW |S (·|1) fW (·)

slide-102
SLIDE 102

Continuous and discrete random variables

Conditional pmf of D given C? Event {C = c} has zero probability

slide-103
SLIDE 103

Continuous and discrete random variables

Conditional pmf of D given C? Event {C = c} has zero probability pD|C (d|c) := lim

∆→0

P (D = d, c ≤ C ≤ c + ∆) P (c ≤ C ≤ c + ∆)

slide-104
SLIDE 104

Continuous and discrete random variables

Conditional pmf of D given C? Event {C = c} has zero probability pD|C (d|c) := lim

∆→0

P (D = d, c ≤ C ≤ c + ∆) P (c ≤ C ≤ c + ∆) By the Law of Total Probability and a limit argument pD (d) = ∞

c=−∞

fC (c) pD|C (d|c) dc

slide-105
SLIDE 105

Bayesian coin flip

Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution

slide-106
SLIDE 106

Bayesian coin flip

Bayesian methods often endow parameters of discrete distributions with a continuous marginal distribution

◮ You suspect a coin is biased ◮ You are uncertain about the bias so you model it as a random variable

with pdf fB (b) = 2b for b ∈ [0, 1]

◮ What is the probability of heads?

slide-107
SLIDE 107

Bayesian coin flip

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 fB (·)

slide-108
SLIDE 108

Bayesian coin flip

pX (1)

slide-109
SLIDE 109

Bayesian coin flip

pX (1) = ∞

b=−∞

fB (b) pX|B (1|b) db

slide-110
SLIDE 110

Bayesian coin flip

pX (1) = ∞

b=−∞

fB (b) pX|B (1|b) db = 1

b=0

2b2 db

slide-111
SLIDE 111

Bayesian coin flip

pX (1) = ∞

b=−∞

fB (b) pX|B (1|b) db = 1

b=0

2b2 db = 2 3

slide-112
SLIDE 112

Chain rule for continuous and discrete random variables

pD (d) fC|D (c|d) = lim

∆→0 P (D = d) P (c ≤ C ≤ c + ∆|D = d)

∆ = lim

∆→0

P (D = d, c ≤ C ≤ c + ∆) ∆ = lim

∆→0

P (c ≤ C ≤ c + ∆) ∆ · P (D = d, c ≤ C ≤ c + ∆) P (c ≤ C ≤ c + ∆) = fC (c) pD|C (d|c)

slide-113
SLIDE 113

Grizzlies in Yellowstone

You spot a grizzly that is about 180 kg What is the probability that it is male?

slide-114
SLIDE 114

Grizzlies in Yellowstone

You spot a grizzly that is about 180 kg What is the probability that it is male? pS|W (0|180)

slide-115
SLIDE 115

Grizzlies in Yellowstone

You spot a grizzly that is about 180 kg What is the probability that it is male? pS|W (0|180) = pS (0) fW |S (180|0) fW (180)

slide-116
SLIDE 116

Grizzlies in Yellowstone

You spot a grizzly that is about 180 kg What is the probability that it is male? pS|W (0|180) = pS (0) fW |S (180|0) fW (180) =

1 40 exp

  • − 602

3200

  • 1

40 exp

  • − 602

3200

  • + 1

20 exp

  • − 402

800

= 0.545

slide-117
SLIDE 117

Bayesian coin flip

Coin flip is tails What is the distribution of the bias now?

slide-118
SLIDE 118

Bayesian coin flip

Coin flip is tails What is the distribution of the bias now? fB|X (b|0)

slide-119
SLIDE 119

Bayesian coin flip

Coin flip is tails What is the distribution of the bias now? fB|X (b|0) = fB (b) pX|B (0|b) pX (0)

slide-120
SLIDE 120

Bayesian coin flip

Coin flip is tails What is the distribution of the bias now? fB|X (b|0) = fB (b) pX|B (0|b) pX (0) = 2b (1 − b) 1/3 = 6b (1 − b)

slide-121
SLIDE 121

Bayesian coin flip

0.2 0.4 0.6 0.8 1 0.5 1 1.5 2 fB (·) fB|X (·|0)