Gov 2000: 3. Multiple Random Variables Matthew Blackwell Fall 2016 - - PowerPoint PPT Presentation

β–Ά
gov 2000 3 multiple random variables
SMART_READER_LITE
LIVE PREVIEW

Gov 2000: 3. Multiple Random Variables Matthew Blackwell Fall 2016 - - PowerPoint PPT Presentation

Gov 2000: 3. Multiple Random Variables Matthew Blackwell Fall 2016 1 / 57 1. Distributions of Multiple Random Variables 2. Properties of Joint Distributions 3. Conditional Distributions 4. Wrap-up 2 / 57 Where are we? Where are we going?


slide-1
SLIDE 1

Gov 2000: 3. Multiple Random Variables

Matthew Blackwell

Fall 2016

1 / 57

slide-2
SLIDE 2
  • 1. Distributions of Multiple Random Variables
  • 2. Properties of Joint Distributions
  • 3. Conditional Distributions
  • 4. Wrap-up

2 / 57

slide-3
SLIDE 3

Where are we? Where are we going?

  • Distributions of one variable: how to describe and summarize

uncertainty about one variable.

  • Today: distributions of multiple variables to describe

relationships between variables.

  • Later: use data to learn about probability distributions.

3 / 57

slide-4
SLIDE 4

Why multiple random variables?

1 2 3 4 5 6 7 8 6 7 8 9 10 Log Settler Mortality Log GDP/pop growth

  • 1. How do we summarize the relationship between two variables,

π‘Œ and 𝑍?

  • 2. What if we have many observations of the same variable,

π‘Œ1, π‘Œ2, … , π‘Œπ‘œ?

4 / 57

slide-5
SLIDE 5

1/ Distributions of Multiple Random Variables

5 / 57

slide-6
SLIDE 6

Joint distributions

  • 2
  • 1

1 2

  • 2
  • 1

1 2 x y

  • 2
  • 1

1 2

  • 2
  • 1

1 2 x y

  • 2
  • 1

1 2

  • 15
  • 10
  • 5

5 x y

  • The joint distribution of two r.v.s, π‘Œ and 𝑍, describes what

pairs of observations, (𝑦, 𝑧) are more likely than others.

β–Ά Settler mortality (π‘Œ) and GDP per capita (𝑍) for the same

country.

  • Shape of the joint distribution now includes the relationship

between π‘Œ and 𝑍

6 / 57

slide-7
SLIDE 7

Discrete r.v.s

Joint probability mass function

The joint p.m.f. of a pair of discrete r.v.s, (π‘Œ, 𝑍) describes the probability of any pair of values: π‘”π‘Œ,𝑍(𝑦, 𝑧) = β„™(π‘Œ = 𝑦, 𝑍 = 𝑧)

  • Properties of a joint p.m.f.:

β–Ά π‘”π‘Œ,𝑍(𝑦, 𝑧) β‰₯ 0 (probabilities can’t be negative) β–Ά βˆ‘π‘¦ βˆ‘π‘§ π‘”π‘Œ,𝑍(𝑦, 𝑧) = 1 (something must happen) β–Ά βˆ‘π‘¦ is shorthand for sum over all possible values of π‘Œ 7 / 57

slide-8
SLIDE 8

Example: Gay marriage and gender

Favor Gay Oppose Gay Marriage Marriage 𝑍 = 1 𝑍 = 0 Female π‘Œ = 1 0.3 0.21 Male π‘Œ = 0 0.22 0.27

  • Joint p.m.f. can be summarized in a cross-tab:

β–Ά Each cell is the probability of that combination, π‘”π‘Œ,𝑍(𝑦, 𝑧)

  • Probability that we randomly select a woman who favors gay

marriage? π‘”π‘Œ,𝑍(1, 1) = β„™(π‘Œ = 1, 𝑍 = 1) = 0.3

8 / 57

slide-9
SLIDE 9

Marginal distributions

  • Often need to fjgure out the distribution of just one of the

r.v.s

β–Ά Called the marginal distribution in this context.

  • Computing marginals from the joint p.m.f.:

𝑔𝑍(𝑧) = β„™(𝑍 = 𝑧) = βˆ‘

𝑦

π‘”π‘Œ,𝑍(𝑦, 𝑧)

  • Intuition: sum over the probability that 𝑍 = 𝑧 for all possible

values of 𝑦

β–Ά Works because these are mutually exclusive events that

partition the space of π‘Œ

9 / 57

slide-10
SLIDE 10

Example: marginals for gay marriage

Favor Gay Oppose Gay Marriage Marriage Marginal 𝑍 = 1 𝑍 = 0 Female π‘Œ = 1 0.3 0.21 0.51 Male π‘Œ = 0 0.22 0.27 0.49 Marginal 0.52 0.48

  • What’s the 𝑔𝑍(1) = β„™(𝑍 = 1)?

β–Ά Probability that a man favors gay marriage plus the probability

that a woman favors gay marriage. 𝑔𝑍(1) = π‘”π‘Œ,𝑍(1, 1) + π‘”π‘Œ,𝑍(0, 1) = 0.3 + 0.22 = 0.52

  • Works for all marginals.

10 / 57

slide-11
SLIDE 11

Continuous r.v.s

π‘Œ 𝑍 𝐡

  • We will focus on getting the probability of being in some

subset of the 2-dimensional plane.

11 / 57

slide-12
SLIDE 12

Continuous joint p.d.f.

Continuous joint distribution

Two continuous r.v.s π‘Œ and 𝑍 have a continuous joint distribution if there is a nonnegative function π‘”π‘Œ,𝑍(𝑦, 𝑧) such that for any subset 𝐡 of the 𝑦𝑧-plane, β„™((π‘Œ, 𝑍) ∈ 𝐡) = ∬

(𝑦,𝑧)∈𝐡 π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦𝑒𝑧.

  • π‘”π‘Œ,𝑍(𝑦, 𝑧) is the joint probability density function.
  • {(𝑦, 𝑧) ∢ π‘”π‘Œ,𝑍(𝑦, 𝑧) > 0} is called the support of the distribution.
  • Joint p.d.f. must meet the following conditions:
  • 1. π‘”π‘Œ,𝑍(𝑦, 𝑧) β‰₯ 0 for all values of (𝑦, 𝑧), (nonnegative)
  • 2. ∫∞

βˆ’βˆž ∫∞ βˆ’βˆž π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦𝑒𝑧 = 1, (probabilities β€œsum” to 1)

  • β„™(π‘Œ = 𝑦, 𝑍 = 𝑧) = 0 for similar reasons as with single r.v.s.

12 / 57

slide-13
SLIDE 13

Joint densities are 3D

0.00 0.05 0.10 0.15

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y

  • π‘Œ and 𝑍 axes are on the β€œfmoor,” height is the value of

π‘”π‘Œ,𝑍(𝑦, 𝑧).

  • Remember π‘”π‘Œ,𝑍(𝑦, 𝑧) β‰  β„™(π‘Œ = 𝑦, 𝑍 = 𝑧).

13 / 57

slide-14
SLIDE 14

Probability = volume

0.00 0.05 0.10 0.15

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y

  • β„™((π‘Œ, 𝑍) ∈ 𝐡) = ∬

(𝑦,𝑧)∈𝐡 π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦𝑒𝑧

  • Probability = volume above a specifjc region.

14 / 57

slide-15
SLIDE 15

Working with joint p.d.f.s

  • Suppose we have the following form of a joint p.d.f.:

π‘”π‘Œ,𝑍(𝑦, 𝑧) = ⎧ { ⎨ { ⎩ 𝑑(𝑦 + 𝑧) for 0 < 𝑦 < 2 and 0 < 𝑧 < 2

  • therwise
  • What does 𝑑 have to be for this to be a valid p.d.f.?

1 = ∫

∞ βˆ’βˆž ∫ ∞ βˆ’βˆž π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦𝑒𝑧

= ∫

2 0 ∫ 2 0 𝑑(𝑦 + 𝑧)𝑒𝑦𝑒𝑧

= 𝑑 ∫

2 0 (𝑦2

2 + 𝑦𝑧)∣

𝑦=2 𝑦=0

𝑒𝑧 = 𝑑 ∫

2 0 (2 + 2𝑧)𝑒𝑧

= (2𝑑𝑧 + 𝑑𝑧2)∣

2 0 = 8𝑑

  • Thus to be a valid p.d.f., 𝑑 = 1/8

15 / 57

slide-16
SLIDE 16

Example continuous distribution

0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 x y x y z p l u s

π‘”π‘Œ,𝑍(𝑦, 𝑧) = ⎧ { ⎨ { ⎩ (𝑦 + 𝑧)/8 for 0 < 𝑦 < 2 and 0 < 𝑧 < 2

  • therwise

16 / 57

slide-17
SLIDE 17

Continuous marginal distributions

  • We can recover the marginal PDF of one of the variables by

integrating over the distribution of the other variable: 𝑔𝑍(𝑧) = ∫

∞ βˆ’βˆž π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦

  • Works for either variable:

π‘”π‘Œ(𝑦) = ∫

∞ βˆ’βˆž π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑧

17 / 57

slide-18
SLIDE 18

Visualizing continuous marginals

y x z

  • Marginal integrates (sums, basically) over other r.v.:

𝑔𝑍(𝑧) = ∫∞

βˆ’βˆž π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦

  • Pile up/fmatten all of the joint density onto a single dimension.

18 / 57

slide-19
SLIDE 19

Deriving continuous marginals

π‘”π‘Œ,𝑍(𝑦, 𝑧) = ⎧ { ⎨ { ⎩ (𝑦 + 𝑧)/8 for 0 < 𝑦 < 2 and 0 < 𝑧 < 2

  • therwise
  • Let’s calculate the marginals for this p.d.f.:

π‘”π‘Œ(𝑦) = ∫

2

1 8(𝑦 + 𝑧)𝑒𝑧 = (𝑦𝑧 8 + 𝑧2 16)∣

𝑧=2 𝑧=0

= 𝑦 4 + 1 4 = 𝑦 + 1 4

  • By symmetry we have the same for 𝑧:

𝑔𝑍(𝑧) = (𝑧 + 1)/4

19 / 57

slide-20
SLIDE 20

Joint c.d.f.s

Joint cumulative distribution function

For two r.v.s π‘Œ and 𝑍, the joint cumulative distribution function or joint c.d.f. πΊπ‘Œ,𝑍(𝑦, 𝑧) is a function such that for fjnite values 𝑦 and 𝑧, πΊπ‘Œ,𝑍(𝑦, 𝑧) = β„™(π‘Œ ≀ 𝑦, 𝑍 ≀ 𝑧).

  • Deriving p.d.f. from c.d.f.: π‘”π‘Œ,𝑍(𝑦,𝑧) = πœ–2πΊπ‘Œ,𝑍(𝑦,𝑧)

πœ–π‘¦πœ–π‘§

  • Deriving c.d.f. from p.d.f: πΊπ‘Œ,𝑍(𝑦, 𝑧) = βˆ«π‘§

βˆ’βˆž βˆ«π‘¦ βˆ’βˆž π‘”π‘Œ,𝑍(𝑠, 𝑑)𝑒𝑠𝑒𝑑

20 / 57

slide-21
SLIDE 21

2/ Properties of Joint Distributions

21 / 57

slide-22
SLIDE 22

Properties of joint distributions

  • Single r.v.: summarized π‘”π‘Œ(𝑦) with 𝔽[π‘Œ] and π•Ž[π‘Œ]
  • With 2 r.v.s, we can additionally measure how strong the

dependence is between the variables.

  • First: expectations over joint distributions and independence

22 / 57

slide-23
SLIDE 23

Expectations over multiple r.v.s

  • 2-d LOTUS: take expectations over the joint distribution.
  • With discrete π‘Œ and 𝑍:

𝔽[𝑕(π‘Œ, 𝑍)] = βˆ‘

𝑦

βˆ‘

𝑧

𝑕(𝑦, 𝑧) π‘”π‘Œ,𝑍(𝑦, 𝑧)

  • With continuous π‘Œ and 𝑍:

𝔽[𝑕(π‘Œ, 𝑍)] = ∫

𝑦 ∫ 𝑧 𝑕(𝑦, 𝑧) π‘”π‘Œ,𝑍(𝑦, 𝑧)𝑒𝑦𝑒𝑧

  • Marginal expectations:

𝔽[𝑍] = βˆ‘

𝑦

βˆ‘

𝑧

𝑧 π‘”π‘Œ,𝑍(𝑦, 𝑧)

  • Example: expectation of the product:

𝔽[π‘Œπ‘] = βˆ‘

𝑦

βˆ‘

𝑧

𝑦𝑧 π‘”π‘Œ,𝑍(𝑦, 𝑧)

23 / 57

slide-24
SLIDE 24

Marginal expectations from joint

π‘”π‘Œ,𝑍(𝑦, 𝑧) = ⎧ { ⎨ { ⎩ (𝑦 + 𝑧)/8 for 0 < 𝑦 < 2 and 0 < 𝑧 < 2

  • therwise
  • Marginal expectation of 𝑍:

𝔽[𝑍] = ∫

2 0 ∫ 2 0 𝑧1

8(𝑦 + 𝑧)𝑒𝑦𝑒𝑧 = ∫

2 0 𝑧 ∫ 2

1 8(𝑦 + 𝑧)𝑒𝑦𝑒𝑧 = ∫

2 0 𝑧1

4(𝑧 + 1)𝑒𝑧 = ( 𝑧3 12 + 𝑧2 8 )∣

2

= 2 3 + 1 2 = 7 6

  • By symmetry, 𝔽[π‘Œ] = 𝔽[𝑍] = 7/6

24 / 57

slide-25
SLIDE 25

Independence

Independence

Two r.v.s 𝑍 and π‘Œ are independent (which we write π‘Œ βŸ‚ βŸ‚ 𝑍) if for all sets 𝐡 and 𝐢: β„™(π‘Œ ∈ 𝐡, 𝑍 ∈ 𝐢) = β„™(π‘Œ ∈ 𝐡)β„™(𝑍 ∈ 𝐢).

  • Knowing the value of π‘Œ gives us no information about the

value of 𝑍.

  • If π‘Œ and 𝑍 are independent, then:

β–Ά π‘”π‘Œ,𝑍(𝑦, 𝑧) = π‘”π‘Œ(𝑦)𝑔𝑍(𝑧) (joint is the product of marginals) β–Ά πΊπ‘Œ,𝑍(𝑦, 𝑧) = πΊπ‘Œ(𝑦)𝐺𝑍(𝑧) β–Ά β„Ž(π‘Œ) βŸ‚

βŸ‚ 𝑕(𝑍) for any functions β„Ž() and 𝑕() (functions of independent r.v.s are independent)

25 / 57

slide-26
SLIDE 26

Key properties of independent r.v.s

  • Theorem If π‘Œ and 𝑍 are independent r.v.s, then

𝔽[π‘Œπ‘] = 𝔽[π‘Œ]𝔽[𝑍].

  • Proof for discrete π‘Œ and 𝑍:

𝔽[π‘Œπ‘] = βˆ‘

𝑦

βˆ‘

𝑧

𝑦𝑧 π‘”π‘Œ,𝑍(𝑦, 𝑧) = βˆ‘

𝑦

βˆ‘

𝑧

𝑦𝑧 π‘”π‘Œ(𝑦)𝑔𝑍(𝑧) = (βˆ‘

𝑦

𝑦 π‘”π‘Œ(𝑦)) βŽ› ⎜ ⎝ βˆ‘

𝑧

𝑧 𝑔𝑍(𝑧)⎞ ⎟ ⎠ = 𝔽[π‘Œ]𝔽[𝑍]

26 / 57

slide-27
SLIDE 27

Why independence?

  • Independence assumptions are everywhere in theoretical and

applied statistics.

β–Ά Each response in a poll is considered independent of all other

responses.

β–Ά In a randomized control trial, treatment assignment is

independent of background characteristics.

  • Lack of independence is a blessing or a curse:

β–Ά Two variables not independent ⇝ potentially interesting

relationship.

β–Ά In observational studies, treatment assignment is usually not

independent of background characteristics.

27 / 57

slide-28
SLIDE 28

Covariance

  • If two variables are not independent, how do we measure the

strength of their dependence?

β–Ά Covariance β–Ά Correlation

  • Covariance: how do two r.v.s vary together?

β–Ά How often do high values of π‘Œ occur with high values of 𝑍? 28 / 57

slide-29
SLIDE 29

Defining covariance

  • If two variables are not independent, how do we measure the

strength of their dependence?

Covariance

The covariance between two r.v.s, π‘Œ and 𝑍 is defjned as: Cov[π‘Œ, 𝑍] = 𝔽[(π‘Œ βˆ’ 𝔽[π‘Œ])(𝑍 βˆ’ 𝔽[𝑍])]

  • How often do high values of π‘Œ occur with high values of 𝑍?
  • Properties of covariances:

β–Ά Cov[π‘Œ, 𝑍] = 𝔽[π‘Œπ‘] βˆ’ 𝔽[π‘Œ]𝔽[𝑍] β–Ά If π‘Œ βŸ‚

βŸ‚ 𝑍, Cov[π‘Œ, 𝑍] = 𝔽[π‘Œπ‘] βˆ’ 𝔽[π‘Œ]𝔽[𝑍] = 𝔽[π‘Œ]𝔽[𝑍] βˆ’ 𝔽[π‘Œ]𝔽[𝑍] = 0

29 / 57

slide-30
SLIDE 30

Covariance intuition

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y E[X] E[Y]

30 / 57

slide-31
SLIDE 31

Covariance intuition

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y E[X] E[Y] Y > E[Y] X > E[X] Y > E[Y] X < E[X] Y < E[Y] X < E[X] Y < E[Y] X > E[X]

  • Large values of π‘Œ tend to occur with large values of 𝑍:

β–Ά (π‘Œ βˆ’ 𝔽[π‘Œ])(𝑍 βˆ’ 𝔽[𝑍]) = (pos. num.) Γ— (pos. num) = +

  • Small values of π‘Œ tend to occur with small values of 𝑍:

β–Ά (π‘Œ βˆ’ 𝔽[π‘Œ])(𝑍 βˆ’ 𝔽[𝑍]) = (neg. num.) Γ— (neg. num) = +

  • If these dominate ⇝ positive covariance.

31 / 57

slide-32
SLIDE 32

Covariance intuition

  • 4
  • 2

2 4

  • 4
  • 2

2 4 x y E[X] E[Y] Y > E[Y] X > E[X] Y > E[Y] X < E[X] Y < E[Y] X < E[X] Y < E[Y] X > E[X]

  • Large values of π‘Œ tend to occur with small values of 𝑍:

β–Ά (π‘Œ βˆ’ 𝔽[π‘Œ])(𝑍 βˆ’ 𝔽[𝑍]) = (pos. num.) Γ— (neg. num) = βˆ’

  • Small values of π‘Œ tend to occur with large values of 𝑍:

β–Ά (π‘Œ βˆ’ 𝔽[π‘Œ])(𝑍 βˆ’ 𝔽[𝑍]) = (neg. num.) Γ— (pos. num) = βˆ’

  • If these dominate ⇝ negative covariance.

32 / 57

slide-33
SLIDE 33

Covariance from joint p.d.f.

  • Using our running example of π‘”π‘Œ,𝑍(𝑦, 𝑧) = (𝑦 + 𝑧)/8
  • From earlier: 𝔽[π‘Œ] = 𝔽[𝑍] = 7/6
  • Expectation of the product:

𝔽[π‘Œπ‘] = ∫

2 0 ∫ 2 0 𝑦𝑧1

8(𝑦 + 𝑧)𝑒𝑦𝑒𝑧 = ∫

2 0 ∫ 2

1 8(𝑦2𝑧 + 𝑦𝑧2)𝑒𝑦𝑒𝑧 = ∫

2 0 (𝑦3𝑧

24 + 𝑦2𝑧2 16 )∣

𝑦=2 𝑦=0

𝑒𝑧 = ∫

2 0 (𝑧

3 + 𝑧2 4 ) 𝑒𝑧 = (𝑧2 6 + 𝑧3 12)∣

2

= 2 3 + 2 3 = 4 3

  • Covariance:

Cov[π‘Œ, 𝑍] = 𝔽[π‘Œπ‘] βˆ’ 𝔽[π‘Œ]𝔽[𝑍] = 4 3 βˆ’ (7 6)

2

= βˆ’ 1 36

33 / 57

slide-34
SLIDE 34

Zero covariance doesn’t imply independence

  • We saw that π‘Œ βŸ‚

βŸ‚ 𝑍 ⇝ Cov[π‘Œ, 𝑍] = 0.

  • Does Cov[π‘Œ, 𝑍] = 0 imply that π‘ŒβŸ‚

βŸ‚π‘? No!

  • Counterexample: π‘Œ ∈ {βˆ’1, 0, 1} with equal probability and

𝑍 = π‘Œ2.

  • Covariance is a measure of linear dependence, so it can miss

non-linear dependence.

34 / 57

slide-35
SLIDE 35

Properties of variances and covariances

  • Properties of covariances:
  • 1. Cov[π‘π‘Œ + 𝑐, 𝑑𝑍 + 𝑒] = 𝑏𝑑Cov[π‘Œ, 𝑍].
  • 2. Cov[π‘Œ, π‘Œ] = π•Ž[π‘Œ]
  • Properties of variances that we can state now that we know

covariance:

  • 1. π•Ž[π‘π‘Œ + 𝑐𝑍 + 𝑑] = 𝑏2π•Ž[π‘Œ] + 𝑐2π•Ž[𝑍] + 2𝑏𝑐Cov[π‘Œ, 𝑍]
  • 2. If π‘Œ and 𝑍 independent, π•Ž[π‘Œ + 𝑍] = π•Ž[π‘Œ] + π•Ž[𝑍].

35 / 57

slide-36
SLIDE 36

Using properties of covariance

  • Rescale our running example: π‘Ž = 2π‘Œ, 𝑋 = 2𝑍.
  • What’s the covariance of (π‘Ž, 𝑋)?

β–Ά Ugh, let’s avoid more integrals.

  • Use properties of covariances:

Cov[π‘Ž, 𝑋] = Cov[2π‘Œ, 2𝑍] = 2 Γ— 2 Γ— Cov[π‘Œ, 𝑍] = βˆ’1 9

36 / 57

slide-37
SLIDE 37

Correlation

  • Covariance is not scale-free: Cov[2π‘Œ, 𝑍] = 2Cov[π‘Œ, 𝑍]

β–Ά ⇝ hard to compare covriances across difgerent r.v.s β–Ά Is a relationship stronger? Or just do to rescaling?

  • Correlation is a scale-free measure of linear dependence.

Correlation

The correlation between two r.v.s π‘Œ and 𝑍 is defjned as: 𝜍 = 𝜍(π‘Œ, 𝑍) = Cov[π‘Œ, 𝑍] βˆšπ•Ž[π‘Œ]π•Ž[𝑍]

  • Covariance after dividing out the scales of the respective

variables.

  • Correlation properties:

β–Ά βˆ’1 ≀ 𝜍 ≀ 1 β–Ά if |𝜍(π‘Œ, 𝑍)| = 1, then π‘Œ and 𝑍 are perfectly correlated with a

deterministic linear relationship: 𝑍 = 𝑏 + π‘π‘Œ.

37 / 57

slide-38
SLIDE 38

3/ Conditional Distributions

38 / 57

slide-39
SLIDE 39

Conditional distributions

  • Conditional distribution: distribution of 𝑍 if we know π‘Œ = 𝑦.

Conditional probability mass function

The conditional probability mass function or conditional p.m.f. of 𝑍 conditional on π‘Œ is 𝑔𝑍|π‘Œ(𝑧|𝑦) = β„™(π‘Œ = 𝑦, 𝑍 = 𝑧) β„™(π‘Œ = 𝑦) = π‘”π‘Œ,𝑍(𝑦, 𝑧) π‘”π‘Œ(𝑦)

  • Intuitive defjnition:

𝑔𝑍|π‘Œ(𝑧|𝑦) = Probability that π‘Œ = 𝑦 and 𝑍 = 𝑧 Probability that π‘Œ = 𝑦

  • This is a valid univariate probability distribution!

β–Ά 𝑔𝑍|π‘Œ(𝑧|𝑦) β‰₯ 0 and βˆ‘π‘§ 𝑔𝑍|π‘Œ(𝑧|𝑦) = 1

  • If π‘Œ βŸ‚

βŸ‚ 𝑍 then 𝑔𝑍|π‘Œ(𝑧|𝑦) = 𝑔𝑍(𝑧) (conditional is the marginal)

39 / 57

slide-40
SLIDE 40

Example: conditionals for gay marriage

Favor Gay Oppose Gay Marriage Marriage Marginal 𝑍 = 1 𝑍 = 0 Female π‘Œ = 1 0.3 0.21 0.51 Male π‘Œ = 0 0.22 0.27 0.49 Marginal 0.52 0.48

  • Probability of favoring gay marriage conditional on being a

man? 𝑔𝑍|π‘Œ(𝑧 = 1|𝑦 = 0) = β„™(π‘Œ = 0, 𝑍 = 1) β„™(π‘Œ = 0) = 0.22 0.22 + 0.27 = 0.44

40 / 57

slide-41
SLIDE 41

Example: conditionals for gay marriage

1

Men

Gay marriage support (Y) Conditional Prob. 0.0 0.2 0.4 0.6 0.8 1.0 1

Women

Gay marriage support (Y) Conditional Prob. 0.0 0.2 0.4 0.6 0.8 1.0

  • Two values of π‘Œ ⇝ two univariate conditional distribution of 𝑍

41 / 57

slide-42
SLIDE 42

Continuous conditional distributions

Conditional probability density function

The conditional p.d.f. of a continuous random variable is 𝑔𝑍|π‘Œ(𝑧|𝑦) = π‘”π‘Œ,𝑍(𝑦, 𝑧) π‘”π‘Œ(𝑦) assuming that π‘”π‘Œ(𝑦) > 0.

  • Implies

β„™(𝑏 < 𝑍 < 𝑐|π‘Œ = 𝑦) = ∫

𝑐 𝑏 𝑔𝑍|π‘Œ(𝑧|𝑦)𝑒𝑧.

  • Based on the defjnition of the conditional p.m.f./p.d.f., we

have the following factorization: π‘”π‘Œ,𝑍(𝑦, 𝑧) = 𝑔𝑍|π‘Œ(𝑧|𝑦)π‘”π‘Œ(𝑦)

42 / 57

slide-43
SLIDE 43

Conditional distributions as slices

  • 𝑔𝑍|π‘Œ(𝑧|𝑦0) is the conditional p.d.f. of 𝑍 when π‘Œ = 𝑦0
  • 𝑔𝑍|π‘Œ(𝑧|𝑦0) is proportional to joint p.d.f. along 𝑦0: π‘”π‘Œ,𝑍(𝑧, 𝑦0)
  • Normalize by dividing by π‘”π‘Œ(𝑦0) to ensure proper p.d.f.

43 / 57

slide-44
SLIDE 44

Continuous conditional example

  • Using our running example of π‘”π‘Œ,𝑍(𝑦, 𝑧) = (𝑦 + 𝑧)/8
  • Earlier we calculated π‘”π‘Œ(𝑦) = (𝑦 + 1)/4
  • Calculate conditional:

𝑔𝑍|π‘Œ(𝑧|𝑦) = π‘”π‘Œ,𝑍(𝑦, 𝑧) π‘”π‘Œ(𝑦) = (𝑦 + 𝑧)/8 (𝑦 + 1)/4 = 𝑦 + 𝑧 2(𝑦 + 1)

  • Remember the limits: 0 < 𝑧 < 2, 0 otherwise

44 / 57

slide-45
SLIDE 45

Conditional Independence

Conditional independence

Two r.v.s π‘Œ and 𝑍 are conditionally independent given π‘Ž (written π‘Œ βŸ‚ βŸ‚ 𝑍|π‘Ž) if π‘”π‘Œ,𝑍|π‘Ž(𝑦, 𝑧|𝑨) = π‘”π‘Œ|π‘Ž(𝑦|𝑨)𝑔𝑍|π‘Ž(𝑧|𝑨).

  • π‘Œ and 𝑍 are independent within levels of π‘Ž.
  • Massively important for regression, causal inference.
  • Example:

β–Ά π‘Œ = swimming accidents, 𝑍 = number of ice cream cones sold. β–Ά In general, dependent. β–Ά Conditional on π‘Ž = temperature, independent. 45 / 57

slide-46
SLIDE 46

Summarizing conditional distributions

  • 4
  • 2

2 4 y f(y|0) f(y|1)

  • Conditional distributions are also univariate distribution and

so we can summarize them with its mean and variance.

  • Gives us insight into a key question:

β–Ά How does the mean of 𝑍 change as we change π‘Œ? 46 / 57

slide-47
SLIDE 47

Defining condition expectations

Conditional expectation

The conditional expectation of 𝑍 conditional on π‘Œ = 𝑦 is: 𝔽[𝑍|π‘Œ = 𝑦] = ⎧ { { ⎨ { { ⎩ βˆ‘

𝑧

𝑧 𝑔𝑍|π‘Œ(𝑧|𝑦) discrete 𝑍 ∫

∞ βˆ’βˆž 𝑧 𝑔𝑍|π‘Œ(𝑧|𝑦)𝑒𝑧

continuous 𝑍

  • Intuition: exactly the same defjnition of the expected value

with 𝑔𝑍|π‘Œ(𝑧|𝑦) in place of 𝑔𝑍(𝑧)

  • The expected value of the (univariate) conditional distribution.
  • This is a function of 𝑦!

47 / 57

slide-48
SLIDE 48

Calculating conditional expectations

Favor Gay Oppose Gay Marriage Marriage Marginal 𝑍 = 1 𝑍 = 0 Female π‘Œ = 1 0.3 0.21 0.51 Male π‘Œ = 0 0.22 0.27 0.49 Marginal 0.52 0.48

  • What’s the conditional expectation of support for gay

marriage 𝑍 given someone is a man π‘Œ = 0? 𝔽[𝑍|π‘Œ = 0] = βˆ‘

𝑧

𝑧 𝑔𝑍|π‘Œ(𝑧|𝑦 = 0) = 0 Γ— 𝑔 (𝑧 = 0|𝑦 = 0) + 1 Γ— 𝑔 (𝑧 = 1|𝑦 = 0) = 1 Γ— 0.22 0.22 + 0.27 = 0.44

48 / 57

slide-49
SLIDE 49

Conditional expectations are random variables

  • For a particular 𝑦, 𝔽[𝑍|π‘Œ = 𝑦] is a number.
  • But π‘Œ takes on many possible values with uncertainty

⇝ 𝔽[𝑍|π‘Œ] takes on many possible values with uncertainty.

  • ⇝ Conditional expectations are random variables!
  • Binary π‘Œ:

𝔽[𝑍|π‘Œ] = ⎧ { ⎨ { ⎩ 𝔽[𝑍|π‘Œ = 0] with prob. β„™(π‘Œ = 0) 𝔽[𝑍|π‘Œ = 1] with prob. β„™(π‘Œ = 1)

  • Has an expectation, 𝔽[𝔽[𝑍|π‘Œ]], and a variance, π•Ž[𝔽[𝑍|π‘Œ]].

49 / 57

slide-50
SLIDE 50

Law of iterated expectations

  • Average/mean of the conditional expectations: 𝔽[𝔽[𝑍|π‘Œ]].

β–Ά Can we connect this to the marginal (overall) expectation?

  • Theorem (The Law of Iterated Expectations): If the

expectation exist and for discrete π‘Œ, 𝔽[𝑍] = 𝔽 [𝔽[𝑍|π‘Œ]] = βˆ‘

𝑦

𝔽[𝑍|π‘Œ = 𝑦]π‘”π‘Œ(𝑦)

50 / 57

slide-51
SLIDE 51

Example: law of iterated expectations

Favor Gay Oppose Gay Marriage Marriage Marginal 𝑍 = 1 𝑍 = 0 Female π‘Œ = 1 0.3 0.21 0.51 Male π‘Œ = 0 0.22 0.27 0.49 Marginal 0.52 0.48 1

  • 𝔽[𝑍|π‘Œ = 1] = 0.59 and 𝔽[𝑍|π‘Œ = 0] = 0.44.
  • π‘”π‘Œ(1) = 0.51 (females) and π‘”π‘Œ(0) = 0.49 (males).
  • Plug into the iterated expectations:

𝔽[𝔽[𝑍|π‘Œ]] = 𝔽[𝑍|π‘Œ = 0]π‘”π‘Œ(0) + 𝔽[𝑍|π‘Œ = 1]π‘”π‘Œ(1) = 0.44 Γ— 0.49 + 0.59 Γ— 0.51 = 0.52 = 𝔽[𝑍]

51 / 57

slide-52
SLIDE 52

Properties of conditional expectations

  • 1. 𝔽[𝑑(π‘Œ)|π‘Œ] = 𝑑(π‘Œ) for any function 𝑑(π‘Œ).

β–Ά Example: 𝔽[π‘Œ2|π‘Œ] = π‘Œ2 (If we know π‘Œ, then we also know π‘Œ2)

  • 2. If π‘Œ and 𝑍 are independent r.v.s, then

𝔽[𝑍|π‘Œ = 𝑦] = 𝔽[𝑍].

  • 3. If π‘Œ βŸ‚

βŸ‚ 𝑍|π‘Ž, then 𝔽[𝑍|π‘Œ = 𝑦, π‘Ž = 𝑨] = 𝔽[𝑍|π‘Ž = 𝑨].

52 / 57

slide-53
SLIDE 53

Conditional Variance

Conditional expectation

The conditional variance of a 𝑍 given π‘Œ = 𝑦 is defjned as: π•Ž[𝑍|π‘Œ = 𝑦] = 𝔽 [(𝑍 βˆ’ 𝔽[𝑍|π‘Œ = 𝑦])2|π‘Œ = 𝑦]

  • Conditional variance describes the spread of the conditional

distribution around the conditional expectation.

  • Usual variance formula applied to conditional distribution.
  • Using LOTUS:

β–Ά Discrete 𝑍:

π•Ž[𝑍|π‘Œ = 𝑦] = βˆ‘

𝑧

(𝑧 βˆ’ 𝔽[𝑍|π‘Œ = 𝑦])2𝑔𝑍|π‘Œ(𝑧|𝑦)

β–Ά Continuous 𝑍:

π•Ž[𝑍|π‘Œ = 𝑦] = ∫

𝑧(𝑧 βˆ’ 𝔽[𝑍|π‘Œ = 𝑦])2𝑔𝑍|π‘Œ(𝑧|𝑦)𝑒𝑧

53 / 57

slide-54
SLIDE 54

Conditional variance is a random variable

  • Again, π•Ž[𝑍|π‘Œ] is a random variable and a function of π‘Œ, just

like 𝔽[𝑍|π‘Œ]. With a binary π‘Œ: π•Ž[𝑍|π‘Œ] = ⎧ { ⎨ { ⎩ π•Ž[𝑍|π‘Œ = 0] with prob. β„™(π‘Œ = 0) π•Ž[𝑍|π‘Œ = 1] with prob. β„™(π‘Œ = 1)

54 / 57

slide-55
SLIDE 55

Law of total variance

  • We can also relate the marginal variance to the conditional

variance and the conditional expectation.

  • Theorem (Law of Total Variance/EVE’s law):

π•Ž[𝑍] = 𝔽[π•Ž[𝑍|π‘Œ]] + π•Ž[𝔽[𝑍|π‘Œ]]

  • The total variance can be decomposed into:
  • 1. the average of the within group variance (𝔽[π•Ž[𝑍|π‘Œ]]) and
  • 2. how much the average varies between groups (π•Ž[𝔽[𝑍|π‘Œ]]).

55 / 57

slide-56
SLIDE 56

4/ Wrap-up

56 / 57

slide-57
SLIDE 57

Review

  • Multiple r.v.s require joint p.m.f.s and joint p.d.f.s
  • Multiple r.v.s can have distributions that exhibit dependence

as measured by covariance and correlation.

  • The conditional expectation of one variable on the other is an

important quantity that we’ll see over and over again.

57 / 57