Gov 2000: 3. Multiple Random Variables
Matthew Blackwell
Fall 2016
1 / 57
Gov 2000: 3. Multiple Random Variables Matthew Blackwell Fall 2016 - - PowerPoint PPT Presentation
Gov 2000: 3. Multiple Random Variables Matthew Blackwell Fall 2016 1 / 57 1. Distributions of Multiple Random Variables 2. Properties of Joint Distributions 3. Conditional Distributions 4. Wrap-up 2 / 57 Where are we? Where are we going?
Matthew Blackwell
Fall 2016
1 / 57
2 / 57
uncertainty about one variable.
relationships between variables.
3 / 57
1 2 3 4 5 6 7 8 6 7 8 9 10 Log Settler Mortality Log GDP/pop growth
π and π?
π1, π2, β¦ , ππ?
4 / 57
5 / 57
1 2
1 2 x y
1 2
1 2 x y
1 2
5 x y
pairs of observations, (π¦, π§) are more likely than others.
βΆ Settler mortality (π) and GDP per capita (π) for the same
country.
between π and π
6 / 57
Joint probability mass function
The joint p.m.f. of a pair of discrete r.v.s, (π, π) describes the probability of any pair of values: ππ,π(π¦, π§) = β(π = π¦, π = π§)
βΆ ππ,π(π¦, π§) β₯ 0 (probabilities canβt be negative) βΆ βπ¦ βπ§ ππ,π(π¦, π§) = 1 (something must happen) βΆ βπ¦ is shorthand for sum over all possible values of π 7 / 57
Favor Gay Oppose Gay Marriage Marriage π = 1 π = 0 Female π = 1 0.3 0.21 Male π = 0 0.22 0.27
βΆ Each cell is the probability of that combination, ππ,π(π¦, π§)
marriage? ππ,π(1, 1) = β(π = 1, π = 1) = 0.3
8 / 57
r.v.s
βΆ Called the marginal distribution in this context.
ππ(π§) = β(π = π§) = β
π¦
ππ,π(π¦, π§)
values of π¦
βΆ Works because these are mutually exclusive events that
partition the space of π
9 / 57
Favor Gay Oppose Gay Marriage Marriage Marginal π = 1 π = 0 Female π = 1 0.3 0.21 0.51 Male π = 0 0.22 0.27 0.49 Marginal 0.52 0.48
βΆ Probability that a man favors gay marriage plus the probability
that a woman favors gay marriage. ππ(1) = ππ,π(1, 1) + ππ,π(0, 1) = 0.3 + 0.22 = 0.52
10 / 57
π π π΅
subset of the 2-dimensional plane.
11 / 57
Continuous joint distribution
Two continuous r.v.s π and π have a continuous joint distribution if there is a nonnegative function ππ,π(π¦, π§) such that for any subset π΅ of the π¦π§-plane, β((π, π) β π΅) = β¬
(π¦,π§)βπ΅ ππ,π(π¦, π§)ππ¦ππ§.
ββ β«β ββ ππ,π(π¦, π§)ππ¦ππ§ = 1, (probabilities βsumβ to 1)
12 / 57
0.00 0.05 0.10 0.15
2 4
2 4 x y
ππ,π(π¦, π§).
13 / 57
0.00 0.05 0.10 0.15
2 4
2 4 x y
(π¦,π§)βπ΅ ππ,π(π¦, π§)ππ¦ππ§
14 / 57
ππ,π(π¦, π§) = β§ { β¨ { β© π(π¦ + π§) for 0 < π¦ < 2 and 0 < π§ < 2
1 = β«
β ββ β« β ββ ππ,π(π¦, π§)ππ¦ππ§
= β«
2 0 β« 2 0 π(π¦ + π§)ππ¦ππ§
= π β«
2 0 (π¦2
2 + π¦π§)β£
π¦=2 π¦=0
ππ§ = π β«
2 0 (2 + 2π§)ππ§
= (2ππ§ + ππ§2)β£
2 0 = 8π
15 / 57
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 x y x y z p l u s
ππ,π(π¦, π§) = β§ { β¨ { β© (π¦ + π§)/8 for 0 < π¦ < 2 and 0 < π§ < 2
16 / 57
integrating over the distribution of the other variable: ππ(π§) = β«
β ββ ππ,π(π¦, π§)ππ¦
ππ(π¦) = β«
β ββ ππ,π(π¦, π§)ππ§
17 / 57
y x z
ππ(π§) = β«β
ββ ππ,π(π¦, π§)ππ¦
18 / 57
ππ,π(π¦, π§) = β§ { β¨ { β© (π¦ + π§)/8 for 0 < π¦ < 2 and 0 < π§ < 2
ππ(π¦) = β«
2
1 8(π¦ + π§)ππ§ = (π¦π§ 8 + π§2 16)β£
π§=2 π§=0
= π¦ 4 + 1 4 = π¦ + 1 4
ππ(π§) = (π§ + 1)/4
19 / 57
Joint cumulative distribution function
For two r.v.s π and π, the joint cumulative distribution function or joint c.d.f. πΊπ,π(π¦, π§) is a function such that for fjnite values π¦ and π§, πΊπ,π(π¦, π§) = β(π β€ π¦, π β€ π§).
ππ¦ππ§
ββ β«π¦ ββ ππ,π(π , π‘)ππ ππ‘
20 / 57
21 / 57
dependence is between the variables.
22 / 57
π½[π(π, π)] = β
π¦
β
π§
π(π¦, π§) ππ,π(π¦, π§)
π½[π(π, π)] = β«
π¦ β« π§ π(π¦, π§) ππ,π(π¦, π§)ππ¦ππ§
π½[π] = β
π¦
β
π§
π§ ππ,π(π¦, π§)
π½[ππ] = β
π¦
β
π§
π¦π§ ππ,π(π¦, π§)
23 / 57
ππ,π(π¦, π§) = β§ { β¨ { β© (π¦ + π§)/8 for 0 < π¦ < 2 and 0 < π§ < 2
π½[π] = β«
2 0 β« 2 0 π§1
8(π¦ + π§)ππ¦ππ§ = β«
2 0 π§ β« 2
1 8(π¦ + π§)ππ¦ππ§ = β«
2 0 π§1
4(π§ + 1)ππ§ = ( π§3 12 + π§2 8 )β£
2
= 2 3 + 1 2 = 7 6
24 / 57
Independence
Two r.v.s π and π are independent (which we write π β β π) if for all sets π΅ and πΆ: β(π β π΅, π β πΆ) = β(π β π΅)β(π β πΆ).
value of π.
βΆ ππ,π(π¦, π§) = ππ(π¦)ππ(π§) (joint is the product of marginals) βΆ πΊπ,π(π¦, π§) = πΊπ(π¦)πΊπ(π§) βΆ β(π) β
β π(π) for any functions β() and π() (functions of independent r.v.s are independent)
25 / 57
π½[ππ] = π½[π]π½[π].
π½[ππ] = β
π¦
β
π§
π¦π§ ππ,π(π¦, π§) = β
π¦
β
π§
π¦π§ ππ(π¦)ππ(π§) = (β
π¦
π¦ ππ(π¦)) β β β β
π§
π§ ππ(π§)β β β = π½[π]π½[π]
26 / 57
applied statistics.
βΆ Each response in a poll is considered independent of all other
responses.
βΆ In a randomized control trial, treatment assignment is
independent of background characteristics.
βΆ Two variables not independent β potentially interesting
relationship.
βΆ In observational studies, treatment assignment is usually not
independent of background characteristics.
27 / 57
strength of their dependence?
βΆ Covariance βΆ Correlation
βΆ How often do high values of π occur with high values of π? 28 / 57
strength of their dependence?
Covariance
The covariance between two r.v.s, π and π is defjned as: Cov[π, π] = π½[(π β π½[π])(π β π½[π])]
βΆ Cov[π, π] = π½[ππ] β π½[π]π½[π] βΆ If π β
β π, Cov[π, π] = π½[ππ] β π½[π]π½[π] = π½[π]π½[π] β π½[π]π½[π] = 0
29 / 57
2 4
2 4 x y E[X] E[Y]
30 / 57
2 4
2 4 x y E[X] E[Y] Y > E[Y] X > E[X] Y > E[Y] X < E[X] Y < E[Y] X < E[X] Y < E[Y] X > E[X]
βΆ (π β π½[π])(π β π½[π]) = (pos. num.) Γ (pos. num) = +
βΆ (π β π½[π])(π β π½[π]) = (neg. num.) Γ (neg. num) = +
31 / 57
2 4
2 4 x y E[X] E[Y] Y > E[Y] X > E[X] Y > E[Y] X < E[X] Y < E[Y] X < E[X] Y < E[Y] X > E[X]
βΆ (π β π½[π])(π β π½[π]) = (pos. num.) Γ (neg. num) = β
βΆ (π β π½[π])(π β π½[π]) = (neg. num.) Γ (pos. num) = β
32 / 57
π½[ππ] = β«
2 0 β« 2 0 π¦π§1
8(π¦ + π§)ππ¦ππ§ = β«
2 0 β« 2
1 8(π¦2π§ + π¦π§2)ππ¦ππ§ = β«
2 0 (π¦3π§
24 + π¦2π§2 16 )β£
π¦=2 π¦=0
ππ§ = β«
2 0 (π§
3 + π§2 4 ) ππ§ = (π§2 6 + π§3 12)β£
2
= 2 3 + 2 3 = 4 3
Cov[π, π] = π½[ππ] β π½[π]π½[π] = 4 3 β (7 6)
2
= β 1 36
33 / 57
β π β Cov[π, π] = 0.
βπ? No!
π = π2.
non-linear dependence.
34 / 57
covariance:
35 / 57
βΆ Ugh, letβs avoid more integrals.
Cov[π, π] = Cov[2π, 2π] = 2 Γ 2 Γ Cov[π, π] = β1 9
36 / 57
βΆ β hard to compare covriances across difgerent r.v.s βΆ Is a relationship stronger? Or just do to rescaling?
Correlation
The correlation between two r.v.s π and π is defjned as: π = π(π, π) = Cov[π, π] βπ[π]π[π]
variables.
βΆ β1 β€ π β€ 1 βΆ if |π(π, π)| = 1, then π and π are perfectly correlated with a
deterministic linear relationship: π = π + ππ.
37 / 57
38 / 57
Conditional probability mass function
The conditional probability mass function or conditional p.m.f. of π conditional on π is ππ|π(π§|π¦) = β(π = π¦, π = π§) β(π = π¦) = ππ,π(π¦, π§) ππ(π¦)
ππ|π(π§|π¦) = Probability that π = π¦ and π = π§ Probability that π = π¦
βΆ ππ|π(π§|π¦) β₯ 0 and βπ§ ππ|π(π§|π¦) = 1
β π then ππ|π(π§|π¦) = ππ(π§) (conditional is the marginal)
39 / 57
Favor Gay Oppose Gay Marriage Marriage Marginal π = 1 π = 0 Female π = 1 0.3 0.21 0.51 Male π = 0 0.22 0.27 0.49 Marginal 0.52 0.48
man? ππ|π(π§ = 1|π¦ = 0) = β(π = 0, π = 1) β(π = 0) = 0.22 0.22 + 0.27 = 0.44
40 / 57
1
Men
Gay marriage support (Y) Conditional Prob. 0.0 0.2 0.4 0.6 0.8 1.0 1
Women
Gay marriage support (Y) Conditional Prob. 0.0 0.2 0.4 0.6 0.8 1.0
41 / 57
Conditional probability density function
The conditional p.d.f. of a continuous random variable is ππ|π(π§|π¦) = ππ,π(π¦, π§) ππ(π¦) assuming that ππ(π¦) > 0.
β(π < π < π|π = π¦) = β«
π π ππ|π(π§|π¦)ππ§.
have the following factorization: ππ,π(π¦, π§) = ππ|π(π§|π¦)ππ(π¦)
42 / 57
43 / 57
ππ|π(π§|π¦) = ππ,π(π¦, π§) ππ(π¦) = (π¦ + π§)/8 (π¦ + 1)/4 = π¦ + π§ 2(π¦ + 1)
44 / 57
Conditional independence
Two r.v.s π and π are conditionally independent given π (written π β β π|π) if ππ,π|π(π¦, π§|π¨) = ππ|π(π¦|π¨)ππ|π(π§|π¨).
βΆ π = swimming accidents, π = number of ice cream cones sold. βΆ In general, dependent. βΆ Conditional on π = temperature, independent. 45 / 57
2 4 y f(y|0) f(y|1)
so we can summarize them with its mean and variance.
βΆ How does the mean of π change as we change π? 46 / 57
Conditional expectation
The conditional expectation of π conditional on π = π¦ is: π½[π|π = π¦] = β§ { { β¨ { { β© β
π§
π§ ππ|π(π§|π¦) discrete π β«
β ββ π§ ππ|π(π§|π¦)ππ§
continuous π
with ππ|π(π§|π¦) in place of ππ(π§)
47 / 57
Favor Gay Oppose Gay Marriage Marriage Marginal π = 1 π = 0 Female π = 1 0.3 0.21 0.51 Male π = 0 0.22 0.27 0.49 Marginal 0.52 0.48
marriage π given someone is a man π = 0? π½[π|π = 0] = β
π§
π§ ππ|π(π§|π¦ = 0) = 0 Γ π (π§ = 0|π¦ = 0) + 1 Γ π (π§ = 1|π¦ = 0) = 1 Γ 0.22 0.22 + 0.27 = 0.44
48 / 57
β π½[π|π] takes on many possible values with uncertainty.
π½[π|π] = β§ { β¨ { β© π½[π|π = 0] with prob. β(π = 0) π½[π|π = 1] with prob. β(π = 1)
49 / 57
βΆ Can we connect this to the marginal (overall) expectation?
expectation exist and for discrete π, π½[π] = π½ [π½[π|π]] = β
π¦
π½[π|π = π¦]ππ(π¦)
50 / 57
Favor Gay Oppose Gay Marriage Marriage Marginal π = 1 π = 0 Female π = 1 0.3 0.21 0.51 Male π = 0 0.22 0.27 0.49 Marginal 0.52 0.48 1
π½[π½[π|π]] = π½[π|π = 0]ππ(0) + π½[π|π = 1]ππ(1) = 0.44 Γ 0.49 + 0.59 Γ 0.51 = 0.52 = π½[π]
51 / 57
βΆ Example: π½[π2|π] = π2 (If we know π, then we also know π2)
π½[π|π = π¦] = π½[π].
β π|π, then π½[π|π = π¦, π = π¨] = π½[π|π = π¨].
52 / 57
Conditional expectation
The conditional variance of a π given π = π¦ is defjned as: π[π|π = π¦] = π½ [(π β π½[π|π = π¦])2|π = π¦]
distribution around the conditional expectation.
βΆ Discrete π:
π[π|π = π¦] = β
π§
(π§ β π½[π|π = π¦])2ππ|π(π§|π¦)
βΆ Continuous π:
π[π|π = π¦] = β«
π§(π§ β π½[π|π = π¦])2ππ|π(π§|π¦)ππ§
53 / 57
like π½[π|π]. With a binary π: π[π|π] = β§ { β¨ { β© π[π|π = 0] with prob. β(π = 0) π[π|π = 1] with prob. β(π = 1)
54 / 57
variance and the conditional expectation.
π[π] = π½[π[π|π]] + π[π½[π|π]]
55 / 57
56 / 57
as measured by covariance and correlation.
important quantity that weβll see over and over again.
57 / 57