From Statistical Transportability to Estimating the Effect of Stochastic Interventions
Juan D. Correa and Elias Bareinboim
{j.d.correa, eliasb}@columbia.edu
- 1
From Statistical Transportability to Estimating the Effect of - - PowerPoint PPT Presentation
From Statistical Transportability to Estimating the Effect of Stochastic Interventions Juan D. Correa and Elias Bareinboim {j.d.correa, eliasb}@columbia.edu 1 Generalization Challenges 2 Generalization Challenges One of the main
Juan D. Correa and Elias Bareinboim
{j.d.correa, eliasb}@columbia.edu
2
generated by the same process.
2
generated by the same process.
learning very accurately the underlying distribution.
2
generated by the same process.
learning very accurately the underlying distribution.
same as the one where the model is intended to be used, and will be deployed.
2
generated by the same process.
learning very accurately the underlying distribution.
same as the one where the model is intended to be used, and will be deployed.
structural similarities between training and target environments.
2
3
3
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
3
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age) Generalization
3
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
3
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
We use to represent differences in mechanism or distribution
P(W) ≠ P*(W) hence P(y | x) ≠ P*(y | x)
3
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
We use to represent differences in mechanism or distribution
4
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
target environments?
4
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
target environments?
4
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
5
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
5
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
We observe P(x,y,w)
5
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
We observe P(x,y,w) We want to say something about P*(y|x)
5
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
We observe P(x,y,w) We want to say something about P*(y|x)
P(x,y,w)=P(w) P(x|w) P(y|x,w)
5
Current Website (𝚸) (training environment) X Y W
(type of ad) (bought) (age)
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age) Generalization
We observe P(x,y,w) We want to say something about P*(y|x)
P(x,y,w)=P(w) P(x|w) P(y|x,w)
are the same in both environments, which is implied by this causal model.
6
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
6
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
6
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
P*(y|x) = P*(y, x) P*(x) = ∑w P*(y|x, w)P*(x|w)P*(w) ∑w P*(x|w)P*(w)
6
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
P*(y|x) = P*(y, x) P*(x) = ∑w P*(y|x, w)P*(x|w)P*(w) ∑w P*(x|w)P*(w)
are the same in source and target
6
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
P*(y|x) = P*(y, x) P*(x) = ∑w P*(y|x, w)P*(x|w)P*(w) ∑w P*(x|w)P*(w)
6
New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
P*(y|x) = P*(y, x) P*(x) = ∑w P*(y|x, w)P*(x|w)P*(w) ∑w P*(x|w)P*(w) = ∑w P(y|x, w)P(x|w)P*(w) ∑w P(x|w)P*(w)
6
P*(w) needs to be measured in the target environment,
while the other distributions can be reused from the data collected in the source environment. New Website (𝚸*) (target environment) X Y W
(type of ad) (bought) (age)
P*(y|x) = P*(y, x) P*(x) = ∑w P*(y|x, w)P*(x|w)P*(w) ∑w P*(x|w)P*(w) = ∑w P(y|x, w)P(x|w)P*(w) ∑w P(x|w)P*(w)
7
7
Source (𝚸) Target (𝚸*)
Selection Diagram D
7
Source (𝚸) Target (𝚸*)
Selection Diagram D
P(v)
Distribution learned from 𝛒
7
Source (𝚸) Target (𝚸*)
Selection Diagram D
P(v)
Distribution learned from 𝛒
P*(w)
Partial distribution from 𝛒*
7
Source (𝚸) Target (𝚸*)
Selection Diagram D Is there a function f such that
?
P*(y|x) = f(P(v), P*(w))
P(v)
Distribution learned from 𝛒
P*(w)
Partial distribution from 𝛒*
7
Source (𝚸) Target (𝚸*)
Selection Diagram D Is there a function f such that
?
P*(y|x) = f(P(v), P*(w))
P(v)
Distribution learned from 𝛒
P*(w)
Partial distribution from 𝛒*
yes ( ) / no
f
😁 ☹
8
Encode the assumptions about the differences and commonalities across environments.
8
1
Encode the assumptions about the differences and commonalities across environments.
8
Selection diagrams (with )
1
Encode the assumptions about the differences and commonalities across environments. Identify the stable mechanisms across environments.
8
Selection diagrams (with )
1 2
Encode the assumptions about the differences and commonalities across environments. Identify the stable mechanisms across environments. Determine the variables that need to be re- measured.
8
Selection diagrams (with )
1 2 3
Encode the assumptions about the differences and commonalities across environments. Identify the stable mechanisms across environments. Determine the variables that need to be re- measured. Construct an estimator from the available data.
8
Selection diagrams (with )
1 2 3 4
Encode the assumptions about the differences and commonalities across environments. Identify the stable mechanisms across environments. Determine the variables that need to be re- measured. Construct an estimator from the available data.
8
Selection diagrams (with ) Exploit Causality Theory
1 2 3 4
9
We introduce a novel graphical decomposition of the observed/learned distribution into factors that take into account the latent structure, which generalizes C- components (Tian & Pearl 2002), and is suitable to reason about distributions with different sets of measured variables.
9
1
We introduce a novel graphical decomposition of the observed/learned distribution into factors that take into account the latent structure, which generalizes C- components (Tian & Pearl 2002), and is suitable to reason about distributions with different sets of measured variables. We derive a complete algorithm that determines if a distribution P*(y|x) can be uniquely identified from distributions P(v) and P*(w) (W ⊆ V) based on the assumptions encoded in graphs corresponding to the source and target domains.
9
1 2
We introduce a novel graphical decomposition of the observed/learned distribution into factors that take into account the latent structure, which generalizes C- components (Tian & Pearl 2002), and is suitable to reason about distributions with different sets of measured variables. We derive a complete algorithm that determines if a distribution P*(y|x) can be uniquely identified from distributions P(v) and P*(w) (W ⊆ V) based on the assumptions encoded in graphs corresponding to the source and target domains. We connect this problem with the problem of identifying the effect of stochastic plans and how it reduces to the former problem.
9
1 2 3
10
10
X Y Z U
10
P(v) = ∏
i
P(vi|pai) = P(x|u)P(z|x)P(y|z, u)P(u)
(where V is the set of all observable variables)
X Y Z U
10
X Y Z U
P(v) = ∏
i
P(vi|pai) = P(x|u)P(z|x)P(y|z, u)P(u)
(where V is the set of all observable variables)
X Y Z U
10
X Y Z U
P(v) = ∏
i
P(vi|pai) = P(x|u)P(z|x)P(y|z, u)P(u)
(where V is the set of all observable variables)
P(v) = ∑
u
P(x, z, y, u) = ∑
u
P(x|u)P(z|x)P(y|z, u)P(u)
X Y Z U
10
X Y Z U
P(v) = ∏
i
P(vi|pai) = P(x|u)P(z|x)P(y|z, u)P(u)
(where V is the set of all observable variables)
P(v) = ∑
u
P(x, z, y, u) = ∑
u
P(x|u)P(z|x)P(y|z, u)P(u) = P(z|x)(∑
u
P(x|u)P(y|z, u)P(u))
X Y Z U
= P(x)P(z|x)(∑
x′
P(y|z, x′)P(x′))
10
X Y Z U
P(v) = ∏
i
P(vi|pai) = P(x|u)P(z|x)P(y|z, u)P(u)
(where V is the set of all observable variables)
P(v) = ∑
u
P(x, z, y, u) = ∑
u
P(x|u)P(z|x)P(y|z, u)P(u) = P(z|x)(∑
u
P(x|u)P(y|z, u)P(u))
Causal Inference tools give us the means to identify some factors involving latent variables from
X Y Z U
= P(x)P(z|x)(∑
x′
P(y|z, x′)P(x′))
10
X Y Z U
P(v) = ∏
i
P(vi|pai) = P(x|u)P(z|x)P(y|z, u)P(u)
(where V is the set of all observable variables)
P(v) = ∑
u
P(x, z, y, u) = ∑
u
P(x|u)P(z|x)P(y|z, u)P(u) = P(z|x)(∑
u
P(x|u)P(y|z, u)P(u))
Causal Inference tools give us the means to identify some factors involving latent variables from
X Y Z U
Q[Y]
11
11
Z Y F X A B D Source (𝚸)
11
Z Y F X A B D Source (𝚸) Z Y F X A B D Target (𝚸*)
P(b,z,f,d,x,a,y) and P*(x,a), it can be written as
11
Z Y F X A B D Source (𝚸) Z Y F X A B D Target (𝚸*) Z Y F X A B D Needed factors (𝚸*)
P(b,z,f,d,x,a,y) and P*(x,a), it can be written as
11
Z Y F X A B D Source (𝚸) Z Y F X A B D Target (𝚸*) Z Y F X A B D Needed factors (𝚸*)
P*(y|x, z) = ∑
a,d
Q*[A, X]Q[D]Q[Y]/ ∑
a,d,y
Q*[A, X]Q[D]Q[Y]
P(b,z,f,d,x,a,y) and P*(x,a), it can be written as
11
Z Y F X A B D Source (𝚸) Z Y F X A B D Target (𝚸*) Z Y F X A B D Needed factors (𝚸*)
P*(y|x, z) = ∑
a,d
Q*[A, X]Q[D]Q[Y]/ ∑
a,d,y
Q*[A, X]Q[D]Q[Y] P*(y|x, z) = ∑
a
P*(a|x)∑
d
P(d|z)∑
z′
P(y|x, z′, d, a)P(z′)
12
Key observation. If the source environment corresponds to the current system, and the target environment corresponds to the source after an intervention, then transporting the distribution P*(y) is the same as identifying the effect of the intervention on an outcome Y.
12
Key observation. If the source environment corresponds to the current system, and the target environment corresponds to the source after an intervention, then transporting the distribution P*(y) is the same as identifying the effect of the intervention on an outcome Y.
12
X Y W
(tutoring) (GPA) (previous GPA)
Z
(motivation)
Students get tutoring on their own volition based
Key observation. If the source environment corresponds to the current system, and the target environment corresponds to the source after an intervention, then transporting the distribution P*(y) is the same as identifying the effect of the intervention on an outcome Y.
12
X Y W
(tutoring) (GPA) (previous GPA)
Z
(motivation)
X Y W
(tutoring) (GPA) (previous GPA)
Z
(motivation) σX Intervention σX
Assign tutoring only to students with low GPA.
Students get tutoring on their own volition based
Key observation. If the source environment corresponds to the current system, and the target environment corresponds to the source after an intervention, then transporting the distribution P*(y) is the same as identifying the effect of the intervention on an outcome Y.
12
X Y W
(tutoring) (GPA) (previous GPA)
Z
(motivation)
X Y W
(tutoring) (GPA) (previous GPA)
Z
(motivation) σX Intervention σX
Assign tutoring only to students with low GPA.
Students get tutoring on their own volition based
P*(y) represents the effect of on Y.
σX
13
probability distributions across different, but related environments.
13
probability distributions across different, but related environments.
transportable from observations in a source domain and partial measurements in the target domain, following the assumptions encoded in graphical models representing the data generating process in the domains.
13
probability distributions across different, but related environments.
transportable from observations in a source domain and partial measurements in the target domain, following the assumptions encoded in graphical models representing the data generating process in the domains.
interventions.
13
14