SLIDE 1 Intro DAG Fix Do Exercise Roy Conclusion
The Roy Model and Pearl’s Do Calculus: What “Do” Cannot Do
James Heckman University of Chicago Rodrigo Pinto University of Chicago Econ 312, Spring 2019
James Heckman Roy Does Not
SLIDE 2 Intro DAG Fix Do Exercise Roy Conclusion
- 1. Causal Effects and the Do-calculus
James Heckman Roy Does Not
SLIDE 3 Intro DAG Fix Do Exercise Roy Conclusion
Identification of Treatment Effects of a DAG Pearl’s Do-Calculus:
1
Purpose: Identify casual effects from non-experimental data.
James Heckman Roy Does Not
SLIDE 4 Intro DAG Fix Do Exercise Roy Conclusion
Identification of Treatment Effects of a DAG Pearl’s Do-Calculus:
1
Purpose: Identify casual effects from non-experimental data.
2
Application: Bayesian network structure, i.e., Directed Acyclic Graph (DAG) that represents causal relationships.
James Heckman Roy Does Not
SLIDE 5 Intro DAG Fix Do Exercise Roy Conclusion
Identification of Treatment Effects of a DAG Pearl’s Do-Calculus:
1
Purpose: Identify casual effects from non-experimental data.
2
Application: Bayesian network structure, i.e., Directed Acyclic Graph (DAG) that represents causal relationships.
3
Tools: Three inference rules that translate graphical relations
- f a DAG into causal independence conditional relations (Pearl
1995, and Pearl 2000).
James Heckman Roy Does Not
SLIDE 6 Intro DAG Fix Do Exercise Roy Conclusion
Identification of Treatment Effects of a DAG Pearl’s Do-Calculus:
1
Purpose: Identify casual effects from non-experimental data.
2
Application: Bayesian network structure, i.e., Directed Acyclic Graph (DAG) that represents causal relationships.
3
Tools: Three inference rules that translate graphical relations
- f a DAG into causal independence conditional relations (Pearl
1995, and Pearl 2000).
4
Identification method: Iteration of do-calculus rules to generate a function that describes treatment effects statistics as a function of the observed variables only (Tian and Pearl 2002, Tian and Pearl 2003).
James Heckman Roy Does Not
SLIDE 7 Intro DAG Fix Do Exercise Roy Conclusion
Characteristics of Pearl’s Do-Calculus Completeness If some causal effect of a DAG is identifiable, then there exists a sequence of application of the Do-Calculus rules that can generate a formula that translates causal effects into an equation that only relies on observational quantities (Huang and Valtorta 2006, Shpitser and Pearl 2006). Limitation Only works for DAGs. Does not allow for additional information outside the DAG framework that could generate identification of causal distributions. Only applies to the information content of a DAG.
James Heckman Roy Does Not
SLIDE 8 Intro DAG Fix Do Exercise Roy Conclusion
- 2. Statistical Tools for DAGs
James Heckman Roy Does Not
SLIDE 9 Intro DAG Fix Do Exercise Roy Conclusion
Markovian Model A Markovian Model (Tian and Pearl, 2003) is defined by four elements: M =
- N, U, G, P(Vi|pa(Vi))
- where:
1
N = {N1, . . . , Mm} is a set of observed variables;
2
U = {U1, . . . , Un} is a set of unobserved variables;
3
G is a direct acyclic graph with nodes corresponding to the variables Vi in V = N ∪ U;
4
V comprises both observed and unobserved variables.
5
P(Vi|pa(Vi)) is the conditional probability of a variable Vi ∈ V given its parents pa(Vi) ⊂ V .
James Heckman Roy Does Not
SLIDE 10 Intro DAG Fix Do Exercise Roy Conclusion
Factorization of a Markovian Model Joint Probability Pr(V1, . . . , Vn+m) can be factorized as: Pr(V1, . . . Vn+m) =
Pr(Vi|pa(Vi)) Causal Interpretation – Structural Equations Vi = f (pa(Vi)) e.g. Y = f (X, U) Direction: Variables pa(Vi) cause Vi (e.g. X, U cause Y ) and not the contrary. Autonomy: Structural equation f is a deterministic function (y = f (x, u)) that is invariant to changes in x or u (Frisch, 1938).
James Heckman Roy Does Not
SLIDE 11 Intro DAG Fix Do Exercise Roy Conclusion
Statistical Tools of a DAG Bayesian Networks, Howard and Matheson (1981) Some Notation
1
Parents: Variables pa(Y ) ⊂ V are termed parents of Y ∈ V .
D(Y ) = ∪|V |
j=1pa−j(T), where pa−(k+1)(G) = pa−1(par−k(G)), pa−1(G) = ∪T∈Gpa−1(T)
such that G ⊂ V and pa−1(T) = {Y ∈ V ; T ∈ pa(Y )}.
James Heckman Roy Does Not
SLIDE 12 Intro DAG Fix Do Exercise Roy Conclusion
Statistical Tools of a DAG Bayesian Networks, Howard and Matheson (1981) Some Notation
1
Parents: Variables pa(Y ) ⊂ V are termed parents of Y ∈ V .
If pa(Y ) = ∅, the Y is not caused by any variable in the model.
D(Y ) = ∪|V |
j=1pa−j(T), where pa−(k+1)(G) = pa−1(par−k(G)), pa−1(G) = ∪T∈Gpa−1(T)
such that G ⊂ V and pa−1(T) = {Y ∈ V ; T ∈ pa(Y )}.
James Heckman Roy Does Not
SLIDE 13 Intro DAG Fix Do Exercise Roy Conclusion
Statistical Tools of a DAG Bayesian Networks, Howard and Matheson (1981) Some Notation
1
Parents: Variables pa(Y ) ⊂ V are termed parents of Y ∈ V .
If pa(Y ) = ∅, the Y is not caused by any variable in the model. In such cases, Y is termed external variable.
D(Y ) = ∪|V |
j=1pa−j(T), where pa−(k+1)(G) = pa−1(par−k(G)), pa−1(G) = ∪T∈Gpa−1(T)
such that G ⊂ V and pa−1(T) = {Y ∈ V ; T ∈ pa(Y )}.
James Heckman Roy Does Not
SLIDE 14 Intro DAG Fix Do Exercise Roy Conclusion
Statistical Tools of a DAG Bayesian Networks, Howard and Matheson (1981) Some Notation
1
Parents: Variables pa(Y ) ⊂ V are termed parents of Y ∈ V .
If pa(Y ) = ∅, the Y is not caused by any variable in the model. In such cases, Y is termed external variable.
2
Descendants: Variables D(Y ) ⊂ V that are caused by Y (directly or indirectly) are termed descendants of Y ∈ V .
D(Y ) = ∪|V |
j=1pa−j(T), where pa−(k+1)(G) = pa−1(par−k(G)), pa−1(G) = ∪T∈Gpa−1(T)
such that G ⊂ V and pa−1(T) = {Y ∈ V ; T ∈ pa(Y )}.
James Heckman Roy Does Not
SLIDE 15 Intro DAG Fix Do Exercise Roy Conclusion
- 3. Fixing versus Conditioning
James Heckman Roy Does Not
SLIDE 16 Intro DAG Fix Do Exercise Roy Conclusion
Pearl’s Definition of Causal Effects, the Do-operator
The Do-operator is based on the Truncated Factorization of the probability factor of the fixed variable is deleted: Let X ⊂ V : Then Pr(V (x) = v) = Pr(V1 = v1, . . . , Vm+n = vm+n, |do(X) = x) and: Pr(V (x) = v) =
Vi∈V \X P(Vi = vi|pa(Vi))
if v is consistent with x; if v is inconsistent with x.
James Heckman Roy Does Not
SLIDE 17
Example of the Do-operator
X Z Y
Variables: Y , X, Z Factorization: Pr(Y , X, Z) = Pr(Y |Z, X) Pr(X|Z) Pr(Z) = Pr(Y |X) Pr(X|Z) Pr(Z) Do-operator: Pr(Z, Y |do(X) = x) = Pr(Y |X = x) Pr(Z) Conditional operator: Pr(Y , Z|X = x) = Pr(Y |Z, X = x) Pr(X|Z, X = x) Pr(Z|X = x) = Pr(Y |X = x) Pr(Z|X = x) Do-operator targets variables, not causal links.
SLIDE 18
Example of the Do-operator
X Y U V
Variables: Y , X, U, V Factorization: Pr(V , U, X, Y ) = Pr(Y |U, X) Pr(X|V ) Pr(U|V ) Pr(V ) Do-operator: Pr(V , U, Y |do(X) = x) = Pr(Y |U, X = x) Pr(U|V ) Pr(V ) Conditional operator: Pr(V , U, Y |X = x) = Pr(Y |U, V , X = x) Pr(U|V , X = x) Pr(V |X = x) = Pr(Y |U, X = x) Pr(U|V ) Pr(V |X = x)
SLIDE 19 Intro DAG Fix Do Exercise Roy Conclusion
Empirical and Hypothetical Models Heckman and Pinto (2013)
Empirical Model Hypothetical Model
X Y U V
X Y U V X ~
James Heckman Roy Does Not
SLIDE 20
Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation;
SLIDE 21
Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
SLIDE 22 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
SLIDE 23 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
2
Observed data is generated through empirical model;
SLIDE 24 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
2
Observed data is generated through empirical model;
Distinguish definition from identification;
SLIDE 25 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
2
Observed data is generated through empirical model;
Distinguish definition from identification;
1
Identification requires to connect the hypothetical and empirical models such that
SLIDE 26 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
2
Observed data is generated through empirical model;
Distinguish definition from identification;
1
Identification requires to connect the hypothetical and empirical models such that
2
allows us to evaluate causal parameters defined of the hypothetical model using data generated by the empirical model
SLIDE 27 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
2
Observed data is generated through empirical model;
Distinguish definition from identification;
1
Identification requires to connect the hypothetical and empirical models such that
2
allows us to evaluate causal parameters defined of the hypothetical model using data generated by the empirical model
Versatility: Targets causal links, not variables
SLIDE 28 Benefits of a Hypothetical Model Formalizes Haavelmo insight of Hypothetical variation; Clarifies the definition of causal parameters;
1
Causal parameters are defined under the hypothetical model;
2
Observed data is generated through empirical model;
Distinguish definition from identification;
1
Identification requires to connect the hypothetical and empirical models such that
2
allows us to evaluate causal parameters defined of the hypothetical model using data generated by the empirical model
Versatility: Targets causal links, not variables Completeness : Automatically generates the Do-calculus (Pinto (2013))
SLIDE 29 Benefits of a Hypothetical Model Most Important Fixing in the empirical model is translated to statistical conditioning in the hypothetical model: EE(Y (x))
- Causal Operation Empirical Model
= EH(Y | ˜ X = x)
- Statistical Operation Hypothetical Model
do-Operator and Statistical Conditioning Let ˜ X be the hypothetical variable in GH associated with variable X in the empirical model GE, such that ChH( ˜ X) = ChE(X), then: PH(TE \ {X}| ˜ X = x) = PE(TE \ {X}|do(X) = x).
SLIDE 30 Connecting Hypothetical and Empirical Models
1
L-1: Let W , Z be any disjoint set of variables in TE \ DH( ˜ X) then: PH(W |Z) = PH(W |Z, ˜ X) = PE(W |Z)∀{W , Z} ⊂ TE\DH( ˜ X).
2
T-1: Let W , Z be any disjoint set of variables in TE then: PH(W |Z, X = x, ˜ X = x) = PE(W |Z, X = x) ∀ {W , Z} ⊂ TE.
3
C-1: Let ˜ X be uniformly distributed in the support of X and let W , Z be any disjoint set of variables in TE then: PH(W |Z, X = ˜ X) = PE(W |Z) ∀ {W , Z} ⊂ TE.
4
Matching: Let Z, W be any disjoint set of variables in TE such that, in the hypothetical model, X ⊥ ⊥ W |(Z, ˜ X), then PH(W |Z, ˜ X = x) = PE(W |Z, X = x),
SLIDE 31 Hypothetical and Empirical Front-door Models
Empirical Model Hypothetical Model
X M U Y
X M U Y X ~ ~
Pa(U) = ∅, Pa(U) = Pa( ˜ X) = ∅, Pa(X) = {U} Pa(X) = {U} Pa(M) = {X} Pa(M) = { ˜ X} Pa(Y ) = {M, U} Pa(Y ) = {M, U}
L-2: In the Front-Door hypothetical model:
1
Y ⊥ ⊥ ˜ X|M,
2
X ⊥ ⊥ M, and
3
Y ⊥ ⊥ ˜ X|(M, X)
SLIDE 32 Using the Hypothetical Model Framework (Front-door)
PH(Y | ˜ X = x) =
PH(Y |M = m, ˜ X = x) PH(M = m| ˜ X = x) =
PH(Y |M = m) PH(M = m| ˜ X = x) =
PH(Y |X = x′, M = m) PH(X = x′|M = m)
X = x) =
PH(Y |X = x′, M = m) PH(X = x′)
X = x) =
PH(Y |X = x′, ˜ X = x′, M = m) PH(X = x′)
X = x) =
PE(Y |M, X = x′)
PE(X = x′)
- by L-1
- PE(M = m|X = x)
- by Matching
. The second equality from (1) Y ⊥ ⊥ ˜ X|M of L-2. The fourth equality from (2) X ⊥ ⊥ M of L-2. The fifth equality from (3) Y ⊥ ⊥ ˜ X|(M, X) of L-2.
SLIDE 33 Intro DAG Fix Do Exercise Roy Conclusion
- 4. The Do-Calculus (Pearl, 1995)
James Heckman Roy Does Not
SLIDE 34 Intro DAG Fix Do Exercise Roy Conclusion
Identifying Causal Effects for Markovian Models Tools needed: Rules: A set of graphical/statistical rules that convert expressions of causal inference into probability equations. Complete algorithm: an algorithm based on these rules that, for any causal effect in question, we can generate an expression involving observed conditional probabilities or report that the causal effect is not identifiable.
James Heckman Roy Does Not
SLIDE 35 Intro DAG Fix Do Exercise Roy Conclusion
Some Notation DAG Notation Let X, Y , Z be arbitrary disjoint sets of variables (nodes) in a causal graph G. GX : DAG that modifies G by deleting the arrows pointing to X. GX : DAG that modifies G by deleting arrows emerging from X. GX,Z : DAG that modifies G by deleting arrows pointing to X and emerging from Z. Probability Notation Probability of Y when X is fixed at x and Z is observed. Pr(Y |do(X) = x, Z) = Pr(Y , Z|do(X) = x) Pr(Z|do(X) = x)
James Heckman Roy Does Not
SLIDE 36
Example of a DAG G
X Z U Y
This figure represents causal relations between four variables. Arrows represent direct causal relations. Circles represent unobserved variables. Squares represent observed variables The “If” Question DAG defines causal relations among variables. It answers the question if a variable causes or is caused by any other variable.
SLIDE 37
Example of DAG Notation GX = GZ GZ X Z U Y
X Z U Y
GX,Z GX,Z
X Z U Y X Z U Y
SLIDE 38
Three Rules of Do-Calculus (Pearl, 2000) Let G be a DAG then for any disjoint sets of variables X, Y , Z, W :
Rule 1: Insertion/deletion of observations Y ⊥ ⊥ Z|(X, W ) under GX ⇒ Pr(Y |do(X), Z, W ) = Pr(Y |do(X), W ) Rule 2: Action/observation exchange Y ⊥ ⊥ Z|(X, W ) under GX,Z ⇒ Pr(Y |do(X), do(Z), W ) = Pr(Y |do(X), Z, W ) Rule 3: Insertion/deletion of actions Y ⊥ ⊥ Z|(X, W ) under GX,Z(W ) ⇒ Pr(Y |do(X), do(Z), W ) = Pr(Y |do(X), W ) where Z(W ) is the set of Z-nodes that are not ancestors of any W -node in GX.
SLIDE 39 Intro DAG Fix Do Exercise Roy Conclusion
Understanding Rules of Do-Calculus Let G be a DAG then for any disjoint sets of variables X, Y , Z, W : Rule 1: Insertion/deletion of observations If Y ⊥ ⊥ Z|(X, W )
under GX
then Pr(Y |do(X), Z, W ) = Pr(Y |do(X), W )
- Equivalent Probability Expression
James Heckman Roy Does Not
SLIDE 40 Intro DAG Fix Do Exercise Roy Conclusion
James Heckman Roy Does Not
SLIDE 41 Intro DAG Fix Do Exercise Roy Conclusion
Using the Do-Calculus : Task 1 – Compute Pr(Z|do(X)) X ⊥ ⊥ Z in GX, by Rule 2, Pr(Z|do(X)) = Pr(Z|X). G GX
X Z U Y X Z U Y
James Heckman Roy Does Not
SLIDE 42 Using the Do-Calculus : Task 2 – Compute Pr(Y |do(Z))
Z ⊥ ⊥ X in GZ, by Rule 3, Pr(X|do(Z)) = Pr(X) Z ⊥ ⊥ Y |X in GZ, by Rule 2, Pr(Y |X, do(Z)) = Pr(Y |X, Z) ∴ Pr(Y |do(Z)) =
Pr(Y |X, do(Z)) Pr(X|do(Z)) =
Pr(Y |X, Z) Pr(X) G GZ GZ X Z U Y
X Z U Y X Z U Y
SLIDE 43
Using the Do-Calculus : Task 3 – Compute Pr(Y |Z, do(X))
Y ⊥ ⊥ Z|X in GX,Z, by Rule 2, Pr(Y |Z, do(X)) = Pr(Y |do(Z), do(X)) Y ⊥ ⊥ X|Z in GX,Z, by Rule 3, Pr(Y |do(X), do(Z)) = Pr(Y |do(Z)) ∴ Pr(Y |Z, do(X)) = Pr(Y |do(Z), do(X)) = Pr(Y |do(Z)) G GX,Z GX,Z X Z U Y
X Z U Y X Z U Y
SLIDE 44 Using the Do-Calculus : Task 4 – Compute Pr(Y |do(X)) ∴ Pr(Y |do(X)) =
Pr(Y |Z, do(X)) Pr(Z|do(X)) =
Pr(Y |do(Z), do(X))
Pr(Z|do(X)) =
Pr(Y |do(Z))
Pr(Z|do(X)) =
X ′
Pr(Y |X ′, Z) Pr(X ′)
Pr(Z|X)
SLIDE 45 Intro DAG Fix Do Exercise Roy Conclusion
- 6. Do-Calculus and The Roy Model
James Heckman Roy Does Not
SLIDE 46 Intro DAG Fix Do Exercise Roy Conclusion
Generalized Roy Model The Generalized Roy Model stems from six variables:
1
V: Unobserved confounding variable V not caused by any variable;
2
X: observed pre-treatment variables X caused by V ;
3
Z: instrumental variable Z caused by X;
4
T: treatment choice T that caused by Z, V and X;
5
U: unobserved variable U caused by T, V and X;
6
Y: outcome of interest Y caused by T, U and X.
James Heckman Roy Does Not
SLIDE 47 Intro DAG Fix Do Exercise Roy Conclusion
Generalized Roy Model
Z T Y U V X
This figure represents causal relations of the Generalized Roy Model. Arrows represent direct causal relations. Circles represent unobserved variables. Squares represent observed variables
James Heckman Roy Does Not
SLIDE 48 Intro DAG Fix Do Exercise Roy Conclusion
Key Aspects of the Generalized Roy Model
1
T is caused by Z, V ;
2
U mediates the effects of V on Y (that is V causes U);
3
T and U cause Y and
4
Z (instrument) not caused by V , U and does not directly cause Y , U. We are left to examine the cases whether:
1
V causes X (or vice-versa),
2
X causes Z (or vice-versa),
3
X causes T,
4
X causes U,
5
T causes U, and
6
X causes Y . The combinations of all these causal relations generate 144 possible models (Pinto, 2013).
James Heckman Roy Does Not
SLIDE 49 Intro DAG Fix Do Exercise Roy Conclusion
Key Aspects of the Generalized Roy Model (Pinto, 2013)
Z T Y U V X
Dashed lines denote causal relations that may not exist or, if they exist, the causal direction can go either way. Dashed arrows denote causal relations that may not exist, but, if they exist, the causal direction must comply the arrow direction.
James Heckman Roy Does Not
SLIDE 50 Intro DAG Fix Do Exercise Roy Conclusion
Marginalizing the Generalized Roy Model We examine the identification of causal effects of the Generalized Roy Model using a simplified model w.l.o.g. Suppress variables X and U. This simplification is usually called marginalization in the DAG literature (Koster (2002), Lauritzen (1996), Wermuth (2011)).
James Heckman Roy Does Not
SLIDE 51 Marginalizing the Generalized Roy Model G = GZ
X Z U Y
This figure represents causal relations of the Marginalized Roy
- Model. Arrows represent direct causal relations. Circles represent
unobserved variables. Squares represent observed variables Note: Z is exogenous, thus conditioning on Z is equivalent to fixing Z.
SLIDE 52
Examining the Marginalized Roy Model – 1/4 Y ⊥ ⊥ Z in GX, by Rule 1 Pr(Y |do(X), Z) = Pr(Y |do(X)) Y ⊥ ⊥ Z, in GX,Z, by Rule 3 Pr(Y |do(X), Z) = Pr(Y |do(X)) Y ⊥ ⊥ Z|X in GX,Z, by Rule 2 Pr(Y |do(X), do(Z)) = Pr(Y |do(X), Z) GX = GX,Z = GX,Z
X Z U Y
SLIDE 53
Examining the Marginalized Roy Model – 2/4 Under GX, Y ⊥ ⊥ X, thus Rule 2 does not apply. Under GX,Z, Y ⊥ ⊥ X|Z, thus Rule 2 does not apply. GX = GX,Z
X Z U Y
SLIDE 54
Examining the Marginalized Roy Model – 3/4 GZ ⇒ Y ⊥ ⊥ Z, thus by Rule 2 Pr(Y |do(Z)) = Pr(Y |Z). GZ
X Z U Y
SLIDE 55
Examining the Marginalized Roy Model – 4 of 4 Modifications Under GX,Z, Y ⊥ ⊥ (X, Z), thus Rule 2 does not apply. GX,Z
X Z U Y
SLIDE 56 Intro DAG Fix Do Exercise Roy Conclusion
The Do-Calculus applied to the Marginalized Roy Model generates:
1
Pr(Y |do(X), do(Z)) = Pr(Y |do(X), Z) = Pr(Y |do(X)),
2
Pr(Y |do(Z)) = Pr(Y |Z) These relations only corroborate the exogeneity of the instrumental variable Z and are not sufficient to identify Pr(Y |do(X)). Identification of the Roy Model To identify the Roy Model, we make assumption on how Z impacts X, i.e. monotonicity/separability. These assumptions cannot be represented in a DAG. These assumptions are associated with properties of how Z causes X and not only if Z causes X.
James Heckman Roy Does Not