 
              Efficient Least Squares for Estimating Total Causal Effects Richard Guo, Emilija Perkovi´ c Pacific Causal Inference Conference, 2020 Department of Statistics, University of Washington, Seattle 1
Highlights 2
Highlights • We consider estimating a total causal effect from observational data . 2
Highlights • We consider estimating a total causal effect from observational data . • We assume 2
Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. 2
Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. 2
Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. 2
Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is 2
Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is • Complete : applicable whenever the effect is identified, 2
Highlights • We consider estimating a total causal effect from observational data . • We assume • Linearity : data is generated from a linear structural equation model. • Causal sufficiency : no unobserved confounding, no selection bias. • The causal DAG is known up to a Markov equivalence class with additional background knowledge. • We present a least squares estimator that is • Complete : applicable whenever the effect is identified, • Efficient : relative to a large class of estimators, which is the first of its kind in the literature ... 2
Causal DAG, linear SEM S A Z W Y T 3
Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . 3
Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . Suppose data is generated by a linear structural equation model (SEM) � X v = γ uv X u + ǫ u , E ǫ u = 0 , 0 < var ǫ u < ∞ . u : u → v 3
Causal DAG, linear SEM S A Z W Y T Suppose D is the underlying causal DAG. D is unknown . Suppose data is generated by a linear structural equation model (SEM) � X v = γ uv X u + ǫ u , E ǫ u = 0 , 0 < var ǫ u < ∞ . u : u → v Under causal sufficiency, the errors are mutually independent (no i ↔ j in the path diagram). 3
Total effect Suppose we want to estimate the total (causal) effect of A on Y . 4
Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T 4
Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T ☞ The total effect τ AY is defined as the slope of x a �→ E [ X Y | do( X A = x a )], given by a sum-product of Wright (1934): ∂ τ AY = E [ X Y | do( X A = x a )] = ( γ AZ γ ZW + γ AW ) γ WY . ∂ x a 4
Total effect Suppose we want to estimate the total (causal) effect of A on Y . S A Z W Y T ☞ The total effect τ AY is defined as the slope of x a �→ E [ X Y | do( X A = x a )], given by a sum-product of Wright (1934): ∂ τ AY = E [ X Y | do( X A = x a )] = ( γ AZ γ ZW + γ AW ) γ WY . ∂ x a Here we consider point intervention ( | A | = 1) for simplicity. For a joint intervention ( | A | > 1), total effect can be similarly defined. 4
Markov equivalence, CPDAG 5
Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . 5
Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . The Markov equivalence class of D is uniquely represented by a CPDAG/essential graph C . S A Z W Y T 5
Markov equivalence, CPDAG Without making further assumptions, the causal DAG D can only be identified from observed distribution up to a Markov equivalence class . The Markov equivalence class of D is uniquely represented by a CPDAG/essential graph C . S A Z W Y T ☞ Knowing only C is often insufficient to identify the total effect. 5
Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. 6
Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. S A Z W Y T 6
Identifiability from a partially directed graph Theorem (Perkovi´ c, 2020) The total effect τ AY is identified from a maximally oriented partially directed acyclic graph G if and only if there is no proper, possibly causal path from A to Y in G that starts with an undirected edge. S A Z W Y T ☞ In the unidentified case, see also the IDA algorithms (Maathuis, Kalisch, and B¨ uhlmann, 2009; Nandy, Maathuis, and Richardson, 2017) that enumerates possible total effects. 6
Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. 7
Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . 7
Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T 7
Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T 7
Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T The green orientations are further implied by the rules of Meek (1995). 7
Background knowledge, MPDAG However, often we have additional knowledge that can help towards identification. ☞ Suppose we know that S temporally preceeds A . S A Z W Y T The green orientations are further implied by the rules of Meek (1995). ☞ In this example, τ AY is identified from the resulting maximally oriented partially directed acyclic graph (MPDAG) G . 7
Adjustment estimator Our task is to estimate τ AY from n iid observational sample generated by a linear SEM associated with causal DAG D , given that D ∈ [ G ] for MPDAG G , τ AY is identifiable from G . S A Z W Y T MPDAG G 8
Adjustment estimator Our task is to estimate τ AY from n iid observational sample generated by a linear SEM associated with causal DAG D , given that D ∈ [ G ] for MPDAG G , τ AY is identifiable from G . S A Z W Y T MPDAG G τ adj ☞ Adjustment estimator : ˆ AY is the least squares coefficient of A from Y ∼ A + S . 8
Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. 9
Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T 9
Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator 9
Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. 9
Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. • may not be unique. 9
Adjustment estimator Adjustment Y ∼ A + S can be justified by looking at the elements of [ G ]. S S S A Z W Y A Z W Y A Z W Y T T T Adjustment estimator • may not exist when | A | > 1. • may not be unique. • The most efficient adjustment estimator is recently characterized by Henckel, Perkovi´ c, and Maathuis (2019) and Witte et al. (2020). 9
Recommend
More recommend