Foundations of Causal Discovery Frederick Eberhardt KDD Causality - PowerPoint PPT Presentation

y x x 6? ? z y 6? ? z x ⊥ ⊥ y } z sufficient to determine the equivalence class, y y x y x y x y x y x x in this case, a unique causal graph z z z z z z y y x y x y x x For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the z z z z true model is complete. y y x y x x (Pearl & Geiger 1988, Meek 1995) z z z y x y x y y x y x x z z z z z y y x y x y x y x y x x z z z z z z

Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles - weaken faithfulness 10

Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles - weaken faithfulness • Exclude the limitations (and reduce the equivalence class) - restrict to non-Gaussian error distributions - restrict to non-linear causal relations - restrict to specific discrete parameterizations 10

Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles - weaken faithfulness • Exclude the limitations (and reduce the equivalence class) - restrict to non-Gaussian error distributions - restrict to non-linear causal relations - restrict to specific discrete parameterizations • Include more general data collection set-ups (and see how assumptions can be adjusted and what equivalence class results) - experimental evidence - multiple (overlapping) data sets - relational data 10

Staying in business • Weaken the assumptions (and increase the equivalence class) - allow for unmeasured common causes - allow for cycles Zhalama talk - weaken faithfulness • Exclude the limitations (and reduce the equivalence class) - restrict to non-Gaussian error distributions Tank talk - restrict to non-linear causal relations - restrict to specific discrete parameterizations • Include more general data collection set-ups (and see how assumptions can be adjusted and what equivalence class results) - experimental evidence - multiple (overlapping) data sets - relational data 10

Limitations For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete. (Pearl & Geiger 1988, Meek 1995) 11

Linear non-Gaussian method (LiNGaM) • Linear causal relations: X � ij x j + ✏ j x i = x j ∈ Pa ( x i ) • Assumptions: - causal Markov - causal sufficiency - acyclicity [Shimizu et al., 2006] 12

Linear non-Gaussian method (LiNGaM) • Linear causal relations: X � ij x j + ✏ j x i = x j ∈ Pa ( x i ) • Assumptions: - causal Markov - causal sufficiency - acyclicity ‣ If non-Gaussian, then the true graph is uniquely identifiable ✏ j ∼ from the joint distribution. [Shimizu et al., 2006] 12

Two variable case True model ✏ y ✏ x y = � x + ✏ y y x 13

Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x 13

Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y x 13

Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y ⊥ ⊥ ˜ ✏ x y x 13

Two variable case True model ✏ y ✏ x y = � x + ✏ y x ⊥ ⊥ ✏ y y x ˜ ˜ Backwards model ✏ x ✏ y x = ✓ y + ˜ ✏ x y ⊥ ⊥ ˜ ✏ x y x ˜ = ✏ x x − ✓ y = x − ✓ ( � x + ✏ y ) = (1 − ✓� ) x − ✓✏ y 13

Why Normals are unusual ✏ x ✏ y y = � x + ✏ y Forwards model ? ˜ ✏ x = (1 − ✓� ) x − ✓✏ y y For backwards model x 14

Why Normals are unusual ✏ x ✏ y y = � x + ✏ y Forwards model ? ˜ ✏ x = (1 − ✓� ) x − ✓✏ y y For backwards model x Theorem 1 (Darmois-Skitovich) Let X 1 , . . . , X n be independent, non-degenerate random variables. If for two linear combinations l 1 a 1 X 1 + . . . + a n X n , a i 6 = 0 = b i 6 = 0 l 2 = b 1 X 1 + . . . + b n X n , are independent, then each X i is normally distributed. 14

algorithm/ assumption Markov faithfulness causal sufficiency acyclicity parametric assumption output 15

algorithm/ PC / GES FCI CCD assumption ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ causal ✓ ✓ ✗ sufficiency acyclicity ✓ ✓ ✗ parametric ✗ ✗ ✗ assumption Markov output PAG PAG equivalence 15

algorithm/ cyclic PC / GES FCI CCD LiNGaM lvLiNGaM assumption LiNGaM ✓ ✓ ✓ ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ ✓ ~ ✗ causal ✓ ✓ ✓ ✓ ✗ ✗ sufficiency acyclicity ✓ ✓ ✓ ✓ ✗ ✗ parametric linear non- linear non- linear non- ✗ ✗ ✗ assumption Gaussian Gaussian Gaussian Markov set of output PAG PAG unique DAG set of graphs equivalence DAGs 15

Bivariate Linear Gaussian case True model x = ✏ x ✏ x , ✏ y ∼ indep. Gaussian y = x + ✏ y a b c 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -5 0 5 -3 0 3 x y x Forwards Backwards (true) model model (graphics from Hoyer et al. 2009) 18

Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j 19

Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j • If is linear, then non-Gaussian errors are required for f j ( . ) identifiability 19

Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j • If is linear, then non-Gaussian errors are required for f j ( . ) identifiability ➡ What if the errors are Gaussian, but is non-linear? f j ( . ) 19

Continuous additive noise models x j = f j ( pa ( x j )) + ✏ j • If is linear, then non-Gaussian errors are required for f j ( . ) identifiability ➡ What if the errors are Gaussian, but is non-linear? f j ( . ) ➡ More generally, under what circumstances is the causal structure represented by this class of models identifiable? 19

Bivariate non-linear Gaussian additive noise model True model ✏ x , ✏ y ∼ indep. Gaussian x = ✏ x y = x + x 3 + ✏ y d e f 5 p ( y | x ) p ( x | y ) y 0 -5 -5 0 5 -3 0 3 -5 0 5 y x x Forwards Backwards (true) model model x = g ( y ) + ˜ ✏ x y \ ⊥ ⊥ ˜ ✏ x (graphics from Hoyer et al. 2009) 20

General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. 21

General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . 21

General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . 21

General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . - this generalizes to multiple variables (assuming minimality*)! 21

General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . - this generalizes to multiple variables (assuming minimality*)! - extension to discrete additive noise models 21

General non-linear additive noise models Hoyer Condition (HC) : Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model. • If the error terms are Gaussian , then the only functional form that satisfies HC is linearity , otherwise the model is identifiable . • If the errors are non-Gaussian , then there are (rather contrived) functions that satisfy HC, but in general identifiability is guaranteed . - this generalizes to multiple variables (assuming minimality*)! - extension to discrete additive noise models • If the function is linear , but the error terms non-Gaussian , then one can’t fit a linear backwards model (Lingam), but there are cases where one can fit a non-linear backwards model 21

algorithm/ cyclic PC / GES FCI CCD LiNGaM lvLiNGaM assumptions LiNGaM ✓ ✓ ✓ ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ ✓ ~ ✗ causal ✓ ✓ ✓ ✓ ✗ ✗ sufficiency acyclicity ✓ ✓ ✓ ✓ ✗ ✗ parametric linear non- linear non- linear non- ✗ ✗ ✗ assumption Gaussian Gaussian Gaussian Markov unique set of set of output PAG PAG equivalence DAG DAGs graphs 22

algorithm/ cyclic non-linear PC / GES FCI CCD LiNGaM lvLiNGaM assumptions LiNGaM additive noise ✓ ✓ ✓ ✓ ✓ ✓ ✓ Markov faithfulness ✓ ✓ ✓ ✓ ~ minimality ✗ causal ✓ ✓ ✓ ✓ ✓ ✗ ✗ sufficiency acyclicity ✓ ✓ ✓ ✓ ✓ ✗ ✗ parametric linear non- linear non- linear non- non-linear ✗ ✗ ✗ assumption Gaussian Gaussian Gaussian additive noise Markov unique set of set of output PAG PAG unique DAG equivalence DAG DAGs graphs 22

Experiments, Background Knowledge and all the other Jazz 23

Experiments, Background Knowledge and all the other Jazz • how to integrate data from experiments? 23

Experiments, Background Knowledge and all the other Jazz y x • how to integrate data from experiments? l 2 l 1 z 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x w pathways 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x x z < y w w pathways tier orderings 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions biological settings 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions time biological settings subsampled time series 23

Experiments, Background Knowledge and all the other Jazz ⇒ y x • how to integrate data from experiments? l 2 l 1 z • how to include background knowledge? x y x x z < y w z w w pathways tier orderings “priors” • specific search space restrictions time Tank talk biological settings subsampled time series 23

High-Level 24

High-Level data sample x y z w samples w x y samples 24

High-Level data sample x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24

High-Level assumptions, e.g. data • causal Markov sample • causal faithfulness • etc. x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24

background High-Level knowledge, e.g. • pathways assumptions, e.g. • tier ordering data • causal Markov • “priors” sample • causal faithfulness • etc. • etc. x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24

background High-Level knowledge, e.g. • pathways setting assumptions, e.g. • tier ordering data • time series • causal Markov • “priors” sample • internal latent structures • causal faithfulness • etc. • etc. • etc. x y z w } samples (in)dependence constraints w x y ? y | C || J x 6? samples 24

background High-Level knowledge, e.g. • pathways setting assumptions, e.g. • tier ordering data • time series • causal Markov • “priors” sample • internal latent structures • causal faithfulness • etc. • etc. • etc. x y z w } samples (in)dependence Encode these as constraints logical constraints on w x y ? y | C || J x 6? the underlying graph samples structure 24

background High-Level knowledge, e.g. • pathways setting assumptions, e.g. • tier ordering data • time series • causal Markov • “priors” sample • internal latent structures • causal faithfulness • etc. • etc. • etc. x y z w } samples (max) SAT-solver (in)dependence Encode these as constraints logical constraints on w x y ? y | C || J x 6? the underlying graph samples structure 24

SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic 25

SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. 25

SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. • Find satisfying assignments A = false y x using a SAT-solver B = false ⇐ ⇒ z ... 25

SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. • Find satisfying assignments A = false y x using a SAT-solver B = false ⇐ ⇒ z ... ➡ very general setting (allows for cycles and latents) and trivially complete 25

SAT -based Causal Discovery • Formulate the independence x ⊥ ⊥ y ⇐ ⇒ ¬ A ∧ ¬ B . . . constraints in propositional A = ‘ x → y is present’ logic • Encode the constraints into ¬ A ∧ ¬ B ∧ ¬ ( C ∧ D ) ∧ ¬ ... one formula. • Find satisfying assignments A = false y x using a SAT-solver B = false ⇐ ⇒ z ... ➡ very general setting (allows for cycles and latents) and trivially complete ➡ BUT : erroneous test results induce conflicting constraints: UNsatisfiable 25

Conflicts and Errors • Statistical independence tests produce errors ➡ Conflict: no graph can produce the set of constraints constraints y x x 6? ? y x ⊥ ⊥ y z x 6? ? z z y 6? ? z y x ⊥ y | z ? y | z x 6? x ⊥ z z 26

Foundations of Causal Discovery Frederick Eberhardt KDD Causality - PowerPoint PPT Presentation

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery data sample x y z w samples 2 Causal Discovery assumptions, e.g. causal Markov causal faithfulness functional form etc.

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

CAUSAL DISCOVERY CAUSAL DISCOVERY Beware of the DAG! Beware of the DAG! Philip Dawid

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Benchmarks, wikis, and open-source causal discovery Patrik O. Hoyer Univ. of Helsinki Finland

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

recap to this point foundations foundations foundations foundations genetics =

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Causal and Non-Causal Feature Selection for Ridge Regression Gavin Cawley School of Computing

Empirical Methods in Natural Language Processing Lecture 1 Introduction (I): Words and

Apache SystemML - Declarative Large-Scale Machine Learning Romeo Kienzler (IBM Waston IoT)

Cant We All Just Get Along? 233 East Redwood Street Todd R. Chason Baltimore, MD 21202

SETTING UP FOR SUCCESS: 15 THINGS NEW NONPROFITS SHOULD KNOW Mission Mission 1. Create a

1. Were the Founding Fathers mostly agnostics, deists, and secularists? 2. Is there any sense in

17/03/2020 George in 1680 1706 1 2 Statue in Hanover Georges brother Maximilian Wilhelm

CSC165 Week 7 Larry Zhang, October 21, 2014 announcements A1 marks on MarkUs class

Choice-free Stone duality Wesley H. Holliday University of California, Berkeley Joint work with

Sambuz

Useful Links

Newsletter

Mail Us