Learning Causal Structures via Gradient-Based Optimization Sbastien - PowerPoint PPT Presentation

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Learning Causal Structures via Gradient-Based Optimization Sébastien Lachapelle Mila, Université de Montréal March 4th, 2020 Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 1 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Overview Causality Framework Causal Graphical Models Motivating example Markov Equivalence and Structure Identifiability Causal Structure Learning Problem formulation Discrete Search Algorithms Gradient-Based Algorithms GraN-DAG & extensions The algorithm With interventional data Neural Autoregressive Flows Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 2 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Causal graphical models (CGM) Simple example Random vector X ∈ R d ( d variables) G = ( V , E ) Let G be a directed acyclic graph (DAG) Assume p ( x ) = � d i = 1 p ( x i | x π G i ) π G = parents of i in G i Encodes (conditional) independence statements (via d-separation , see [Koller & Friedman, 2009]) p ( x , y , z ) = p ( x ) p ( z | x ) p ( y | z ) Almost identical to Bayesian Networks but allows = ⇒ p ( x , y | z ) = p ( x | z ) p ( y | z ) for interventional distributions: p ( x | do ( z )) = Y | Z i.e. X | The do operator will be explained in the following example... Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 3 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } (Example taken from Element of Causal Inference by Peters et al. p111) Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 4 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment Pay attention to these two questions... Assuming the size of your stone is unknown... Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 5 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment Pay attention to these two questions... Assuming the size of your stone is unknown... What is your chance of recovery knowing that the doctor gave you treatment A? What is your chance of recovery if you decide to take treatment A? Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 5 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } What is your chance of recovery knowing that the doctor gave you treatment A? Knowing that your doctor gave you treatment A tells you that you probably have a large kidney stone ... P ( Z = large | T = A ) = 0 . 75 ... which reduces your chance of recovery P ( R = 1 | T = A , Z = large ) = 0 . 73 < 0 . 93 = P ( R = 1 | T = A , Z = small ) Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 6 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } What is your chance of recovery knowing that the doctor gave you treatment A? Knowing that your doctor gave you treatment A tells you that you probably have a large kidney stone ... P ( Z = large | T = A ) = 0 . 75 ... which reduces your chance of recovery P ( R = 1 | T = A , Z = large ) = 0 . 73 < 0 . 93 = P ( R = 1 | T = A , Z = small ) What is your chance of recovery if you decide to take treatment A? Your really don’t know anything about your kidney stone You taking treatment A is not a function of any variable Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 6 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } What is your chance of recovery knowing that the doctor gave you treatment A? P ( R = 1 | T = A ) = 0 , 78 P ( R = 1 | T = B ) = 0,83 Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 7 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } What is your chance of recovery knowing that the doctor gave you treatment A? P ( R = 1 | T = A ) = 0 , 78 P ( R = 1 | T = B ) = 0,83 What is your chance of recovery if you decide to take treatment A? P ( R = 1 | do ( T = A )) = 0,832 P ( R = 1 | do ( T = B )) = 0 , 782 Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 7 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } What is your chance of recovery knowing that the doctor gave you treatment A? P ( R = 1 | T = A ) = 0 , 78 P ( R = 1 | T = B ) = 0,83 What is your chance of recovery if you decide to take treatment A? P ( R = 1 | do ( T = A )) = 0,832 P ( R = 1 | do ( T = B )) = 0 , 782 But how do we compute these interventional distributions ?! Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 7 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Why should you care: Kidney Stone Treatment T = Treatment ∈ { A , B } Z = Stone size ∈ { small , large } R = Patient recovered ∈ { 0 , 1 } P ( R , Z | do ( T = A )) = P ( R | Z , T = A ) P ( T = A | Z ) P ( Z ) � �� The decision of taking treatment A does not depend on Z anymore Then simply marginalize as usual: � P ( R = 1 | do ( T = A )) = P ( R = 1 , Z | do ( T = A )) Z � P ( R = 1 | Z , T = A ) P ( Z ) = 0 , 832 = Z Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 8 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Structure Learning In the kidney stone example, the causal graph was known What if we don’t have it? Learn it! Purely observational data X 1 X 2 X 3 sample 1 1.76 10.46 0.002 sample2 3.42 78.6 0.011 ... ... sample n 4.56 9.35 1.96 Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 9 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Structure Learning In the kidney stone example, the causal graph was known What if we don’t have it? Learn it! Purely observational data X 1 X 2 X 3 sample 1 1.76 10.46 0.002 sample2 3.42 78.6 0.011 ... ... sample n 4.56 9.35 1.96 Is it even possible? Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 9 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Identifiability In general, this is impossible without interventional data... Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 10 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Identifiability In general, this is impossible without interventional data... Multiple DAGs can express the same distribution... Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 10 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Identifiability If we assume causal mechanisms are "simple", then G can be identified... Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 11 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Identifiability If we assume causal mechanisms are "simple", then G can be identified... An example (useful later!) If data follows this model... i ) , σ 2 X i | X π G ∼ N ( f i ( X π G i ) i ...then correct causal DAG G can be identified from purely observational data (see [Peters et al., 2014] for proof and regularity conditions) Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 11 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Structure Learning X 1 X 2 X 3 sample 1 1.76 10.46 0.002 sample2 3.42 78.6 0.011 ... ... sample n 4.56 9.35 1.96 Score-based algorithms ˆ G = arg max Score ( G ) G∈ DAG Often, Score ( G ) = regularized maximum likelihood under G Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 12 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Structure Learning Taxonomy of score-based algorithms (non-exhaustive) Discrete optim. Continuous optim. GES NOTEARS Linear [Chickering, 2003] [Zheng et al., 2018] CAM GraN-DAG Nonlinear [Bühlmann et al., 2014] [Lachapelle et al., 2020] Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 13 / 40

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Structure Learning Taxonomy of score-based algorithms (non-exhaustive) Discrete optim. Continuous optim. GES NOTEARS Linear [Chickering, 2003] [Zheng et al., 2018] CAM GraN-DAG Nonlinear [Bühlmann et al., 2014] [Lachapelle et al., 2020] Sébastien Lachapelle Mila EAI Science Talk March 4th, 2020 14 / 40

Learning Causal Structures via Gradient-Based Optimization Sbastien - PowerPoint PPT Presentation

Overview Causality Framework Structure Learning GraN-DAG & ext. Conclusion Learning Causal Structures via Gradient-Based Optimization Sbastien Lachapelle Mila, Universit de Montral March 4th, 2020 Sbastien Lachapelle Mila EAI

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Gradient Analysis NMDS Indirect Gradient Analysis NMDS Direct Gradient Analysis Objective:

Conjugate Gradient (CG) Majid Lesani Alireza Masoum Overview Backpropagation Gradient

Applied Machine Learning Gradient Descent Methods Siamak Ravanbakhsh COMP 551 (Fall 2020)

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

CS 6316 Machine Learning Gradient Descent Yangfeng Ji Department of Computer Science University

CSC2541 Lecture 5 Natural Gradient Roger Grosse Roger Grosse CSC2541 Lecture 5 Natural Gradient

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Randomized Experiments The goal of randomized experiments is to identify The causal

Making Results Handling Safer Delivering the Scottish Patient Safety Programme in Primary Care

Practical approaches to undertaking research priority setting in health Anneliese Synnot,

To help re-establish context for anyone involved subsequent

Purely Functional Data Structures and Monoids Donnacha Ois n Kidney May 9, 2020 1 Purely

Board of Governors Meeting Via Teleconference/Webinar August 20, 2019 12:00 PM 1:30 PM ET 1

Bayesian Networks (= Belief Networks) Sven Koenig, USC Russell and Norvig, 3 rd Edition, Sections

Computational materials science: From needle crystals to complex polycrystalline forms L.

STAT 215 Hypothesis Testing I Colin Reimer Dawson Oberlin College September 7, 2017 1 / 14