Causal Inference Theory and Applications Dr. Matthias Uflacker, - PowerPoint PPT Presentation

Causal Inference – Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher Schmidt April 24, 2018

Agenda April 24, 2018 Jupyter Notebook „Causal Inference in Application“ ■ Recap Causal Inference in a Nutshell ■ Introduction to Structural Causal Models ■ Preliminaries 1. Structural Causal Models 2. (Local) Markov Condition 3. Factorization 4. Global Markov Condition 5. Functional Model and Markov conditions 6. Causal Inference Faithfulness 7. - Theory and Applications Constraint-based Causal Inference 8. Uflacker, Huegle, Markov Equivalence Class 9. Schmidt Summary 10. Slide 2 Excursion: Maximal Ancestral Graphs 11.

Jupyter Notebook “Causal Inference in Application”

Jupyter Notebook Causal Inference in Application Causal Inference - Theory and Applications Uflacker, Huegle, Schmidt Slide 4

Jupyter Notebook Access Information System Link will be provided via email once we have the list of participants! Procedure 1. Login via LDAP (standard HPI credentials) 2. Use folder Causal Inference – Theory and Applications 3. We provide a Master Notebook Please use as a read only resource Copy relevant information into your Causal Inference local workspace - Theory and 4. Your local workspace either in your Applications home directory or as a separate Uflacker, Huegle, Schmidt folder in our courses’ folder 5. Let us know if you require new packages Slide 5

Causal Inference in a Nutshell

Causal Inference in a Nutshell Recap: The Concept Traditional Statistical Paradigm of Structural Inference Paradigm Causal Models Data Generating Aspects of 𝑯 Model Aspects of 𝑸 Joint Distribution Causal Inference Inference Inference - Theory and Data Applications Uflacker, Huegle, E.g., what is the sailors’ probability of E.g., what is the sailors’ probability of Schmidt recovery when we see a treatment with recovery if we do treat them with lemons? lemons? Slide 7 𝑹 𝑸 = 𝑸 𝒔𝒇𝒅𝒑𝒘𝒇𝒔𝒛 𝒎𝒇𝒏𝒑𝒐𝒕 𝑹 𝑯 = 𝑸 𝒔𝒇𝒅𝒑𝒘𝒇𝒔𝒛 𝒆𝒑(𝒎𝒇𝒏𝒑𝒐𝒕)

Introduction to Structural Causal Models

Introduction to Causal Graphical Models Content Preliminaries 1. Structural Causal Models 2. (Local) Markov Condition 3. Factorization 4. Global Markov Condition 5. Functional Model and Markov conditions 6. Faithfulness 7. Constraint-based Causal Inference 8. Causal Inference Markov Equivalence Class 9. - Theory and 10. Summary Applications Uflacker, Huegle, 11. Excursion: Maximal Ancestral Graphs Schmidt Slide 9

1. Preliminaries Notation 𝐵, 𝐶 events ■ 𝑌, 𝑍, 𝑎 random variables ■ 𝑦 value of random variable ■ 𝑄𝑠 probability measure ■ 𝑌 probability distribution of 𝑌 ■ 𝑄 𝑞 density ■ 𝑞 𝑦 or 𝑞 𝑌 density of 𝑄 ■ 𝑌 Causal Inference 𝑞 𝑦 density of 𝑄 𝑌 evaluated at the point 𝑦 ■ - Theory and Applications Uflacker, Huegle, 𝑌 ⊥ 𝑍 independence of 𝑌 and 𝑍 ■ Schmidt 𝑌 ⊥ 𝑍 | 𝑎 conditional independence of 𝑌 and 𝑍 given 𝑎 ■ Slide 10

1. Preliminaries Independence of Events Two events 𝐵 and 𝐶 are called independent if ■ Pr 𝐵 ∩ 𝐶 = Pr 𝐵 ⋅ Pr 𝐶 , or - rewritten in conditional probabilities - if Pr A = 𝐵 ∩ 𝐶 = Pr 𝐵 𝐶 , 𝐶 Pr B = 𝐵 ∩ 𝐶 = Pr 𝐶 𝐵 . 𝐵 𝐵 1 , … , 𝐵 𝑜 are called (mutually) independent if for every subset 𝑇 ⊂ {1, … , 𝑜} ■ we have Causal Inference Pr ሩ 𝐵 𝑗 = ෑ Pr 𝐵 𝑗 . - Theory and Applications 𝑗∈𝑇 𝑗∈𝑇 Note: Uflacker, Huegle, ■ Schmidt for 𝑜 ≥ 3 , pairwise independence Pr 𝐵 𝑗 ∩ 𝐵 𝑘 = Pr 𝐵 𝑗 ⋅ Pr 𝐵 𝑘 for all 𝑗, 𝑘 does not imply (mutual) independence. Slide 11

1. Preliminaries Independence of Random Variables Two real-valued random variables X and 𝑍 are called independent, ■ 𝑌 ⊥ 𝑍, if for every x, 𝑧 ∈ ℝ , the events 𝑌 ≤ 𝑦 and {𝑍 ≤ 𝑧} are independent, Or, in terms of densities: for all 𝑦, 𝑧 , 𝑞 𝑦, 𝑧 = 𝑞 𝑦 𝑞 𝑧 . Note: ■ If 𝑌 ⊥ 𝑍 , then E XY = E X E[Y] , and 𝑑𝑝𝑤 𝑌, 𝑍 = 𝐹 𝑌𝑍 − 𝐹 𝑌 𝐹 𝑍 = 0. The converse is not true: If, 𝑑𝑝𝑤 𝑌, 𝑍 = 0 , then 𝑌 ⊥ 𝑍 . Causal Inference - Theory and No correlation does not imply independence Applications Uflacker, Huegle, However, we have, for large ℱ : ∀𝑔, 𝑕 ∈ ℱ: 𝑑𝑝𝑤 𝑔 𝑌 , 𝑕 𝑍 = 0 , then 𝑌 ⊥ 𝑍. Schmidt Slide 12

1. Preliminaries Conditional Independence of Random Variables Two real-valued random variables X and 𝑍 are called conditionally ■ independent given Z, 𝑌 ⊥ 𝑍 | 𝑎 or (𝑌 ⊥ 𝑍 | 𝑎) 𝑄 if 𝑞 𝑦, 𝑧 𝑨 = 𝑞 𝑦 𝑨 𝑞(𝑧|𝑨) For all 𝑦, 𝑧 and for all 𝑨 s.t. 𝑞 𝑨 > 0. Note: ■ It is possible to find 𝑌, 𝑍 which are conditionally independent given a Causal Inference variable 𝑎 but unconditionally dependent, and vice versa. - Theory and Applications Uflacker, Huegle, Schmidt Slide 13

2. Structural Causal Models Definition (Pearl) ■ Directed Acyclic Graph (DAG) 𝐻 = (𝑊, 𝐹) Cooling House Example: □ Vertices 𝑊 1 , … , 𝑊 𝑜 𝑊 𝑊 1 2 □ Directed edges 𝐹 = (𝑊 𝑘 ) , i.e., 𝑊 𝑘 , 𝑗 , 𝑊 𝑗 → 𝑊 □ No cycles 𝑊 𝑊 4 3 ■ Use kinship terminology, e.g., for path 𝑊 𝑗 → 𝑊 𝑘 → 𝑊 𝑙 𝑊 𝑘 ) parent of 𝑊 𝑊 □ 𝑊 𝑗 = 𝑄𝑏(𝑊 5 6 𝑘 ancestors of 𝑊 𝑊 𝑗 , 𝑊 𝑘 = 𝐵𝑜𝑕 𝑊 □ ▪ 𝑊 1 = 𝑂 0,1 𝑙 𝑙 ▪ 𝑊 2 = 𝑂 0,1 𝑗 ) descendants of 𝑊 □ 𝑊 𝑘 , 𝑊 𝑙 = 𝐸𝑓𝑡(𝑊 𝑗 Causal Inference ▪ 𝑊 3 = 3 𝑊 2 + 𝑂(0,1) ■ Directed Edges encode direct causes via - Theory and ▪ 𝑊 4 = 4 𝑊 1 + 5 𝑊 2 + 0.7 𝑊 3 + 𝑂(0,1) Applications with independent noise 𝑂 1 , … , 𝑂 𝑜 ▪ 𝑊 5 = 𝑊 4 + 𝑂(0,1) □ 𝑊 𝑘 = 𝑔 𝑘 Pa V j , N j Uflacker, Huegle, ▪ 𝑊 6 + 1.2 𝑊 4 + 𝑂(0,1) Schmidt This forms the Causal Graphical Model Slide 14

2. Structural Causal Models Connecting 𝐻 and 𝑄 Basic Assumption: Causal Sufficiency ■ All relevant variables are included in the DAG 𝐻 □ Data Generating Model Joint Distribution Causal Inference - Theory and 𝒀 ⊥ 𝒁 𝒂 𝑯 ⇒ 𝒀 ⊥ 𝒁 𝒂 𝑸 Applications Uflacker, Huegle, Key Postulate: (Local) Markov Condition ■ Schmidt Essential mathematical concept: d-separation ■ Slide 15 (describes the conditional independences required by a causal DAG)

3. (Local) Markov Condition Theorem (Local) Markov Condition: 𝑘 statistically independent of nondescendants, given parents 𝑄𝑏(𝑊 𝑘 ) , i.e., 𝑊 𝑾 𝒌 ⊥ 𝑾 𝑾/𝑬𝒇𝒕(𝑾 𝒌 ) |𝑸𝒃 𝑾 𝒌 . I.e., every information exchange with its nondescendants involves its parents ■ Example: ■ 𝑊 𝑊 Causal Inference 1 2 𝑊 6 ⊥ 𝑊 1 , 𝑊 2 , 𝑊 3 |𝑊 ▪ 4 - Theory and Applications 𝑊 5 ⊥ 𝑊 1 , 𝑊 2 , 𝑊 3 |𝑊 ▪ 𝑊 𝑊 4 4 Uflacker, Huegle, 3 Schmidt 𝑊 𝑊 5 Slide 16 6

3. (Local) Markov Condition Supplement (Lauritzen 1996) Assume 𝑊 𝑜 has no descendants, then 𝑂𝐸 𝑜 = {𝑊 𝑜−1 } . ■ 1 , … , 𝑊 Thus the local Markov condition implies ■ 𝑊 𝑜 ⊥ 𝑊 1 , … , 𝑊 𝑜−1 |𝑄𝑏 𝑊 𝑜 . Hence, the general decomposition ■ 𝑞 𝑤 1 , … , 𝑤 𝑜 = 𝑞 𝑤 𝑜 𝑤 1 , … , 𝑤 𝑜−1 𝑞(𝑤 1 , … , 𝑤 𝑜−1 ) becomes 𝑞 𝑤 1 , … , 𝑤 𝑜 = 𝑞 𝑤 𝑜 𝑄𝑏(𝑤 𝑜 ) 𝑞 𝑤 1 , … , 𝑤 𝑜−1 . Induction over 𝑜 yields to ■ 𝑜 Causal Inference 𝑞 𝑤 1 , … , 𝑤 𝑜 = ෑ 𝑞 𝑤 𝑗 𝑄𝑏 𝑤 𝑗 . - Theory and Applications 𝑗=1 Uflacker, Huegle, Schmidt I.e., the graph shows us how to factor the joint distribution 𝑄 𝑊 . ■ Slide 17

4. Factorization Definition Factorization: 𝑜 𝑞 𝑤 1 , … , 𝑤 𝑜 = ෑ 𝑞 𝑤 𝑗 𝑄𝑏 𝑤 𝑗 . 𝑗=1 I.e., conditionals as causal mechanisms generating statistical dependence ■ Example: ■ 𝑞 𝑊 = 𝑞 𝑤 1 , … , 𝑤 𝑜 𝑊 𝑊 Causal Inference 1 2 = 𝑞 𝑤 1 ⋅ 𝑞 𝑤 2 - Theory and Applications ⋅ 𝑞 𝑤 3 𝑤 2 ⋅ 𝑞 𝑤 4 𝑤 1 , 𝑤 2 , 𝑤 3 𝑊 𝑊 4 Uflacker, Huegle, 3 Schmidt ⋅ 𝑞 𝑤 5 𝑤 4 ⋅ 𝑞 𝑤 6 𝑤 4 𝑊 𝑊 = ς 𝑗=1 𝑜 5 𝑞 𝑤 𝑗 𝑄𝑏 𝑤 𝑗 Slide 18 6

5. Global Markov Condition D-Separation (Pearl 1988) Path = sequence of pairwise distinct vertices where consecutive ones are ■ adjacent A path 𝑟 is said to be blocked by a set 𝑇 if ■ 𝑟 contains a chain 𝑊 𝑙 or a fork 𝑊 𝑙 such that the □ 𝑗 → 𝑊 𝑘 → 𝑊 𝑗 ← 𝑊 𝑘 → 𝑊 middle node is in 𝑇 , or 𝑟 contains a collider 𝑊 𝑙 such that the middle node is not in 𝑇 𝑗 → 𝑊 𝑘 ← 𝑊 □ and such that no descendant of 𝑊 𝑘 is in S. Causal Inference - Theory and D-separation: Applications 𝑇 is said to d-separate 𝒀 and 𝒁 in the DAG 𝐻 , i.e., Uflacker, Huegle, 𝑌 ⊥ 𝑍 𝑇 𝐻 , Schmidt if 𝑇 blocks every path from a vertex in 𝑌 to a vertex in 𝑍. Slide 19

Causal Inference Theory and Applications Dr. Matthias Uflacker, - PowerPoint PPT Presentation

Causal Inference Theory and Applications Dr. Matthias Uflacker, Johannes Huegle, Christopher Schmidt April 24, 2018 Agenda April 24, 2018 Jupyter Notebook Causal Inference in Application Recap Causal Inference in a Nutshell

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

A Brief Introduction to Causal Inference Brady Neal causalcourse.com What is causal inference?

Introduction to Causal Inference Lan Liu University of Minnesota at Twin Cities liux3771@umn.edu

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Modes of Statistical Inference for Causal Efgects Plus an overview of the testing based approach

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal inference Gary Goertz Kroc Institute for International Peace Studies University of Notre

Causal Inference An introduction based on S. Wagers course on Causal Inference (OIT 661) Imke

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Geographic Data Science - Lecture IX Causal Inference Dani Arribas-Bel Today Correlation Vs

Causal Inference and Response Surface Modeling Inference and

Causal Programming Causal Programming Joshua Brul Joshua Brul

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

{ } { } Pr { t } = by definition of Pr i [ n ] , h ( x i ) t = Pr a

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

BLOOMIN' MARVELLOUS WHY PROBABLY CAN BE BETTER THAN DEFINITELY Adrian Colyer, @adriancolyer

Tracking Frequent Items Dynamically: Whats Hot and Whats Not To appear in PODS 2003

Independent Random Matching Darrell Duffie, Stanford University and Yeneng Sun, National

s rss st

Two-point Sampling Speaker: Chuang-Chieh Lin Advisor: Professor Maw-Shang Chang National Chung

Overview Motivation and introduction Preliminaries and notation General theory