Data-efficient causal effect estimation Adith Swaminathan - PowerPoint PPT Presentation

Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with Maggie Makar (MIT) and Emre Kıcıman (MSR AI) Brown TRIPODS 1.16.2019

1. Improve ML applications using Causal Reasoning Causal ML Reasoning 2. Use ML tools to perform Causal Inference 2

1. Causal Reasoning -> ML “Use logs collected from interactive systems to evaluate and train new interaction policies” ML The data we collect from …confounds our interactive systems… ML models Simple pragmatic fixes to address confounding! 3

Example: Search [https://arxiv.org/abs/1608.04468 ; WSDM’17] Model the propensity of clicks on documents to de-bias training set of learning-to-rank models Click 4

Pointers to recent results “IPW fixes collaborative filtering for “Similar IPW -like ideas massively recommendations” improve learning-to- rank for search” [Schnabel et al,ICML’16] [Joachims et al,WSDM’17 Best Paper] “Important to reason about variance of IPW for counterfactual learning” [Swaminathan&Joachims,ICML’15] “We can do much better than IPW for structured treatments (slates)” “Self -normalized estimators are [Swaminathan et al,NIPS’17] better to use in these applications” [Swaminathan&Joachims,NIPS’15] “These techniques complement deep learning” [Joachims et al,ICLR’18] 5

1. Improve ML applications using Causal Reasoning Causal ML Reasoning 2. Use ML tools to perform Causal Inference 6

2. ML -> Causal Reasoning “Data efficient treatment effect estimation” [AAAI’19] Representation learning + Causal inference = Bias-Variance Trade-off? 7

Problem Setting Will my patient’s blood pressure increase if I put her on medication A? Challenges - A question of causal nature - Limited data at test time 8

Individual Treatment Effect (ITE) ● Estimate the causal effect of an intervention: if 𝑢 changes, how does the outcome 𝑍 𝑢 change? ● T arget for estimation: 𝑍 1 − 𝑍 0 ● T arget is unobserved: the fundamental problem of causal inference 𝐽𝑈𝐹: 𝜐 𝑦 = 𝔽 𝑍 1 |𝑦) 𝑍 1 − 𝔽 𝑍 0 |𝑦) 𝑍 1 ∼Pr(𝑍 0 ∼Pr(𝑍 0 9

ITE estimation from obs. data Two functions: Adjustment for Estimation of Confounding heterogeneity Confounders Effect modifiers 10

Confounders vs. Effect modifiers Average treatment effect 11

Confounders vs. Effect modifiers 12

Data efficient ITE estimation ITE Prediction Adjustment for Estimation of Confounding heterogeneity ITE discovery 13

Insight Leverage the difference between tasks at training and test time to reduce data collection burden at test time ITE Discovery ITE Prediction 14

Why Trees?

Trees identify the most important axes of heterogeneity 1 2 2 3 3 3 3

Trees can be traversed till querying ability is exhausted

Different individuals → different queries

Algorithm: DEITEE Data Efficient Individual Treatment Effect Estimator ITE Discovery: Base model ITE Prediction DEITEE model 19

Experiments: Synthetic Data: ACIC’17 simulated data (“semi - synthetic”) • N=5k; d=58 Base models: BART and GRF • Benchmarks: Train BART/GRF with feature • regularization Evaluation: (1) Accuracy relative to true ITE; • (2) Number of features queried 20

DEITEE: Features queried 21

DEITEE doesn’t sacrifice accuracy 22

Experiment on real data What is the effect of mother’s habits on newborn’s health? 1989 MA singleton births (CDC) N=90k; d=77 Mother’s Mean Absolute Mean habit Error relative to number of Alcohol? (treatment) proxy ITE DEITEE Y N features HS Age BART DEITEE- Education BART Y N Prenatal 580.20 580.20 15.42 Health Age? Married? care risks? Smoking 587.62 587.62 16.2 23

Conclusions • DEITEE reduces the number of features required to estimate individual causal effects ❖ Leverage difference between ITE discovery and ITE prediction • Ongoing: Careful analysis of distillation error; guarantees on effect modifier discovery • Need: Good robust method for model selection 24

Thanks! Confounding Heterogeneity ITE Prediction ITE Discovery ML adswamin@microsoft.com 25

Data-efficient causal effect estimation Adith Swaminathan - PowerPoint PPT Presentation

Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with Maggie Makar (MIT) and Emre Kcman (MSR AI) Brown TRIPODS 1.16.2019 1. Improve ML applications using Causal Reasoning Causal ML Reasoning

Causal Effect Evaluation and Causal Network Learning Zhi Geng Peking University, China June

Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery

Political Science 209 - Fall 2018 Causal Inference Florian Hollenbach 7th September 2018 Causal

Heterogeneity, Endogeneity and Causal Effect Estimation Kevin Sheppard

Heterogeneity, Endogeneity and Causal Effect Estimation Kevin Sheppard

Causal Inference By: Miguel A. Hern an and James M. Robins Part I: Causal inference without

Causal Discovery from Observational Data Brady Neal causalcourse.com What if we dont have

Randomized Experiments The goal of randomized experiments is to identify The causal

Few-shot Domain Adaptation 1/12 by Causal Mechanism Transfer Domain adaptation Causal mechanism

Causal Programming Causal Programming Joshua Brul Joshua Brul

Identification and Estimation of Dynamic Causal Effects in Macroeconomics Jim Stock and Mark

Can one extract causal information from high-dimensional observational data? Applied Multivariate

Week 5 Video 2 Relationship Mining Causal Mining Causal Data Mining These slides developed in

Identification and Estimation of Causal Effects from Dependent Data Eli Sherman esherman@jhu.edu

On estimation of functional causal models: Post - nonlinear causal model as an

Causal Inference An introduction based on S. Wagers course on Causal Inference (OIT 661) Imke

CMPS 112: Spring 2019 Comparative Programming Languages Lambda Calculus Owen Arden

CSE 116: Fall 2019 Introduction to Functional Programming Lambda Calculus Owen Arden

COUNTEREXAMPLE-GUIDED MODEL SYNTHESIS Mathias Preiner, Aina Niemetz and Armin Biere Institute

Adaptive)Concretization)for) Parallel)Program)Synthesis) Jinseong(Jeon 1 ,)Xiaokang)Qiu 2 ,)

The Nuts and Bolts of Yices Bruno Dutertre SRI International SMT 2016 Coimbra, Portugal

Guarantees in in Program Synthesis Qinheping Hu , Jason Breck , John Cyphert , Loris D'Antoni ,

Model Finding for Recursive Functions in SMT Andrew Reynolds Jasmin Christian Blanchette Cesare

Adaptive)Concretization)for) Parallel)Program)Synthesis) Jinseong(Jeon 1 ,)Xiaokang)Qiu 2 ,)