Data-efficient causal effect estimation Adith Swaminathan - - PowerPoint PPT Presentation

data efficient causal effect estimation
SMART_READER_LITE
LIVE PREVIEW

Data-efficient causal effect estimation Adith Swaminathan - - PowerPoint PPT Presentation

Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with Maggie Makar (MIT) and Emre Kcman (MSR AI) Brown TRIPODS 1.16.2019 1. Improve ML applications using Causal Reasoning Causal ML Reasoning


slide-1
SLIDE 1

Data-efficient causal effect estimation

Adith Swaminathan adswamin@microsoft.com

Joint work with Maggie Makar (MIT) and Emre Kıcıman (MSR AI)

Brown TRIPODS 1.16.2019

slide-2
SLIDE 2
  • 1. Improve ML applications using Causal Reasoning
  • 2. Use ML tools to perform Causal Inference

2

ML Causal Reasoning

slide-3
SLIDE 3
  • 1. Causal Reasoning -> ML

3

“Use logs collected from interactive systems to evaluate and train new interaction policies”

The data we collect from …confounds our interactive systems… ML models

Simple pragmatic fixes to address confounding!

ML

slide-4
SLIDE 4

Example: Search

4

[https://arxiv.org/abs/1608.04468; WSDM’17]

Model the propensity of clicks on documents to de-bias training set

  • f learning-to-rank

models

Click

slide-5
SLIDE 5

Pointers to recent results

5

“Similar IPW-like ideas massively improve learning-to-rank for search”

[Joachims et al,WSDM’17 Best Paper]

“Important to reason about variance

  • f IPW for counterfactual learning”

[Swaminathan&Joachims,ICML’15]

“We can do much better than IPW for structured treatments (slates)”

[Swaminathan et al,NIPS’17]

“These techniques complement deep learning”

[Joachims et al,ICLR’18]

“Self-normalized estimators are better to use in these applications”

[Swaminathan&Joachims,NIPS’15]

“IPW fixes collaborative filtering for recommendations”

[Schnabel et al,ICML’16]

slide-6
SLIDE 6
  • 1. Improve ML applications using Causal Reasoning
  • 2. Use ML tools to perform Causal Inference

6

ML Causal Reasoning

slide-7
SLIDE 7
  • 2. ML -> Causal Reasoning

7

“Data efficient treatment effect estimation”

Representation learning + Causal inference = Bias-Variance Trade-off?

[AAAI’19]

slide-8
SLIDE 8

Problem Setting

8

Will my patient’s blood pressure increase if I put her on medication A? Challenges

  • A question of causal nature
  • Limited data at test time
slide-9
SLIDE 9

Individual Treatment Effect (ITE)

9

  • Estimate the causal effect of an intervention: if

𝑢 changes, how does the outcome 𝑍

𝑢 change?

  • T

arget for estimation: 𝑍

1 − 𝑍

  • T

arget is unobserved: the fundamental problem of causal inference 𝐽𝑈𝐹: 𝜐 𝑦 = 𝔽𝑍

1∼Pr(𝑍 1|𝑦) 𝑍

1 − 𝔽𝑍

0∼Pr(𝑍 0|𝑦) 𝑍

slide-10
SLIDE 10

ITE estimation from obs. data

10

Two functions:

Estimation of heterogeneity Adjustment for Confounding Confounders Effect modifiers

slide-11
SLIDE 11

Confounders vs. Effect modifiers

11

Average treatment effect

slide-12
SLIDE 12

Confounders vs. Effect modifiers

12

slide-13
SLIDE 13

ITE discovery

Data efficient ITE estimation

13

Estimation of heterogeneity Adjustment for Confounding ITE Prediction

slide-14
SLIDE 14

Insight

14

Leverage the difference between tasks at training and test time to reduce data collection burden at test time

ITE Discovery ITE Prediction

slide-15
SLIDE 15

Why Trees?

slide-16
SLIDE 16

Trees identify the most important axes of heterogeneity

1 2 2 3 3 3 3

slide-17
SLIDE 17

Trees can be traversed till querying ability is exhausted

slide-18
SLIDE 18

Different individuals → different queries

slide-19
SLIDE 19

Algorithm: DEITEE

19

Data Efficient Individual Treatment Effect Estimator

ITE Discovery: ITE Prediction Base model DEITEE model

slide-20
SLIDE 20

Experiments: Synthetic

20

  • Data: ACIC’17 simulated data (“semi-synthetic”)

N=5k; d=58

  • Base models: BART and GRF
  • Benchmarks: Train BART/GRF with feature

regularization

  • Evaluation: (1) Accuracy relative to true ITE;

(2) Number of features queried

slide-21
SLIDE 21

DEITEE: Features queried

21

slide-22
SLIDE 22

DEITEE doesn’t sacrifice accuracy

22

slide-23
SLIDE 23

Experiment on real data

23

What is the effect of mother’s habits on newborn’s health? 1989 MA singleton births (CDC) N=90k; d=77

Mother’s habit (treatment) Mean Absolute Error relative to proxy ITE Mean number of DEITEE features BART DEITEE- BART Prenatal care 580.20 580.20 15.42 Smoking 587.62 587.62 16.2 Alcohol? HS Education Age Age? Married? Health risks? Y N Y N

slide-24
SLIDE 24

Conclusions

24

  • DEITEE reduces the number of features required to

estimate individual causal effects

❖ Leverage difference between ITE discovery and ITE prediction

  • Ongoing: Careful analysis of distillation error;

guarantees on effect modifier discovery

  • Need: Good robust method for model selection
slide-25
SLIDE 25

Thanks!

25

adswamin@microsoft.com

ML

ITE Discovery Heterogeneity Confounding ITE Prediction