data efficient causal effect estimation
play

Data-efficient causal effect estimation Adith Swaminathan - PowerPoint PPT Presentation

Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with Maggie Makar (MIT) and Emre Kcman (MSR AI) Brown TRIPODS 1.16.2019 1. Improve ML applications using Causal Reasoning Causal ML Reasoning


  1. Data-efficient causal effect estimation Adith Swaminathan adswamin@microsoft.com Joint work with Maggie Makar (MIT) and Emre Kıcıman (MSR AI) Brown TRIPODS 1.16.2019

  2. 1. Improve ML applications using Causal Reasoning Causal ML Reasoning 2. Use ML tools to perform Causal Inference 2

  3. 1. Causal Reasoning -> ML “Use logs collected from interactive systems to evaluate and train new interaction policies” ML The data we collect from …confounds our interactive systems… ML models Simple pragmatic fixes to address confounding! 3

  4. Example: Search [https://arxiv.org/abs/1608.04468 ; WSDM’17] Model the propensity of clicks on documents to de-bias training set of learning-to-rank models Click 4

  5. Pointers to recent results “IPW fixes collaborative filtering for “Similar IPW -like ideas massively recommendations” improve learning-to- rank for search” [Schnabel et al,ICML’16] [Joachims et al,WSDM’17 Best Paper] “Important to reason about variance of IPW for counterfactual learning” [Swaminathan&Joachims,ICML’15] “We can do much better than IPW for structured treatments (slates)” “Self -normalized estimators are [Swaminathan et al,NIPS’17] better to use in these applications” [Swaminathan&Joachims,NIPS’15] “These techniques complement deep learning” [Joachims et al,ICLR’18] 5

  6. 1. Improve ML applications using Causal Reasoning Causal ML Reasoning 2. Use ML tools to perform Causal Inference 6

  7. 2. ML -> Causal Reasoning “Data efficient treatment effect estimation” [AAAI’19] Representation learning + Causal inference = Bias-Variance Trade-off? 7

  8. Problem Setting Will my patient’s blood pressure increase if I put her on medication A? Challenges - A question of causal nature - Limited data at test time 8

  9. Individual Treatment Effect (ITE) ● Estimate the causal effect of an intervention: if 𝑢 changes, how does the outcome 𝑍 𝑢 change? ● T arget for estimation: 𝑍 1 − 𝑍 0 ● T arget is unobserved: the fundamental problem of causal inference 𝐽𝑈𝐹: 𝜐 𝑦 = 𝔽 𝑍 1 |𝑦) 𝑍 1 − 𝔽 𝑍 0 |𝑦) 𝑍 1 ∼Pr(𝑍 0 ∼Pr(𝑍 0 9

  10. ITE estimation from obs. data Two functions: Adjustment for Estimation of Confounding heterogeneity Confounders Effect modifiers 10

  11. Confounders vs. Effect modifiers Average treatment effect 11

  12. Confounders vs. Effect modifiers 12

  13. Data efficient ITE estimation ITE Prediction Adjustment for Estimation of Confounding heterogeneity ITE discovery 13

  14. Insight Leverage the difference between tasks at training and test time to reduce data collection burden at test time ITE Discovery ITE Prediction 14

  15. Why Trees?

  16. Trees identify the most important axes of heterogeneity 1 2 2 3 3 3 3

  17. Trees can be traversed till querying ability is exhausted

  18. Different individuals → different queries

  19. Algorithm: DEITEE Data Efficient Individual Treatment Effect Estimator ITE Discovery: Base model ITE Prediction DEITEE model 19

  20. Experiments: Synthetic Data: ACIC’17 simulated data (“semi - synthetic”) • N=5k; d=58 Base models: BART and GRF • Benchmarks: Train BART/GRF with feature • regularization Evaluation: (1) Accuracy relative to true ITE; • (2) Number of features queried 20

  21. DEITEE: Features queried 21

  22. DEITEE doesn’t sacrifice accuracy 22

  23. Experiment on real data What is the effect of mother’s habits on newborn’s health? 1989 MA singleton births (CDC) N=90k; d=77 Mother’s Mean Absolute Mean habit Error relative to number of Alcohol? (treatment) proxy ITE DEITEE Y N features HS Age BART DEITEE- Education BART Y N Prenatal 580.20 580.20 15.42 Health Age? Married? care risks? Smoking 587.62 587.62 16.2 23

  24. Conclusions • DEITEE reduces the number of features required to estimate individual causal effects ❖ Leverage difference between ITE discovery and ITE prediction • Ongoing: Careful analysis of distillation error; guarantees on effect modifier discovery • Need: Good robust method for model selection 24

  25. Thanks! Confounding Heterogeneity ITE Prediction ITE Discovery ML adswamin@microsoft.com 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend