Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal - PowerPoint PPT Presentation

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal Inference Part 2 David Sontag Acknowledgement: adapted from slides by Uri Shalit (Technion)

Reminder: Potential Outcomes • Each unit (individual) 𝑦 " has two potential outcomes: – 𝑍 $ (𝑦 " ) is the potential outcome had the unit not been treated: “ control outcome ” – 𝑍 ' (𝑦 " ) is the potential outcome had the unit been treated: “ treated outcome ” • Conditional average treatment effect for unit 𝑗 : 𝐷𝐵𝑈𝐹 𝑦 " = 𝔽 / 0 |4 5 ) [𝑍 ' |𝑦 " ] − 𝔽 / : |4 5 ) [𝑍 $ |𝑦 " ] 0 ~2(/ : ~2(/ • Average Treatment Effect: 𝐵𝑈𝐹 = 𝔽 4~2(4) 𝐷𝐵𝑈𝐹 𝑦

Two common approaches for counterfactual inference Covariate adjustment Propensity scores

Covariate adjustment (reminder) Explicitly model the relationship between treatment, confounders, and outcome: Covariates Regression Outcome (Features) model 𝑦 ' 𝑦 ; 𝑔(𝑦, 𝑈) … 𝑧 𝑦 < 𝑈

Covariate adjustment (reminder) • Under ignorability, 𝐷𝐵𝑈𝐹 𝑦 = 𝔽 4~2 4 𝔽 𝑍 ' 𝑈 = 1, 𝑦 − 𝔽 𝑍 $ 𝑈 = 0, 𝑦 • Fit a model 𝑔 𝑦, 𝑢 ≈ 𝔽 𝑍 D 𝑈 = 𝑢, 𝑦 , then: J 𝑦 " = 𝑔 𝑦 " , 1 − 𝑔(𝑦 " , 0). 𝐷𝐵𝑈𝐹

Covariate adjustment with linear models • Assume that: Blood pressure age medication 𝑍 D 𝑦 = 𝛾𝑦 + 𝛿 ⋅ 𝑢 + 𝜗 D 𝔽 𝜗 D = 0 • Then: 𝐷𝐵𝑈𝐹(𝑦): = 𝔽[𝑍 ' 𝑦 − 𝑍 $ 𝑦 ] = 𝔽[(𝛾𝑦 + 𝛿 + 𝜗 ' ) − 𝛾𝑦 + 𝜗 $ ] = 𝛿

Covariate adjustment with linear models • Assume that: Blood pressure age medication 𝑍 D 𝑦 = 𝛾𝑦 + 𝛿 ⋅ 𝑢 + 𝜗 D 𝔽 𝜗 D = 0 • Then: 𝐷𝐵𝑈𝐹(𝑦): = 𝔽[𝑍 ' 𝑦 − 𝑍 $ 𝑦 ] = 𝔽[(𝛾𝑦 + 𝛿 + 𝜗 ' ) − 𝛾𝑦 + 𝜗 $ ] = 𝛿 𝐵𝑈𝐹: = 𝔽 2 4 𝐷𝐵𝑈𝐹 𝑦 = 𝛿

Covariate adjustment with linear models • Assume that: Blood pressure age medication 𝑍 D 𝑦 = 𝛾𝑦 + 𝛿 ⋅ 𝑢 + 𝜗 D 𝔽 𝜗 D = 0 𝐵𝑈𝐹: = 𝔽 2 4 𝐷𝐵𝑈𝐹 𝑦 = 𝛿 • For causal inference, need to estimate 𝛿 well, not 𝑍 D 𝑦 - Identification, not prediction • Major difference between ML and statistics

What happens if true model is not linear? • True data generating process, 𝑦 ∈ ℝ : D 𝑦 = 𝛾𝑦 + 𝛿 ⋅ 𝑢 + 𝜀 ⋅ 𝑦 ; 𝑍 𝐵𝑈𝐹 = 𝔽 𝑍 ' − 𝑍 $ = 𝛿 • Hypothesized model: U𝑦 + 𝛿 T 𝑦 = 𝛾 𝑍 V ⋅ 𝑢 D V = 𝛿 + 𝜀 𝔽 𝑦𝑢 𝔽 𝑦 ; − 𝔽[𝑢 ; ]𝔽[𝑦 ; 𝑢] 𝛿 𝔽 𝑦𝑢 ; − 𝔽[𝑦 ; ]𝔽[𝑢 ; ] Depending on 𝜺 , can be made to be arbitrarily large or small!

Covariate adjustment with non-linear models • Random forests and Bayesian trees Hill (2011), Athey & Imbens (2015), Wager & Athey (2015) • Gaussian processes Hoyer et al. (2009), Zigler et al. (2012) • Neural networks Beck et al. (2000), Johansson et al. (2016), Shalit et al. (2016), Lopez-Paz et al. (2016)

Example: Gaussian processes 𝑍 ' 𝑦 𝑍 ' 𝑦 Separate treated and Joint treated and GP − Independent GP − Grouped control models ● control model ● 120 120 ● ● ● ● ● ● ● ● ● ● 110 110 ● ● ● ● 𝑧 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 𝑍 $ 𝑦 𝑍 $ 𝑦 90 ● ● 90 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 80 80 ● ● ● ● ● ● ● ● 𝑦 𝑦 10 20 30 40 50 60 10 20 30 40 50 60 Treated Control Figures: Vincent Dorie & Jennifer Hill

Example: Neural networks Neural network layers Predicted potential outcomes Inte % ) … & " Φ … % ( … ' Covariates Shared representation Learning objective Shalit, Johansson, Sontag. Estimating Individual Treatment Effect: Generalization Bounds and Algorithms . ICML, 2017

Matching • Find each unit’s long-lost counterfactual identical twin, check up on his outcome

Matching • Find each unit’s long-lost counterfactual identical twin, check up on his outcome Obama, had he gone to law school Obama, had he gone to business school

Matching • Find each unit’s long-lost counterfactual identical twin, check up on his outcome • Used for estimating both ATE and CATE

Match to nearest neighbor from opposite group Charleson comorbidity index Treated Control Age

1-NN Matching • Let 𝑒 ⋅,⋅ be a metric between 𝑦 ’s • For each 𝑗 , define 𝑘 𝑗 = argmin 𝑒(𝑦 _ , 𝑦 " ) _ `.D. D a bD 5 𝑘 𝑗 is the nearest counterfactual neighbor of 𝑗 • 𝑢 " = 1 , unit 𝑗 is treated: J 𝑦 " = 𝑧 " − 𝑧 _ " 𝐷𝐵𝑈𝐹 • 𝑢 " = 0, unit 𝑗 is control: J 𝑦 " = 𝑧 _(") − 𝑧 " 𝐷𝐵𝑈𝐹

1-NN Matching • Let 𝑒 ⋅,⋅ be a metric between 𝑦 ’s • For each 𝑗 , define 𝑘 𝑗 = argmin 𝑒(𝑦 _ , 𝑦 " ) _ `.D. D a bD 5 𝑘 𝑗 is the nearest counterfactual neighbor of 𝑗 J 𝑦 " = (2𝑢 " − 1)(𝑧 " −𝑧 _ " ) • 𝐷𝐵𝑈𝐹 ' J = J 𝑦 " d d ∑ • 𝐵𝑈𝐹 𝐷𝐵𝑈𝐹 "f'

Matching • Interpretable, especially in small-sample regime • Nonparametric • Heavily reliant on the underlying metric • Could be misled by features which don’t affect the outcome

Covariate adjustment and matching • Matching is equivalent to covariate adjustment with two 1-nearest neighbor classifiers: g g 𝑍 ' 𝑦 = 𝑧 hh 0 4 , 𝑍 $ 𝑦 = 𝑧 hh : 4 where 𝑧 hh i 4 is the nearest-neighbor of 𝑦 among units with treatment assignment 𝑢 = 0,1 • 1-NN matching is in general inconsistent, though only with small bias (Imbens 2004)

Two common approaches for counterfactual inference Covariate adjustment Propensity scores

Propensity scores • Tool for estimating ATE • Basic idea: turn observational study into a pseudo-randomized trial by re-weighting samples, similar to importance sampling

Inverse propensity score re-weighting 𝑞(𝑦|𝑢 = 0) ≠ 𝑞 𝑦 𝑢 = 1 control treated 𝑦 ; = Charlson comorbidity index Treated 𝑦 ' = 𝑏𝑕𝑓 Control

Inverse propensity score re-weighting 𝑞 𝑦 𝑢 = 0 ⋅ 𝑥 $ (𝑦) ≈ 𝑞 𝑦 𝑢 = 1 ⋅ 𝑥 ' (𝑦) reweighted control reweighted treated 𝑦 ; = Charlson comorbidity index Treated 𝑦 ' = 𝑏𝑕𝑓 Control

Propensity score • Propensity score: 𝑞 𝑈 = 1 𝑦 , using machine learning tools • Samples re-weighted by the inverse propensity score of the treatment they received

Propensity scores – algorithm Inverse probability of treatment weighted estimator How to calculate ATE with propensity score for sample 𝑦 ' , 𝑢 ' , 𝑧 ' , … , (𝑦 d , 𝑢 d , 𝑧 d ) 1. Use any ML method to estimate 𝑞 V 𝑈 = 𝑢 𝑦 ATE = 1 p ( t i = 1 | x i ) − 1 y i y i 2. ˆ X X ˆ p ( t i = 0 | x i ) ˆ n n i s.t. t i =1 i s.t. t i =0

Propensity scores – algorithm Inverse probability of treatment weighted estimator How to calculate ATE with propensity score for sample 𝑦 ' , 𝑢 ' , 𝑧 ' , … , (𝑦 d , 𝑢 d , 𝑧 d ) 1. Randomized trial 𝑞(𝑈 = 𝑢|𝑦) = 0.5 ATE = 1 p ( t i = 1 | x i ) − 1 y i y i 2. ˆ X X ˆ p ( t i = 0 | x i ) ˆ n n i s.t. t i =1 i s.t. t i =0

Propensity scores – algorithm Inverse probability of treatment weighted estimator How to calculate ATE with propensity score for sample 𝑦 ' , 𝑢 ' , 𝑧 ' , … , (𝑦 d , 𝑢 d , 𝑧 d ) 1. Randomized trial 𝑞(𝑈 = 𝑢|𝑦) = 0.5 y i y i ATE = 1 0 . 5 − 1 2. X X ˆ 0 . 5 = n n i s.t. t i =1 i s.t. t i =0 X X

Propensity scores – algorithm Inverse probability of treatment weighted estimator How to calculate ATE with propensity score for sample 𝑦 ' , 𝑢 ' , 𝑧 ' , … , (𝑦 d , 𝑢 d , 𝑧 d ) 1. Randomized trial 𝑞 = 0.5 y i y i ATE = 1 0 . 5 − 1 2. X X ˆ 0 . 5 = n n i s.t. t i =1 i s.t. t i =0 2 y i − 2 X X y i n n i s.t. t i =1 i s.t. t i =0

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal - PowerPoint PPT Presentation

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal Inference Part 2 David Sontag Acknowledgement: adapted from slides by Uri Shalit (Technion) Reminder: Potential Outcomes Each unit (individual) " has two potential

Regulation of AL / ML in the US 6.S897/HST.956: Machine Learning for Healthcare 6.S897/HST.956:

Machine Learning for Healthcare HST.956, 6.S897 Lecture 19: Disease progression modeling &

Machine Learning for Healthcare HST.956, 6.S897 Lecture 4: Risk stratification David Sontag

Machine Learning for Healthcare HST.956, 6.S897 Lecture 24: Robustness to dataset shift David

Reinforcement learning Fredrik D. Johansson Clinical ML @ MIT 6.S897/HST.956: Machine Learning

Machine Learning for Healthcare 6.S897, HST.S53 Lecture 1: What makes healthcare unique? Prof.

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag

Machine Learning for Healthcare 6.871, HST.956 Lecture 5: Learning with noisy or censored labels

Machine Learning for Healthcare 6.871, HST.956 Lecture 14: Causal Inference Part 1 David Sontag

Business Office Fort Ringgold Rio Grande City, Texas 78582 Phone: (956) 716-6710 Fax: (956)

Payroll Department Updated for the 2017-2018 School Year Sharyland ISD Business Office Staff

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Creating Innovations that Matter Deep Learning for Medical Imaging Christine Swisher, PhD Guest

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Situation Assessment and Stakeholder Analysis for A A flexi xible framework to understand the

The Effect of Housing on Portfolio Choice Raj Chetty Adam Szeidl Harvard UC-Berkeley April

Clean Air Act: Important issues and considerations during COVID-19 COVID-19 Considerations

Moral Hazard vs. Liquidity and Optimal Unemployment Insurance Raj Chetty UC-Berkeley and NBER

How the Changing Landscape of Oncology Drug Development and Approval Will Affect Advanced

Affect/Emotion in Design Administrivia Poster session Thursday NOT MY FAULT Critiques

Webinar Instructions PowerPoint and webinar recording will be available on the HUD Exchange

Lecture 3: Regularization I Princeton University COS 495 Instructor: Yingyu Liang What is

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal - PowerPoint PPT Presentation

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal Inference Part 2 David Sontag Acknowledgement: adapted from slides by Uri Shalit (Technion) Reminder: Potential Outcomes Each unit (individual) " has two potential

Regulation of AL / ML in the US 6.S897/HST.956: Machine Learning for Healthcare 6.S897/HST.956:

Machine Learning for Healthcare HST.956, 6.S897 Lecture 19: Disease progression modeling &amp;

Machine Learning for Healthcare HST.956, 6.S897 Lecture 4: Risk stratification David Sontag

Machine Learning for Healthcare HST.956, 6.S897 Lecture 24: Robustness to dataset shift David

Reinforcement learning Fredrik D. Johansson Clinical ML @ MIT 6.S897/HST.956: Machine Learning

Machine Learning for Healthcare 6.S897, HST.S53 Lecture 1: What makes healthcare unique? Prof.

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag

Machine Learning for Healthcare 6.871, HST.956 Lecture 5: Learning with noisy or censored labels

Machine Learning for Healthcare 6.871, HST.956 Lecture 14: Causal Inference Part 1 David Sontag

Business Office Fort Ringgold Rio Grande City, Texas 78582 Phone: (956) 716-6710 Fax: (956)

Payroll Department Updated for the 2017-2018 School Year Sharyland ISD Business Office Staff

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Creating Innovations that Matter Deep Learning for Medical Imaging Christine Swisher, PhD Guest

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

Situation Assessment and Stakeholder Analysis for A A flexi xible framework to understand the

The Effect of Housing on Portfolio Choice Raj Chetty Adam Szeidl Harvard UC-Berkeley April

Clean Air Act: Important issues and considerations during COVID-19 COVID-19 Considerations

Moral Hazard vs. Liquidity and Optimal Unemployment Insurance Raj Chetty UC-Berkeley and NBER

How the Changing Landscape of Oncology Drug Development and Approval Will Affect Advanced

Affect/Emotion in Design Administrivia Poster session Thursday NOT MY FAULT Critiques

Webinar Instructions PowerPoint and webinar recording will be available on the HUD Exchange

Lecture 3: Regularization I Princeton University COS 495 Instructor: Yingyu Liang What is

Machine Learning for Healthcare HST.956, 6.S897 Lecture 19: Disease progression modeling &