MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal - PowerPoint PPT Presentation

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides)

*Last week: Type 2 diabetes 2013 1994 2000 <4.5% 4.5%–5.9% 6.0%–7.4% 7.5%–8.9% >9.0% Early detection of Type 2 diabetes: (Razavian et al., Big Data , 2016)

*Last week: Discovered risk factors Additional Disease Risk Factors Include: Highly weighted features Odds Ratio Pituitary dwarfism (253.3), 4.17 Impaired Fasting Glucose (Code 790.21) Hepatomegaly(789.1), Chronic (3.87 4.49) 4.07 Hepatitis C (070.54), Hepatitis Abnormal Glucose NEC (790.29) (3.76 4.41) (573.3), Calcaneal Spur(726.73), 3.28 Hypertension (401) (3.17 3.39) Thyrotoxicosis without mention 2.98 Obstructive Sleep Apnea (327.23) of goiter(242.90), Sinoatrial (2.78 3.20) 2.88 Node dysfunction(427.81), Acute Obesity (278) (2.75 3.02) frontal sinusitis (461.1 ), 2.49 Abnormal Blood Chemistry (790.6) (2.36 2.62) Hypertrophic and atrophic 2.45 Hyperlipidemia (272.4) conditions of skin(701.9), (2.37 2.53) 2.09 Shortness Of Breath (786.05) Irregular menstruation(626.4), … (1.99 2.19) 1.85 Esophageal Reflux (530.81) (1.78 1.93) Diabetes (Razavian et al., Big Data , 2016) 1-year gap

Thinking about interventions Do highly weighted features suggest avenues for 1. preventing onset of diabetes? Example: Gastric bypass surgery . Highest negative weight • (9 th most predictive feature) What is the mathematical justification for thinking of highly • weighted features in this way? What happens if the patient did not get diabetes 2. because an intervention made in the gap? How do we deconvolve effect of interventions from the • prediction task? Solution is to reframe as causal inference problem: 3. predict for which patients an intervention will reduce chances of getting T2D

Randomized trials vs. observational studies Which treatment works better? A or B

Randomized controlled trial (RCT) A B A B A A A B A B B A A B A B Socio-economic class Poor Wealthy Which treatment works better? A or B

Observational study A A A B A A A A A B A B B B B B Socio-economic class Poor Wealthy Which treatment works better? A or B

Observational study Socio-economic class is a potential Confounder A A A B A A A A A B A B B B B B Socio-economic class Poor Wealthy Which treatment works better? A or B

In many fields randomized studies are the gold standard for causal inference, but…

• Does inhaling Asbestos cause cancer? • Does decreasing the interest rate reinvigorate the economy? • We have a budget for one new anti- diabetic drug experiment. Can we use past health records of 100,000 diabetics to guide us?

Even randomized controlled trials have flaws • Not personalized – only population effect • Study population might not represent true population • Recruiting is hard • People might drop out of study • Study in one company/hospital/state/country could fail to generalize to others

Example 1 Precision medicine: Individualized Treatment Effect (ITE)

Which treatment is best for me ? • Which anti-hypertensive treatment? • Calcium channel blocker (A) • ACE inhibitor (B) • Blood pressure = 150/95 WBC count = 6*10 9 /L • • Current situation: • Temperature = 98°F • HbA1c = 6.6% • Clinical trials • Thickness of heart artery plaque = 3mm • Doctor’s knowledge & intuition • Weight = 65kg • Use datasets of patients and their histories

Which treatment is best for me ? • Which anti-hypertensive treatment? • Calcium channel blocker (A) • ACE inhibitor (B) • Future blood pressure: treatment A vs. B • Individualized Treatment Effect (ITE)

Which treatment is best for me ? • Which anti-hypertensive treatment? • Calcium channel blocker (A) • ACE inhibitor (B) • Potential confounder : maybe rich patients got medication A more often, and poor patients got medication B more often

Example 2 Job training: Average Treatment Effect (ATE)

Should the government fund job-training programs? • Existing job training programs seem to help unemployed and underemployed find better jobs • Should the government fund such programs? • Maybe training helps but only marginally? Is it worth the investment? • Average Treatment Effect (ATE) • Potential confounder : Maybe only motivated people go to job training? Maybe they would have found better jobs anyway?

Observational studies A major challenge in causal inference from observational studies is how to control or adjust for the confounding factors

Counterfactuals and causal inference • Does treatment 𝑼 cause outcome 𝒁 ? • If 𝑼 had not occurred, 𝒁 would not have occurred (David Hume) • Counterfactuals: Kim received job training (𝑼) , and her income one year later ( 𝒁 ) is 20,000$ What would have been Kim’s income had she not had job training?

Counterfactuals and causal inference • Counterfactuals: Kim received job training (𝑼) , and her income one year later ( 𝒁 ) is $20,000 What would have been Kim’s income had she not had job training? • If her income would have been $18,000, we say that job training caused an increase of $2,000 in Kim’s income • The problem: you never know what might have been

Sliding Doors

Potential Outcomes Framework (Rubin-Neyman Causal Model) • Each unit 𝑦 & has two potential outcomes: • 𝑍 ( (𝑦 & ) is the potential outcome had the unit not been treated: “ control outcome ” • 𝑍 ) (𝑦 & ) is the potential outcome had the unit been treated: “ treated outcome ” • Individual Treatment Effect for unit 𝑗 : 𝐽𝑈𝐹 𝑦 & = 𝔽 0 1 |5 6 ) [𝑍 ) |𝑦 & ] − 𝔽 0 ; |5 6 ) [𝑍 ( |𝑦 & ] 1 ~3(0 ; ~3(0 • Average Treatment Effect: 𝐵𝑈𝐹:= 𝔽 𝑍 ) − 𝑍 ( = 𝔽 5~3(5) 𝐽𝑈𝐹 𝑦

Potential Outcomes Framework (Rubin-Neyman Causal Model) • Each unit 𝑦 & has two potential outcomes: • 𝑍 ( (𝑦 & ) is the potential outcome had the unit not been treated: “ control outcome ” • 𝑍 ) (𝑦 & ) is the potential outcome had the unit been treated: “ treated outcome ” • Observed factual outcome: 𝑧 & = 𝑢 & 𝑍 ) 𝑦 & + 1 − 𝑢 & 𝑍 ( (𝑦 & ) • Unobserved counterfactual outcome: BC = (1 − 𝑢 & )𝑍 𝑧 & ) 𝑦 & + 𝑢 & 𝑍 ( (𝑦 & )

Terminology • Unit : data point, e.g. patient, customer, student • Treatment : binary indicator (in this tutorial) Also called intervention • Treated : units who received treatment=1 • Control : units who received treatment=0 • Factual : the set of observed units with their respective treatment assignment • Counterfactual : the factual set with flipped treatment assignment

Example – Blood pressure and age 𝑧 = 𝑐𝑚𝑝𝑝𝑒_𝑞𝑠𝑓𝑡. 𝑍 ) 𝑦 Treated 𝑍 ( 𝑦 𝑦 = 𝑏𝑕𝑓

Blood pressure and age 𝑧 = 𝑐𝑚𝑝𝑝𝑒_𝑞𝑠𝑓𝑡. 𝐽𝑈𝐹(𝑦) 𝑍 ) 𝑦 Treated 𝑍 ( 𝑦 𝑦 = 𝑏𝑕𝑓

Blood pressure and age 𝑧 = 𝑐𝑚𝑝𝑝𝑒_𝑞𝑠𝑓𝑡. 𝐵𝑈𝐹 𝑍 ) 𝑦 Treated 𝑍 ( 𝑦 𝑦 = 𝑏𝑕𝑓

Blood pressure and age 𝑧 = 𝑐𝑚𝑝𝑝𝑒_𝑞𝑠𝑓𝑡. 𝑍 ) 𝑦 Treated 𝑍 ( 𝑦 Treated Control 𝑦 = 𝑏𝑕𝑓

Blood pressure and age 𝑍 ) 𝑦 𝑧 = 𝑐𝑚𝑝𝑝𝑒_𝑞𝑠𝑓𝑡. Treated 𝑍 ( 𝑦 Treated Control 𝑦 = 𝑏𝑕𝑓 Counterfactual treated Counterfactual control

The fundamental problem of causal inference “The fundamental problem of causal inference” We only ever observe one of the two outcomes

“The Assumptions” – no unmeasured confounders 𝑍 ( , 𝑍 ) : potential outcomes for control and treated 𝑦 : unit covariates (features) T: treatment assignment We assume: (𝑍 ( ,𝑍 ) ) ⫫ 𝑈 | 𝑦 The potential outcomes are independent of treatment assignment, conditioned on covariates 𝑦

“The Assumptions” – no unmeasured confounders 𝑍 ( , 𝑍 ) : potential outcomes for control and treated 𝑦 : unit covariates (features) T: treatment assignment We assume: (𝑍 ( ,𝑍 ) ) ⫫ 𝑈 | 𝑦 Ignorability

Ignorability covariates 𝒚 𝑼 treatment (features) 𝒁 𝟏 𝒁 𝟐 Potential outcomes (𝑍 ( ,𝑍 ) ) ⫫ 𝑈 | 𝑦

Ignorability anti- hypertensive medication age, gender, 𝒚 𝑼 weight, diet, heart rate at rest,… 𝒁 𝟏 𝒁 𝟐 blood pressure blood pressure after after medication A medication B (𝑍 ( ,𝑍 ) ) ⫫ 𝑈 | 𝑦

No Ignorability anti- hypertensive medication age, gender, 𝒚 𝑼 weight, diet, diabetic heart rate at rest,… 𝒊 𝒁 𝟏 𝒁 𝟐 blood pressure blood pressure after after medication A medication B (𝑍 ( ,𝑍 ) ) ⫫ 𝑈 | 𝑦

“The Assumptions” – common support Y ( , 𝑍 ) : potential outcomes for control and treated 𝑦 : unit covariates (features) 𝑈 : treatment assignment We assume: 𝑞 𝑈 = 𝑢 𝑌 = 𝑦 > 0 ∀𝑢, 𝑦

Average Treatment Effect The expected causal effect of 𝑈 on 𝑍 : ATE := E [ Y 1 − Y 0 ]

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal - PowerPoint PPT Presentation

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides) *Last week: Type 2 diabetes 2013 1994 2000 <4.5% 4.5%5.9%

Regulation of AL / ML in the US 6.S897/HST.956: Machine Learning for Healthcare 6.S897/HST.956:

Machine Learning for Healthcare 6.S897, HST.S53 Lecture 1: What makes healthcare unique? Prof.

Machine Learning for Healthcare HST.956, 6.S897 Lecture 19: Disease progression modeling &

Machine Learning for Healthcare HST.956, 6.S897 Lecture 4: Risk stratification David Sontag

Machine Learning for Healthcare HST.956, 6.S897 Lecture 24: Robustness to dataset shift David

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal Inference Part 2 David Sontag

Reinforcement learning Fredrik D. Johansson Clinical ML @ MIT 6.S897/HST.956: Machine Learning

IOGP/IADC BOP Reliability Database Mark Siegmund Chairman: RAPID S53 Oversight Committee IADC

Creating Innovations that Matter Deep Learning for Medical Imaging Christine Swisher, PhD Guest

Machine Learning for Healthcare 6.871, HST.956 Lecture 5: Learning with noisy or censored labels

Machine Learning for Healthcare 6.871, HST.956 Lecture 14: Causal Inference Part 1 David Sontag

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Intern Survival Series Lecture #6 Most Common Medical Diagnosis: Pneumonia and CHF Shaping the

Improved Health David M. Cutler Harvard University dcutler@harvard.edu What matters for health?

Critical Issues on Full- - Critical Issues on Full Length Articles Length Articles Objectives

EVALUATING AND REDUCING THE RISK AND COST OF MEDICATION ERRORS RELATED TO INCOMPLETE MEDICATION

OHS Quality Council Meeting July 22, 2020 Agenda Welcome and Introductions - 10 minutes

LOW DIASTOLIC BP & Associate Professor Internal-Geriatric Medicine CARDIOVASCULAR EVENTS

Addressing the Mortality Gap in Individuals With Serious Mental Illness IPS Washington, DC

Ir Irish Network Group Dr Eamon Dolan Stroke and Hypertension Unit Connolly hospital Dublin

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal - PowerPoint PPT Presentation

MACHINE LEARNING FOR HEALTHCARE 6.S897, HST.S53 Lecture 3: Causal inference Prof. David Sontag MIT EECS, CSAIL, IMES (Thanks to Uri Shalit for many of the slides) *Last week: Type 2 diabetes 2013 1994 2000 <4.5% 4.5%5.9%

Regulation of AL / ML in the US 6.S897/HST.956: Machine Learning for Healthcare 6.S897/HST.956:

Machine Learning for Healthcare 6.S897, HST.S53 Lecture 1: What makes healthcare unique? Prof.

Machine Learning for Healthcare HST.956, 6.S897 Lecture 19: Disease progression modeling &amp;

Machine Learning for Healthcare HST.956, 6.S897 Lecture 4: Risk stratification David Sontag

Machine Learning for Healthcare HST.956, 6.S897 Lecture 24: Robustness to dataset shift David

Machine Learning for Healthcare HST.956, 6.S897 Lecture 15: Causal Inference Part 2 David Sontag

Reinforcement learning Fredrik D. Johansson Clinical ML @ MIT 6.S897/HST.956: Machine Learning

IOGP/IADC BOP Reliability Database Mark Siegmund Chairman: RAPID S53 Oversight Committee IADC

Creating Innovations that Matter Deep Learning for Medical Imaging Christine Swisher, PhD Guest

Machine Learning for Healthcare 6.871, HST.956 Lecture 5: Learning with noisy or censored labels

Machine Learning for Healthcare 6.871, HST.956 Lecture 14: Causal Inference Part 1 David Sontag

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Intern Survival Series Lecture #6 Most Common Medical Diagnosis: Pneumonia and CHF Shaping the

Improved Health David M. Cutler Harvard University dcutler@harvard.edu What matters for health?

Critical Issues on Full- - Critical Issues on Full Length Articles Length Articles Objectives

EVALUATING AND REDUCING THE RISK AND COST OF MEDICATION ERRORS RELATED TO INCOMPLETE MEDICATION

OHS Quality Council Meeting July 22, 2020 Agenda Welcome and Introductions - 10 minutes

LOW DIASTOLIC BP &amp; Associate Professor Internal-Geriatric Medicine CARDIOVASCULAR EVENTS

Addressing the Mortality Gap in Individuals With Serious Mental Illness IPS Washington, DC

Ir Irish Network Group Dr Eamon Dolan Stroke and Hypertension Unit Connolly hospital Dublin

Machine Learning for Healthcare HST.956, 6.S897 Lecture 19: Disease progression modeling &

LOW DIASTOLIC BP & Associate Professor Internal-Geriatric Medicine CARDIOVASCULAR EVENTS