Reliable Decision Support using Counterfactual Models Suchi Saria - PowerPoint PPT Presentation

Reliable Decision Support   using Counterfactual Models Suchi Saria Assistant Professor Computer Science, Applied Math & Stats and Health Policy Institute for Computational Medicine w/ Peter Schulam, PhD candidate

Example: Customer Churn ! Cancels Account | P

Example: Customer Churn ! , Supervised   Learning ! ˆ , P ! , ! ,

Example: Customer Churn ! , Supervised   Learning ! ˆ , P ! , ! Supervised ML models can be biased   , for decision-making problems!

Why? ! Ad emails, , , discounts, etc. ! Ad emails, discounts, etc. , , Past actions determined by some policy.

Why? ! Ad emails, , , discounts, etc. ! Ad emails, discounts, etc. , , Actions determined by a policy ˆ based on your learned model P

Why? ! Cancels Account | P π train , 6 = ! Cancels Account | π test ( ˆ P P ) , Supervised ML leads to models that are unstable to   shifts in the policy between the train and test

Example: Risk Monitoring Adverse Event Onset Is the patient at risk of a septic shock?

    • Rise in Temperature and Rise in WBC are indicators of sepsis and death • But, doctors in H1 aggressively treat patients with high temperature • As doctors treat treat more aggressively, supervised learning model learns high temperature is associated with low risk . Dyagilev and Saria, Machine Learning 2015

Treat based on   Treat based on   temp WBC Increasing discrepancy in physician prescription behavior in train vs. test environment Predictive model trained using classical supervised ML creates   unsafe scenarios where sick patients are overlooked. Dyagilev and Saria, Machine Learning 2015

Run an experiment:   observe outcome under diff scenarios • Clone the customer; give a 10% and 20% discount code to each clone • Choose the outcome that has the better outcome { } Y ( d 10 ) Y ( d 20 ) , Outcome under 10% discount.

Run an experiment:   observe outcome under diff scenarios • Clone the customer; give a 10% and 20% discount code to each clone • Choose the outcome that has the better outcome { } Y ( d 10 ) Y ( d 20 ) , Outcome under 20% discount.

  Can we learn models of these outcomes from observational data? • Factual: outcome observed in the data   vs. • Counterfactual: outcome is unobserved { } Y ( d 10 ) Y ( d 20 ) ,

Potential Outcomes Set of actions Random variable { Y ( a ) : a ∈ A} Action Potential outcomes model the observed outcome under each possible action (or intervention) Rubin, 1974 Neyman et al., 1923 Rubin, 2005

Sequential Decisions in   Continuous-Time 120 ● 100 Lung Capacity ● ● ● ● ● ● ● ● PFVC 80 ● 60 40 0 5 10 15 Years Since First Symptom

Counterfactual GP 120 ● 100 Lung Capacity ● ● ● ● ● ● ● ● PFVC ? 80 ● 60 40 0 5 10 15 Years Since First Symptom

Counterfactual GP 120 ● 100 Lung Capacity ● ● ● ● ● ● ● ● PFVC 80 ● E [ Y ( ) | H = h ] 60 40 0 5 10 15 Years Since First Symptom

Counterfactual GP 120 ● 100 Lung Capacity ● ● ● ● E [ Y ( ) | H = h ] ● ● ● ● PFVC 80 ● E [ Y ( ) | H = h ] 60 40 0 5 10 15 Years Since First Symptom

Counterfactual GP 120 ● E [ Y ( ) | H = h ] 100 Lung Capacity ● ● ● ● E [ Y ( ) | H = h ] ● ● ● ● PFVC 80 ● E [ Y ( ) | H = h ] 60 40 0 5 10 15 Years Since First Symptom

Related Work • Counterfactual models: See Schulam and Saria, NIPS 2017 for discussion of related work.   Schulam Saria, 2017 ads; single intervention Brodersen et al., 2015 Bottou et al., 2013 epidemiology; multiple sequential   Taubman et al.,2009 interventions sparse, irregularly sampled   Xu, Xu, Saria, 2016 longitudinal data; functional outcomes Lok et al., 2008 • Off-policy evaluation: Re-weighting to evaluate reward   for a policy when learning from offline data. e.g. Dudik et al., 2011 Jiang and Li, 2016 Paduraru et al. 2013

Critical Assumptions • To learn the potential outcome models, we will use three important assumptions: • (1) Consistency • Links observed outcomes to potential outcomes • (2) Treatment Positivity • Ensures that we can learn potential outcome models • (3) No unmeasured confounders (NUC) • Ensures that we do not learn biased models Rubin, 1974 Neyman et al., 1923 Rubin, 2005

(1) Consistency • Consider a dataset containing observed outcomes, observed treatments, and covariates: { y i , a i , x i } n i =1 • E.g.: blood pressure, exercise, BMI • Consistency allows us to replace the observed response with the potential outcome of the observed treatment Y , Y ( a ) | A = a • Under consistency our dataset satisfies { y i , a i , x i } n i =1 , { y i ( a i ) , a i , x i } n i =1

(2) Positivity • When working with observational data, for any set of covariates we need to assume a non-zero x probability of seeing each treatment • Otherwise, in general, cannot learn a conditional model of the potential outcomes given those covariates • Formally, we assume that P Obs ( A = a | X = x ) > 0 ∀ a ∈ A , ∀ x ∈ X

(3) No Unmeasured Confounders (NUC) • Formally, NUC is an statistical independence assertion: Y ( a ) ⊥ A | X = x : ∀ a ∈ A , ∀ x ∈ X

(3) No Unmeasured Confounders (NUC) • Formally, NUC is an statistical independence assertion: Y ( a ) ⊥ A | X = x : ∀ a ∈ A , ∀ x ∈ X Exerc Exerc y BP y BP x BMI x BMI Exerc y BP x BMI

Learning Potential Outcome Models • Assumptions allow estimation of potential outcomes from (observational) data: (A3) P( Y ( a ) | X = x ) = P( Y ( a ) | X = x , A = a ) = P( Y | X = x , A = a ) (A1) Estimation requires a statistical model for estimating conditionals • To simulate data from a new policy, we need to learn the potential outcome models • If we have an observational dataset where assumptions 1-3 hold, then this is possible! UAI Tutorial: Saria and Soleimani, 2017

Observational Traces Creatinine is a test used to measure kidney function. Timing between   measurements is   irregular and random

Observational Traces And so are times   between treatments

Challenges w/ Observational Traces In the discrete-time setting,   we did not treat the timing of events as random

Counterfactual GP • Collection of Gaussian processes n o { Y t ( a ) : t ∈ [0 , τ ] } : a ∈ C Fixed time period Set of finite sequences of   actions

Learning from Observational Traces tss pfvc pdlco rvsp ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● Marker Value ● Medication ● ● ● Prednisone 50 ● ● Methotrex ● ● ● ● ● Cyclophosphamide Cytoxan ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0 ● 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 Years Since Diagnosis

Learning from Observational Traces tss pfvc pdlco rvsp ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● Marker Value ● Medication ● ● ● Prednisone 50 ● ● Methotrex ● ● ● ● ● Cyclophosphamide Cytoxan ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0 ● 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 Years Since Diagnosis Treatments administered according to unknown policy (i.e. not an RCT)

Reliable Decision Support using Counterfactual Models Suchi Saria - PowerPoint PPT Presentation

Reliable Decision Support using Counterfactual Models Suchi Saria Assistant Professor Computer Science, Applied Math & Stats and Health Policy Institute for Computational Medicine w/ Peter Schulam, PhD candidate Example: Customer Churn !

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Using Decision Support Tools to Using Decision Support Tools to Integrate Land Use, Conservation,

A Model-Invariant Tieory of Singular Causation J. Dmitri Gallow Counterfactual Causal Models 2.

Counterfactual evaluation of machine learning models Michael Manapat @mlmanapat O N L I N E F

Reliable Power Reliable Markets AESO Rule Consultation Loss Factors Rule 9.2 and Appendix 7

RELIABILITY RELIABILITY and and RELIABLE DESIGN RELIABLE DESIGN Giovanni De Micheli Micheli

Advanced Hidden Markov Models The Baum-Welch Algorithm Biostatistics 615/815 Lecture 23: . . .

H -Passive Linear Discrete Time Invariant State/Signal Systems Damir Arov Olof Staffans

MCMC for Continuous-Time Discrete-State Systems Vinayak Rao and Yee Whye Teh Gatsby

Adaptive Designs Mark van der Laan Division of Biostatistics, UC Berkeley September 28 , 2018

Implementing a Continuous Audit Framework Presented to IIA@Noon January 2016 Ziad

Foundations of Modelling and Simulation Hans Vangheluwe Modelling, Simulation and Design Lab

ELG3 1 2 5 Signal and System Analysis Lab5: Fourier series: Synthesis of signals TA: Jungang Liu

o TiMA Continuous Time Meta-Analysis Christian Dormann University of Mainz, Germany &

Reliable Decision Support using Counterfactual Models Suchi Saria - PowerPoint PPT Presentation

Reliable Decision Support using Counterfactual Models Suchi Saria Assistant Professor Computer Science, Applied Math & Stats and Health Policy Institute for Computational Medicine w/ Peter Schulam, PhD candidate Example: Customer Churn !

Counterfactual Donkeys and the Modal Horizon Andreas Walker and Maribel Romero Counterfactual

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

Counterfactual-based mediation analysis Workshop 1 Rhian Daniel London School of Hygiene and

Counterfactual-based mediation analysis Workshop 2 Rhian Daniel London School of Hygiene and

Counterfactual Visual Explanations Yash Goyal Ziyan Wu Jan Ernst Dhruv Batra Devi Parikh

Counterfactual Policy Evaluation in Reproducing Kernel Hilbert Spaces Krikamol Muandet Max

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Prior Work Consensus Consensus Reliable BGP Consensus Reliable BGP Consensus Routing

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

6 Decision- -Making Making MVC (revisited) 6 Decision MVC (revisited) decision

Using Decision Support Tools to Using Decision Support Tools to Integrate Land Use, Conservation,

A Model-Invariant Tieory of Singular Causation J. Dmitri Gallow Counterfactual Causal Models 2.

Counterfactual evaluation of machine learning models Michael Manapat @mlmanapat O N L I N E F

Reliable Power Reliable Markets AESO Rule Consultation Loss Factors Rule 9.2 and Appendix 7

RELIABILITY RELIABILITY and and RELIABLE DESIGN RELIABLE DESIGN Giovanni De Micheli Micheli

Advanced Hidden Markov Models The Baum-Welch Algorithm Biostatistics 615/815 Lecture 23: . . .

H -Passive Linear Discrete Time Invariant State/Signal Systems Damir Arov Olof Staffans

MCMC for Continuous-Time Discrete-State Systems Vinayak Rao and Yee Whye Teh Gatsby

Adaptive Designs Mark van der Laan Division of Biostatistics, UC Berkeley September 28 , 2018

Implementing a Continuous Audit Framework Presented to IIA@Noon January 2016 Ziad

Foundations of Modelling and Simulation Hans Vangheluwe Modelling, Simulation and Design Lab

ELG3 1 2 5 Signal and System Analysis Lab5: Fourier series: Synthesis of signals TA: Jungang Liu

o TiMA Continuous Time Meta-Analysis Christian Dormann University of Mainz, Germany &amp;

o TiMA Continuous Time Meta-Analysis Christian Dormann University of Mainz, Germany &