Transforming Medicine and Healthcare through Machine Learning and AI
Mihaela van der Schaar
John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine
Transforming Medicine and Healthcare through Machine Learning and AI - - PowerPoint PPT Presentation
Transforming Medicine and Healthcare through Machine Learning and AI Mihaela van der Schaar John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine University of Cambridge Alan Turing Institute ML-AIM Group
John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine
5
8
Cardiovascular disease
heart-failure
Cardiac transplantation Hospital care Cancer: Breast, Prostate, Colon Asthma Alzheimer’s disease Cystic Fibrosis
ctive accur urac acy y (for
me diseases) ases) + Data-driv driven en, , few assum sumptions ptions
quality of well-being)
AUROC MAGGIC UK Biobank UNOS-I UNOS-II Best ML algorithm
0.80 ± 0.004 0.76 ± 0.002 0.78 ± 0.002 0.65 ± 0.001
NN GradientBoost ToPs ToPs
Best Clinical Score
0.70 ± 0.007 0.70 ± 0.003 0.62 ± 0.001 0.56 ± 0.001
Cox PH
0.75 ± 0.005 0.74 ± 0.002 0.70 ± 0.001 0.59 ± 0.001
10
Pipel elin ine e configu guratio tion
11 Prediction
Survival Models Competing Risks Temporal Models Causal Models
Lee, Alaa, Zame, vdS, AISTATS 2019 Alaa, vdS, NIPS 2017 Bellot, vdS, AISTATS 2018 In submission Alaa, vdS, ICML 2019 ICML 2018 Scientific Reports Plos One
12
13
Essential for trustworthiness, transparency etc.Transparency
Black-box model Predictions (confidence) + Explanations
[Yoon, Jordon, vdS, ICLR 2019]
workshop 2018]
14
Black-bo box
16
Bob
Diagnosed with Disease X
17
18
[Atan, vdS, 2015, 2018] [Alaa, vdS, 2017, 2018, 2019] [Yoon, Jordon, vdS, 2017] [Lim, Alaa, vdS, 2018] [Bica, Alaa, vdS, 2019]
19
Each patient has features Observational data
Factual outcomes Causal effects
Two potential outcomes Treatment assignment
20
Obser erved ed Hidde den
21
Training examples
22
Ground-truth causal effects
23
1- Need to model interventions 2- Selection bias → covariate shift: training distribution ≠ testing distribution
Training distribution Testing distribution
24
Bayesian Additive Regression Trees (BART) [Chipman et. al, 2010], [J. Hill, 2011] Causal Forests [Wager & Athey, 2016] Nearest Neighbor Matching (kNN) [Crump et al., 2008] Balancing Neural Networks [Johansson, Shalit and Sontag, 2016] Causal MARS [Powers, Qian, Jung, Schuler, N. Shah, T. Hastie, R. Tibshirani, 2017 ] Targeted Maximum Likelihood Estimator (TMLE) [Gruber & van der Laan, 2011] Counterfactual regression [Johansson, Shalit and Sontag, 2016] CMGP [Alaa & van der Schaar, 2017]
25
What t is possi sible? le? How can it be achie ieved ed?
(Fundamental limits) (Practical implementation)
[Alaa, van der Schaar, JSTSP 2017][ICML 2018]
: estimated causal effect Precision in estimating heterogeneous effects (PEHE) [Hill, 2011]
26
Minimax loss = information-theoretic quantity, independent of the model. Minimax estimation loss:
Best estimate Most “difficult” response surfaces
27
has relevant dimensions in a Hölder space has relevant dimensions in a Hölder space If , then
28
We prove that the minimax estimation loss: Depends on the complexity of and has relevant dimensions in a Hölder space has relevant dimensions in a Hölder space
Sparsity Smoothness
29
Handling selection bias Sharing training data between response surfaces ML model and hyperparameter tuning
30 30
Prior on vvRKHS = Multi-task Gaussian Process
Matern kernel = Prior over
Posterior potential outcomes distribution Posterior ITE distribution
Individualized uncertainty measure
But t how w can we know
ct a m model? l?
Supervised learning → cross-validation!
Training Testing
Precision in estimating heterogeneous effects (PEHE) [Hill, 2011]
Testing set
Valid idati ting ng causa sal l infer eren ence ce models ls
33
True causal effects
No explicit label: cannot apply supervised cross-validation.
Goal: developing a similar procedure for causal inference
A perform
ance e metric ic is a stati tist stic ical al func ncti tional
34
A functional is a function of a function. A statistical functional is a function of a distribution.
Statistical functional Empirical measure
Taylo lor r series es approxima imati tion
35
The value of a function at a given input can be predicted using its value and (higher-order derivatives) at a proximal input.
Analogy logy with h Taylor ylor series ies approxima ximation ion
36
The performance of a causal inference model is a functional
Functional = a function of a function.
Synthetic distribution with known counterfactuals True distribution
Functi ctional
culus us: : von-Mise ises s expa pans nsion ion (VME) E)
37
A distributional analog of Taylor expansion [Fernholz, 1983] Influence functions ↔ Derivatives
We can predict the performance of a causal inference model using the influence functions of its loss on a “similar” synthetic dataset.
Estima imating ting a model el’s perf rformance
38
First-order “Taylor approximation”
Synthetic (Plug-in) distribution True distribution Influence function
Inaccessible empirical measure Accessible empirical measure
Influence function
Estima imating ting a model el’s perf rformance
39
No need to simulate an entire observational dataset: just synthesize counterfactuals!
Step p 1: Plug ug-in in estima imation tion Step 2: Bias correctio ction
40
Th Theore
is in a Hölder space is in a Hölder space If plug-in model is minimax optimal:
Training Testing
Let be an IF-based estimator using truncated m-term VME.
Plug-in model
When enough number of VME included: 𝑜 - consistent!
Consi nsisten stency y and efficie ienc ncy
Model
Automa
ting g causal sal infer erence! ence!
41
Selecting the right model for the right observational study. Collection of all models published in ICML, NeurIPS and ICLR between 2016 and 2018.
BNN
ICML 2016
CMGP
NIPS 2017
TARNet
ICML 2017
CFR Wass.
ICML 2017
CFR MMD
ICML 2017
NSGP
ICML 2018
GAN-ITE
ICLR 2018
SITE
NIPS 2018
BART Causal Forest
42
Average performance on the 77 benchmark datasets.
Method % Winner BNN 3% CMGP 12% NSGP 17% TARNet 8% CFR Wass. 9% CFR MMD 12% GAN-ITE 7% SITE 7% BART 15%
7% Random 10% Factual 53% IF-based 72% Supervised 84%
No absolute winner on all datasets. IF-based selection is better than any single model. Factual selection is vulnerable to selection bias.
43
Rando domiz ized ed Contr trol l Trials als
Research Question Designing a clinical Trial Patient recruitment Conducting the trial Disseminating Results Clinical practice
Time Consuming + Enormous Costs + Small sample Sizes + Population-level conclusions Patient-centric, cheap, big data, quick
Machin ine e Learn rning ing can Transf nsfor
Recom
ender der system ems s for individualized treatment planning. Design gning ng clinical cal trials s for new drugs using data for similar drugs.
Post st-hoc hoc subgroup up analysi sis s for previously conducted clinical trials.
44
Link nked ed EHR Data Clin inic ical al Practice tice Clin inic ical al Research Pharm rma OMICS CS
Obse servational Da Data Da Data-induced Genetic as assoc sociations Da Data-induced Cau Causal Disc Discovery Mac achine Lea Learning
Augm gmente ented MD MD