Transforming Medicine and Healthcare through Machine Learning and AI - - PowerPoint PPT Presentation

transforming medicine and healthcare
SMART_READER_LITE
LIVE PREVIEW

Transforming Medicine and Healthcare through Machine Learning and AI - - PowerPoint PPT Presentation

Transforming Medicine and Healthcare through Machine Learning and AI Mihaela van der Schaar John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine University of Cambridge Alan Turing Institute ML-AIM Group


slide-1
SLIDE 1

Transforming Medicine and Healthcare through Machine Learning and AI

Mihaela van der Schaar

John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine

University of Cambridge Alan Turing Institute

slide-2
SLIDE 2

ML-AIM Group aims to transform medicine and healthcare by developing new methods in Machine Learning & Artificial Intelligence

slide-3
SLIDE 3

The 5 Challenges of Personalized Medicine and Healthcare

  • 1. Lifestyle optimization and disease prevention
  • 2. Disease detection and prediction of disease progression (longitudinal)
  • 3. Best interventions and treatments
  • 4. State-of-the-art tools for clinicians & healthcare professionals to

deliver high-quality care

  • 5. Optimization of healthcare systems (quality, efficiency, cost

effectiveness, robustness, scalability)

slide-4
SLIDE 4

Why ML-AIM can solve these challenges?

Unique expertise Developing and combing new methods in

  • Machine Learning and Artificial Intelligence
  • Applied Mathematics and Statistics
  • Operations Research
  • Engineering, incl. distributed computing

Working with numerous clinical and medical collaborators to make an impact on medicine and healthcare

slide-5
SLIDE 5

ML ML-AIM AIM group up: : htt ttp://www p://www.v .vande anderscha haar ar-lab.com .com

5

slide-6
SLIDE 6
slide-7
SLIDE 7

https://www.youtube.com/watch?v=TWI-WIoWvfk

slide-8
SLIDE 8

8

Cardiovascular disease

  • Risk of CVD events
  • Mortality risk after

heart-failure

  • Mortality risk –

Cardiac transplantation Hospital care Cancer: Breast, Prostate, Colon Asthma Alzheimer’s disease Cystic Fibrosis

Part 1: Automate the process of designing Clinical Predictive Analytics at Scale

slide-9
SLIDE 9

+ High predicti

ctive accur urac acy y (for

  • r some

me diseases) ases) + Data-driv driven en, , few assum sumptions ptions

  • Many ML algorithms: Which one to choose?
  • Many hyper-parameters: Need expertise in data science
  • Can we predict in advance which method is best?
  • Can we do better than any individual method?
  • Many metrics of performance (AUROC, AUPRC, C-index,

quality of well-being)

Ma Machine hine Le Learni arning ng in in Cl Clin inic ical al Resear search

AUROC MAGGIC UK Biobank UNOS-I UNOS-II Best ML algorithm

0.80 ± 0.004 0.76 ± 0.002 0.78 ± 0.002 0.65 ± 0.001

NN GradientBoost ToPs ToPs

Best Clinical Score

0.70 ± 0.007 0.70 ± 0.003 0.62 ± 0.001 0.56 ± 0.001

Cox PH

0.75 ± 0.005 0.74 ± 0.002 0.70 ± 0.001 0.59 ± 0.001

slide-10
SLIDE 10

Aut utoPr

  • Prognosi
  • gnosis [Alaa

aa & & vdS dS, , ICML ML 2018]: ]: A A tool l for crafti fting ng Clinica linical l Scor

  • res

es

10

Pipel elin ine e configu guratio tion

slide-11
SLIDE 11

Aut utoma

  • mated

ted ML ML for

  • r cli

lini nical cal ana naly lytics tics (beyond

  • nd predict

edictions) ions)

11 Prediction

Survival Models Competing Risks Temporal Models Causal Models

Lee, Alaa, Zame, vdS, AISTATS 2019 Alaa, vdS, NIPS 2017 Bellot, vdS, AISTATS 2018 In submission Alaa, vdS, ICML 2019 ICML 2018 Scientific Reports Plos One

slide-12
SLIDE 12

Aut utoPr

  • Prognosis
  • gnosis:

: Ex Exemplary emplary tec echno hnology logy in in Top

  • pol
  • l Revie

iew

12

Disease areas: Cystic Fibrosis, Cardiovascular Disease, Breast cancer, Prostate cancer etc.

slide-13
SLIDE 13

Not only black-bo box pred edictions ctions, al also inte terpr preta etations tions

13

Essential for trustworthiness, transparency etc.Transparency

Black-box model Predictions (confidence) + Explanations

INVASE: Instance-wise Variable Selection using Deep Learning

[Yoon, Jordon, vdS, ICLR 2019]

Metamodeling [Alaa, vdS, 2019] Clinician-AI interaction using Reinforcement Learning [Lahav, vdS, NeurIPS

workshop 2018]

slide-14
SLIDE 14

From m blac ack-bo box x mo models els to to wh white te-bo box x funct ction

  • ns

14

A symbolic bolic metamode amodel takes es as an input ut a trained ained machine hine learni rning ng model el and d outputs puts a transpar ansparent ent equa uation tion describing scribing the model del’s s prediction ediction su surface ce

Black-bo box

Inter terpr preta etabil bility ity using g symboli lic c met etamod amodeling eling [A. Alaa & vdS, NeurIPS 2019]

slide-15
SLIDE 15

Part 2: From Individualized Predictions to Individualized Treatment Effects

slide-16
SLIDE 16

16

Bob

Which treatment is best for Bob?

Diagnosed with Disease X

Problem: Estimate the effect of a treatment/intervention on an individual

Individualized Treatment Recommendations

slide-17
SLIDE 17

RCT CTs do not not suppor port t Per ersonaliz alized ed Med edicin cine

17

Rando domiz mized ed Contr trol

  • l Trials:

als: Average e Trea eatme tment nt Effec ects ts Non-repr presenta sentativ tive e patients ients Small ll sample le sizes es Time e consum suming ing Enor

  • rmous

mous costs sts Popula ulation tion-le level el Adaptiv ptive e Clinical nical Trials als [Atan tan, , Zame, , vdS, AIST STATS TS 2019 19] [Shen, en, van der Schaa haar, , 2019] 19]

slide-18
SLIDE 18

De Deliver ering ing Per ersonaliz alized ed (I (Indiv dividuali dualized) ed) Trea eatments tments

18

Randomiz domized ed Contr trol

  • l Trials:

als: Average e Trea eatment tment Effect ects Non-repr presenta sentativ tive e patients ients Small ll sample le sizes es Time e consum suming ing Enor

  • rmous

mous costs sts Machine hine Lear arning: ing: Individuali dividualized ed Treatment ment Ef Effects ects Popula ulation tion-le level el Patient ient-centric centric Real al-wor

  • rld

ld obser ervationa tional l data ta Scala lable le & a adaptiv ptive e implement lementation tion Fast t deplo loyme ment nt Cost-ef effec ectiv tive

[Atan, vdS, 2015, 2018] [Alaa, vdS, 2017, 2018, 2019] [Yoon, Jordon, vdS, 2017] [Lim, Alaa, vdS, 2018] [Bica, Alaa, vdS, 2019]

slide-19
SLIDE 19

Pote tential tial outcomes tcomes fram amewor

  • rk

k [Neyman, 1923]

19

Each patient has features Observational data

Factual outcomes Causal effects

Two potential outcomes Treatment assignment

slide-20
SLIDE 20

Assumptions ptions

20

No unmeasured confounders (Ignorability) Common support

Obser erved ed Hidde den

Our work on hidden confounders [Lee, Mastronarde, van der Schaar, 2018] [Bica, Alaa, van der Schaar, 2019]

slide-21
SLIDE 21

The e lea earning ing problem lem

21

Response surfaces Causal effects Observational data

slide-22
SLIDE 22

Training examples

Beyond nd super ervised ed lea earning ing…

22

“The fundamental problem of causal inference” is that we never observe counterfactual outcomes

Ground-truth causal effects

. . . . . .

slide-23
SLIDE 23

23

1- Need to model interventions 2- Selection bias → covariate shift: training distribution ≠ testing distribution

Training distribution Testing distribution

Ca Causal al mode deling ing ≠ pred edicti ictive e mode deli ling ng

slide-24
SLIDE 24

Previou

  • us works on tr

treatmen tment t effects cts

24

Bayesian Additive Regression Trees (BART) [Chipman et. al, 2010], [J. Hill, 2011] Causal Forests [Wager & Athey, 2016] Nearest Neighbor Matching (kNN) [Crump et al., 2008] Balancing Neural Networks [Johansson, Shalit and Sontag, 2016] Causal MARS [Powers, Qian, Jung, Schuler, N. Shah, T. Hastie, R. Tibshirani, 2017 ] Targeted Maximum Likelihood Estimator (TMLE) [Gruber & van der Laan, 2011] Counterfactual regression [Johansson, Shalit and Sontag, 2016] CMGP [Alaa & van der Schaar, 2017]

No th theo eory ry, ad ad-hoc hoc mode dels ls

slide-25
SLIDE 25

A A first t th theo eory ry for ca causal al infer eren ence ce - indivi ividuali dualized ed tr trea eatm tment ent ef effec ects ts

25

Algori

  • rithms

thms Theor

  • ry

What t is possi sible? le? How can it be achie ieved ed?

(Fundamental limits) (Practical implementation)

[Alaa, van der Schaar, JSTSP 2017][ICML 2018]

slide-26
SLIDE 26

: estimated causal effect Precision in estimating heterogeneous effects (PEHE) [Hill, 2011]

Fundam ndamental ental limits ts

26

Minimax loss = information-theoretic quantity, independent of the model. Minimax estimation loss:

Best estimate Most “difficult” response surfaces

slide-27
SLIDE 27

Theo eoretical etical Found undations tions

27

Theorem

  • rem [Alaa

laa & va van der r Sc Schaar, aar, JST STSP SP 2017 17]

has relevant dimensions in a Hölder space has relevant dimensions in a Hölder space If , then

slide-28
SLIDE 28

Ch Char arac acte terizing izing res espons

  • nse

e surface aces

28

We prove that the minimax estimation loss: Depends on the complexity of and has relevant dimensions in a Hölder space has relevant dimensions in a Hölder space

Sparsity Smoothness

slide-29
SLIDE 29

29

We want models that do well for small and large samples Small sample regime Large sample regime

Handling selection bias Sharing training data between response surfaces ML model and hyperparameter tuning

Theo eory ry – wh what t have e we le e lear arned? ed?

slide-30
SLIDE 30

Multi ti-tas task k Gau aussian an Pr Proce cesses es [Alaa & van der Schaar, NIPS 2017]

30 30

Prior on vvRKHS = Multi-task Gaussian Process

Matern kernel = Prior over

Posterior potential outcomes distribution Posterior ITE distribution

Individualized uncertainty measure

slide-31
SLIDE 31

Multip tiple le Trea eatments tments: : GANITE ITE [Y [Yoon

  • on,

, Jordon, don, vdS, ICL CLR R 2018 2018]

slide-32
SLIDE 32

But t how w can we know

  • w how
  • w to select

ct a m model? l?

Supervised learning → cross-validation!

Training Testing

Precision in estimating heterogeneous effects (PEHE) [Hill, 2011]

slide-33
SLIDE 33

Testing set

Valid idati ting ng causa sal l infer eren ence ce models ls

33

True causal effects

. . . . . .

No explicit label: cannot apply supervised cross-validation.

Goal: developing a similar procedure for causal inference

Solution: Alaa and van der Schaar, ICML 2019

slide-34
SLIDE 34

A perform

  • rmanc

ance e metric ic is a stati tist stic ical al func ncti tional

  • nal

34

A functional is a function of a function. A statistical functional is a function of a distribution.

PEHE

Statistical functional Empirical measure

slide-35
SLIDE 35

Taylo lor r series es approxima imati tion

  • n

35

The value of a function at a given input can be predicted using its value and (higher-order derivatives) at a proximal input.

slide-36
SLIDE 36

Analogy logy with h Taylor ylor series ies approxima ximation ion

36

The performance of a causal inference model is a functional

  • f the data-generating distribution .

Functional = a function of a function.

Synthetic distribution with known counterfactuals True distribution

slide-37
SLIDE 37

Functi ctional

  • nal calcul

culus us: : von-Mise ises s expa pans nsion ion (VME) E)

37

A distributional analog of Taylor expansion [Fernholz, 1983] Influence functions ↔ Derivatives

We can predict the performance of a causal inference model using the influence functions of its loss on a “similar” synthetic dataset.

slide-38
SLIDE 38

Estima imating ting a model el’s perf rformance

  • rmance

38

First-order “Taylor approximation”

Synthetic (Plug-in) distribution True distribution Influence function

Inaccessible empirical measure Accessible empirical measure

Influence function

slide-39
SLIDE 39

Estima imating ting a model el’s perf rformance

  • rmance

39

No need to simulate an entire observational dataset: just synthesize counterfactuals!

Step p 1: Plug ug-in in estima imation tion Step 2: Bias correctio ction

  • Plug-in model
  • Plug-in PEHE loss
slide-40
SLIDE 40

40

Th Theore

  • rem

is in a Hölder space is in a Hölder space If plug-in model is minimax optimal:

Training Testing

Let be an IF-based estimator using truncated m-term VME.

Plug-in model

When enough number of VME included: 𝑜 - consistent!

Consi nsisten stency y and efficie ienc ncy

Model

slide-41
SLIDE 41

Automa

  • matin

ting g causal sal infer erence! ence!

41

Selecting the right model for the right observational study. Collection of all models published in ICML, NeurIPS and ICLR between 2016 and 2018.

BNN

ICML 2016

CMGP

NIPS 2017

TARNet

ICML 2017

CFR Wass.

ICML 2017

CFR MMD

ICML 2017

NSGP

ICML 2018

GAN-ITE

ICLR 2018

SITE

NIPS 2018

BART Causal Forest

slide-42
SLIDE 42

Results ults

42

Average performance on the 77 benchmark datasets.

Method % Winner BNN 3% CMGP 12% NSGP 17% TARNet 8% CFR Wass. 9% CFR MMD 12% GAN-ITE 7% SITE 7% BART 15%

  • C. Forest

7% Random 10% Factual 53% IF-based 72% Supervised 84%

No absolute winner on all datasets. IF-based selection is better than any single model. Factual selection is vulnerable to selection bias.

slide-43
SLIDE 43

Mac achine ne Lea earning rning an and Cl Clinical cal Trial als

43

Rando domiz ized ed Contr trol l Trials als

Research Question Designing a clinical Trial Patient recruitment Conducting the trial Disseminating Results Clinical practice

Time Consuming + Enormous Costs + Small sample Sizes + Population-level conclusions Patient-centric, cheap, big data, quick

Machin ine e Learn rning ing can Transf nsfor

  • rm RCTs

Recom

  • mmen

ender der system ems s for individualized treatment planning. Design gning ng clinical cal trials s for new drugs using data for similar drugs.

Post st-hoc hoc subgroup up analysi sis s for previously conducted clinical trials.

slide-44
SLIDE 44

Machine hine Lear arning ning & Medicine: dicine: Vision

  • n

44

Link nked ed EHR Data Clin inic ical al Practice tice Clin inic ical al Research Pharm rma OMICS CS

Obse servational Da Data Da Data-induced Genetic as assoc sociations Da Data-induced Cau Causal Disc Discovery Mac achine Lea Learning

Augm gmente ented MD MD

slide-45
SLIDE 45

Details about our software: http://www.vanderschaar-lab.com Details about our algorithms: http://www.vanderschaar-lab.com