Learning Learning E Eng ngines ines for Netw or Networ orks, - - PowerPoint PPT Presentation

learning learning e eng ngines ines for netw or networ
SMART_READER_LITE
LIVE PREVIEW

Learning Learning E Eng ngines ines for Netw or Networ orks, - - PowerPoint PPT Presentation

Learning Learning E Eng ngines ines for Netw or Networ orks, Healthcar Healthcare and e and Be Beyond ond Mihael haela v a van an der er Schaar haar John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and


slide-1
SLIDE 1

Learning Learning E Eng ngines ines for Netw

  • r Networ
  • rks,

Healthcar Healthcare and e and Be Beyond

  • nd

Mihael haela v a van an der er Schaar haar

John Humphrey Plummer Professor

  • f Machine Learning, Artificial Intelligence and Medicine

University of Cambridge Alan Turing Institute Chancellor’s Professor, UCLA

http://www.vanderschaar-lab.com

slide-2
SLIDE 2

My My rese search j jour

  • urne

ney Mult ltim imedia ia (M (MPEG PEG, etc.) Si Signal Pr l Proce cessin ing Com

  • mmunication
  • n N

Networ

  • rks (

(NSF F Career A Award, d, 2004 04) Distr tributed ted S Systems ms Game T me Theory ry Network S k Scie cience

Medic edicin ine an and d Health Healthcar care

slide-3
SLIDE 3

Wireless transmission – A cross-layer optimization problem `

  • M. van der Schaar, S. Krishnamachari, S. Choi, and X. Xu, "Adaptive cross-layer

protection strategies for robust scalable video transmission over 802.11 WLANs,“ IEEE J. Sel. Areas Commun., Dec. 2003

  • M. van der Schaar and S. Shankar, "Cross-layer wireless multimedia transmission:

challenges, principles, and new paradigms," IEEE Wireless Commun. Mag., Aug. 2005.

slide-4
SLIDE 4

How to adapt to changing demands and resources? Model-based adaptation using queuing theory

  • Q. Li and M. van der Schaar, "Providing adaptive QoS to layered video over

wireless local area networks through real-time retry limit adaptation," IEEE Trans. Multimedia, vol. 6, no. 2, pp. 278-290, Apr. 2004.

  • H. P. Shiang and M. van der Schaar, "Multi-user video streaming over multi-hop

wireless networks: A distributed, cross-layer approach based on priority queuing," IEEE J. Sel. Areas Commun., May 2007.

slide-5
SLIDE 5

How to adapt to changing demands and resources? Dealing with dynamics: A first machine learning approach

  • M. van der Schaar, D. Turaga, and R. Wong, "Classification-Based System For

Cross-Layer Optimized Wireless Video Transmission," IEEE Trans. Multimedia, vol. 8, no. 5, pp. 1082-1095, Oct. 2006.

Use supervised learning! SVM

slide-6
SLIDE 6

How to adapt to changing demands and resources? Learning and decision making under uncertainty

Reinforcement learning

  • F. Fu and M. van der Schaar, IEEE Trans. Multimedia, Jun. 2007.
slide-7
SLIDE 7

How to adapt to changing demands and resources? Different Layers – Different dynamics!

  • F. Fu and M. van der Schaar, "A New Systematic Framework for Autonomous

Cross-Layer Optimization," IEEE Trans. Veh. Tech., vol. 58, no. 4, pp. 1887-1903, May, 2009.

Reinforcement learning

slide-8
SLIDE 8

Reinforcement learning – A centralized approach?

  • F. Fu and M. van der Schaar, "A New Systematic Framework for Autonomous

Cross-Layer Optimization," IEEE Trans. Veh. Tech., vol. 58, no. 4, pp. 1887-1903, May, 2009.

slide-9
SLIDE 9

Reinforcement learning – A distributed reinforcement learning approach

  • F. Fu and M. van der Schaar, "A New Systematic Framework for Autonomous

Cross-Layer Optimization," IEEE Trans. Veh. Tech., vol. 58, no. 4, pp. 1887-1903, May, 2009.

Multiple agents acting to optimize common goal facing different dynamics New methods for reinforcement learning

slide-10
SLIDE 10

Cross-layer optimization for delay-sensitive multimedia (Learning with real-time constraints)

  • F. Fu and M. van der Schaar, "Decomposition Principles and Online Learning in

Cross-Layer Optimization for Delay-Sensitive Applications", IEEE Trans. Signal Process., vol 58, no. 3, pp. 1401-1415, Feb. 2010

slide-11
SLIDE 11

From single to multiple users:

Interference Management using Multi-user Learning

  • Y. Su and M. van der Schaar, "Conjectural Equilibrium in Multi-user Power Control

Games", IEEE Trans. Signal Process., vol. 57, no. 9, pp. 3638-3650, Sep. 2009.

  • Y. Su and M. van der Schaar, "Dynamic Conjectures in Random Access Networks Using

Bio-inspired Learning," IEEE J. Sel. Areas Commun., vol. 28, no. 4, pp. 587-601, May 2010

slide-12
SLIDE 12

From single to multiple users: Better congestion control by learning to coordinate without communication

  • W. Zame, J. Xu and M. van der Schaar, "Winning the Lottery: Learning Perfect

Coordination with Minimal Feedback," in IEEE J. Sel. Topics in Signal Process., Oct. 2013

  • W. Zame, J. Xu and M. van der Schaar, "Cooperative Multi-Agent Learning and

Coordination for Cognitive Radio Networks," IEEE J. Sel. Areas Commun, Mar. 2014.

slide-13
SLIDE 13
  • F. Fu and M. van der Schaar, "Noncollaborative Resource Management for Wireless

Multimedia Applications Using Mechanism Design," IEEE Trans. Multimedia, Jun. 2007

  • F. Fu, T. M. Stoenescu, and M. van der Schaar, "A Pricing Mechanism for Resource

Allocation in Wireless Multimedia Applications," IEEE Journal of Selected Topics in Signal Process., Aug. 2007

  • H. Park and M. van der Schaar, "Coalition-based Resource Reciprocation Strategies for

P2P Multimedia Broadcasting," IEEE Trans. Broadcast. Sep. 2008

  • H. Park and M. van der Schaar, "Bargaining Strategies for Networked Multimedia

Resource Management," IEEE Trans. Signal Process., Jul. 2007.

From benevolent to strategic users: Multi-user wireless games, learning and decisions

slide-14
SLIDE 14

Reinforcement learning for multiple strategic users

  • M. van der Schaar and F. Fu, "Spectrum Access Games and Strategic Learning in

Cognitive Radio Networks for Delay-Critical Applications," Proc. of IEEE, Special issue on Cognitive Radio, Apr. 2009.

slide-15
SLIDE 15

Mechanism design based resource allocation and Multi-agent Reinforcement Learning

  • F. Fu and M. van der Schaar, "Noncollaborative Resource Management for Wireless

Multimedia Applications Using Mechanism Design," IEEE Trans. Multimedia, Jun. 2007.

slide-16
SLIDE 16

Multi-user wireless games and reinforcement learning

  • F. Fu and M. van der Schaar, "Learning to Compete for Resources in Wireless Stochastic

Games," IEEE Trans. Veh. Tech., vol. 58, no. 4, pp. 1904-1919, May 2009.

slide-17
SLIDE 17

Building BIG networks! Distributed optimization in multi-hop networks

  • Y. Andreopoulos, N. Mastronarde, and M. van der Schaar, "Cross-layer Optimized

Video Streaming Over Wireless Multihop Mesh Networks," IEEE J. Sel. Areas Commun., vol. 24, no. 11, pp. 2104-2115, Nov. 2006.

slide-18
SLIDE 18

Building BIG networks! Multi-agent reinforcement learning in multi-hop nets

  • H. P. Shiang and M. van der Schaar, "Online Learning in Autonomic Multi-Hop

Wireless Networks for Transmitting Mission-Critical Applications," IEEE J. Sel. Areas Commun., vol. 28, no. 5, pp. 728-741, June 2010.

slide-19
SLIDE 19

Internet of Things: Communication among smart devices

  • J. Xu, Y. Andreopoulos, Y. Xiao and M. van der Schaar, "Non-stationary Resource

Allocation Policies for Delay-constrained Video Streaming: Application to Video

  • ver Internet-of-Things-enabled Networks," J. Sel. Areas in Commun., Apr. 2014.
slide-20
SLIDE 20

Incentivizing distributed online exchanges: Fiat money – a first theory

  • M. van der Schaar, J. Xu and W. Zame, "Efficient Online Exchange via Fiat Money,"

in Economic Theory, vol. 54, no. 2, pp. 211-248, Oct. 2013

slide-21
SLIDE 21

Fiat-Money: Applications in wireless networks Reinforcement learning to learn how to trade tokens

Multi-hop networks

  • J. Xu and M. van der Schaar, "Token System Design for Autonomic Wireless

Relay Networks," in IEEE Trans. on Commun., July 2013 Interference management

  • C. Shen, J. Xu and M. van der Schaar, "Silence is Gold: Strategic

Interference Mitigation Using Tokens in Heterogeneous Small Cell Networks," IEEE J. Sel. Areas Commun., June 2015 Device-to-Device Communication

  • N. Mastronarde, V. Patel, J. Xu, L. Liu, and M. van der Schaar , "To Relay or

Not to Relay: Learning Device-to-Device Relaying Strategies in Cellular Networks,"IEEE Transactions on Mobile Computing, June 2016.

slide-22
SLIDE 22

Context-Aware Caching – a contextual learning approach

  • S. Li, J. Xu, M. van der Schaar, and W. Li, "Trend-Aware Video Caching through

Online Learning, " IEEE Transactions on Multimedia, 2016.

  • S. Muller, O. Atan, M. van der Schaar and A. Klein, "Context-Aware Proactive

Content Caching With Service Differentiation in Wireless Networks," in IEEE Transactions on Wireless Communications, Feb. 2017.

slide-23
SLIDE 23

Network science: Learning in social networks

  • A. Alaa, K. Ahuja, and M. van der Schaar, "A Micro-foundation of Social Capital in

Evolving Social Networks," IEEE Transactions on Network Science and Engineering, 2017.

  • S. Zhang and M. van der Schaar, "From Acquaintances to Friends: Homophily and

Learning in Networks," the 2017 JSAC Game Theory for Networks special issue., 2017

slide-24
SLIDE 24

Forecasting popularity in networks – a contextual learning approach

  • J. Xu, M. van der Schaar, J. Liu and H. Li, "Forecasting Popularity of Videos using Social

Media," IEEE Journal of Selected Topics in Signal Processing (JSTSP), Nov. 2014.

slide-25
SLIDE 25

Communication & networking

  • The journey continued

http://www.vanderschaar-lab.com/NewWebsite/Publications_Communications_and_Networks.html

slide-26
SLIDE 26

My/Our challenge

  • Why was nothing implemented?
  • Costs of technology dropped
  • Simple solution was sufficient to meet requirements/needs
  • Academic – industry divide: difficult to show proof-of-concept
  • What next?
  • Find an area
  • with very hard problems: clever solutions matter
  • cost is a real issue!
  • Medicine and healthcare
slide-27
SLIDE 27

My My researc rch now

  • w:

: med medic icin ine

27

Develop cutting-edge machine learning, AI and operations research theory, methods, algorithms and systems to deliver precision medicine at the patient-level 1) understand the basis of health and disease 2) support clinical decisions for the patient at hand 3) inform and improve clinical pathways, better utilize resources & reduce costs 4) transform public health and policy

vs

slide-28
SLIDE 28

A broa

  • ad vis

ision

  • n of the

he role le of ML for heal alth thca care

28

Today: : Design clinical decision support systems 1. Early warning systems 2. Learning how to act when no experimentation is possible: individualized treatment effects

slide-29
SLIDE 29

Part 1: Building Clinical Decision Support Systems enabling delivery of precision medicine at the patient-level

29

slide-30
SLIDE 30

One One S Suc uccess ss Stor

  • ry:

y: F For

  • recast

st IC ICU

30

  • Hospita

taliz ized pati tients ents are vulnerable to adverse e event ents.

Cardiopulmonary Arrests Acute Respiratory Failure Septic Shocks Unanticipated transfer to the intensive care unit (ICU)

Delayed ICU admission is correlated with mortality

[Cardoso, 2011], [Liu et al., 2012]

Courtesy of: critical care medicine, 2011

Each hour delay =

1.5% increased risk of ICU death

slide-31
SLIDE 31

One One S Suc uccess ss Stor

  • ry:

y: F For

  • recast

st IC ICU

31

  • Hospita

taliz ized pati tients ents are vulnerable to adverse e event ents.

Cardiopulmonary Arrests Acute Respiratory Failure Septic Shocks Unanticipated transfer to the intensive care unit (ICU)

Delayed ICU admission is correlated with mortality

[Cardoso, 2011], [Liu et al., 2012]

Courtesy of: critical care medicine, 2011

Each hour delay =

1.5% increased risk of ICU death

Wh Whic ich patie ients in in the wards ds s should b d be admi mitted t d to the ICU U and when?

slide-32
SLIDE 32
  • Exampl

mple: Diastolic blood pressure for a patient hospitalized in a regular ward for more than 1000 hours and then admitted to ICU

ICU admission

  • Patient appeared stable, but was actually deteriorating

– the true state was hidden

32

Challen enge: e: T True state i te is hidden en

slide-33
SLIDE 33

Diastolic blood pressure Systolic blood pressure Best motor response Best verbal response Eye opening Glasgow coma scale score Heart rate Respiratory rate Oxygen saturation Temperature Oxygen device assistance

Vital signs Chloride Creatinine Glucose Hemoglobin Platelet count Potassium Sodium Total CO2 Urea nitrogen White blood cell count Lab tests Transfer Age Floor ID Gender Ethnicity Race Stem cell transplant ICD-9 codes Admission information 1 measurement / 24 hours 1 measurement / 4 hours Constant

33

What d data i is th ther ere? H e? Hospit ital Exa l Example le

slide-34
SLIDE 34

34

Wide v e variety ety o

  • f d

deteri teriorati tion p patt tter erns a and diagnosis

slide-35
SLIDE 35

Goal: al: Ea Early ly Warn arnin ing Sy Syste tems

Understand, infer and forecast patient-level trajectory (diagnosis, evolution of bio-markers and subsequent outcomes, treatment effects, etc.) Revise patient-level trajectory as care continues Feed back to stakeholders extracted patient-level intelligence to deliver personalized care

Holisti tic, p patient ent-level el d deci cision

  • n s

suppor

  • rt

Cancer er Diabet etes es Car ardio iovascula lar

slide-36
SLIDE 36

Use ML to infer present and future health states of the current patient

  • n the basis of observations about him/her

36

Ho How to t

  • thi

hink about he health a and di dise sease se? Ho How to t

  • track di

dise sease?

Observations/ Events States

Current state

New “Disease categorization”

slide-37
SLIDE 37

37

Observations States

Forecast Trajectory =

Probability to be in a certain state at a time T in the future

AC ACT

Ho How to t

  • thi

hink about he health a and di dise sease se? Ho How to t

  • track di

dise sease?

Use ML to infer present and future health states of the current patient

  • n the basis of observations about him/her

Current state

Observations/ Events

slide-38
SLIDE 38

Cardiac-related Mortality Stage 3 Cancer Secondary Cancer Mortality

Com

  • mpeting R

Risk sks

Other Causes of Death Primary Cancer Mortality

Define, infer and forecast health/disease state and state transitions

slide-39
SLIDE 39

Why has AI/ML not been used so far for in medicine for decision support and discovery?

Inadequate, simplistic models

  • Unable to capture the complexity of medicine
  • One-size-fits-all
  • Uninterpretable
  • Not easy to act upon
slide-40
SLIDE 40

Markov M Mod

  • dels

HMM MMs De Deep Markov Mod

  • dels

Etc. c.

Current D Diseas ease Pr e Progressio ion M Models els – simpl plistic a and w nd wron

  • ng

40

Easy to to unde understand and and com compute….. But W But WRO RONG

Cur urrent Sta nt State te

slide-41
SLIDE 41

41

Ignore history

  • Previous states
  • Order of states
  • Duration in a state

Pathological event 1 Pathological event 2 Pathological event 1 Pathological event 2 Most likely future Disease A Most likely future Disease B

His istor tory matter tters!

Ma Markov m v mod

  • dels?

s?

One size fits all!

Only capture population-level transitions across progression stages Ignores individual clinical trajectories

slide-42
SLIDE 42

Do e

  • existing D

Deep Le Learning m methods pr s provi vide sui suitable sol solutions? s?

42

Modeling using deep learning methods - recurrent neural network (RNN)?

slide-43
SLIDE 43

43

RNNs s with th atten tention mechan anism sms: : identify important variables for future predictions based on patient’s history

Variables and events at time t-1 Predictions of events at time t

Current D Diseas ease Pr e Progressio ion M Models els based ed o

  • n D

Dee eep Lea earnin ing

[E. Ch . Choi, i, 2017][L ][Lim and v nd van de n der r Sc Scha haar, M , ML4HC C 2018, , Ne NeurI urIPS PS 2018]

Pragmatic c predicti tions

  • ns, but

t no in inter terpreta table e path thol

  • log
  • gy

Atte tenti ntion

  • n

mecha hani nism!

slide-44
SLIDE 44

Do e

  • existing D

Deep Le Learning m methods pr s provi vide sui suitable sol solutions? s?

44

RNN considers the timing and order

  • f events, but no notion of states

Modeling using deep learning methods - recurrent neural network (RNN)? Not interpretable! Cannot use or extract clinical knowledge! Not able to answer important questions about early diagnosis/progression

slide-45
SLIDE 45

Models els f for H r Hea ealt lth a and Diseas ease e Traje ajectorie ies - Requi uirements

Learn from complex data, including event times and order Learn from clinical annotations, codes, expertise etc. History matters! Non-stationary models needed Learn holistically! Multiple morbidities Heterogeneous patients – personalization matters Interpretable models Cli lini nica call lly acti ctiona

  • nable

e model els for pati tien ent-level l traj ajec ecto tory need eded!

slide-46
SLIDE 46

Our f first st m mode

  • del:

Hidde dden A Absor sorbi bing S g Semi-Marko kov Model el (HAS HASMM) M) [ICML ML 2016 16, J , JMLR R 2017 017]

46

  • Hidden (true) state space:
  • one or more observable/absorbing states

Cardiac-related Mortality Cancer-related Mortality Hypertension Diabetes COPD with Exacerbation Hidden States

Sem Semi-super ervised ed Competi ting ng Ris isks

Tran ansiti tion probabiliti ties es depen end on so sojourn times es (Se Semi mi-Marko kov)

slide-47
SLIDE 47

Learn f Learn from Censo Censoring and Inform

  • rmati

tive O Obse servati tion T n Times es

47

Censoring

An S-HSMM episode

: Intensity parameter Informative observation times = sampling times correlated with states

  • Physiological data is gathered over irregul

ularly s spaced i d intervals: model the observations via a po point pr nt proc

  • cess
slide-48
SLIDE 48

Inf Infor

  • rmati

tive Obse Observati tion

  • n Time

mes

48

Observation times are modeled as a Hawkes es proces ess

  • Conti
  • ntinuous

uous-ti time j jum ump p pr proce

  • cess (l

(like P Poi

  • isson)

n)

  • Jum

ump p inte ntens nsities de depe pend on nd on sta tate te (unl (unlike Poi

  • isson)
  • n)

= Intensity

slide-49
SLIDE 49
  • Sojour
  • urn t

n time distri ribution

  • n
  • Semi

Semi-Markov transi sition f

  • n functions
  • ns
  • Sampl

plin ing g times o es of physiological str trea eams ms: : Hawkes es point p proces ess

  • Obser

served physiologica cal data: a: mu multi-task task Gau aussian an Proces ess

Hi Hidde dden n Absor bsorbi bing S ng Semi mi-Markov Mode Model (HA (HASMM MM)

49

Gamma distribution

  • cumulative distribution function of state i’s sojourn time

Multinomial logistic

slide-50
SLIDE 50

HA HASMM MM: A Versati tile Mode Model tha that S t Subsume ubsumes ma many Othe ny Others…

50

Switching Ornstein- Uhlenbeck DT-HMM CT-HMM ED-HMM Segment- HMM Sequential Hypothesis Testing Switching Gaussian Process DT-HSMM

HASM HASMM

[Alaa and van der Schaar, 2016, 2017] [Rabiner, 1989] [Sontag, 2014], [Liu, 2015] [Liu, 2015] [Veeravalli, 2015] [van der Schaar, 2016] [Chen, 2010] [Johnson and Willsky, 2015], [van der Schaar, 2016] [Dewar, 2011] [Murphy, 2002]

slide-51
SLIDE 51

Step 1 – Offline: Learn longitudinal models of health & disease: maps hidden (clinical) states to observable (physiological) data

Hidd idden S States Health States, History Clinical findings Physiological measurements Observation times Available data + Clinical Knowledge

51

A Ge Gene neral F Frame mewor

  • rk (C

(Clini nic, , Hospi Hospita tal, Home , Home)

Un Understand! Mo Model el! L Lear earn o

  • ffline!

e! In Infer at t run-ti time! Model(P (Par arameter eters)

Observable le d data

Ste Step p 2 – Onl nline ne: I Inf nfer di diagnos gnosis a and nd for

  • recast ri

t risk/pr prog

  • gnos

nosis for the

  • r the cur

urrent pa nt pati tient nt

slide-52
SLIDE 52

52

Resul sults ts - Sensi nsiti tivi vity ty-Prec ecis isio ion (at UC t UCLA Rona

  • nald R

d Reagan n Hospi Hospita tal)

52

100% PPV improvement 70% TPR improvement

25% reduc ucti tion of

  • n of waste

ted I ICU U resour

  • urces (PPV = Pr

(PPV = Precision

  • n)

) 70% mor

  • re de

dete teriorating pa ng pati tients nts (TPR (TPR = = Se Sens nsiti tivity ty)

slide-53
SLIDE 53

4 hours earlier than clinicians

180% PPV improvement 180% PPV improvement

12 hours 8 hours Sensitivity = 50%

53

Resul sults: ts: T Time meline ness ss (at UC t UCLA Rona

  • nald

d Reagan Hospi n Hospita tal)

Cambridge Southampton Etc.

slide-54
SLIDE 54

Algorithm AUC-PR (TPR vs PPV) UCLA ForecastICU 0.49 (Sequential) Random Forest 0.36 (Sequential) Logistic Regression 0.27 (Sequential) LASSO 0.26 HMM (Gaussian emission) 0.32 Multitask Gaussian Processes 0.30 Recurrent Neural Networks 0.29 Rothman 0.25 MEWS 0.18 APACHE II 0.13 SOFA 0.13

54

slide-55
SLIDE 55

PASS [Ala laa & van an der der Scha haar ar, 2018 018] Main idea: a general and versatile deep probabilistic model capturing complex, non-stationary representations for patient-level trajectories

Can w we d do even b better? tter?

Mainta tain p probabilist stic s str truct cture e of HMMs Bu But u use RNNs e RNNs to to mo model st state d e dynam amics

Emission Transition

slide-56
SLIDE 56

PASS: : Goi Going B Beyond Ma Markov v

56

  • Attention weights determine the influences of past state realizations
  • n future state transitions
  • PASS repeatedly updates attention weights to focus on past state

Attention weights Patient context

v

slide-57
SLIDE 57

PASS: : Ov Overcomes sho s shortcoming of

  • f Ma

Markov Mod v Models

Attention weights create a "soft" version of a non-stationary, variable-order Markov model where under erlying ng dynam namics cs of a pati tien ent t chan ange e over er tim ime e based

  • n an individual’s clini

linical al conte ntext!

Attention weights Patient context

PASS “memory” is shaped by patient’s current context (clinical events, treatments, etc.)

slide-58
SLIDE 58

PASS: : Beyond One One-size-fits ts-all u ll using Conte textu xtual al A Atten tenti tion

58

Atte tenti ntive e state-space ce model el: Indiv ivid idualiz ized dynamics ics Indiv ivid idualiza izatio ion – two

  • fold

ld: Stati tic c + Dynam namic c Cont ntext Attention weights explain causative and associative relationships between hidden disease states and past clinical events for that patient!

slide-59
SLIDE 59

PASS: SS: A General, l, V Ver ersatile ile and C Clin inica ically lly A Act ctio ionable le M Model

59

Variable-order HMM (Context Trees) DT-HMM CT-HMM Multi-task RNN HSMM, HASMM Sequential Hypothesis Testing Deep Markov Model Autoregressive Models

PA PASS

[Sontag, 2014], [Liu, 2015] [Krishnan et al., 2015] [Alaa and van der Schaar, 2016] [Alaa and van der Schaar, 2018] [Hoiles and van der Schaar, 2016] [Alaa and van der Schaar, 2017] [Lim and van der Schaar, 2018]

slide-60
SLIDE 60

60

Dynamic and personalized forecasting of the health and disease trajectory of a patient as data is gathered over time

Dynam namic Ti Time-to to-event nt analy lysis is

PASS: : A tool for

  • r de

decisi sion supp support a and di disc scovery Holistic and Competing Risks! [Bellot, vdS, NeurIPS 2018] [Lee, Zame, vdS, AAAI 2018] [Alaa, vdS, NIPS 2017]

slide-61
SLIDE 61

61

Mor Morbi bidi dity ne ty netw twor

  • rks:

s: Persona

  • nalized

Person

  • naliz

lized ed m morbid idity ity n networ

  • rks

Popula lation ion-le level l mo morbi rbidit ity n network

slide-62
SLIDE 62

62

  • Dynamic Morbidi

dity M Maps ps – In Inferred ed based ased on att ttention wei eights

  • Causal assoc
  • ciations
  • ns: how much attention is paid to diagnosis of

morbidity A when predicting morbidity B

Mor Morbi bidi dity ne ty netw twor

  • rks:

s: Dyna ynami mic

slide-63
SLIDE 63

63

Personaliz lized ed Sc Scree eenin ing/Monit itorin ing: Wh Who to S

  • Screen? Wh

When to S

  • Screen? Wha

What to S

  • Screen?

Dee eep Se Sensin ing [Yoo

  • on,

n, Zame me, vdS, ICLR LR 201 2018]

Dise sease se A Atlas s [Lim,

m, vdS, ML4HC HC 2018 018]

Whic ich M Modalit ality o

  • f Sc

Scree eenin ing? [Ala

laa, Moon, Hsu, vdS, TMM TMM 201 2016]

La Lab tes test Ev Event ent of

  • f

Interest

slide-64
SLIDE 64

Part 2:

64

Personalized medicine needs to go beyond risk predictions-

Individualized Treatment Recommendations

slide-65
SLIDE 65

65

Bob

Which treatment is best for Bob?

Diagnosed with Disease X

Problem: Estimate the effect of a treatment/intervention on an individual

Individualized Treatment Recommendations

slide-66
SLIDE 66

RCTs do s do not

  • t supp

support Perso sonalized Me Medi dicine

66

Rand andom

  • miz

ized Contr ntrol l Trial als: Aver erage Trea eatm tment nt Effect ects Non

  • n-repres

esen enta tative e pati tien ents ts Sm Small sam sample si sizes Tim ime e cons nsum uming ng Enor

  • rmous
  • us costs

ts Popula latio tion-lev evel Adaptiv ive Clinic ical Tria ials ls [Atan an, Zame me, vdS, AIST STATS TS 201 2019]

slide-67
SLIDE 67

Deliv liver erin ing Personaliz alized (I (Indiv ivid iduali lized) T Trea eatments

67

Rand andom

  • miz

ized Contr ntrol l Trial als: Aver erage Trea eatm tment nt Effect ects Non

  • n-repres

esen enta tative e pati tien ents ts Sm Small sam sample si sizes Tim ime e cons nsum uming ng Enor

  • rmous
  • us costs

ts Machi hine ne Lear earni ning ng: Individua uali lized Trea eatm tmen ent t Effect ects Popula latio tion-lev evel Pati tient nt-centric ric Re Real-wor

  • rld

ld obser ervati tiona

  • nal data

ta Scala lable le & adaptiv ive imple lementatio ion Fast t deployment ent Cost-ef effecti ctive

[Atan, vdS, 2015, 2018] [Alaa, vdS, 2017, 2018, 2019] [Yoon, Jordon, vdS, 2017] [Lim, Alaa, vdS, 2018] [Bica, Alaa, vdS, 2019]

slide-68
SLIDE 68

Potenti tial al o

  • utco

tcomes mes f framework [Neyman, 1923]

68

Each patient has features Observational data

Factual outcomes Causal effects

Two potential outcomes Treatment assignment

slide-69
SLIDE 69

Assum ssumptions

69

No unmeasured confounders (Ignorability) Common support

Obs bser erved Hi Hidde dden

Our work on hidden confounders [Lee, Mastronarde, vdS, 2018] [Bica, Alaa, vdS, 2019]

slide-70
SLIDE 70

Es Esti tima matin ing i indiv ivid idualiz lized ed tr trea eatment e effec ects

70

Observational data Treatment response surfaces Estimate causal effects: individualized treatment effects

slide-71
SLIDE 71

Training examples

Beyon

  • nd s

d supe pervise sed l d learning ng…

71

Fundamental challenge of causal inference: we never observe counterfactual outcomes

Ground-truth causal effects

. . . . . .

slide-72
SLIDE 72

72

1- Need to model interventions 2- Selection bias → covariate shift: training distribution ≠ testing distribution

Training distribution Testing distribution

Causa usal m mode

  • deling

g ≠ pred edic ictive mo e modelin ling

slide-73
SLIDE 73

Ma Many r y recent w works on s on i ind ndivi vidualized t treatment e effects s (IT ITEs) s)

73

Bayesian Additive Regression Trees (BART) [Chipman et. al, 2010], [J. Hill, 2011] Causal Forests [Wager & Athey, 2016] Nearest Neighbor Matching (kNN) [Crump et al., 2008] Balancing Neural Networks [Johansson, Shalit and Sontag, 2016] Causal MARS [Powers, Qian, Jung, Schuler, N. Shah, T. Hastie, R. Tibshirani, 2017 ] Targeted Maximum Likelihood Estimator (TMLE) [Gruber & van der Laan, 2011] Counterfactual regression [Johansson, Shalit and Sontag, 2016] CMGP [Alaa & van der Schaar, 2017]

No t No the heory, a , ad-hoc m c model els

GANITE [Yoon, Jordon & van der Schaar, 2018]

slide-74
SLIDE 74

A first t t theory ry for c causal infer eren ence ce - individual alized ed t treatm tmen ent e effect cts

74

Algorit rithms Theor eory

Wh What is is poss ssib ible? How How c can i n it t be be a achi hieved? d?

(Fundamental limits) (Practical implementation)

[Alaa, vdS, JSTSP 2017][ICML 2018]

slide-75
SLIDE 75

Bayes esian ian n nonparametric ic I ITE E estim timatio ion

75

ITE

True ue I ITE m TE mode

  • del

ITE e TE esti timation

  • n

Prior over response functions: Point estimator induced by Bayesian posterior Precision of estimating heterogeneous effects

What can be achieved?

Minimax estimation loss:

Best estimate Most “difficult” response surfaces Minimax loss = information-theoretic quantity, independent of the model

slide-76
SLIDE 76

Mi Minimax R Rate f for

  • r ITE

ITE Est stimation

Depends on the “complexity” of and …

relevant dimensions relevant dimensions Hölder ball Hölder ball Rough functions Smooth functions Sparsity Smoothness

slide-77
SLIDE 77

Mi Minimax R Rate f for

  • r ITE

ITE Est stimation

Theorem 1

The minimax rate for ITE estimation is given by: Measure of response surface complexity =

Smoothness parameter Number of relevant dimensions

Minimax rate depends on the more complex

  • r

Minimax rate does not depend on selection bias

slide-78
SLIDE 78

Sh Should ld w we e car are a about selec ectio ion b bias ias?

Assume that and

Minimax-optimal estimator

Offset Slope

Complexity of response surfaces dominates Need to account for selection bias

(Smoothness & dimensionality)

Large-sample regime Small-sample regime

Rényi Divergence

slide-79
SLIDE 79

79

We want models that do well in both small and large sample regimes

Small sample regime Large sample regime

Handling selection bias Sharing training data between response surfaces Flexible model and hyperparameter tuning

Theory gui y guides m s model de desi sign

slide-80
SLIDE 80

Whe here N e Next? t?

deliver real time and personalised decision support direct to individual clinicians and patients

slide-81
SLIDE 81
slide-82
SLIDE 82
slide-83
SLIDE 83
slide-84
SLIDE 84

Select a patient

slide-85
SLIDE 85
slide-86
SLIDE 86

Upload a pathology report

slide-87
SLIDE 87

AutoPrognosis

slide-88
SLIDE 88

INVASE

slide-89
SLIDE 89

ER Positive ER Negative

slide-90
SLIDE 90
slide-91
SLIDE 91

PASS

slide-92
SLIDE 92
slide-93
SLIDE 93
slide-94
SLIDE 94
slide-95
SLIDE 95
slide-96
SLIDE 96
slide-97
SLIDE 97
slide-98
SLIDE 98

Deep Sensing

slide-99
SLIDE 99
slide-100
SLIDE 100
slide-101
SLIDE 101
slide-102
SLIDE 102
  • GANITE
  • NSGP
  • Counterfactual

Recurrent Nets

slide-103
SLIDE 103

Deep Predictive Clustering

slide-104
SLIDE 104

Join us! A recruitment video!

https://www.youtube.com/watch?v=TWI-WIoWvfk

slide-105
SLIDE 105

Details about our software: http://www.vanderschaar-lab.com Details about our algorithms: http://www.vanderschaar-lab.com