Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical - PowerPoint PPT Presentation

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care Patrick Schwab 1 @schwabpa Emanuela Keller 2 , Carl Muroi 2 , David J. Mack 2 , Christian Strässle 2 and Walter Karlen 1 1 Institute of Robotics and Intelligent Systems, ETH Zurich 2 Neurocritical Care Unit, University Hospital Zurich

Schwab et al. Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

How Can We Help?

The Idea Schwab et al. Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

The Idea ! Smarter Monitoring (1) Lower degree of urgency, or (2) suppressed Schwab et al. Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

Challenges • Large amounts of biosignal monitoring data and alarms available • But only a limited amount of labelled data • Expert labels expensive and time-consuming • Can we make due with a smaller number of labels?

Semi-supervised Learning

Existing Approaches • Existing methods to semi-supervised learning in deep networks are roughly: 1. Distant / self / weak supervision 1 • e.g. temporal ensembling 2. Reconstruction-based objectives • e.g. AE, VAE, Ladder Nets 3. Adversarial learning • e.g. Feature Matching GANs, CatGAN, Triple-GAN, … 1 Laine & Aila, ICLR 2017

A Uni fi ed View • Reconstruction-based SSL can be viewed as distant supervision where reconstruction is the auxiliary task supervision compress reconstruct

A Uni fi ed View • Reconstruction-based SSL can be viewed as distant supervision where reconstruction is the auxiliary task • Reconstruction is a convenient auxiliary task • .. generalises to all kinds of models, input data

A Uni fi ed View • Reconstruction-based SSL can be viewed as distant supervision where reconstruction is the auxiliary task • Reconstruction is a convenient auxiliary task • .. generalises to all kinds of models, input data • But is it the best ?

Hypotheses • Recent empirical successes 1 with speci fi cally engineered auxiliary tasks lead to hypotheses: (1) More “related” auxiliary tasks might be a better choice than reconstruction (2) Using multiple diverse auxiliary tasks might be better than just one 1 Oquab et al., 2015; Deriu et al., 2017; Doersch & Zisserman, 2017 Schwab et al. Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

Supervised Learning

Supervised Learning Specialised per Signal

Supervised Learning Missing Indicators

DSMT-Net

DSMT-Net Any Number of Multitask Blocks ….

So far so good, but … 1 - Where could we get a large number of auxiliary tasks from? 2 - What about potential adverse interactions between gradients from all these auxiliary tasks? Schwab et al. Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care

1 - Large-scale Auxiliary Task Selection • How do we select auxiliary tasks for distant supervision? • Identi fi cation of relevant features in large feature repository (auto-corr., power spectral densities..) • relevant = signi fi cant correlation 1 with labels • Simple strategies: (1) At random out of the relevant set, and (2) in order of importance 1 Kendall’s !

2 - Combating Adverse Gradient Interactions • A key issue in end-to-end multitask learning are adverse gradient interactions • We therefore disentangle training unsupervised and supervised tasks • Train in alternating fashion in each epoch • First unsupervised tasks then supervised tasks • Similar to alternating training regime in GANs

Evaluation

Results 0,500 0,625 0,750 0,875 1,000 Feature RF Supervised Naive Multitask Learning Ladder Network Feature Matching GAN DSMT-Net-6 AUROC @ 12 labels DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

Results Supervised Baselines 0,500 0,625 0,750 0,875 1,000 Feature RF Supervised Naive Multitask Learning Ladder Network Feature Matching GAN DSMT-Net-6 AUROC @ 12 labels DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

Results 0,500 0,625 0,750 0,875 1,000 Feature RF SSL Baselines Supervised Naive Multitask Learning Ladder Network Feature Matching GAN DSMT-Net-6 AUROC @ 12 labels DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

Results 0,500 0,625 0,750 0,875 1,000 Feature RF Supervised Naive Multitask Learning DSMT-Nets (importance) Ladder Network Feature Matching GAN DSMT-Net-6 AUROC @ 12 labels DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

Results 0,500 0,625 0,750 0,875 1,000 Feature RF Supervised Naive Multitask Learning Ladder Network Feature Matching GAN DSMT-Net-6 AUROC @ 12 labels DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Nets (R + D) DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

0,500 0,625 0,750 0,875 1,000 DSMT-Nets outperform existing SSL methods Feature RF Supervised Naive Multitask Learning Ladder Network AUROC @ 25 labels Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D Feature RF Supervised Naive Multitask Learning Ladder Network AUROC @ 50 labels Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D Feature RF Supervised Naive Multitask Learning AUROC @ 100 labels Ladder Network Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

0,500 0,625 0,750 0,875 1,000 Random outperforms Importance Selection Feature RF Supervised Naive Multitask Learning Ladder Network AUROC @ 25 labels Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D Feature RF Supervised Naive Multitask Learning Ladder Network AUROC @ 50 labels Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D Feature RF Supervised Naive Multitask Learning AUROC @ 100 labels Ladder Network Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

0,500 0,625 0,750 0,875 1,000 Preventing Adverse Gradient Interactions Is Key Feature RF Supervised Naive Multitask Learning Ladder Network AUROC @ 25 labels Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D Feature RF Supervised Naive Multitask Learning Ladder Network AUROC @ 50 labels Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D Feature RF Supervised Naive Multitask Learning AUROC @ 100 labels Ladder Network Feature Matching GAN DSMT-Net-6 DSMT-Net-12 DSMT-Net-25 DSMT-Net-50 DSMT-Net-100 No two-step training DSMT-Net-6R DSMT-Net-100R DSMT-Net-100D

Conclusion

Conclusion • We present an approach to semi-supervised learning that … ✔ • automatically selects a large set of auxiliary tasks from multivariate time series ✔ • scales to hundreds of auxiliary tasks in a single neural network ✔ • combats adverse gradient interactions between tasks • We con fi rm that adverse gradient interactions and auxiliary task diversity are key in multitask learning. • We make good progress on a clinically important task.

Questions? Patrick Schwab @schwabpa patrick.schwab@hest.ethz.ch Institute for Robotics and Intelligent Systems ETH Zurich Find out more at the poster session (#108, 18.15), and in the paper: Schwab, P., Keller, E., Muroi, C., Mack, D. J., Strässle, C., and Karlen, W. (2018). Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care. 34

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical - PowerPoint PPT Presentation

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care Patrick Schwab 1 @schwabpa Emanuela Keller 2 , Carl Muroi 2 , David J. Mack 2 , Christian Strssle 2 and Walter Karlen 1 1 Institute of Robotics and Intelligent Systems,

Far Cry and DirectX Far Cry and DirectX Carsten Wenzel Carsten Wenzel Far Cry uses the latest

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

Shadows increase realism: Cry Cry En Engine Zaxxon Zaxxon (1982) 2 Shadows increase

The wolf in the Free State of Saxony Statement on the handling of the wolf in Saxony 1 | XX.

$ Jitish Kallat Traumanama - The Cry of the Gland, 2009-10 Mixed Media on Indian Handmade Paper

Sort these words into adjectives and verbs cry try dry funny happy copy heavy reply

Low-C -Cost S st Se e lf-T lf-Te e st o st of C f Cry rypto to D De e v vice ice s

2018 Annual Report - Oregon Wolf Conservation and Management April 19, 2019 Roblyn Brown Wolf

The Overall Cost of Renewable Energies Gerd H. Wolf Zagreb, November 14 th , 2014 Gerd H. Wolf

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Data Exploration & Visualization MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Density-based Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Introduction to Data Science MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Decision Trees MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr

Handout and presentation maps from P. Kozelka, USEPA TAC meeting 31 January 2006 Table 2.x

Learning Drug Resistance from Therapeutic History Alejandro Pironti Computational Biology and

Clean Energy Program Bringing Clean Energy Innovation to the City of New Yorks Municipal

FY 2017 Second Quarter Earnings Conference Call May 9, 2017 Agenda TransDigm Overview, W.

ASX Release 25 July 2007 BBW PRESENTATION TO CITIGROUP INAUGURAL CLIMATE CHANGE CONFERENCE

2018-2019 Dynamo/Dash League July 21, 2018 South Texas Youth Soccer Association Summer GBM

Efficient Communication Library for Large-Scale Deep Learning Mar 26, 2018 Minsik Cho

Plymouth Transportation & Visitors Services Center May 6, 2014 Town of Plymouth Board of

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical - PowerPoint PPT Presentation

Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care Patrick Schwab 1 @schwabpa Emanuela Keller 2 , Carl Muroi 2 , David J. Mack 2 , Christian Strssle 2 and Walter Karlen 1 1 Institute of Robotics and Intelligent Systems,

Far Cry and DirectX Far Cry and DirectX Carsten Wenzel Carsten Wenzel Far Cry uses the latest

? Message sound Message P(wolf|sound) P(sound| wolf) x P(wolf) 1 9/4/19 P(sound| wolf)

Shadows increase realism: Cry Cry En Engine Zaxxon Zaxxon (1982) 2 Shadows increase

The wolf in the Free State of Saxony Statement on the handling of the wolf in Saxony 1 | XX.

$ Jitish Kallat Traumanama - The Cry of the Gland, 2009-10 Mixed Media on Indian Handmade Paper

Sort these words into adjectives and verbs cry try dry funny happy copy heavy reply

Low-C -Cost S st Se e lf-T lf-Te e st o st of C f Cry rypto to D De e v vice ice s

2018 Annual Report - Oregon Wolf Conservation and Management April 19, 2019 Roblyn Brown Wolf

The Overall Cost of Renewable Energies Gerd H. Wolf Zagreb, November 14 th , 2014 Gerd H. Wolf

Hierarchical Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Data Exploration &amp; Visualization MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca

Principal Component Analysis MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Density-based Clustering MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Introduction to Data Science MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Decision Trees MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de Montr

Handout and presentation maps from P. Kozelka, USEPA TAC meeting 31 January 2006 Table 2.x

Learning Drug Resistance from Therapeutic History Alejandro Pironti Computational Biology and

Clean Energy Program Bringing Clean Energy Innovation to the City of New Yorks Municipal

FY 2017 Second Quarter Earnings Conference Call May 9, 2017 Agenda TransDigm Overview, W.

ASX Release 25 July 2007 BBW PRESENTATION TO CITIGROUP INAUGURAL CLIMATE CHANGE CONFERENCE

2018-2019 Dynamo/Dash League July 21, 2018 South Texas Youth Soccer Association Summer GBM

Efficient Communication Library for Large-Scale Deep Learning Mar 26, 2018 Minsik Cho

Plymouth Transportation &amp; Visitors Services Center May 6, 2014 Town of Plymouth Board of

Data Exploration & Visualization MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca

Plymouth Transportation & Visitors Services Center May 6, 2014 Town of Plymouth Board of