Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin - PowerPoint PPT Presentation

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin rning g wit with h Tra rain inin ing g Data Data Ma Mana nage gement ment Alex x Ratne tner Stanford University 1

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Training data is the key ingredient in ML But it’s created and managed in manual, ad hoc ways 2

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management KEY RESEARCH QUESTION Can we add mathematical & systems structure to the way people build & manage training sets today? 3

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Running Example: Chest X-Ray Triage “Abnormal” Motivation: Case prioritization for e.g. low- resource hospitals 4 [Dunnmon et. al., Radiology 2018; Dunnmon & Ratner et. al., 2019; Khandewala et. al., NeurIPS ML4H 2017]

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Running Example: Chest X-Ray Triage Unlabeled data Training set Model Model (multi-modal) creation development (e.g. ResNet) 2-3 days ± 1 point due to model choice Model dev is often radically easier today! 5 (All scores: ROC AUC) [Dunnmon et. al., Radiology 2018; Dunnmon & Ratner et. al., 2019; Khandewala et. al., NeurIPS ML4H 2017]

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Running Example: Chest X-Ray Triage Unlabeled data Training set Model Model (multi-modal) creation development (e.g. ResNet) 8 months 2- 3 days ± 9 points due to training set size ± 1 point due to model choice ± 8 points due to training set quality Training data is often the key differentiator 6 (All scores: ROC AUC) [Dunnmon et. al., Radiology 2018; Dunnmon & Ratner et. al., 2019; Khandewala et. al., NeurIPS ML4H 2017]

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Challenges of Training Data Management • Vol olume is is crit itic ical al • But training ining data ata is lar argel gely y hand nd-labele abeled: : slow & e expensiv xpensive • Qualit lity is is c crit itic ical • But this s is chall llenging ging to to assess assess 𝑍 ∈ {“𝐵𝑐𝑜𝑝𝑠𝑛𝑏𝑚”, “𝑂𝑝𝑠𝑛𝑏𝑚”} • Fle lexi xibi bilit lity is is c crit itic ical al 𝑍 ∈ {“𝑉𝑠𝑕𝑓𝑜𝑢”, “𝐹𝑛𝑓𝑠𝑕𝑓𝑜𝑢”, “𝑂𝑝𝑠𝑛𝑏𝑚”} • But training ining sets ts are e comp mpletel ely y stati atic 7

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our research: building systems that Let users specify training sets in 1 higher-level, programmatic ways 2 Clean and integrate this input Use as training data for ML models 3 A new way to specify ML models--- in hours rather than months 8

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our Research: Training Data Management Systems Multi-Task Labeling Augmentation Supervision Unlabeled data Model This talk: Three systems that support and accelerate critical steps of training data creation & management

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our Research: Training Data Management Systems 1 Multi-Task Labeling Augmentation Supervision Unlabeled Normal data Model Snorkel Programmatically label training data

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our Research: Training Data Management Systems 1 2 Multi-Task Labeling Augmentation Supervision Unlabeled Normal data Model Snorkel TANDA Programmatically transform training data

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our Research: Training Data Management Systems 1 2 3 Multi-Task Labeling Augmentation Supervision Unlabeled Normal 𝑍 𝑍 𝑍 data 1 2 3 Model Snorkel TANDA MeTaL Programmatically integrate training data across multiple tasks

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our Research: Training Data Management Systems 1 2 3 Multi-Task Labeling Augmentation Supervision Unlabeled Normal 𝑍 𝑍 𝑍 data 1 2 3 Model Snorkel TANDA MeTaL Deployments: Industry Government Medicine

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Our Research: Training Data Management Systems 1 2 3 Multi-Task Labeling Augmentation Supervision Unlabeled Normal data Model 𝑍 𝑍 𝑍 1 2 3 Snorkel TANDA MeTaL

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Problem: Hand-labeling is slow, expensive, & static Idea: Enable users to label training data programmatically 15

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management KEY TECHNICAL IDEA: View training set labeling as a noisy programmatic process that we can model 16

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management The Snorkel Pipeline snorkel.stanford.edu def LF_short_report(x): 𝑍 if len(X.words) < 15: 1 return “NORMAL” def LF_off_shelf_classifier(x): 𝑍 if off_shelf_classifier(x) == 1: 2 return “NORMAL” 𝑍 def LF_pneumo(x): 𝑍 if re.search( r’pneumo.*’ , X.text): 3 return “ABNORMAL” def LF_ontology(x): 𝑍 if DISEASES & X.words: 4 TRAINING return “ABNORMAL” DATABASE LABELING FUNCTIONS LABEL MODEL UNLABELED DATA END MODEL Users write Snorkel The resulting labeling functions cleans and training database to heuristically combines the used to train an label data LF labels ML model Note: No hand-labeled training data! 17

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Snorkel: Real-World Deployments Science & Industry Government snorkel.stanford.edu Medicine In many cases: From person-months of hand- labeling to hours 18

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management (1) Writing Labeling Functions 1 def LF_short_report(x): 𝑍 if len(X.words) < 15: 1 return “NORMAL” def LF_off_shelf_classifier(x): 𝑍 if off_shelf_classifier(x) == 1: 2 return “NORMAL” 𝑍 def LF_pneumo(x): 𝑍 if re.search( r’pneumo.*’ , X.text): 3 return “ABNORMAL” def LF_ontology(x): 𝑍 if DISEASES & X.words: 4 TRAINING return “ABNORMAL” DATABASE LABELING FUNCTIONS LABEL MODEL UNLABELED DATA END MODEL Users write Snorkel The resulting labeling functions cleans and training database to heuristically combines the used to train an label data LF labels ML model 19

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management (1) Writing Labeling Functions def LF_short_report(x): if len(X.words) < 15: return “NORMAL” Labeling function: def LF_off_shelf_classifier(x): if off_shelf_classifier(x) == 1: return “NORMAL” 𝜇: 𝒴 ↦ 𝒵 ∪ {0} def LF_pneumo(x): if re.search( r’pneumo.*’ , X.text): return “ABNORMAL” def LF_ontology(x): if DISEASES & X.words: return “ABNORMAL” Data Labels Abstain LABELING FUNCTIONS A simple abstraction for expressing domain heuristics or other noisy label sources 20

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Simple Example: Pattern Matching “Indication: Chest pain. Findings: Focal consolidation def LF_pneumo(x): if re.search( r’pneumo.*’ , X.text): and pneumothorax. ” return “ABNORMAL” Labeling beling functio nctions ns (LFs) s) are e si simple ple UDF DFs s for r expr pressing essing domain main exper pertise tise 21

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Simple Example: Pattern Matching “Indication: Chest pain. Findings: No focal def LF_pneumo(x): if re.search( r’pneumo.*’ , X.text): consolidation or return “ABNORMAL” pneumothorax …” LFs s can n also so be noisy sy--- --we can n est stima imate e their eir accuracies ccuracies to to handle ndle this s (next) ) 22

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management A Simple Formalism for Weak Supervision Strategies def LF_pneumo(x): • Pattern matching [e.g. Hearst 1992, if re.search( r’pneumo.*’ , X.text): Zhang 2017] return “ABNORMAL” def LF_ontology(x): • Distant supervision [e.g. Mintz 2009] if DISEASES & X.words: return “ABNORMAL” def LF_short_report(x): • Domain heuristics if len(X.words) < 15: return “NORMAL” def LF_circular_mass(x): • Functions of features [e.g. Varma 2017] c = off_shelf_circle_finder(x)[0] if c.radius > 1: return “ABNORMAL” An And many ny others ers: : crowdsour dsourcing, ing, other her models, dels, etc. tc. 23

Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin - PowerPoint PPT Presentation

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin rning g wit with h Tra rain inin ing g Data Data Ma Mana nage

Unit 14: The Mach Operating System 14.3. Mach Memory Management AP 9/01 Mach Virtual Memory

First Meeting of Creditors Orlc 92 Pty Ltd 12 April 2018 Red Lea Franchise Pty Ltd Red Lea

Unit 14: The Mach Operating System 14.2. Threads and Scheduling in Mach AP 9/01 Threads

1 Mach Overview Mach Overview Mach Mach Mach is more general than NT in that objects named by

THE HUMAN CAPITAL WILL WILL A ACC CCEL ELERA ERATE TE MO MORE AND RE AND BETT BETTER

CESO National Services Acc ccele lerating In Indig igenous Youth Leaders th through

SOCIAL SECURITY AND ACC Concluding remarks for The ACC Debate: How Do We Pay for ACC? ,

IN INTE TERGRA RATIN TING A Presentation at the WEOG Region Post 2020 INDIGENOU OUS PEOP

Study on Mach Reflection and Mach Configuration CHEN Shuxing Hyp-2008, Maryland Outline

MART INE Z CRE E K L INE AR CRE E KWAY T RAIL Pub lic Me e ting Ja nua ry 18, 2018 L

Repairing Four-Atom Conjecture Ting-Ting Nan Advisor: Nigel Boston SP Coding and Information

Chin Baptist Churches USA Womens Department Executive Committee Meeting Mid America Chin

DAVID MACH By Jack Haxley BIOGRAPHY David Mach was born in a town called Methil on the

Shock wave v v = = = S sin ; Mach angle , Mach number v v S Shock

The Mach System From "Operating Systems Concepts, Sixth Edition" by Abraham

A Quick Look At Low Mach Number Methodology Ann Almgren Center for Computational Sciences and

with Context Features Evelin Hristova, Heinrich Schulz, Tom Brosch, Mattias P. Heinrich, Hannes

NG39 Major Trauma: Assessment and Initial Management START This resource presents every

Introduction to X Introduction to X- -ray crystallography ray crystallography Sergei V.

Nanofocused X-Ray Beam To Reprogram Secure Circuits Stphanie Anceau, Pierre Bleuet, Jessy

Conventional radiograph generation from CT images intensity integration Allows rich dataset for

Slide 1 - Analytical/Cabinet X-Ray Safety Training Page 1 of 69 Adobe Captivate Friday, May 19,

Exposure monitoring and DRLs in diagnostic nuclear medicine and hybrid imaging: quantities,

Dosimetry at accelerators: state-of-the-art and applications to medicine Marco Silari CERN,

Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin - PowerPoint PPT Presentation

Data Council 4/17/19 | Alexander Ratner Accelerating Machine Learning with Training Data Management Acc ccele elera ratin ting g Ma Mach chin ine e Lea earnin rning g wit with h Tra rain inin ing g Data Data Ma Mana nage

Unit 14: The Mach Operating System 14.3. Mach Memory Management AP 9/01 Mach Virtual Memory

First Meeting of Creditors Orlc 92 Pty Ltd 12 April 2018 Red Lea Franchise Pty Ltd Red Lea

Unit 14: The Mach Operating System 14.2. Threads and Scheduling in Mach AP 9/01 Threads

1 Mach Overview Mach Overview Mach Mach Mach is more general than NT in that objects named by

THE HUMAN CAPITAL WILL WILL A ACC CCEL ELERA ERATE TE MO MORE AND RE AND BETT BETTER

CESO National Services Acc ccele lerating In Indig igenous Youth Leaders th through

SOCIAL SECURITY AND ACC Concluding remarks for The ACC Debate: How Do We Pay for ACC? ,

IN INTE TERGRA RATIN TING A Presentation at the WEOG Region Post 2020 INDIGENOU OUS PEOP

Study on Mach Reflection and Mach Configuration CHEN Shuxing Hyp-2008, Maryland Outline

MART INE Z CRE E K L INE AR CRE E KWAY T RAIL Pub lic Me e ting Ja nua ry 18, 2018 L

Repairing Four-Atom Conjecture Ting-Ting Nan Advisor: Nigel Boston SP Coding and Information

Chin Baptist Churches USA Womens Department Executive Committee Meeting Mid America Chin

DAVID MACH By Jack Haxley BIOGRAPHY David Mach was born in a town called Methil on the

Shock wave v v = = = S sin ; Mach angle , Mach number v v S Shock

The Mach System From &quot;Operating Systems Concepts, Sixth Edition&quot; by Abraham

A Quick Look At Low Mach Number Methodology Ann Almgren Center for Computational Sciences and

with Context Features Evelin Hristova, Heinrich Schulz, Tom Brosch, Mattias P. Heinrich, Hannes

NG39 Major Trauma: Assessment and Initial Management START This resource presents every

Introduction to X Introduction to X- -ray crystallography ray crystallography Sergei V.

Nanofocused X-Ray Beam To Reprogram Secure Circuits Stphanie Anceau, Pierre Bleuet, Jessy

Conventional radiograph generation from CT images intensity integration Allows rich dataset for

Slide 1 - Analytical/Cabinet X-Ray Safety Training Page 1 of 69 Adobe Captivate Friday, May 19,

Exposure monitoring and DRLs in diagnostic nuclear medicine and hybrid imaging: quantities,

Dosimetry at accelerators: state-of-the-art and applications to medicine Marco Silari CERN,

The Mach System From "Operating Systems Concepts, Sixth Edition" by Abraham