Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex - - PowerPoint PPT Presentation

snorkel data programming beyond hand labeled training data
SMART_READER_LITE
LIVE PREVIEW

Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex - - PowerPoint PPT Presentation

AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab MOTIVATION: In practice, training data is often: The bottleneck The practical injection point for


slide-1
SLIDE 1

AAAI DeLBP Workshop 2/3/2018

Snorkel + Data Programming: Beyond Hand-Labeled Training Data

Alex Ratner Stanford University, InfoLab

slide-2
SLIDE 2

MOTIVATION:

In practice, training data is often:

  • The bottleneck
  • The practical injection point for

domain knowledge

slide-3
SLIDE 3

KEY IDEA:

We can use higher-level, weaker supervision to program ML models

slide-4
SLIDE 4

AAAI DeLBP Workshop 2/3/2018

Outline

  • The Labeling Bottleneck: The new pain point of ML
  • Data Programming + Snorkel: A framework for weaker, more

efficient supervision

  • In practice: Empirical results & user studies
slide-5
SLIDE 5

AAAI DeLBP Workshop 2/3/2018

My Amazing Collaborators

Chris De Sa (Cornell) Sen Wu Chris Ré Henry Ehrenberg (Facebook) Stephen Bach

Jason Fries

Bryan He

And many more at Stanford & Beyond…

Paroma Varma

On the market!

Braden Hancock

slide-6
SLIDE 6

AAAI DeLBP Workshop 2/3/2018

The ML Pipeline Pre-Deep Learning

Collection

True False

Labeling Training

Feature engineering used to be the bottleneck…

Feature Engineering

slide-7
SLIDE 7

AAAI DeLBP Workshop 2/3/2018

The ML Pipeline Today

Collection Representation Learning

True False

Labeling Training

New pain point, new injection point

slide-8
SLIDE 8

AAAI DeLBP Workshop 2/3/2018

Training Data: Challenges & Opportunities

  • Expensive & Slow:
  • Especially when domain expertise needed
  • Static:
  • Real-world problems change; hand-labeled

training data does not.

  • An opportunity to inject domain knowledge:
  • Modern ML models are often too complex for

hand-tuned structures, priors, etc.

How do we get—and use—training data more effectively?

slide-9
SLIDE 9

AAAI DeLBP Workshop 2/3/2018

Data Programming + Snorkel

A Framework + System for Creating Training Data with Weak Supervision

NIPS 2016

SIGMOD (Demo) 2017

slide-10
SLIDE 10

Get users to provide higher-level (but noisier) supervision, Then model & de-noise it (using unlabeled data) to train high-quality models

KEY IDEA:

slide-11
SLIDE 11

AAAI DeLBP Workshop 2/3/2018

Data Programming Pipeline in Snorkel

DOMAIN EXPERT

Input: Labeling Functions, Unlabeled data

def def lf1(x): cid = (x.chemical_id, x.disease_id) return return 1 if if cid in KB else else 0 def def lf2(x): m = re.search(r’.*cause.*’, x.between) return return 1 if if m else else 0 def def lf3(x): m = re.search(r’.*not r’.*not cause.*’ cause.*’, x.between) return return 1 if if m else else 0

Noise-Aware Discriminative Model

Output: Probabilistic Training Labels

x1,1 x1,2 h1,3 h1,1 h1,2 y1

𝜇" 𝜇# 𝜇$ 𝑍 Generative Model

Users write labeling functions to generate noisy labels 1 We model the labeling functions’ behavior to de-noise them 2 We use the resulting

  • prob. labels to train

a model 3

  • Ex. Application:

Knowledge Base Creation (KBC)

slide-12
SLIDE 12

AAAI DeLBP Workshop 2/3/2018

Surprising Point: No hand-labeled training data!

slide-13
SLIDE 13

AAAI DeLBP Workshop 2/3/2018

Step 1: Writing Labeling Functions

A Unifying Framework for Expressing Weak Supervision

DOMAIN EXPERT

def def lf1(x): cid = (x.chemical_id, x.disease_id) return return 1 if if cid in KB else else 0 def def lf2(x): m = re.search(r’.*cause.*’, x.between) return return 1 if if m else else 0 def def lf3(x): m = re.search(r’.*not r’.*not cause.*’ cause.*’, x.between) return return 1 if if m else else 0

DOMAIN EXPERT def def lf1(x): cid = (x.chemical_id, x.disease_id) return return 1 if if cid in KB else else 0 def def lf2(x): m = re.search(r’.*cause.*’, x.between) return return 1 if if m else else 0 def def lf3(x): m = re.search(r’.*not cause.*’ r’.*not cause.*’, x.between) return return 1 if if m else else 0 x 1 , 1 x 1 , 2 h 1 , 3 h 1 , 1 h 1 , 2 y 1 𝜇" 𝜇# 𝜇$ 𝑍
slide-14
SLIDE 14

AAAI DeLBP Workshop 2/3/2018

Example: Chemical-Disease Relation Extraction from Text

  • We define candidate entity mentions:
  • Chemicals
  • Diseases
  • Goal: Populate a relational schema with

relation mentions

ID Chemical Disease Prob.

00 magnesium Myasthenia gravis 0.84 01 magnesium quadriplegic 0.73 02 magnesium paralysis 0.96

KNOWLEDGE BASE (KB)

slide-15
SLIDE 15

AAAI DeLBP Workshop 2/3/2018

Labeling Functions

  • Traditional “distant supervision”

rule relying on external KB

”Chemical A is found to cause disease B under certain conditions…”

Label = TRUE Label = TRUE

Existing KB

Contains (A,B) Contains (A,B)

This is likely to be true… but

def def lf1(x): cid =(x.chemical_id,x.disease_id) return return 1 if if cid in KB else else 0

slide-16
SLIDE 16

AAAI DeLBP Workshop 2/3/2018

Labeling Functions

  • Traditional “distant supervision”

rule relying on external KB

”Chemical A was found on the floor near a person with disease B…”

Label = TRUE Label = TRUE

Existing KB

Contains (A,B) Contains (A,B)

…can be false!

def def lf1(x): cid =(x.chemical_id,x.disease_id) return return 1 if if cid in KB else else 0

We will learn the accuracy of each LF (next)

slide-17
SLIDE 17

AAAI DeLBP Workshop 2/3/2018

Writing Labeling Functions in Snorkel

  • Labeling functions take

in Candidate objects:

Document Sentence Span Entity

CONTEXT HIERARCHY

  • Three levels of abstraction for

writing LFs in Snorkel:

  • Python code
  • LF templates
  • LF generators

Candidate( Candidate(A,B ,B)

def def lf1(x): cid =(x.chemical_id,x.disease_id) return return 1 if if cid in KB else else 0 lf1 = LF_DS(KB) for lf in LF_DS_hier(KB, cut_level=2): yield lf A knowledge base (KB) with hierarchy

Key Point: Supervision as code

slide-18
SLIDE 18

AAAI DeLBP Workshop 2/3/2018

Supported by Simple Jupyter Interface

snorkel.stanford.edu

slide-19
SLIDE 19

AAAI DeLBP Workshop 2/3/2018

Broader Perspective: A Template for Weak Supervision

slide-20
SLIDE 20

AAAI DeLBP Workshop 2/3/2018

  • Distant supervision
  • Crowdsourcing
  • Weak classifiers
  • Domain heuristics / rules

𝜇 ∶ 𝑌 ↦ 𝑍 ∪ {∅}

A Unifying Method for Weak Supervision

slide-21
SLIDE 21

AAAI DeLBP Workshop 2/3/2018

Related Work in Weak Supervision

  • Distant Supervision: Mintz et. al. 2009, Alfonesca et. al. 2012, Takamatsu et. al. 2012, Roth &

Klakow 2013, Augenstein et. al. 2015, etc.

  • Crowdsourcing: Dawid & Skene 1979, Karger et. al. 2011, Dalvi et. al. 2013, Ruvolo et. al. 2013,

Zhang et. al. 2014, Berend & Kontorovich 2014, etc.

  • Co-Training: Blum & Mitchell 1998
  • Noisy Learning: Bootkrajang et. al. 2012, Mnih & Hinton 2012, Xiao et. al. 2015, etc.
  • Indirect Supervision: Clarke et. al. 2010, Guu et. Al. et. al. 2017, etc.
  • Feature and Class-distribution Supervision: Zaidan & Eisner 2008, Druck et. al. 2009, Liang et.
  • al. 2009, Mann & McCallum 2010, etc.
  • Boosting & Ensembling: Schapire & Freund, Platanios et. al. 2016, etc.
  • Constraint-Based Supervision: Bilenko et. al. 2004, Koestinger et. al. 2012, Stewart & Ermon

2017, etc.

Check out our full list @ snorkel.stanford.edu/blog/ws_blog_post.html – we love suggested additions or other feedback!

slide-22
SLIDE 22

AAAI DeLBP Workshop 2/3/2018

How to handle such a diversity of weak supervision sources?

slide-23
SLIDE 23

AAAI DeLBP Workshop 2/3/2018

Step 2: Modeling Weak Supervision

𝜇" 𝜇# 𝜇$ 𝑍 𝑍 .

DOMAIN EXPERT def def lf1(x): cid = (x.chemical_id, x.disease_id) return return 1 if if cid in KB else else 0 def def lf2(x): m = re.search(r’.*cause.*’, x.between) return return 1 if if m else else 0 def def lf3(x): m = re.search(r’.*not cause.*’ r’.*not cause.*’, x.between) return return 1 if if m else else 0 x 1 , 1 x 1 , 2 h 1 , 3 h 1 , 1 h 1 , 2 y 1 𝜇" 𝜇# 𝜇$ 𝑍
slide-24
SLIDE 24

AAAI DeLBP Workshop 2/3/2018

Weak Supervision: Core Challenges

  • Unified input format
  • Modeling
  • Using to train a wide range of models

𝜇" 𝜇# 𝜇$ 𝑍

  • Accuracies of sources
  • Correlations between sources
  • Expertise of sources
slide-25
SLIDE 25

AAAI DeLBP Workshop 2/3/2018

Weak Supervision: Core Challenges

  • Unified input format
  • Modeling
  • Using to train a wide range of models

𝜇" 𝜇# 𝜇$ 𝑍

  • Accuracies of sources
  • Correlations between sources
  • Expertise of sources

NIPS 2016

Intuition: We use agreements / disagreements to learn without ground truth

slide-26
SLIDE 26

AAAI DeLBP Workshop 2/3/2018

Basic Generative Labeling Model

Λ0,$ Λ0,# Λ0," 𝑍

Labeling propensity: 𝛾3 = 𝑞6(Λ0,3 ≠ ∅)

𝑔

3 ;<= Λ0, 𝑍 0 = exp

(𝜄3

;<=Λ0,3 # )

𝑔

3 <BB Λ0, 𝑍 0 = exp

(𝜄

3 <BBΛ0,3𝑍 0)

Accuracy: 𝛽3 = 𝑞6 Λ0,3 = 𝑍

0 𝑍 0, Λ0,3 ≠ ∅)

Correlations

ICML 2017

slide-27
SLIDE 27

AAAI DeLBP Workshop 2/3/2018

Intuition: Learning from Disagreements

Learn the model π = 𝑄 𝑧, Λ using MLE

  • LFs have a hidden accuracy parameter
  • Intuition: Majority vote--estimate labeling

function accuracy based on overlaps / conflicts

  • Similar to crowdsourcing but different scaling.
  • small number of LFs, large number of labels

each

Produce a set of noisy training labels 𝜈H 𝑧, 𝜇 = 𝑄 I,J ~L 𝑧 | Λ = 𝜇(𝑦)

x1 x3 x5 x2 x4 Unlabeled

  • bjects

P(λi|yj) 0.85 0.80 0.65 λ1 λ2 λ3 LFs (𝜇) P(yi| 𝜇) 0.95 0.80 0.15 0.85 0.65

slide-28
SLIDE 28

AAAI DeLBP Workshop 2/3/2018

Step 2: Training a Noise-Aware Model

In a supervised learning setting, we would learn from ground-truth labels: Here, we learn from the noisy labels:

𝑥 P = argminW 1 𝑂 Z 𝑚(𝑥, 𝑦 0 , 𝑧 0 )

\ 0]$

𝑥 P = argminW 1 𝑂 Z 𝔽 𝒛,𝜧 ~𝝆[𝑚 𝑥, 𝑦 0 , 𝑧 0 = 𝑧 ]

\ 0]$

Only requires simple tweak to loss function works over ma many mod

  • dels including Logistic Regression, SVMs and LSTMs.

𝑈 = { 𝑦$, 0 , 𝑦#, 1 , 𝑦", 0 , … } 𝑈 = { 𝑦$, 0.1 , 𝑦#, 0.6 , 𝑦", 0.3 , … }

slide-29
SLIDE 29

AAAI DeLBP Workshop 2/3/2018

Theory: Scaling with Unlabeled Data

  • We show that with:
  • O 1 labeling functions of sufficient quality / expressiveness
  • 𝑃

.(𝜗m#) unlabeled training data points

  • à We get 𝑃 𝜗 generalization risk

This is the same asymptotic scaling as in supervised methods!

slide-30
SLIDE 30

AAAI DeLBP Workshop 2/3/2018

When is modeling the noise worthwhile?

  • Can look at label density:
  • Low: Too sparse to beat MV
  • High: MV approaches optimal
  • Medium: Just right!
  • Can use conditional decision rule

to safely skip gen. modeling stage

  • E.g. during early LF dev cycles
slide-31
SLIDE 31

AAAI DeLBP Workshop 2/3/2018

Putting it All Back Together

DOMAIN EXPERT

Input: Labeling Functions, Unlabeled data

def def lf1(x): cid = (x.chemical_id, x.disease_id) return return 1 if if cid in KB else else 0 def def lf2(x): m = re.search(r’.*cause.*’, x.between) return return 1 if if m else else 0 def def lf3(x): m = re.search(r’.*not r’.*not cause.*’ cause.*’, x.between) return return 1 if if m else else 0

Noise-Aware Discriminative Model Output: Probabilistic Training Labels

x1,1 x1,2 h1,3 h1,1 h1,2 y1

𝜇" 𝜇# 𝜇$ 𝑍 Generative Model Users write labeling functions to generate noisy labels 1 We model the labeling functions’ behavior to de-noise them 2 We use the resulting

  • prob. labels to train a

model 3

slide-32
SLIDE 32

AAAI DeLBP Workshop 2/3/2018

How well does this work in practice?

Empirical Results

slide-33
SLIDE 33

AAAI DeLBP Workshop 2/3/2018

Results on Chemical-Disease Relations

Distant Supervision

Precision: 25.5 Recall: 34.8 F1: 29.4

L

1

L

2

L

3

y

Generative Model

Precision: 52.3 Recall: 30.4 F1: 38.5 + 9.1

x1 x2

h3 h1 h2

y

Discriminative Model

Precision: 38.8 Recall: 54.3 F1: 45.3 + 6.8

True False

Hand Supervision

Precision: 39.9 Recall: 58.1 F1: 47.3 + 2.0

slide-34
SLIDE 34

AAAI DeLBP Workshop 2/3/2018

Snorkel is Powering Real Applications

slide-35
SLIDE 35

AAAI DeLBP Workshop 2/3/2018

How easy is this to use in practice?

User Study

slide-36
SLIDE 36

AAAI DeLBP Workshop 2/3/2018

71%

New Snorkel users matched or beat 7 hours of hand-labeling

3rd Place Score

No machine learning experience Beginner-level Python

How well did these new Snorkel users do?

2.8x

Faster than hand-labeling data

45.5%

Average improvement in model performance

We recently ran a Snorkel biomedical workshop in collaboration with the NIH Mobilize Center 15 companies and research groups attended

Jason Fries, Stephen Bach, Alex Ratner, Joy Ku, Christopher Ré

Snorkel User Study

slide-37
SLIDE 37

AAAI DeLBP Workshop 2/3/2018

What’s Next: MTL?

  • Hierarchical LFs as weakly-supervised MTL
  • And more, see snorkel.stanford.edu
slide-38
SLIDE 38

AAAI DeLBP Workshop 2/3/2018

Conclusion

  • Snorkel provides a unifying framework for combining and

modeling weak supervision

  • Allows us to rapidly generate training data for modern ML models
  • Labeling functions: supervision as code
  • For more check out snorkel.stanford.edu: Code, tutorials,

blogs, papers

snorkel.stanford.edu snorkel.stanford.edu