snorkel data programming beyond hand labeled training data
play

Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex - PowerPoint PPT Presentation

AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab MOTIVATION: In practice, training data is often: The bottleneck The practical injection point for


  1. AAAI DeLBP Workshop 2/3 / 2018 Snorkel + Data Programming: Beyond Hand-Labeled Training Data Alex Ratner Stanford University, InfoLab

  2. MOTIVATION: In practice, training data is often: • The bottleneck • The practical injection point for domain knowledge

  3. KEY IDEA: We can use higher-level, weaker supervision to program ML models

  4. AAAI DeLBP Workshop 2/3 / 2018 Outline • The Labeling Bottleneck: The new pain point of ML • Data Programming + Snorkel: A framework for weaker, more efficient supervision • In practice: Empirical results & user studies

  5. AAAI DeLBP Workshop 2/3 / 2018 My Amazing Collaborators On the market! Jason Fries Henry Ehrenberg Chris De Sa Stephen Bach (Facebook) (Cornell) And many more at Stanford & Beyond… Bryan He Paroma Varma Sen Wu Braden Hancock Chris Ré

  6. AAAI DeLBP Workshop 2/3 / 2018 The ML Pipeline Pre-Deep Learning Feature Training Collection Labeling Engineering True False Feature engineering used to be the bottleneck…

  7. AAAI DeLBP Workshop 2/3 / 2018 The ML Pipeline Today Training Collection Labeling Representation True Learning False New pain point, new injection point

  8. AAAI DeLBP Workshop 2/3 / 2018 Training Data: Challenges & Opportunities • Expensive & Slow: • Especially when domain expertise needed • Static: • Real-world problems change; hand-labeled training data does not. • An opportunity to inject domain knowledge: • Modern ML models are often too complex for hand-tuned structures, priors, etc. How do we get— and use— training data more effectively?

  9. AAAI DeLBP Workshop 2/3 / 2018 Data Programming + Snorkel A Framework + System for Creating Training Data with Weak Supervision NIPS 2016 SIGMOD (Demo) 2017

  10. KEY IDEA: Get users to provide higher-level (but noisier) supervision, Then model & de-noise it (using unlabeled data) to train high-quality models

  11. AAAI DeLBP Workshop 2/3 / 2018 Data Programming Pipeline in Snorkel Input: Labeling Functions, Generative Noise-Aware Ex. Application: Knowledge Base Unlabeled data Model Discriminative Model Creation (KBC) DOMAIN def lf1(x): def 𝜇 $ cid = (x.chemical_id, EXPERT x.disease_id) h 1,1 return return 1 if if cid in KB else else 0 x 1,1 def lf2(x): def 𝜇 # m = re.search(r’.*cause.*’, 𝑍 y 1 h 1,2 x.between) return 1 if return if m else else 0 x 1,2 Output: Probabilistic def def lf3(x): h 1,3 𝜇 " Training Labels m = re.search( r’.*not r’.*not cause.*’ , x.between) cause.*’ return 1 if return if m else else 0 1 2 3 Users write labeling We model the labeling We use the resulting functions to generate functions’ behavior to prob. labels to train noisy labels de-noise them a model

  12. AAAI DeLBP Workshop 2/3 / 2018 Surprising Point: No hand-labeled training data!

  13. AAAI DeLBP Workshop 2/3 / 2018 𝜇 $ DOMAIN EXPERT def lf1(x): def cid = (x.chemical_id, x.disease_id) h return 1 if return if cid in KB else else 0 1 , 1 x 1 , 1 𝜇 # def def lf2(x): 𝑍 h m = re.search(r’.*cause.*’, y 1 x.between) 1 , return 1 if return if m else else 0 2 x 1 , 2 h 1 , def lf3(x): def 𝜇 " 3 m = re.search( r’.*not cause.*’ r’.*not cause.*’ , x.between) return 1 if return if m else else 0 DOMAIN def def lf1(x): cid = (x.chemical_id, EXPERT x.disease_id) return 1 if return if cid in KB else else 0 def def lf2(x): Step 1: Writing m = re.search(r’.*cause.*’, x.between) return 1 if return if m else else 0 Labeling Functions def lf3(x): def m = re.search( r’.*not r’.*not cause.*’ , x.between) cause.*’ return 1 if return if m else else 0 A Unifying Framework for Expressing Weak Supervision

  14. AAAI DeLBP Workshop 2/3 / 2018 Example: Chemical-Disease Relation Extraction from Text • We define candidate entity mentions: ID Chemical Disease Prob. • Chemicals 00 magnesium Myasthenia 0.84 gravis • Diseases 01 magnesium quadriplegic 0.73 • Goal: Populate a relational schema with 02 magnesium paralysis 0.96 relation mentions KNOWLEDGE BASE (KB)

  15. AAAI DeLBP Workshop 2/3 / 2018 Labeling Functions • Traditional “distant supervision” def lf1(x): def rule relying on external KB cid =(x.chemical_id,x.disease_id) if cid in KB else return 1 if return else 0 ”Chemical A is found to cause Label = TRUE Label = TRUE disease B under certain conditions…” This is likely to be true… but Contains (A,B) Contains (A,B) Existing KB

  16. AAAI DeLBP Workshop 2/3 / 2018 Labeling Functions • Traditional “distant supervision” def lf1(x): def cid =(x.chemical_id,x.disease_id) rule relying on external KB return return 1 if if cid in KB else else 0 ”Chemical A was found on the Label = TRUE Label = TRUE floor near a person with disease B…” …can be false! Contains (A,B) Contains (A,B) Existing KB We will learn the accuracy of each LF (next)

  17. AAAI DeLBP Workshop 2/3 / 2018 Writing Labeling Functions in Snorkel • Labeling functions take • Three levels of abstraction for in Candidate objects: writing LFs in Snorkel: • Python code Document Candidate( Candidate(A,B ,B) def def lf1(x): cid =(x.chemical_id,x.disease_id) Sentence return return 1 if if cid in KB else else 0 • LF templates Span Entity lf1 = LF_DS(KB) CONTEXT HIERARCHY • LF generators for lf in LF_DS_hier(KB, cut_level=2): A knowledge base Key Point: Supervision as code (KB) with hierarchy yield lf

  18. AAAI DeLBP Workshop 2/3 / 2018 Supported by Simple Jupyter Interface snorkel.stanford.edu

  19. AAAI DeLBP Workshop 2/3 / 2018 Broader Perspective: A Template for Weak Supervision

  20. AAAI DeLBP Workshop 2/3 / 2018 A Unifying Method for Weak Supervision • Distant supervision • Crowdsourcing • Weak classifiers • Domain heuristics / rules 𝜇 ∶ 𝑌 ↦ 𝑍 ∪ {∅}

  21. AAAI DeLBP Workshop 2/3 / 2018 Related Work in Weak Supervision • Distant Supervision: Mintz et. al. 2009, Alfonesca et. al. 2012, Takamatsu et. al. 2012, Roth & Klakow 2013, Augenstein et. al. 2015, etc. • Crowdsourcing: Dawid & Skene 1979, Karger et. al. 2011, Dalvi et. al. 2013, Ruvolo et. al. 2013, Zhang et. al. 2014, Berend & Kontorovich 2014, etc. • Co-Training: Blum & Mitchell 1998 • Noisy Learning: Bootkrajang et. al. 2012, Mnih & Hinton 2012, Xiao et. al. 2015, etc. • Indirect Supervision: Clarke et. al. 2010, Guu et. Al. et. al. 2017, etc. • Feature and Class-distribution Supervision: Zaidan & Eisner 2008, Druck et. al. 2009, Liang et. al. 2009, Mann & McCallum 2010, etc. • Boosting & Ensembling: Schapire & Freund, Platanios et. al. 2016, etc. • Constraint-Based Supervision: Bilenko et. al. 2004, Koestinger et. al. 2012, Stewart & Ermon 2017, etc. Check out our full list @ snorkel.stanford.edu/blog/ws_blog_post.html – we love suggested additions or other feedback!

  22. AAAI DeLBP Workshop 2/3 / 2018 How to handle such a diversity of weak supervision sources?

  23. AAAI DeLBP Workshop 2/3 / 2018 𝜇 $ DOMAIN EXPERT def def lf1(x): cid = (x.chemical_id, x.disease_id) h return 1 if return if cid in KB else else 0 1 , 1 x 1 , 1 𝜇 # def def lf2(x): 𝑍 h m = re.search(r’.*cause.*’, y 1 x.between) 1 , return 1 if return if m else else 0 2 x 1 , 2 h 1 , def lf3(x): def 𝜇 " 3 m = re.search( r’.*not cause.*’ r’.*not cause.*’ , x.between) return 1 if return if m else else 0 𝜇 $ Step 2: Modeling . 𝜇 # 𝑍 𝑍 Weak Supervision 𝜇 "

  24. AAAI DeLBP Workshop 2/3 / 2018 Weak Supervision: Core Challenges • Unified input format 𝜇 $ • Accuracies of sources • Modeling • Correlations between sources 𝜇 # 𝑍 • Expertise of sources 𝜇 " • Using to train a wide range of models

  25. AAAI DeLBP Workshop 2/3 / 2018 Weak Supervision: Core Challenges • Unified input format 𝜇 $ • Accuracies of sources NIPS 2016 • Modeling • Correlations between sources 𝜇 # 𝑍 • Expertise of sources 𝜇 " • Using to train a wide range of models Intuition: We use agreements / disagreements to learn without ground truth

  26. AAAI DeLBP Workshop 2/3 / 2018 Basic Generative Labeling Model Labeling propensity: Λ 0,$ ;<= Λ 0 , 𝑍 # ) 𝛾 3 = 𝑞 6 (Λ 0,3 ≠ ∅) ;<= Λ 0,3 𝑔 0 = exp (𝜄 3 3 Accuracy: 𝑍 Λ 0,# 0 𝛽 3 = 𝑞 6 Λ 0,3 = 𝑍 0 𝑍 0 , Λ 0,3 ≠ ∅) <BB Λ 0 , 𝑍 <BB Λ 0,3 𝑍 𝑔 0 = exp (𝜄 0 ) 3 3 Λ 0," Correlations ICML 2017

  27. AAAI DeLBP Workshop 2/3 / 2018 Intuition: Learning from Disagreements P(y i | 𝜇 ) x 1 0.95 P(λ i |y j ) Learn the model π = 𝑄 𝑧, Λ using MLE λ 1 0.85 • LFs have a hidden accuracy parameter x 2 0.80 • Intuition: Majority vote--estimate labeling function accuracy based on overlaps / conflicts λ 2 0.80 x 3 0.85 • Similar to crowdsourcing but different scaling. • small number of LFs, large number of labels 0.15 x 4 each λ 3 0.65 x 5 Produce a set of noisy training labels 0.65 LFs ( 𝜇 ) 𝜈 H 𝑧, 𝜇 = 𝑄 I,J ~L 𝑧 | Λ = 𝜇(𝑦) Unlabeled objects

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend