Learning the Structure of Generative Models without Labeled Data - - PowerPoint PPT Presentation

learning the structure of generative models without
SMART_READER_LITE
LIVE PREVIEW

Learning the Structure of Generative Models without Labeled Data - - PowerPoint PPT Presentation

Learning the Structure of Generative Models without Labeled Data Stephen Bach Bryan He Alex Ratner Chris R Stanford University This Talk We study structure learning for generative models in which a latent variable


slide-1
SLIDE 1

Learning the Structure of Generative Models
 without Labeled Data

Stephen Bach Bryan He Alex Ratner Chris Ré Stanford University

slide-2
SLIDE 2

This Talk

  • We study structure learning for generative models in

which a latent variable generates weak signals

  • The challenge is distinguishing between dependencies

directly between the weak signals and those induced by the latent class

slide-3
SLIDE 3

This Talk

  • We propose an l1- regularized pseudolikelihood

approach

  • We develop a new analysis technique, since previous

analyses of related approaches only apply to the fully supervised case

slide-4
SLIDE 4

Roadmap

  • Motivation: Denoising Weak Supervision with

Generative Models

  • Our Work: Learn their Structure without Ground Truth
  • Results
  • Provable Recovery
  • Consistent Performance Improvements on Existing Systems
slide-5
SLIDE 5

Motivation: Denoising Weak Supervision with Generative Models

slide-6
SLIDE 6

Training Data Creation: $$$, Slow, Static

  • Expensive & Slow:
  • Especially when domain expertise needed

Especially when domain expertise needed

  • With deep learning replacing feature

engineering, collecting training data is now

  • ften the biggest ML bottleneck

Grad Student Labeler

slide-7
SLIDE 7

Snorkel

  • Open-source system to build ML models

with weak supervision

  • Users write labeling functions, model their

accuracies and correlations, and train models

snorkel.stanford.edu

slide-8
SLIDE 8

Example: Chemical-Disease Relations

  • We have entity mentions:
  • Chemicals

Chemicals

  • Diseases

Diseases

  • Goal: Populate table with relation mentions

ID ¡ Chemical ¡ Disease ¡

  • Prob. ¡

00 ¡ magnesium ¡ Myasthenia ¡ gravis ¡ 0.84 ¡ 01 ¡ magnesium ¡ quadriplegic ¡ 0.73 ¡ 02 ¡ magnesium ¡ paralysis ¡ 0.96 ¡

slide-9
SLIDE 9

How can we train without hand-labeling examples?

slide-10
SLIDE 10

Weak Supervision

Noisy, less expensive labels

Example types:

  • Domain heuristics
  • Distant supervision
  • Crowdsourcing
  • Weak classifiers
slide-11
SLIDE 11

Generative Models for Weak Supervision

  • Crowdsourcing

[Dawid and Skene, 1979, Dalvi et al., WWW 2013]

  • Hierarchical topic models for relation extraction

[Alfonseca et al., ACL 2012, Roth and Klakow, EMNLP 2013]

  • Generative models for denoising distant supervision

[Takamatsu et al., ACL 2012]

  • Generative models for arbitrary labeling functions

[Ratner et al., NIPS 2016]

slide-12
SLIDE 12

Labeling Functions – Domain Heuristics

“In our study, administering Chemical A caused Disease B under certain conditions…”

def def LF_1(x): m = re.match('.*caused.*', x.sentence) return True if m else else None

slide-13
SLIDE 13

Labeling Functions – Distant Supervision

“In our study, administering Chemical A caused Disease B under certain conditions…”

def def LF_2(x): in_kb = (x.chemical, x.disease) in ctd return True if in_kb else else None

Comparative Toxicogenomics Database http://ctdbase.org ¡

slide-14
SLIDE 14

Weak Supervision Pipeline in Snorkel

Output: Trained Model

DOMAIN EXPERT

Input: Labeling Functions

def def lf1(x): cid = (x.chemical_id, x.disease_id) return return 1 if if cid in KB else else 0 def def lf2(x): m = re.search(r’.*cause.*’, x.between) return return 1 if if m else else 0 def def lf3(x): m = re.search(r’.*not r’.*not cause.*’ cause.*’, x.between) return return 1 if if m else else 0

Users write functions to label training data

L1 ¡ L2 ¡ L3 ¡ y ¡

Generative Model

We model functions’ behavior to denoise it

Noise-Aware Discriminative Model

x1 ¡ x2 ¡ h3 ¡ h1 ¡ h2 ¡ y ¡

We use estimated labels to train a model

slide-15
SLIDE 15

Denoising Weak Supervision

Latent variable Generates LF outputs Factors model LF accuracies We maximize the marginal likelihood of the noisy labels

  • Intuitively, compares their agreements and disagreements

True Label LF1 LF2 LF3 Acc Acc Acc

slide-16
SLIDE 16

Dependent Labeling Functions

  • Correlated heuristics
  • E.g., looking for keywords in different sized windows of text
  • Correlated inputs
  • E.g., looking for keywords in raw tokens or lemmas
  • Correlated Knowledge Sources
  • E.g., distant supervision from overlapping knowledge bases
slide-17
SLIDE 17

Structure Learning

slide-18
SLIDE 18

Structure Learning

True Label LF1 LF2 LF3 Acc Acc Acc Cor? Cor? Cor? Latent Variable Target Variable Conditioning Variable Dependency Possible Dependency

slide-19
SLIDE 19

Structure Learning for Factor Graphs

Challenges

  • Gradient requires approximation
  • Possible dependencies grow quadratically or worse

Prior Work

  • Ravikumar et al. (Ann. of Stats., 2010) proposed using

l1-regularized pseudolikelihood for supervised Ising models

slide-20
SLIDE 20

Structure Learning for Generative Models

  • We maximize the l1-regularized marginal pseudolikelihood
  • One target variable and one latent variable means gradient can

be computed exactly, efficiently

slide-21
SLIDE 21

Structure Learning for Generative Models

True Label LF1 LF2 LF3 Acc Acc Acc Cor? Cor? Cor? Latent Variable Target Variable Conditioning Variable Dependency Possible Dependency

slide-22
SLIDE 22

Structure Learning for Generative Models

True Label LF1 LF2 LF3 Acc Acc Acc Cor Cor Latent Variable Target Variable Conditioning Variable Dependency Possible Dependency

slide-23
SLIDE 23

LF1

Structure Learning for Generative Models

True Label LF3 Acc Acc Acc Cor Cor Latent Variable Target Variable Conditioning Variable Dependency Possible Dependency LF2

slide-24
SLIDE 24

LF1 LF3

Structure Learning for Generative Models

True Label LF2 Acc Acc Acc Cor Cor Latent Variable Target Variable Conditioning Variable Dependency Possible Dependency

slide-25
SLIDE 25

Structure Learning for Generative Models

  • Without ground truth, the problem becomes harder
  • Latent variable means marginal likelihood is nonconvex
slide-26
SLIDE 26

Analysis

slide-27
SLIDE 27

Analysis

  • Strategy
  • Focus on case in which most labeling functions are non-

adversarial

  • Show that true model contained in region in which objective

is locally strongly convex

  • Assumptions
  • Feasible set of parameters that contains the true model
  • Over the feasible set, conditioning on a labeling function

provides more information than marginalizing it out

slide-28
SLIDE 28

Theorem: Guaranteed Recovery

For pairwise dependencies, such as correlations,

  • samples are sufficient to recover true dependency structure over

n labeling functions with probability at least 1 - .

δ n

m ≥ Ω ⇣ n log n δ ⌘

slide-29
SLIDE 29

Empirical Results

slide-30
SLIDE 30

Empirical Sample Complexity

  • Better in

practice

  • Same as
  • bserved in

supervised setting

slide-31
SLIDE 31

Speed Up: 100x

slide-32
SLIDE 32

Improvement to End Models

Application

  • Ind. F1
  • Struct. F1

F1 Diff # LF # Dep. Disease Tagging 66.3 68.9 +2.6 233 315 Chemical- Disease 54.6 55.9 +1.3 33 21 Device- Polarity 88.1 88.7 +0.6 12 32

slide-33
SLIDE 33

Conclusion

  • Generative models can help us get around the training data

bottleneck, but we need to learn their structure

  • Maximum pseudolikelihood gives
  • provable recovery
  • 100x speedup
  • end-model improvement

Thank you! snorkel.stanford.edu