Combining Distant and Partial Supervision for Relation Extraction - - PowerPoint PPT Presentation

combining distant and partial supervision for relation
SMART_READER_LITE
LIVE PREVIEW

Combining Distant and Partial Supervision for Relation Extraction - - PowerPoint PPT Presentation

Combining Distant and Partial Supervision for Relation Extraction Gabor Angeli , Julie Tibshirani, Jean Y. Wu, Christopher D. Manning Stanford University October 28, 2014 Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial


slide-1
SLIDE 1

Combining Distant and Partial Supervision for Relation Extraction

Gabor Angeli, Julie Tibshirani, Jean Y. Wu, Christopher D. Manning

Stanford University

October 28, 2014

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 1 / 19

slide-2
SLIDE 2

Motivation: Knowledge Base Completion

Unstructured Text

Structured Knowledge Base

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 2 / 19

slide-3
SLIDE 3

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 3 / 19

slide-4
SLIDE 4

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 3 / 19

slide-5
SLIDE 5

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 3 / 19

slide-6
SLIDE 6

Motivation: Question Answering

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 3 / 19

slide-7
SLIDE 7

Relation Extraction

Input: Sentences containing (entity, slot value). Output: Relation between entity and slot value.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 4 / 19

slide-8
SLIDE 8

Relation Extraction

Input: Sentences containing (entity, slot value). Output: Relation between entity and slot value. Consider two approaches: Supervised: Trivial as a supervised classifier. Training data: {(sentence, relation)}. But...

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 4 / 19

slide-9
SLIDE 9

Relation Extraction

Input: Sentences containing (entity, slot value). Output: Relation between entity and slot value. Consider two approaches: Supervised: Trivial as a supervised classifier. Training data: {(sentence, relation)}. But... this training data is expensive to produce.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 4 / 19

slide-10
SLIDE 10

Relation Extraction

Input: Sentences containing (entity, slot value). Output: Relation between entity and slot value. Consider two approaches: Supervised: Trivial as a supervised classifier. Training data: {(sentence, relation)}. But... this training data is expensive to produce. Distantly Supervised: Artificially produce “supervised” data. Training data: {(entity, relation, slot value)}. But...

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 4 / 19

slide-11
SLIDE 11

Relation Extraction

Input: Sentences containing (entity, slot value). Output: Relation between entity and slot value. Consider two approaches: Supervised: Trivial as a supervised classifier. Training data: {(sentence, relation)}. But... this training data is expensive to produce. Distantly Supervised: Artificially produce “supervised” data. Training data: {(entity, relation, slot value)}. But... this training data is much more noisy.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 4 / 19

slide-12
SLIDE 12

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervised relation extraction.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 5 / 19

slide-13
SLIDE 13

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervised relation extraction. What is “carefully selected”: Propose new active learning criterion. Evaluate a number of questions:

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 5 / 19

slide-14
SLIDE 14

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervised relation extraction. What is “carefully selected”: Propose new active learning criterion. Evaluate a number of questions:

Is the proposed criterion better than other methods?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 5 / 19

slide-15
SLIDE 15

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervised relation extraction. What is “carefully selected”: Propose new active learning criterion. Evaluate a number of questions:

Is the proposed criterion better than other methods? Where is the supervision helping?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 5 / 19

slide-16
SLIDE 16

Contribution: Combine Benefits of Both

Adding carefully selected supervision improves distantly supervised relation extraction. What is “carefully selected”: Propose new active learning criterion. Evaluate a number of questions:

Is the proposed criterion better than other methods? Where is the supervision helping? How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 5 / 19

slide-17
SLIDE 17

Distant Supervision

(Barack Obama, EmployedBy, United States)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 6 / 19

slide-18
SLIDE 18

Multiple-Instance Multiple-Label (MIML) Learning

(Barack Obama, EmployedBy, United States)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 6 / 19

slide-19
SLIDE 19

Distant Supervision

y x y x y x

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 7 / 19

↑ Barack Obama is the 44th and

current president of the United States

↓ EmployedBy

slide-20
SLIDE 20

Distant Supervision

y x y x y x

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 7 / 19

↑ Barack Obama is the 44th and

current president of the United States

↓ EmployedBy

slide-21
SLIDE 21

Multiple-Instance

y z2 x2 z1 x1 z3 x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 8 / 19

Latent per-mention relation →

slide-22
SLIDE 22

Multiple-Instance

y z2 x2 z1 x1 z3 x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 8 / 19

Latent per-mention relation →

slide-23
SLIDE 23

Multiple-Instance Multiple-Label (MIML-RE)

y1 y2

...

yn−1 yn z2 x2 z1 x1 z3 x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 8 / 19

slide-24
SLIDE 24

Active Learning

Old problem: Supervision is expensive, but very useful. Old solution: Active learning!

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 9 / 19

slide-25
SLIDE 25

Active Learning

Old problem: Supervision is expensive, but very useful. Old solution: Active learning! Select a subset of latent z to annotate. Fix these labels during training.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 9 / 19

slide-26
SLIDE 26

Active Learning

Old problem: Supervision is expensive, but very useful. Old solution: Active learning! Select a subset of latent z to annotate. Fix these labels during training. Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 9 / 19

slide-27
SLIDE 27

Active Learning

Old problem: Supervision is expensive, but very useful. Old solution: Active learning! Select a subset of latent z to annotate. Fix these labels during training. Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Some Statistics 1,208,524 latent z which we could annotate. $0.13 per annotation. $160,000 to annotate everything.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 9 / 19

slide-28
SLIDE 28

Active Learning

Old problem: Supervision is expensive, but very useful. Old solution: Active learning! Select a subset of latent z to annotate. Fix these labels during training. Bonus: this creates a supervised training set.

We initialize from a supervised classifier on this training set.

Some Statistics 1,208,524 latent z which we could annotate. $0.13 per annotation. $160,000 to annotate everything. New spin: Have to get it right the first time.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 9 / 19

slide-29
SLIDE 29

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-30
SLIDE 30

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

2

For each latent z, each trained model c predicts a multinomial Pc(z).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-31
SLIDE 31

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

2

For each latent z, each trained model c predicts a multinomial Pc(z).

3

Calculate Jensen-Shannon divergence: 1

k ∑k c=1 KL(pc(z)||pmean(z)).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-32
SLIDE 32

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

2

For each latent z, each trained model c predicts a multinomial Pc(z).

3

Calculate Jensen-Shannon divergence: 1

k ∑k c=1 KL(pc(z)||pmean(z)).

4

We have measure of disagreement for each z.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-33
SLIDE 33

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

2

For each latent z, each trained model c predicts a multinomial Pc(z).

3

Calculate Jensen-Shannon divergence: 1

k ∑k c=1 KL(pc(z)||pmean(z)).

4

We have measure of disagreement for each z. Three selection criteria Sample uniformly (uniform).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-34
SLIDE 34

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

2

For each latent z, each trained model c predicts a multinomial Pc(z).

3

Calculate Jensen-Shannon divergence: 1

k ∑k c=1 KL(pc(z)||pmean(z)).

4

We have measure of disagreement for each z. Three selection criteria Sample uniformly (uniform). Take z with highest disagreement (highJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-35
SLIDE 35

Example Selection Criteria

1

Train k MIML-RE models on k subsets of the data.

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

2

For each latent z, each trained model c predicts a multinomial Pc(z).

3

Calculate Jensen-Shannon divergence: 1

k ∑k c=1 KL(pc(z)||pmean(z)).

4

We have measure of disagreement for each z. Three selection criteria Sample uniformly (uniform). Take z with highest disagreement (highJS). Sample z with highest disagreement (sampleJS).

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 10 / 19

slide-36
SLIDE 36

Example Selection Criteria

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ·104 JS Disagreement # Examples

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 11 / 19

← Mostly easy examples

Potentially Non-representative examples ↓

slide-37
SLIDE 37

Example Selection Criteria

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ·104 JS Disagreement # Examples

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 11 / 19

← Mostly easy examples

Potentially Non-representative examples ↓

slide-38
SLIDE 38

Example Selection Criteria

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 ·104 JS Disagreement # Examples

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 11 / 19

← Mostly easy examples

Potentially Non-representative examples ↓

slide-39
SLIDE 39

Example Selection Criteria

Committee Member Judgments

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

Sentence Member A Member B Member C Obama was born in Hawaii

born born no relation

Obama grew up in Hawaii

born lived in born

Obama Bear visits Hawaii

no relation born employee of

President Obama ...

title title title

Obama employed president ...

employee of title employee of

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 12 / 19

slide-40
SLIDE 40

Example Selection Criteria

Committee Member Judgments

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

Sentence Member A Member B Member C Obama was born in Hawaii

born born no relation

Obama grew up in Hawaii

born lived in born

Obama Bear visits Hawaii

no relation born employee of

President Obama ...

title title title

Obama employed president ...

employee of title employee of

Uniform: Often annotates easy sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 12 / 19

slide-41
SLIDE 41

Example Selection Criteria

Committee Member Judgments

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

Sentence Member A Member B Member C Obama was born in Hawaii

born born no relation

Obama grew up in Hawaii

born lived in born

Obama Bear visits Hawaii

no relation born employee of

President Obama ...

title title title

Obama employed president ...

employee of title employee of

Uniform: Often annotates easy sentences. High JS (disagreement): More likely to annotate “rare” sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 12 / 19

slide-42
SLIDE 42

Example Selection Criteria

Committee Member Judgments

y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3 y1 y2 ... yn−1 yn z2 x2 z1 x1 z3 x3

Sentence Member A Member B Member C Obama was born in Hawaii

born born no relation

Obama grew up in Hawaii

born lived in born

Obama Bear visits Hawaii

no relation born employee of

President Obama ...

title title title

Obama employed president ...

employee of title employee of

Uniform: Often annotates easy sentences. High JS (disagreement): More likely to annotate “rare” sentences. Sample JS (disagreement): Mix of hard and representative sentences.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 12 / 19

slide-43
SLIDE 43

Experiments

Recall our questions: Is the proposed criterion better than other methods? Where is the supervision helping? How far can we get with a supervised classifier?

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 13 / 19

slide-44
SLIDE 44

Experiments

Recall our questions: Is the proposed criterion better than other methods? Where is the supervision helping? How far can we get with a supervised classifier? Two experimental setups: Slot filling evaluation of Surdeanu et al. (2012). Stanford’s 2013 TAC-KBP slot filling system

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 13 / 19

slide-45
SLIDE 45

Experiments

Recall our questions: Is the proposed criterion better than other methods? Where is the supervision helping? How far can we get with a supervised classifier? Two experimental setups: Slot filling evaluation of Surdeanu et al. (2012). Stanford’s 2013 TAC-KBP slot filling system Bonus: 4.4 F1 improvement on 2014 TAC-KBP competition

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 13 / 19

slide-46
SLIDE 46

Old News: MIML-RE Works Well

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 14 / 19

slide-47
SLIDE 47

Old News: MIML-RE Works Well

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Surdeanu et al. (2012) Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 14 / 19

slide-48
SLIDE 48

Old News: MIML-RE Works Well

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall MIML-RE (our baseline) Surdeanu et al. (2012) Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 14 / 19

slide-49
SLIDE 49

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-50
SLIDE 50

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Uniform MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-51
SLIDE 51

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall High JS Uniform MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-52
SLIDE 52

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Sample JS High JS Uniform MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-53
SLIDE 53

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Sample JS MIML-RE (our baseline)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-54
SLIDE 54

Active learning is important; SampleJS performs well.

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Sample JS MIML-RE (our baseline) Distant Supervision

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-55
SLIDE 55

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-56
SLIDE 56

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-57
SLIDE 57

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 No active learning 38.0 30.5 33.8

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-58
SLIDE 58

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 No active learning 38.0 30.5 33.8 Sample uniformly 34.4 35.0 34.7

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-59
SLIDE 59

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 No active learning 38.0 30.5 33.8 Sample uniformly 34.4 35.0 34.7 Highest JS disagreement 46.2 30.8 37.0

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-60
SLIDE 60

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 No active learning 38.0 30.5 33.8 Sample uniformly 34.4 35.0 34.7 Highest JS disagreement 46.2 30.8 37.0 Sample JS disagreement 39.4 36.2 37.7

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-61
SLIDE 61

SampleJS performs best on TAC-KBP challenge.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 No active learning 38.0 30.5 33.8 Sample uniformly 34.4 35.0 34.7 Highest JS disagreement 46.2 30.8 37.0 Sample JS disagreement 39.4 36.2 37.7 2014 TAC-KBP Slot Filling Challenge: 27.6 → 32.0 F1.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 15 / 19

slide-62
SLIDE 62

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping? What if we initialize with distant supervision?

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides better initialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 16 / 19

slide-63
SLIDE 63

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping? What if we initialize with distant supervision?

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Uniform MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides better initialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 16 / 19

slide-64
SLIDE 64

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping? What if we initialize with distant supervision?

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall High JS Uniform MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides better initialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 16 / 19

slide-65
SLIDE 65

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping? What if we initialize with distant supervision?

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Sample JS High JS Uniform MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides better initialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 16 / 19

slide-66
SLIDE 66

Good initialization is more important than constraining EM.

Is initialization or fixing latent zs during EM helping? What if we initialize with distant supervision?

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Sample JS High JS Uniform MIML-RE (our baseline)

Hypothesis: Supervision not only smooths the objective but provides better initialization for the non-convex objective.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 16 / 19

slide-67
SLIDE 67

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 MIML-RE (baseline) 38.0 30.5 33.8

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 17 / 19

slide-68
SLIDE 68

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 MIML-RE (baseline) 38.0 30.5 33.8 Supervised from SampleJS 33.5 35.0 34.2

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 17 / 19

slide-69
SLIDE 69

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 MIML-RE (baseline) 38.0 30.5 33.8 Supervised from SampleJS 33.5 35.0 34.2 MIML-RE Supervised init. 35.1 35.6 35.5 SampleJS 39.4 36.2 37.7

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 17 / 19

slide-70
SLIDE 70

A supervised classifier performs surprisingly well.

TAC-KBP 2013 Slot Filling Challenge: End-to-end task – includes IR + consistency. Precision: facts LDC evaluators judged as correct. Recall: facts other teams (including LDC annotators) also found. System P R F1 MIML-RE (baseline) 38.0 30.5 33.8 Supervised from SampleJS 33.5 35.0 34.2 MIML-RE Supervised init. 35.1 35.6 35.5 SampleJS 39.4 36.2 37.7 A bit circular: Need MIML-RE to get supervised examples.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 17 / 19

slide-71
SLIDE 71

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier (artist rendition) (150 lines + featurizer)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 18 / 19

slide-72
SLIDE 72

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier (artist rendition) (150 lines + featurizer)

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 18 / 19

slide-73
SLIDE 73

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier (artist rendition) (150 lines + featurizer) Annotating examples: $1330

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 18 / 19

slide-74
SLIDE 74

A Case for Supervised Classifiers

Stanford’s KBP system Supervised Classifier (artist rendition) (150 lines + featurizer) Annotating examples: $1330 Flight to Qatar: $1027 Apple 27” Screen: $999

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 18 / 19

slide-75
SLIDE 75

Conclusions

Things you can use: New active learning criterion: Sampling disagreement between a committee of classifiers. Corpus of supervised examples for TAC-KBP relations. 4.4 F1 improvement on 2014 KBP Slot Filling.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 19 / 19

slide-76
SLIDE 76

Conclusions

Things you can use: New active learning criterion: Sampling disagreement between a committee of classifiers. Corpus of supervised examples for TAC-KBP relations. 4.4 F1 improvement on 2014 KBP Slot Filling. Things we’ve learned: Example selection is very important for performance. MIML-RE is sensitive to initialization. Supervised classifiers can perform similarly to distantly supervised methods.

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 19 / 19

slide-77
SLIDE 77

Conclusions

Things you can use: New active learning criterion: Sampling disagreement between a committee of classifiers. Corpus of supervised examples for TAC-KBP relations. 4.4 F1 improvement on 2014 KBP Slot Filling. Things we’ve learned: Example selection is very important for performance. MIML-RE is sensitive to initialization. Supervised classifiers can perform similarly to distantly supervised methods. Thank You!

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 19 / 19

slide-78
SLIDE 78

Comparison to Pershina et al. (ACL 2014)

Slot filling evaluation of Surdeanu et al. (2012).

0.2 0.3 0.4 0.5 0.6 0.7 0.05 0.1 0.15 0.2 0.25 0.3 Precision Recall Sample JS Pershina et al. (2014) MIML-RE

Angeli, Tibshirani, Wu, Manning (Stanford) Combining Distant and Partial Supervision ... October 28, 2014 19 / 19