Effective Self-Training for Parsing David McClosky - - PowerPoint PPT Presentation

effective self training for parsing
SMART_READER_LITE
LIVE PREVIEW

Effective Self-Training for Parsing David McClosky - - PowerPoint PPT Presentation

Effective Self-Training for Parsing David McClosky dmcc@cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 1


slide-1
SLIDE 1

Effective Self-Training for Parsing

David McClosky

dmcc@cs.brown.edu

Brown Laboratory for Linguistic Information Processing (BLLIP) Joint work with Eugene Charniak and Mark Johnson

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 1

slide-2
SLIDE 2

Parsing

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

slide-3
SLIDE 3

Parsing

“I need a sentence with ambiguity.”

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

slide-4
SLIDE 4

Parsing

S NP PRP I VP VBP need NP NP DT a NN sentence PP IN with NP NN ambiguity . .

“I need a sentence with ambiguity.”

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 2

slide-5
SLIDE 5

Parsing

s is a sentence π is a parse tree parse(s) = arg max

π

p(π | s) such that yield(π) = s

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 3

slide-6
SLIDE 6

Flow Chart

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 4

slide-7
SLIDE 7

Flow Chart

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 5

slide-8
SLIDE 8

n-best parsing

NP a sentence PP with ambiguity S NP PRP I VP VBP need NP . . p(π1) = 7.25 × 10−20 NP a sentence PP with ambiguity S NP PRP I VP VBP need . . p(π2) = 7.05 × 10−21

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 6

slide-9
SLIDE 9

Reranking Parsers

Best parses are not always first, but the correct parse is often in the top 50 Rerankers rescore parses from the n-best parser using more complex (not necessarily context-free) features Oracle rerankers on the Charniak parser’s 50-best list can achieve over 95% f-score

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 7

slide-10
SLIDE 10

Flow Chart

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 8

slide-11
SLIDE 11

Our reranking parser

Parser and reranker as described in Charniak and Johnson (ACL 2005) with new features Lexicalized context-free generative parser, maximum entropy discriminative reranker New reranking features improve reranking parser’s performance by 0.3% on section 23

  • ver ACL 2005

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 9

slide-12
SLIDE 12

Unlabelled data

Question: Can we improve the reranking parser with cheap unlabeled data?

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 10

slide-13
SLIDE 13

Unlabelled data

Question: Can we improve the reranking parser with cheap unlabeled data? Self-training Co-training Clustering n-grams, use clusters as general class of n-grams Improve vocabulary, n-gram language model etc.

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 10

slide-14
SLIDE 14

Self-training

Train model from labeled data

train reranking parser on WSJ

Use model to annotate unlabeled data

use model to parse NANC

Combine annotated data with labeled training data

merge WSJ training data with parsed NANC data

Train a new model from the combined data

train reranking parser on WSJ+NANC data

Optional: repeat with new model on more unlabeled data

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 11

slide-15
SLIDE 15

Flow Chart

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 12

slide-16
SLIDE 16

Previous work

Parsing: Charniak (1997), confirmed by Steedman et al. (2003)

insignificant improvement

Part of speech tagging: Clark et al. (2003)

minor improvement/damage depending on amount of training data

Parser adaptation: Bacchiani et al. (2006)

helps when parsing WSJ when training on Brown corpus and self-training on news data

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 13

slide-17
SLIDE 17

Experiments (overview)

How should we annotate data? (parser or reranking parser) How much unlabelled data should we label? How should we combine annotated unlabeled data with true data?

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 14

slide-18
SLIDE 18

Annotating unlabeled data

Annotator Sentences added Parser Reranking parser 0 (baseline) 90.3 50k 90.1 90.7 500k 90.0 90.9 1,000k 90.0 90.8 1,500k 90.0 90.8 2,000k 91.0 Parser (not reranking parser) f-scores

  • n all sentences in section 22

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 15

slide-19
SLIDE 19

Annotating unlabeled data

WSJ Section

Sentences added 1 22 24 0 (baseline) 91.8 92.1 90.5 50k 91.8 92.4 90.8 500k 92.0 92.4 90.9 1,000k 92.1 92.2 91.3 2,000k 92.2 92.0 91.3 Reranking parser f-scores for all sentences

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 16

slide-20
SLIDE 20

Weighting WSJ data

Wall Street Journal data is more reliable than the self-trained data Multiply each event in Wall Street Journal data by a constant to give it a higher relative weight events = c × eventswsj + eventsnanc Increasing WSJ weight tends to improve f-scores. Based on development data, our best model is

WSJ×5+1,750k sentences from NANC

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 17

slide-21
SLIDE 21

Evaluation on test section

Model fparser freranker Charniak and Johnson (2005) – 91.0 Current baseline 89.7 91.3 Self-trained 91.0 92.1

f-scores from all sentences in WSJ section 23

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 18

slide-22
SLIDE 22

The Story So Far...

Retraining parser on its own output doesn’t help Retraining parser on the reranker’s output helps Retraining reranker on the reranker’s output doesn’t help

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 19

slide-23
SLIDE 23

Analysis: Global changes

Oracle f-scores increase, self-trained parser has greater potential Model 1-best 10-best 50-best Baseline 89.0 94.0 95.9

WSJ×1 + 250k

89.8 94.6 96.2

WSJ×5 + 1,750k

90.4 94.8 96.4 Average of log2

Pr(1-best) Pr(50th-best) increases from 12.0

(baseline parser) to 14.1 (self-trained parser)

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 20

slide-24
SLIDE 24

Sentence-level Analysis

10 20 30 40 50 60 20 40 60 80 100 Sentence length Number of sentences (smoothed) Better No change Worse 1 2 3 4 5 500 1000 1500 2000 Unknown words Number of sentences Better No change Worse 1 2 3 4 5 500 1000 1500 2000 Number of CCs Number of sentences Better No change Worse 2 4 6 8 10 200 400 600 Number of INs Number of sentences Better No change Worse

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 21

slide-25
SLIDE 25

Effect of Sentence Length

10 20 30 40 50 60 20 40 60 80 100 Sentence length Number of sentences (smoothed) Better No change Worse

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 22

slide-26
SLIDE 26

The Goldilocks EffectTM

10 20 30 40 50 60 20 40 60 80 100 Sentence length Number of sentences (smoothed) Better No change Worse

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 23

slide-27
SLIDE 27

. . . and . . .

1 2 3 4 5 500 1000 1500 2000 Number of CCs Number of sentences Better No change Worse

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 24

slide-28
SLIDE 28

Ongoing work

Parser adaptation (McClosky, Charniak, and Johnson ACL 2006) Sentence selection Clustering local trees Other ways of combining data

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 25

slide-29
SLIDE 29

Conclusions

Self-training can improve on state-of-the-art parsing for Wall Street Journal Reranking parsers can self-train their first stage parser More analysis is needed to understand why reranking is necessary

Self-trained reranking parser available from: ftp://ftp.cs.brown.edu/pub/nlparser

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 26

slide-30
SLIDE 30

Acknowledgements

This work was supported by NSF grants LIS9720368, and IIS0095940, and DARPA GALE contract HR0011-06-2-0001. Thanks to Michael Collins, Brian Roark, James Henderson, Miles Osborne, and the BLLIP team for their comments.

Questions?

David McClosky - dmcc@cs.brown.edu - NAACL 2006 - 6.5.2006 - 27