Reranking and Self-Training for Parser Adaptation David McClosky, - - PowerPoint PPT Presentation

reranking and self training for parser adaptation
SMART_READER_LITE
LIVE PREVIEW

Reranking and Self-Training for Parser Adaptation David McClosky, - - PowerPoint PPT Presentation

Reranking and Self-Training for Parser Adaptation David McClosky, Eugene Charniak, and Mark Johnson { dmcc|ec|mj } @cs.brown.edu Brown Laboratory for Linguistic Information Processing (BLLIP) David McClosky - dmcc@cs.brown.edu - ACL 2006 -


slide-1
SLIDE 1

Reranking and Self-Training for Parser Adaptation

David McClosky, Eugene Charniak, and Mark Johnson

{dmcc|ec|mj}@cs.brown.edu

Brown Laboratory for Linguistic Information Processing (BLLIP)

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 1

slide-2
SLIDE 2

Overview

Introduction and Previous Work Parser portability Parser adaptation Reranker portability Analysis Future Work and Conclusions

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 2

slide-3
SLIDE 3

Parsing

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 3

slide-4
SLIDE 4

Parameters

Parser as in [Charniak and Johnson ACL 2005] Corpus # words # sentences Parameters

WSJ

950,028 39,832 ∼ 2,200,000

BROWN

373,152 19,740 ∼ 1,300,000 Number of parameters is a function of training data.

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 4

slide-5
SLIDE 5

Parsing

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 5

slide-6
SLIDE 6

n-best Parsing

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 6

slide-7
SLIDE 7

Reranking Parsers

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 7

slide-8
SLIDE 8

More Parameters

Reranking parser as in [Charniak and Johnson 2005] 14 feature schemas Extract features according to schemas then estimate feature weights Corpus Parser parameters Reranker features

WSJ

∼ 2,200,000 ∼ 1,300,000

BROWN

∼ 1,300,000 ∼ 700,000 Again, number of parameters is a function of training data.

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 8

slide-9
SLIDE 9

Corpora and Domains

WSJ: labeled news text, about 40,000 parses NANC: unlabeled news text, about 24 million

sentences

BROWN: labeled text from various domains,

about 24,000 parses total

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 9

slide-10
SLIDE 10

Corpora and Domains

WSJ: labeled news text, about 40,000 parses NANC: unlabeled news text, about 24 million

sentences

BROWN: labeled text from various domains,

about 24,000 parses total

Divisions as in [Bacchiani et al. 2006] (based on [Gildea 2001]) 19,740 train, 2,078 tune, 2,425 test Treebanked sections are predominantly fiction Each division of the corpus consists of sentences from all available genres

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 9

slide-11
SLIDE 11

Self-Training

[McClosky, Charniak, and Johnson NAACL 2006]

Train model from labeled data

train reranking parser on WSJ

Use model to annotate unlabeled data

use model to parse NANC

Combine annotated data with labeled training data

merge parsed NANC data with WSJ training data

Train a new model from the combined data

train reranking parser on WSJ+NANC data

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 10

slide-12
SLIDE 12

Overtrained?

Question: How does setting so many parameters from Wall Street Journal data affect parsing performance on the Brown corpus?

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 11

slide-13
SLIDE 13

Previous Work

Training Testing f-measure Gildea Bacchiani

WSJ WSJ

86.4 87.0

WSJ BROWN

80.6 81.1

BROWN BROWN

84.0 84.7

WSJ+BROWN BROWN

84.3 85.6

[Gildea 2001], [Bacchiani et al. 2006]

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 12

slide-14
SLIDE 14

Previous Work

Training Testing f-measure Gildea Bacchiani

WSJ WSJ

86.4 87.0

WSJ BROWN

80.6 81.1

BROWN BROWN

84.0 84.7

WSJ+BROWN BROWN

84.3 85.6

[Gildea 2001], [Bacchiani et al. 2006]

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 12

slide-15
SLIDE 15

Previous Work

Training Testing f-measure Gildea Bacchiani

WSJ WSJ

86.4 87.0

WSJ BROWN

80.6 81.1

BROWN BROWN

84.0 84.7

WSJ+BROWN BROWN

84.3 85.6

[Gildea 2001], [Bacchiani et al. 2006]

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 12

slide-16
SLIDE 16

Previous Work

Training Testing f-measure Gildea Bacchiani

WSJ WSJ

86.4 87.0

WSJ BROWN

80.6 81.1

BROWN BROWN

84.0 84.7

WSJ+BROWN BROWN

84.3 85.6

[Gildea 2001], [Bacchiani et al. 2006]

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 12

slide-17
SLIDE 17

Previous Work

Training Testing f-measure Gildea Bacchiani

WSJ WSJ

86.4 87.0

WSJ BROWN

80.6 81.1

BROWN BROWN

84.0 84.7

WSJ+BROWN BROWN

84.3 85.6

[Gildea 2001], [Bacchiani et al. 2006]

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 12

slide-18
SLIDE 18

Summary of findings

The self-trained WSJ+NANC model does not appear to be overtrained. Both self-training and reranking techniques are fairly portable across domains.

WSJ data with these techniques gives

performance almost as good as actual BROWN corpus (does not work as well with more distant domains)

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 13

slide-19
SLIDE 19

Overview

Introduction and Previous Work Parser portability Parser adaptation Reranker portability Analysis Future Work and Conclusions

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 14

slide-20
SLIDE 20

Parser Portability

Task: Use existing data/models from source domain to parse target domain. Train:

WSJ

Test:

BROWN

Variables: Parser vs. reranker parser Effect of self-training on NANC

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 15

slide-21
SLIDE 21

Parser Portability

Train Test Parser Reranking Parser

WSJ WSJ

89.7 91.0

WSJ BROWN

83.9 85.8

f-score on WSJ section 23 and BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 16

slide-22
SLIDE 22

Parser Portability

Parsing model Parser Reranking Parser

WSJ baseline

83.9 85.8

WSJ+50k NANC

84.8 86.6

WSJ+250k NANC

85.7 87.2

WSJ+500k NANC

86.0 87.3

WSJ+1,000k NANC

86.2 87.3

WSJ+1,500k NANC

86.2 87.6

WSJ+2,500k NANC

86.4 87.7

f-score on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 17

slide-23
SLIDE 23

Parser Portability

Parsing model Parser Reranking Parser

WSJ baseline

83.9 85.8

WSJ+50k NANC

84.8 86.6

WSJ+250k NANC

85.7 87.2

WSJ+500k NANC

86.0 87.3

WSJ+1,000k NANC

86.2 87.3

WSJ+1,500k NANC

86.2 87.6

WSJ+2,500k NANC

86.4 87.7

BROWN baseline

86.4 87.7

f-score on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 18

slide-24
SLIDE 24

Parser Adaptation

Task: Use existing data/models from source domain with some target domain material to parse target domain. Train:

WSJ and/or BROWN

Test:

BROWN

Variables: Number of self-trained sentences added Amount of BROWN training data

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 19

slide-25
SLIDE 25

Labeled In-domain Data

Parser model Parser Reranker

WSJ alone

83.9 85.8

BROWN alone

86.3 87.4

WSJ+BROWN

86.5 88.1

f-score on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 20

slide-26
SLIDE 26

Adding Self-Trained Data

Parser model Parser Reranker

WSJ alone

83.9 85.8

WSJ+2,500k NANC

86.4 87.7

BROWN alone

86.3 87.4

BROWN+250k NANC

86.8 88.1

WSJ+BROWN

86.5 88.1

WSJ+BROWN+250k NANC

86.8 88.1

f-score on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 21

slide-27
SLIDE 27

Reranker Portability

Reranker Parser model Parser alone

WSJ BROWN WSJ

82.9 85.2 85.2

WSJ+NANC

87.1 87.8 87.9

BROWN

86.7 88.2 88.4

f-scores on BROWN test section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 22

slide-28
SLIDE 28

Reranker Portability

Reranker Parser model Parser alone

WSJ BROWN WSJ

82.9 85.2 85.2

WSJ+NANC

87.1 87.8 87.9

BROWN

86.7 88.2 88.4

f-scores on BROWN test section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 22

slide-29
SLIDE 29

Reranker Portability

Reranker Parser model Parser alone

WSJ BROWN WSJ

82.9 85.2 85.2

WSJ+NANC

87.1 87.8 87.9

BROWN

86.7 88.2 88.4

f-scores on BROWN test section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 22

slide-30
SLIDE 30

Analysis Overview

Oracle scores Parser agreement Per-category f-scores Factor analysis

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 23

slide-31
SLIDE 31

Oracle Scores

Model 1-best 10-best 25-best 50-best

WSJ

82.6 88.9 90.7 91.9

WSJ+NANC

86.4 92.1 93.5 94.3

BROWN

86.3 92.0 93.3 94.2

f-score on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 24

slide-32
SLIDE 32

Oracle Scores

Model 1-best 10-best 25-best 50-best

WSJ

82.6 88.9 90.7 91.9

WSJ+NANC

86.4 92.1 93.5 94.3

BROWN

86.3 92.0 93.3 94.2

f-score on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 24

slide-33
SLIDE 33

Parser Agreement

Bracketing agreement f-score 88.03% Complete match 44.92% Average crossing brackets 0.94 POS Tagging agreement 94.85%

Agreement of parses from WSJ+NANC reranking parser with parses from BROWN reranking parser

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 25

slide-34
SLIDE 34

Per-Category f-scores

Description Size

BROWN WSJ+NANC

∆ Popular Lore 271 87.3 89.6 2.28 Letters 281 87.6 87.1

  • 0.45

General fiction 333 87.2 85.9

  • 1.29

Mystery 318 88.7 88.3

  • 0.45

Science fiction 76 87.7 88.8 1.17 Adventure 378 89.7 89.0

  • 0.64

Romance 338 88.0 86.6

  • 1.40

Humor 83 84.6 87.0 2.45 f-scores on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 26

slide-35
SLIDE 35

Factor Analysis

Generalized linear model with binomial link with the predicted variable as

BROWN f-score > WSJ+NANC f-score

Explanatory variables: sentence length number of prepositions number of conjunctions

BROWN subcorpus ID

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 27

slide-36
SLIDE 36

Factor Analysis

Generalized linear model with binomial link with the predicted variable as

BROWN f-score > WSJ+NANC f-score

Explanatory variables: sentence length number of prepositions ⋆ number of conjunctions

BROWN subcorpus ID ⋆

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 27

slide-37
SLIDE 37

Per-Category f-scores

Description Size

BROWN WSJ+NANC

∆ Popular Lore 271 87.3 89.6 2.28 Letters ⋆ 281 87.6 87.1

  • 0.45

General fiction ⋆ 333 87.2 85.9

  • 1.29

Mystery ⋆ 318 88.7 88.3

  • 0.45

Science fiction 76 87.7 88.8 1.17 Adventure ⋆ 378 89.7 89.0

  • 0.64

Romance ⋆ 338 88.0 86.6

  • 1.40

Humor 83 84.6 87.0 2.45 f-scores on BROWN development section

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 28

slide-38
SLIDE 38

Future Work

Self-bridging corpora for harder domains To parse BioMedical, self-train on biology text books Deeper comparison of BROWN and WSJ rerankers Parallel experiments for Switchboard and BioMedical domains Further analysis

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 29

slide-39
SLIDE 39

Conclusions

The self-trained WSJ+NANC model does not appear to be overtrained. Both self-training and reranking techniques are fairly portable across domains.

WSJ data with these techniques gives

performance almost as good as actual BROWN corpus (does not work as well with more distant domains)

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 30

slide-40
SLIDE 40

Acknowledgements

This work was supported by NSF grants LIS9720368, and IIS0095940, and DARPA GALE contract HR0011-06-2-0001. We would like to thank the BLLIP team for their comments.

Questions?

David McClosky - dmcc@cs.brown.edu - ACL 2006 - 7.18.2006 - 31