[PPT] - Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted PowerPoint Presentation

SLIDE 1

Parsing III

Maria Ryskina – CMU Slides adapted from: Dan Klein – UC Berkeley Taylor Berg-Kirkpatrick, Yulia Tsvetkov – CMU

Algorithms for NLP

SLIDE 2

Learning PCFGs

SLIDE 3

Treebank PCFGs

▪ Use PCFGs for broad coverage parsing ▪ Can take a grammar right off the trees (doesn’t work well):

ROOT → S 1 S → NP VP . 1 NP → PRP 1 VP → VBD ADJP 1 …..

Model F1 Baseline 72.0

[Charniak 96]

SLIDE 4

Conditional Independence?

▪ Not every NP expansion can fill every NP slot

▪ A grammar with symbols like “NP” won’t be context-free ▪ Statistically, conditional independence too strong

SLIDE 5

Non-Independence

▪ Independence assumptions are often too strong. ▪ Example: the expansion of an NP is highly dependent on the parent of the NP (i.e., subjects vs. objects). ▪ Also: the subject and object expansions are correlated!

NP PP DT NN PRP 6% 9% 11% NP PP DT NN PRP 21% 9% 9% NP PP DT NN PRP 4% 7% 23%

All NPs NPs under S NPs under VP

SLIDE 6

Grammar Refinement

▪ Example: PP attachment

SLIDE 7

Grammar Refinement

▪ Example: PP attachment

SLIDE 8

Grammar Refinement

▪ Example: PP attachment

SLIDE 9

Grammar Refinement

▪ Example: PP attachment

SLIDE 10

Grammar Refinement

▪ Example: PP attachment

SLIDE 11

Grammar Refinement

▪ Example: PP attachment

SLIDE 12

Grammar Refinement

▪ Example: PP attachment

SLIDE 13

Grammar Refinement

SLIDE 14

Grammar Refinement

▪ Structural Annotation [Johnson ’98, Klein&Manning ’03]

SLIDE 15

Grammar Refinement

▪ Structural Annotation [Johnson ’98, Klein&Manning ’03] ▪ Lexicalization [Collins ’99, Charniak ’00]

SLIDE 16

Grammar Refinement

▪ Structural Annotation [Johnson ’98, Klein&Manning ’03] ▪ Lexicalization [Collins ’99, Charniak ’00] ▪ Latent Variables [Matsuzaki et al. ’05, Petrov et al. ’06]

SLIDE 17

Structural Annotation

SLIDE 18

▪ Annotation refines base treebank symbols to improve statistical fit of the grammar

▪ Structural annotation

The Game of Designing a Grammar

SLIDE 19

Typical Experimental Setup

▪ Corpus: Penn Treebank, WSJ ▪ Accuracy – F1: harmonic mean of per-node labeled precision and recall. ▪ Here: also size – number of symbols in grammar.

Training: sections 02-21 Development: section 22 (here, first 20 files) Test: section 23

SLIDE 20

Vertical Markovization

▪ Vertical Markov

rder: rewrites

depend on past k ancestor nodes. (cf. parent annotation)

Order 1 Order 2

72% 74% 76% 78% Vertical Markov Order 1 2 3 Symbols 6250 12500 18750 25000 Vertical Markov Order 1 2 3

SLIDE 21

Horizontal Markovization

71% 72% 72% 73% 73% Horizontal Markov Order 1 2 inf Symbols 3000 6000 9000 12000 Horizontal Markov Order 1 2 inf

Order 1 Order ∞

SLIDE 22

Binarization / Markovization

NP DT JJNN NN

SLIDE 23

Binarization / Markovization

NP DT JJNN NN v=1,h=∞

SLIDE 24

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ NP

SLIDE 25

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

SLIDE 26

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

@NP[DT]

SLIDE 27

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

@NP[DT]

JJ

@NP[DT ,JJ]

SLIDE 28

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

@NP[DT] @NP[DT ,JJ,NN]

NN JJ

@NP[DT ,JJ]

SLIDE 29

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

@NP[DT] @NP[DT ,JJ,NN]

NN JJ

@NP[DT ,JJ]

NN

SLIDE 30

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

@NP[DT] @NP[DT ,JJ,NN]

NN JJ

@NP[DT ,JJ]

NN v=1,h=1 DT NP JJ

@NP[DT] @NP[…,NN]

NN

@NP[…,JJ]

NN

SLIDE 31

Binarization / Markovization

NP DT JJNN NN v=1,h=∞ DT NP

@NP[DT] @NP[DT ,JJ,NN]

NN JJ

@NP[DT ,JJ]

NN v=1,h=0 DT NP JJ

@NP @NP

NN

@NP

NN v=1,h=1 DT NP JJ

@NP[DT] @NP[…,NN]

NN

@NP[…,JJ]

NN

SLIDE 32

Binarization / Markovization

NP DT JJNN NN v=2,h=∞

DT^NP NP^VP JJ^NP

@NP^VP[DT] @NP^VP[DT ,JJ,NN]

NN^NP

@NP^VP[DT ,JJ]

NN^NP

v=2,h=0

DT^NP NP^VP JJ^NP

@NP @NP

NN^NP

@NP

NN^NP

v=2,h=1

DT^NP NP^VP JJ^NP

@NP^VP[DT] @NP^VP[…,NN]

NN^NP

@NP^VP[…,JJ]

NN^NP

SLIDE 33

Unary Splits

▪ Problem: unary rewrites used to transmute categories so a high-probability rule can be used.

Annotation F1 Size Base 77.8 7.5K UNARY 78.3 8.0K

SLIDE 34

Unary Splits

▪ Problem: unary rewrites used to transmute categories so a high-probability rule can be used.

Annotation F1 Size Base 77.8 7.5K UNARY 78.3 8.0K

■ Solution: Mark

unary rewrite sites with -U

SLIDE 35

Unary Splits

▪ Problem: unary rewrites used to transmute categories so a high-probability rule can be used.

Annotation F1 Size Base 77.8 7.5K UNARY 78.3 8.0K

■ Solution: Mark

unary rewrite sites with -U

SLIDE 36

Tag Splits

▪ Problem: Treebank tags are too coarse. ▪ Example: Sentential, PP , and other prepositions are all marked IN. ▪ Partial Solution:

▪ Subdivide the IN tag.

Annotation F1 Size Previous 78.3 8.0K SPLIT-IN 80.3 8.1K

SLIDE 37

Tag Splits

▪ Problem: Treebank tags are too coarse. ▪ Example: Sentential, PP , and other prepositions are all marked IN. ▪ Partial Solution:

▪ Subdivide the IN tag.

Annotation F1 Size Previous 78.3 8.0K SPLIT-IN 80.3 8.1K

SLIDE 38

Tag Splits

▪ Problem: Treebank tags are too coarse. ▪ Example: Sentential, PP , and other prepositions are all marked IN. ▪ Partial Solution:

▪ Subdivide the IN tag.

Annotation F1 Size Previous 78.3 8.0K SPLIT-IN 80.3 8.1K

SLIDE 39

A Fully Annotated (Unlex) Tree

SLIDE 40

Some Test Set Results

▪ Beats “first generation” lexicalized parsers. ▪ Lots of room to improve – more complex models next.

Parser LP LR F1 CB 0 CB Magerman 95 84.9 84.6 84.7 1.26 56.6 Collins 96 86.3 85.8 86.0 1.14 59.9 Unlexicalized 86.9 85.7 86.3 1.10 60.3 Charniak 97 87.4 87.5 87.4 1.00 62.1 Collins 99 88.7 88.6 88.6 0.90 67.1

SLIDE 41

Efficient Parsing for  Structural Annotation

SLIDE 42

Grammar Projections

Coarse Grammar Fine Grammar

DT NP JJ

@NP @NP

NN

@NP

NN

DT^NP NP^VP JJ^NP

@NP^VP[DT] @NP^VP[…,NN]

NN^NP

@NP^VP[…,JJ]

NN^NP

SLIDE 43

Grammar Projections

NP → DT @NP

Coarse Grammar Fine Grammar

DT NP JJ

@NP @NP

NN

@NP

NN

DT^NP NP^VP JJ^NP

@NP^VP[DT] @NP^VP[…,NN]

NN^NP

@NP^VP[…,JJ]

NN^NP

SLIDE 44

Grammar Projections

NP → DT @NP

Coarse Grammar Fine Grammar

DT NP JJ

@NP @NP

NN

@NP

NN

DT^NP NP^VP JJ^NP

@NP^VP[DT] @NP^VP[…,NN]

NN^NP

@NP^VP[…,JJ]

NN^NP

NP^VP → DT^NP @NP^VP[DT]

SLIDE 45

Grammar Projections

NP → DT @NP

Coarse Grammar Fine Grammar

DT NP JJ

@NP @NP

NN

@NP

NN

DT^NP NP^VP JJ^NP

@NP^VP[DT] @NP^VP[…,NN]

NN^NP

@NP^VP[…,JJ]

NN^NP

NP^VP → DT^NP @NP^VP[DT]

Note: X-Bar Grammars are projections with rules like XP → Y @X or XP → @X Y or @X → X

SLIDE 46

Grammar Projections

NP

Coarse Symbols Fine Symbols

DT @NP NP^VP NP^S @NP^VP[DT] @NP^S[DT] @NP^VP[…,JJ] @NP^S[…,JJ] DT^NP

SLIDE 47