1
Natural Language Processing
Parsing II
Dan Klein – UC Berkeley
Natural Language Processing Parsing II Dan Klein UC Berkeley 1 - - PowerPoint PPT Presentation
Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank PCFGs [Charniak 96] Use PCFGs for broad coverage parsing Can take a grammar right off the trees (doesnt work well): ROOT S 1 S NP
1
Parsing II
Dan Klein – UC Berkeley
2
3
ROOT S 1 S NP VP . 1 NP PRP 1 VP VBD ADJP 1 …..
Model F1 Baseline 72.0
[Charniak 96]
4
5
parent of the NP (i.e., subjects vs. objects).
11% 9% 6% NP PP DT NN PRP
9% 9% 21% NP PP DT NN PRP 7% 4% 23% NP PP DT NN PRP
All NPs NPs under S NPs under VP
6
7
8
9
improve statistical fit of the grammar
10
precision and recall.
Training: sections 02-21 Development: section 22 (here, first 20 files) Test: section 23
11
depend on past k ancestor nodes. (cf. parent annotation)
Order 1 Order 2
72% 73% 74% 75% 76% 77% 78% 79% 1 2v 2 3v 3 Vertical Markov Order 5000 10000 15000 20000 25000 1 2v 2 3v 3 Vertical Markov Order Symbols
12
70% 71% 72% 73% 74% 1 2v 2 inf Horizontal Markov Order 3000 6000 9000 12000 1 2v 2 inf Horizontal Markov Order Symbols
Order 1 Order
13
rewrites used to transmute categories so a high‐probability rule can be used.
Annotation F1 Size Base 77.8 7.5K UNARY 78.3 8.0K
Solution: Mark
unary rewrite sites with -U
14
are too coarse.
and other prepositions are all marked IN.
Annotation F1 Size Previous 78.3 8.0K SPLIT-IN 80.3 8.1K
15
16
Parser LP LR F1 CB 0 CB Magerman 95 84.9 84.6 84.7 1.26 56.6 Collins 96 86.3 85.8 86.0 1.14 59.9 Unlexicalized 86.9 85.7 86.3 1.10 60.3 Charniak 97 87.4 87.5 87.4 1.00 62.1 Collins 99 88.7 88.6 88.6 0.90 67.1
17
18
NP^S → DT^NP N’[…DT]^NP NP → DT N’ Coarse Grammar Fine Grammar
Note: X‐Bar Grammars are projecons with rules like XP → Y X’ or XP → X’ Y or X’ → X
19
For each coarse chart item X[i,j], compute posterior probability:
… QP NP VP …
coarse: refined: E.g. consider the span 5 to 12:
< threshold
20
21
22
search without sacrificing
process first
merit” [Charniak 98]
valid A* heuristic, no loss of
Manning 03]
X n i j
23
24
25
statistical fit of the grammar
26
27
28
each phrasal node
heads
treebanks
e.g.:
29
30
Choose a head tag and word Choose a complement bag Generate children (incl. adjuncts) Recursively derive children
31
bestScore(X,i,j,h) if (j = i+1) return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * bestScore(Y,i,k,h) * bestScore(Z,k,j,h’) max score(X[h]->Y[h’] Z[h]) * bestScore(Y,i,k,h’) * bestScore(Z,k,j,h) Y[h] Z[h’] X[h] i h k h’ j
k,h’,X->YZ
(VP->VBD )[saw] NP[her] (VP->VBD...NP )[saw]
k,h’,X->YZ
32
33
Y[h] Z[h’] X[h] i h k h’ j Y[h] Z X[h] i h k j
34
cell beams [Collins 99]
each span <i,j>.
then we do at most O(nK2) work per span (why?)
practice is more like linear!)
entirely on the basis of punctuation (crucial for speed)
Y[h] Z[h’] X[h] i h k h’ j
35
to‐fine approach [Charniak 97+]
36
lexical / reranked)
37
38
statistical fit of the grammar
39
Parse Tree Sentence Parameters ... Derivations
40
Forward
EM algorithm: X1 X2 X7 X4 X5 X6 X3
He was right .
Just like Forward‐Backward for HMMs.
Backward
41
DT DT-1 DT-2 DT-3 DT-4
42
Hierarchical refinement
43
74 76 78 80 82 84 86 88 90 100 300 500 700 900 1100 1300 1500 1700
Total Number of grammar symbols Parsing accuracy (F1)
Model F1 Flat Training 87.3 Hierarchical Training 88.4
44
45
were least useful
46
Model F1 Previous 88.4 With 50% Merging 89.5
47
5 10 15 20 25 30 35 40 NP VP PP ADVP S ADJP SBAR QP WHNP PRN NX SINV PRT WHPP SQ CONJP FRAG NAC UCP WHADVP INTJ SBARQ RRC WHADJP X ROOT LST
Number of Phrasal Subcategories
48
Number of Lexical Subcategories
10 20 30 40 50 60 70 NNP JJ NNS NN VBN RB VBG VB VBD CD IN VBZ VBP DT NNPS CC JJR JJS : PRP PRP$ MD RBR WP POS PDT WRB
. EX WP$ WDT
'' FW RBS TO $ UH , `` SYM RP LS #
49
NNP-14 Oct. Nov. Sept. NNP-12 John Robert James NNP-2 J. E. L. NNP-1 Bush Noriega Peters NNP-15 New San Wall NNP-3 York Francisco Street PRP-0 It He I PRP-1 it he they PRP-2 it them him
50
RBR-0 further lower higher RBR-1 more less More RBR-2 earlier Earlier later CD-7
two Three CD-4 1989 1990 1988 CD-11 million billion trillion CD-0 1 50 100 CD-3 1 30 31 CD-9 78 58 34
51
≤ 40 words F1 all F1 ENG Charniak&Johnson ‘05 (generative) 90.1 89.6 Split / Merge 90.6 90.1 GER Dubey ‘05 76.3
80.8 80.1 CHN Chiang et al. ‘02 80.0 76.6 Split / Merge 86.3 83.4 Still higher numbers from reranking / self-training methods
52
53
?????????
54
… QP NP VP …
coarse: split in two:
… QP1 QP2 NP1 NP2 VP1 VP2 … … QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4 …
split in four: split in eight: …
… … … … … … … … … … … … … … … …
55
56