Natural Language Processing Learning PCFGs Parsing II Dan Klein UC - PDF document

Natural Language Processing Learning PCFGs Parsing II Dan Klein – UC Berkeley Treebank PCFGs Conditional Independence? [Charniak 96]  Use PCFGs for broad coverage parsing  Can take a grammar right off the trees (doesn’t work well): ROOT  S 1 S  NP VP . 1 NP  PRP 1 VP  VBD ADJP 1 …..  Not every NP expansion can fill every NP slot  A grammar with symbols like “NP” won’t be context ‐ free  Statistically, conditional independence too strong Model F1 Baseline 72.0 Non ‐ Independence Grammar Refinement  Independence assumptions are often too strong.  Example: PP attachment All NPs NPs under S NPs under VP 23% 21% 11% 9% 9% 9% 7% 6% 4% NP PP DT NN PRP NP PP DT NN PRP NP PP DT NN PRP  Example: the expansion of an NP is highly dependent on the parent of the NP (i.e., subjects vs. objects).  Also: the subject and object expansions are correlated! 1

Grammar Refinement Structural Annotation  Structure Annotation [Johnson ’98, Klein&Manning ’03]  Lexicalization [Collins ’99, Charniak ’00]  Latent Variables [Matsuzaki et al. 05, Petrov et al. ’06] Typical Experimental Setup The Game of Designing a Grammar  Corpus: Penn Treebank, WSJ Training: sections 02-21 Development: section 22 (here, first 20 files) Test: section 23  Annotation refines base treebank symbols to  Accuracy – F1: harmonic mean of per ‐ node labeled improve statistical fit of the grammar precision and recall.  Here: also size – number of symbols in grammar.  Structural annotation Vertical Markovization Horizontal Markovization Order  Order 1 Order 2 Order 1  Vertical Markov order: rewrites depend on past k ancestor nodes. (cf. parent annotation) 74% 12000 79% 25000 73% 9000 78% Symbols 20000 77% Symbols 72% 6000 15000 76% 75% 10000 71% 3000 74% 5000 73% 70% 0 72% 0 0 1 2v 2 inf 1 2v 2 3v 3 1 2v 2 3v 3 0 1 2v 2 inf Vertical Markov Order Vertical Markov Order Horizontal Markov Order Horizontal Markov Order 2

Unary Splits Tag Splits  Problem: unary  Problem: Treebank tags rewrites used to are too coarse. transmute categories so a high ‐ probability  Example: Sentential, PP, rule can be used. and other prepositions are all marked IN.  Solution: Mark unary rewrite  Partial Solution: Annotation F1 Size sites with -U Annotation F1 Size Base 77.8 7.5K  Subdivide the IN tag. Previous 78.3 8.0K UNARY 78.3 8.0K SPLIT-IN 80.3 8.1K A Fully Annotated (Unlex) Tree Some Test Set Results Parser LP LR CB 0 CB F1 Magerman 95 84.9 84.6 1.26 56.6 84.7 Collins 96 86.3 85.8 1.14 59.9 86.0 Unlexicalized 86.9 85.7 1.10 60.3 86.3 Charniak 97 87.4 87.5 1.00 62.1 87.4 Collins 99 88.7 88.6 0.90 67.1 88.6  Beats “first generation” lexicalized parsers.  Lots of room to improve – more complex models next. Grammar Projections Coarse Grammar Fine Grammar Efficient Parsing for Structural Annotation NP → DT N’ NP^S → DT^NP N’[…DT]^NP Note: X ‐ Bar Grammars are projec � ons with rules like XP → Y X’ or XP → X’ Y or X’ → X 3

Coarse ‐ to ‐ Fine Pruning Computing (Max ‐ )Marginals For each coarse chart item X [ i,j ] , compute posterior probability: < threshold E.g. consider the span 5 to 12: coarse: … QP NP VP … refined: Inside and Outside Scores Pruning with A*  You can also speed up the search without sacrificing optimality  For agenda ‐ based parsers:  Can select which items to X process first  Can do with any “figure of 0 i j n merit” [Charniak 98]  If your figure ‐ of ‐ merit is a valid A* heuristic, no loss of optimiality [Klein and Manning 03] A* Parsing Lexicalization 4

Problems with PCFGs The Game of Designing a Grammar  If we do no annotation, these trees differ only in one rule:  Annotation refines base treebank symbols to improve  VP  VP PP statistical fit of the grammar  NP  NP PP  Parse will go one way or the other, regardless of words  Structural annotation [Johnson ’98, Klein and Manning 03]  We addressed this in one way with unlexicalized grammars (how?)  Head lexicalization [Collins ’99, Charniak ’00]  Lexicalization allows us to be sensitive to specific words Problems with PCFGs Lexicalized Trees  Add “head words” to each phrasal node  Syntactic vs. semantic heads  Headship not in (most) treebanks  Usually use head rules , e.g.:  NP:  Take leftmost NP  Take rightmost N*  Take rightmost JJ  Take right child  VP:  Take leftmost VB*  What’s different between basic PCFG scores here?  Take leftmost VP  Take left child  What (lexical) correlations need to be scored? Lexicalized PCFGs? Lexical Derivation Steps  Problem: we now have to estimate probabilities like  A derivation of a local tree [Collins 99] Choose a head tag and word  Never going to get these atomically off of a treebank Choose a complement bag  Solution: break up derivation into smaller steps Generate children (incl. adjuncts) Recursively derive children 5

Lexicalized CKY (VP->VBD...NP  )[saw] X[h] (VP->VBD  )[saw] NP[her] Efficient Parsing for Y[h] Z[h’] bestScore(X,i,j,h) Lexical Grammars if (j = i+1) i h k h’ j return tagScore(X,s[i]) else return max max score(X[h]->Y[h] Z[h’]) * k,h’,X->YZ bestScore(Y,i,k,h) * bestScore(Z,k,j,h’) max score(X[h]->Y[h’] Z[h]) * k,h’,X->YZ bestScore(Y,i,k,h’) * bestScore(Z,k,j,h) Quartic Parsing Pruning with Beams   The Collins parser prunes with per ‐ Turns out, you can do (a little) better [Eisner 99] cell beams [Collins 99] X[h] X[h]  Essentially, run the O(n 5 ) CKY  Remember only a few hypotheses for X[h] each span <i,j>. Y[h] Z[h’] Y[h] Z  If we keep K hypotheses at each span, then we do at most O(nK 2 ) work per Y[h] Z[h’] span (why?) i h k h’ j i h k j  Keeps things more or less cubic (and in practice is more like linear!)  Gives an O(n 4 ) algorithm i h k h’ j  Still prohibitive in practice if not pruned  Also: certain spans are forbidden entirely on the basis of punctuation (crucial for speed) Pruning with a PCFG Results  Some results  The Charniak parser prunes using a two ‐ pass, coarse ‐  Collins 99 – 88.6 F1 (generative lexical) to ‐ fine approach [Charniak 97+]  Charniak and Johnson 05 – 89.7 / 91.3 F1 (generative  First, parse with the base grammar lexical / reranked)  For each X:[i,j] calculate P(X|i,j,s)  Petrov et al 06 – 90.7 F1 (generative unlexical)  This isn’t trivial, and there are clever speed ups  McClosky et al 06 – 92.1 F1 (gen + rerank + self ‐ train)  Second, do the full O(n 5 ) CKY  Skip any X :[i,j] which had low (say, < 0.0001) posterior  However  Avoids almost all work in the second phase!  Bilexical counts rarely make a difference (why?)  Gildea 01 – Removing bilexical counts costs < 0.5 F1  Charniak et al 06: can use more passes  Petrov et al 07: can use many more passes 6

The Game of Designing a Grammar Latent Variable PCFGs  Annotation refines base treebank symbols to improve statistical fit of the grammar  Parent annotation [Johnson ’98]  Head lexicalization [Collins ’99, Charniak ’00]  Automatic clustering? Latent Variable Grammars Learning Latent Annotations Forward EM algorithm:  Brackets are known  Base categories are known X 1  Only induce subcategories ... X 7 X 2 X 4 X 3 X 5 X 6 . Parse Tree He was right Just like Forward ‐ Backward for HMMs. Parameters Derivations Sentence Backward Refinement of the DT tag Hierarchical refinement DT DT-1 DT-2 DT-3 DT-4 7

Hierarchical Estimation Results Refinement of the , tag  Splitting all categories equally is wasteful: 90 88 Parsing accuracy (F1) 86 84 82 80 78 76 74 Model F1 100 300 500 700 900 1100 1300 1500 1700 Flat Training 87.3 Total Number of grammar symbols Hierarchical Training 88.4 Adaptive Splitting Adaptive Splitting Results  Want to split complex categories more  Idea: split everything, roll back splits which were least useful Model F1 Previous 88.4 With 50% Merging 89.5 Number of Phrasal Subcategories Number of Lexical Subcategories 40 70 35 60 30 50 25 40 20 30 15 10 20 5 10 0 CONJP ROOT NP VP PP ADVP S ADJP QP WHNP PRN NX SINV PRT WHPP SQ FRAG NAC UCP WHADVP INTJ SBARQ RRC WHADJP X LST SBAR 0 NNP JJ NNS NN VBN RB VBG VB VBD CD IN VBZ VBP DT NNPS CC JJR JJS : PRP PRP$ MD RBR WP POS PDT WRB -LRB- . EX WP$ WDT -RRB- '' FW RBS TO $ UH , `` SYM RP LS # 8

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC - PDF document

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs Conditional Independence? [Charniak 96] Use PCFGs for broad coverage parsing Can take a grammar right off the trees (doesnt work well): ROOT

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

PCFGs: Parsing & Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted from: Dan Klein UC

Jeffrey D. Ullman Stanford University/Infolab Slides mostly developed by Anand Rajaraman

Inferring the Purposes of Network Tra ffi c in Mobile Apps Who (which app) sends the data? Where

Revising IDIS Vouchers for the ESG Program July 11, 2016 Presenters Marlisa Grogan, HUD

Privacy notice and Privacy notice and choice choice Engineering & Public Policy Lorrie

Measurements of time-integrated CP and other asymmetries ) r k e c t e s e b h

Privacy Preserving Multi-target Tracking Anton Milan Stefan Roth Konrad Schindler

Object tracking and re-identification Sigmund Rolfsjord Overview Curriculum: Highly relevant

Advancing the SIP Standards -Tracking- Robert Sparks Estacado Systems SIPit Twice a year -

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC - PDF document

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs Conditional Independence? [Charniak 96] Use PCFGs for broad coverage parsing Can take a grammar right off the trees (doesnt work well): ROOT

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

PCFGs: Parsing &amp; Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted from: Dan Klein UC

Jeffrey D. Ullman Stanford University/Infolab Slides mostly developed by Anand Rajaraman

Inferring the Purposes of Network Tra ffi c in Mobile Apps Who (which app) sends the data? Where

Revising IDIS Vouchers for the ESG Program July 11, 2016 Presenters Marlisa Grogan, HUD

Privacy notice and Privacy notice and choice choice Engineering &amp; Public Policy Lorrie

Measurements of time-integrated CP and other asymmetries ) r k e c t e s e b h

Privacy Preserving Multi-target Tracking Anton Milan Stefan Roth Konrad Schindler

Object tracking and re-identification Sigmund Rolfsjord Overview Curriculum: Highly relevant

Advancing the SIP Standards -Tracking- Robert Sparks Estacado Systems SIPit Twice a year -

PCFGs: Parsing & Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

Privacy notice and Privacy notice and choice choice Engineering & Public Policy Lorrie