Polynomial time parsing of PCFGs Gerald Penn (some slides from - PowerPoint PPT Presentation

Polynomial time parsing of PCFGs Gerald Penn (some slides from Pi-Chuan Chang and Christopher Manning)

0. Chomsky Normal Form • All rules are of the form X  Y Z or X  w. • A transformation to this form doesn’t change the weak generative capacity of CFGs. • With some extra book-keeping in symbol names, you can even reconstruct the same trees with a detransform • Unaries/empties are removed recursively • n -ary rules introduce new nonterminals ( n > 2) • VP  V NP PP becomes VP  V @VP-V and @VP-V  NP PP • In practice it’s a pain • Reconstructing n -aries is easy • Reconstructing unaries can be trickier • But it makes parsing easier/more efficient

An example: before binarization… ROOT S VP NP NP V PP N P NP N N people with cats scratch claws

After binarization… ROOT S @S->_NP VP NP @VP->_V @VP->_V_NP NP V PP N P @PP->_P N NP N people cats scratch with claws

Treebank: empties and unaries TOP TOP TOP TOP TOP S-HLN S S S NP-SUBJ VP NP VP VP -NONE- VB -NONE- VB VB VB   Aton Aton Aton Aton Aton e e e e e High Low PTB Tree NoFuncTags NoEmpties NoUnaries

Constituency Parsing PCFG Rule Probs θ i θ 0 : S → NP VP θ 1 : NP → NN NNS … θ 42 : NN→Factory θ 43 : NNS→payrolls …

1. Cocke-Kasami-Younger (CKY) Constituency Parsing Factory payrolls fell in September

Viterbi (Max) Scores NP→NN NNS 0.13 i NP = (0.13)(0.0023)(0.0014) = 1.87 × 10 -7 NP 1.87 × 10 -7 NP→NNP NNS0.056 i NP = (0.056)(0.001)(0.0014) NN 0.0023 NNS 0.0014 = 7.84 × 10 -8 NNP 0.001 Factory payrolls

Extended CKY parsing • Unaries can be incorporated into the algorithm • Messy, but doesn’t increase algorithmic complexity • Empties can be incorporated • Use fenceposts • Doesn’t increase complexity; essentially like unaries • Binarization is vital • All sorts of optimizations depend on this • Binarization may be an explicit transformation or implicit in how the parser works (Early-style dotted rules), but it’s almost always there.

The CKY algorithm (1960/1965) … generalized function CKY(words, grammar) returns most probable parse/prob score = new double[#(words)+1][#(words)+][#(nonterms)] back = new Pair[#(words)+1][#(words)+1][#nonterms]] for i=0; i<#(words); i++ for A in nonterms if A -> words[i] in grammar score[i][i+1][A] = P(A -> words[i]) //handle unaries boolean added = true while added added = false for A, B in nonterms if score[i][i+1][B] > 0 && A->B in grammar prob = P(A->B)*score[i][i+1][B] if(prob > score[i][i+1][A]) score[i][i+1][A] = prob back[i][i+1] [A] = B added = true

The CKY algorithm (1960/1965) … generalized for span = 2 to #(words) for begin = 0 to #(words)- span end = begin + span for split = begin+1 to end-1 for A,B,C in nonterms prob=score[begin][split][B]*score[split][end][C]*P(A->BC) if(prob > score[begin][end][A]) score[begin]end][A] = prob back[begin][end][A] = new Triple(split,B,C) //handle unaries boolean added = true while added added = false for A, B in nonterms prob = P(A->B)*score[begin][end][B]; if(prob > score[begin][end] [A]) score[begin][end] [A] = prob back[begin][end] [A] = B added = true return buildTree(score, back)

cats scratch walls with claws 1 2 4 5 3 0 score[0][1] score[0][2] score[0][3] score[0][4] score[0][5] 1 score[1][2] score[1][3] score[1][4] score[1][5] 2 score[2][3] score[2][4] score[2][5] 3 score[3][4] score[3][5] 4 score[4][5] 5

cats scratch walls with claws 1 2 4 5 3 0 N → cats P → cats V → cats 1 N → scratch P → scratch V → scratch 2 N → walls P → walls V → walls 3 N → with P → with V → with for i=0; i<#(words); i++ for A in nonterms 4 if A -> words[i] in grammar N → claws score[i][i+1][A] = P(A -> words[i]); P → claws V → claws 5

cats scratch walls with claws 1 2 4 5 3 0 N → cats P → cats V → cats NP N → @VP->V NP → @PP->P NP → 1 N → scratch P → scratch V → scratch NP N → @VP->V NP → @PP->P NP → 2 N → walls P → walls V → walls NP N → @VP->V NP → @PP->P NP → 3 N → with P → with V → with NP N → @VP->V NP → // handle unaries @PP->P NP → 4 N claws → P claws → V claws → NP N → @VP->V NP → @PP->P NP → 5

cats scratch walls with claws 1 2 4 5 3 0 N → cats PP→P @PP->_P VP→V @VP->_V P → cats V → cats NP N → @VP->V NP → @PP->P NP → 1 N → scratch PP→P @PP->_P VP→V @VP->_V P → scratch V → scratch NP N → @VP->V NP → @PP->P NP → 2 N → walls PP→P @PP->_P VP→V @VP->_V P → walls V → walls NP N → @VP->V NP → @PP->P NP → 3 N → with PP→P @PP->_P VP→V @VP->_V P → with V → with NP N → @VP->V NP → @PP->P NP → 4 N claws → prob=score[begin][split][B]*score[split][end][C]*P(A->BC) P claws → prob=score[0][1][P]*score[1][2][@PP->_P]*P(PP  P @PP->_P) V claws → NP N → For each A, only keep the “A->BC” with highest prob. @VP->V NP → @PP->P NP → 5

1 2 4 5 scratch walls 3 with claws cats 0 N→cats PP→P @PP->_P P→cats VP→V @VP->_V V→cats @S->_NP→VP NP→N @NP->_NP→PP @VP->V→NP @VP->_V_NP→PP @PP->P→NP 1 N→scratch N→scratch PP→P @PP->_P 0.0967 P→scratch VP→V @VP->_V V→scratch P→scratch @S->_NP→VP NP→N 0.0773 @NP->_NP→PP V→scratch @VP->V→NP @VP->_V_NP→PP @PP->P→NP 0.9285 NP→N 0.0859 @VP->V→NP 2 0.0573 @PP->P→NP 0.0859 N→walls N→walls PP→P @PP->_P P→walls 0.2829 VP→V @VP->_V V→walls P→walls @S->_NP→VP NP→N 0.0870 @NP->_NP→PP V→walls @VP->V→NP @VP->_V_NP→PP 0.1160 @PP->P→NP NP→N 0.2514 @VP->V→NP 0.1676 3 @PP->P→NP 0.2514 N→with N→with PP→P @PP->_P P→with 0.0967 VP→V @VP->_V P→with V→with @S->_NP→VP 1.3154 NP→N @NP->_NP→PP V→with @VP->V→NP @VP->_V_NP→PP 0.1031 @PP->P→NP NP→N 0.0859 @VP->V→NP 0.0573 // handle unaries 4 @PP->P→NP 0.0859 N→claws N→claws P→claws 0.4062 V→claws P→claws NP→N 0.0773 V→claws @VP->V→NP @PP->P→NP 0.1031 NP→N 0.3611 @VP->V→NP 0.2407 5 @PP->P→NP 0.3611

………

scratch walls with claws cats 1 2 3 4 5 0 N→cats 0.5259 PP→P @PP->_P 0.0062 @VP->_V→NP @VP->_V_NP PP→P @PP->_P 5.187E-6 @VP->_V→NP @VP->_V_NP 0.0030 VP→V @VP->_V 2.074E-5 1.600E-4 P→cats 0.0725 VP→V @VP->_V 0.0055 NP→NP @NP->_NP 0.0010 @S->_NP→VP 2.074E-5 NP→NP @NP->_NP 5.335E-5 V→cats 0.0967 @S->_NP→VP 0.0055 S→NP @S->_NP 0.0727 @NP->_NP→PP 5.187E-6 S→NP @S->_NP 0.0172 NP→N 0.4675 @NP->_NP→PP 0.0062 @VP->_V_NP→PP ROOT→S 0.0172 @VP->V→NP 0.3116 @VP->_V_NP→PP 0.0062 ROOT→S 0.0727 5.187E-6 @PP->_P→NP 5.335E-5 @PP->P→NP 0.4675 @PP->_P→NP 0.0010 1 N→scratch 0.0967 PP→P @PP->_P 0.0194 @VP->_V→NP @VP->_V_NP PP→P @PP->_P 0.0010 VP→V @VP->_V 0.1556 2.145E-4 VP→V @VP->_V 0.0369 P→scratch 0.0773 @S->_NP→VP 0.1556 NP→NP @NP->_NP 7.150E-5 @S->_NP→VP 0.0369 V→scratch 0.9285 @NP->_NP→PP 0.0194 S→NP @S->_NP 5.720E-4 @NP->_NP→PP 0.0010 NP→N 0.0859 @VP->_V_NP→PP 0.0194 ROOT→S 5.720E-4 @VP->_V_NP→PP 0.0010 @VP->V→NP 0.0573 @PP->_P→NP 7.150E-5 @PP->P→NP 0.0859 2 N→walls 0.2829 PP→P @PP->_P 0.0074 @VP->_V→NP @VP->_V_NP VP→V @VP->_V 0.0066 0.0398 P→walls 0.0870 @S->_NP→VP 0.0066 NP→NP @NP->_NP 0.0132 V→walls 0.1160 @NP->_NP→PP 0.0074 S→NP @S->_NP 0.0062 NP→N 0.2514 @VP->_V_NP→PP 0.0074 ROOT→S 0.0062 @VP->V→NP 0.1676 @PP->_P→NP 0.0132 @PP->P→NP 0.2514 3 N→with 0.0967 PP→P @PP->_P 0.4750 VP→V @VP->_V 0.0248 P→with 1.3154 @S->_NP→VP 0.0248 V→with 0.1031 @NP->_NP→PP 0.4750 NP→N 0.0859 @VP->_V_NP→PP 0.4750 @VP->V→NP 0.0573 @PP->P→NP 0.0859 4 N→claws 0.4062 P→claws 0.0773 V→claws 0.1031 Call buildTree(score, back) to get the best parse NP→N 0.3611 @VP->V→NP 0.2407 @PP->P→NP 0.3611 5

Unary rules: alchemy in the land of treebanks

Same-Span Reachability NoEmpties TOP SQ X RRC NX LST ADJP ADVP FRAG INTJ NP CONJP PP PRN QP S NAC SBAR UCP VP WHNP SINV PRT SBARQ WHADJP WHPP WHADVP

Polynomial time parsing of PCFGs Gerald Penn (some slides from - PowerPoint PPT Presentation

Polynomial time parsing of PCFGs Gerald Penn (some slides from Pi-Chuan Chang and Christopher Manning) 0. Chomsky Normal Form All rules are of the form X Y Z or X w. A transformation to this form doesnt change the weak

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted from: Dan Klein UC

PCFGs: Parsing & Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing Shashi Narayan, Siva Reddy,

10-701 Machine Learning Learning HMMs A Hidden Markov model A set of states {s 1 s n }

2012 NCTS Workshop on Dynamical Systems National Center for Theoretical Sciences, National

Mechanical oscillators described by a system of differential-algebraic equations Kumbakonam R.

Non-quantum Effect and Test of Spontaneous Collapse Models in Mechanical Oscillators Lajos Di

Dependency Analysis for Hybrid Programs Yong Kiam Tan X B Y C A W D Z 1 Motivation

Recurrent Neural Network Grammars NAACL-HLT 2016 Authors: Chris Dyer, Adhiguna Kuncoro, Miguel

Domain Decomposition for Multiscale PDEs Robert Scheichl Bath Institute for Complex Systems

Coupled FETI/BETI Solvers for Nonlinear Potential Problems in Unbounded Domains Clemens Pechstein

Sambuz

Useful Links

Newsletter

Mail Us

Polynomial time parsing of PCFGs Gerald Penn (some slides from - PowerPoint PPT Presentation

Polynomial time parsing of PCFGs Gerald Penn (some slides from Pi-Chuan Chang and Christopher Manning) 0. Chomsky Normal Form All rules are of the form X Y Z or X w. A transformation to this form doesnt change the weak

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Algorithms for NLP Parsing III Maria Ryskina CMU Slides adapted from: Dan Klein UC

PCFGs: Parsing &amp; Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing Shashi Narayan, Siva Reddy,

10-701 Machine Learning Learning HMMs A Hidden Markov model A set of states {s 1 s n }

2012 NCTS Workshop on Dynamical Systems National Center for Theoretical Sciences, National

Mechanical oscillators described by a system of differential-algebraic equations Kumbakonam R.

Non-quantum Effect and Test of Spontaneous Collapse Models in Mechanical Oscillators Lajos Di

Dependency Analysis for Hybrid Programs Yong Kiam Tan X B Y C A W D Z 1 Motivation

Recurrent Neural Network Grammars NAACL-HLT 2016 Authors: Chris Dyer, Adhiguna Kuncoro, Miguel

Domain Decomposition for Multiscale PDEs Robert Scheichl Bath Institute for Complex Systems

Coupled FETI/BETI Solvers for Nonlinear Potential Problems in Unbounded Domains Clemens Pechstein

Sambuz

Useful Links

Newsletter

Mail Us

PCFGs: Parsing & Evaluation Deep Processing Techniques for NLP Ling 571 January 23, 2017