Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark - PowerPoint PPT Presentation

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa Vu Brown Laboratory for Linguistic Information Processing (BLLIP)

Statistical Parsing Speed ● Lexicalized statistical parsing can be slow. – Charniak: 0.7 seconds per sentence. ● Real applications demand more speed! – Large corpora, eg. NANTC (McClosky, Charniak and Johnson 2006) – More words to consider-- lattices from speech recognition (Hall and Johnson 2004) – Costly second stage such as question answering.

Bottom-up Parsing I The constituent Constit Length (VP (VBZ plays) (NP (NNP Elianti)) S1 S 4 wds S1 S 3 wds S1 S 2 wds S VP NP S1 S 1 wd VP NP NP (VBZ plays) (NNP Ms.) (NNP Haag) (NNP Elianti) POS Beginning word ● Standard probabilistic CKY chart parsing. – Computes the inside probability β for each constituent.

Bottom-up Parsing II Constit Length S1 S 4 wds S1 S 3 wds S1 S 2 wds S VP NP S1 S VP NP 1 wd NP (VBZ plays) (NNP Ms.) (NNP Haag) (NNP Elianti) POS Beginning word ● Some constituents are gold constituents (parts of correct parse). – These may not be part of the highest probability (Viterbi) parse. – We can use a reranker to try to pick them out later on.

Pruning ● We want to dispose of the incorrect constituents and retain the gold . ● Initial idea: prune constituents with low probability (~ outside α times inside β). p(n k i,j | s) = α(n k i,j )β(n k i,j )  p(s) S1 S 4 wds 3 wds S1 S S1 S 2 wds S VP NP S1 S VP NP 1 wd NP (VBZ plays) (NNP Ms.) (NNP Haag) (NNP Elianti) POS

Outside Probabilities ● We need the full parse of the sentence to get outside probability α. – Estimates how well the constituent contributes to spanning parses for the sentence. α ≈ 1 S1 S α ≈ 0 S1 S ● Caraballo and Charniak (1998): agenda reordering method-- proper pruning needs an approximation of α. – Approximated α using ngrams at constituent boundaries.

Coarse-to-Fine Parsing ● Parse quickly with a smaller grammar. S1 P 4 wds 3 wds S1 P S1 P 2 wds P P S1 P P P 1 wd P (VBZ plays) (NNP Ms.) (NNP Haag) (NNP Elianti) POS ● Now calculate α using the full chart. S1 P 4 wds 3 wds S1 P S1 P 2 wds P P S1 P P P 1 wd P (VBZ plays) (NNP Ms.) (NNP Haag) (NNP Elianti) POS

Coarse-to-Fine Parsing II ● Prune the chart, then reparse with a more specific grammar. S1 S_ 4 wds 3 wds S1 S S1 S 2 wds S_ V_ N_ S1 P V_ N_ 1 wd N_ (VBZ plays) (NNP Ms.) (NNP Haag) (NNP Elianti) POS ● Repeat the process until the final grammar is reached. ● Reduces the cost of a high grammar constant.

Related Work ● Two-stage parsers: – Maxwell and Kaplan (1993); automatically extracted first stage – Goodman (1997); first stage uses regular expressions – Charniak (2000); first stage is unlexicalized ● Agenda reordering: – Klein and Manning (2003); A* search for the best parse using an upper bound on α. – Tsuruoka and Tsujii (2004); iterative deepening.

Parser Details ● Binarized S NP grammar based (DT the) (JJ brown) (NN fox) (JJ quick) on Klein and Manning (2003) – Head annotation. S – Vertical (parent) NP^S and horizontal <NP-NN^S+JJ (sibling) Markov context. <NP-NN^S+JJ (DT the) (JJ brown) (NN fox) (JJ quick)

Coarse-to-Fine Scheme S1 Level 0 P HP MP Level 1 S1 Level 2 S1 A_ P_ S_ N_ NP NAC S VP ADJP QP PP PRT S1 NX LST X UCP SQ CONJP RRC UCP FRAG SBAR SBARQ ADVP INTJ WHADJP PRN PRT WHADVP Level 3: Full Treebank Grammar WHNP WHPP

Examples Level 0 Level 1 Level 3 Level 2 (Treebank)

Coarse-to-Fine Probabilities P(N _ → N _ P _ ) = Heuristic probabilities: weighted-avg( P(NP → NP PP) Using max instead of P(NP → NP PRT) avg computes an exact upper ... bound instead of a heuristic (Geman and Kochanek 2001). P(NP → NAC PP) P(NP → NAC PRT) No smoothing needed. ... P(NAC → NP PP) ...)

Pruning Thresholds Threshold vs. Pruning threshold vs. fraction of incorrect probability of pruning a gold constituents remaining. constituent % Prob. Pruning threshold Pruning threshold

Pruning Statistics Constits Constits % Produced Pruned Pruned (millions) (millions) Level 0 8.82 7.55 86.5 Level 1 9.18 6.51 70.8 Level 2 11.2 9.48 84.4 Level 3 11.8 0 0 Total 40.4 - - Level 3 only 392 0 0

Timing Statistics Time At Cumulative F-score Level Time Level 0 1598 1598 Level 1 2570 4164 Level 2 4303 8471 Level 3 1527 9998 77.9 Level 3 only 114654 - 77.9 10x speed increase from pruning.

Discussion ● No loss in f-score from pruning. ● Each pruning level is useful. – Prunes ~80% of constituents produced. ● Pruning at level 0 (only two nonterminals, S1 / P) – Preterminals are still useful. – Probability of P-IN → NN IN (a constituent ending with a preposition) will be very low.

Conclusion ● Multi-level coarse-to-fine parsing allows bottom- up parsing to use top-down information. – Deciding on good parent labels. – Using the string boundary. ● Can be combined with agenda reordering methods. – Use coarser levels to estimate outside probability. ● More stages of parsing can be added. – Lexicalization.

Future Work ● The coarse-to-fine scheme we use is hand- generated. ● A coarse-to-fine scheme is just a hierarchical clustering of constituent labels. – Hierarchical clustering is a well-understood task. – Should be possible to define an objective function and search for the best scheme. – Could be used to automatically find useful annotations/lexicalizations.

Acknowledgements ● Class project for CS 241 at Brown University ● Funded by: – Darpa GALE – Brown University fellowships – Parents of undergraduates ● Our thanks to all!

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark - PowerPoint PPT Presentation

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa Vu Brown Laboratory for Linguistic

Assignment 2: Parsing PCFG and CKY with C2FP Chan Young Park Background: PCFG Recap 2

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

COARSE-TO-FINE, COST-SENSITIVE CLASSIFICATION OF E-MAIL Jay Pujara jay@cs.umd.edu Lise Getoor

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Coarse-to-Fine Decoding for Neural Semantic Parsing July 16, 2018 Li Dong and Mirella Lapata

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Coarse-graining Markov state models with PCCA Coarse-graining Markov state models

Lattice Alignment Align must be linear can be random reference signals => coarse

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Figure 1a: Multilevel Visualization of Market Power Cases in Tree Format Figure 1b: Multilevel

Agenda 1. More on multilevel R formulas 2. Generalized Multilevel Models 3. GMLM in R 1 More

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Where

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

"[An article about computational science in a scientific publication is not the

Pruned Dynamic Programming for Steiner Tree Yoichi Iwata (NII) Takuto Shigemura (U-Tokyo)

NAS NAS

Adversarial Search Problem: minimax takes too long. Solution: improve algorithm to ignore

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo

Summary of Search Strategies Strategy Frontier Selection Halts? Space Depth-first Last node

Linking and Building Ontologies of Linked Data Rahul Parundekar, Craig A. Knoblock and Jose-Luis

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark - PowerPoint PPT Presentation

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph Austerweil, David Ellis, Isaac Haxton, Catherine Hill, Shrivaths Iyengar, Jeremy Moore, Michael Pozar, and Theresa Vu Brown Laboratory for Linguistic

Assignment 2: Parsing PCFG and CKY with C2FP Chan Young Park Background: PCFG Recap 2

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

COARSE-TO-FINE, COST-SENSITIVE CLASSIFICATION OF E-MAIL Jay Pujara jay@cs.umd.edu Lise Getoor

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Coarse-to-Fine Decoding for Neural Semantic Parsing July 16, 2018 Li Dong and Mirella Lapata

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Coarse-graining Markov state models with PCCA Coarse-graining Markov state models

Lattice Alignment Align must be linear can be random reference signals =&gt; coarse

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Figure 1a: Multilevel Visualization of Market Power Cases in Tree Format Figure 1b: Multilevel

Agenda 1. More on multilevel R formulas 2. Generalized Multilevel Models 3. GMLM in R 1 More

Multilevel Krylov Methods Deflation Deflation, DD, MG Reinhard Nabben Multilevel Krylov

Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Where

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Properties of - //the leaf node (terminal state) ) 9 ( The - algorithm //the leaf

&quot;[An article about computational science in a scientific publication is not the

Pruned Dynamic Programming for Steiner Tree Yoichi Iwata (NII) Takuto Shigemura (U-Tokyo)

NAS NAS

Adversarial Search Problem: minimax takes too long. Solution: improve algorithm to ignore

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo

Summary of Search Strategies Strategy Frontier Selection Halts? Space Depth-first Last node

Linking and Building Ontologies of Linked Data Rahul Parundekar, Craig A. Knoblock and Jose-Luis

Lattice Alignment Align must be linear can be random reference signals => coarse

"[An article about computational science in a scientific publication is not the