Transition-based Dependency Parsing with Selectional Branching - PowerPoint PPT Presentation

Transition-based Dependency Parsing with Selectional Branching Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts Amherst

Greedy vs. Non-greedy Parsing • Greedy parsing - Considers only one head for each token. - Generates one parse tree per sentence. - e.g., transition-based parsing (2 ms / sentence).   • Non-greedy parsing - Considers multiple heads for each token. - Generates multiple parse trees per sentence. - e.g., transition-based parsing with beam search, graph- based parsing, linear programming, dual decomposition ( ≥ 93%). 2

Motivation • How often do we need non-greedy parsing? - Our greedy parser performs as accurately as our non- greedy parser about 64% of the time. - This gap is even closer when they are evaluated on non- benchmark data (e.g., twits, chats, blogs). • Many applications are time sensitive. - Some applications need at least one complete parse tree ready given a limited time period (e.g., search, dialog, Q/ A). • Hard sentences are hard for any parser! - Considering more heads does not always guarantee more accurate parse results. 3

Transition-based Parsing • Transition-based dependency parsing (greedy) - Considers one transition for each parsing state. … S S S t 1 t ′ t 1 t ′ T … … What if t ′ is not the correct transition? t L t L 4

Transition-based Parsing • Transition-based dependency parsing with beam search - Considers b -num. of transitions for each block of parsing … S S 1 S 1 t 1 t ′ 1 t 11 t ′ 1 T 1 … … … t 1L … … S b S b t ′ b t b 1 t ′ b T b … t L t bL 5

Selectional Branching • Issues with beam search - Generates the fixed number of parse trees no matter how easy/hard the input sentence is. - Is it possible to dynamically adjust the beam size for each individual sentence? • Selectional branching - One-best transition sequence is found by a greedy parser. - Collect k -best state-transition pairs for each low confidence transition used to generate the one-best sequence. - Generate transition sequences from the b -1 highest scoring state-transition pairs in the collection. 6

Selectional Branching … S 1 S 2 S n t 11 t ′ 11 t 21 t ′ 21 T … … low low confident? confident? t 1 L t 2 L … … … λ = S 1 S 1 S 2 S 2 t ′ 12 t ′ 1 k t ′ 22 t ′ 2 k Pick b -1 number of pairs with the highest scores. For our experiments, k = 2 is used. 7

Selectional Branching λ = S 1 S 2 S 3 t ′ 12 t ′ 22 t ′ 32 … S 1 S 2 t ′ 12 S a T … S 2 S 3 t ′ 22 S b T … S 3 S 4 S c t ′ 32 T Carries on parsing states from the one-best sequence. Guarantees to generate fewer trees than   beam search when | λ | ≤ b . 8

          Low Confidence Transition • Let C 1 be a classifier that finds the highest scoring transition given the parsing state x .   C 1 ( x ) = arg max y 2 Y { f ( x, y ) } exp( w · Φ ( x, y )) f ( x, y ) = P y 0 2 Y exp( w · Φ ( x, y 0 )) • Let C k be a classifier that finds the k -highest scoring transitions given the parsing state x and the margin m .   C k ( x, m ) = K arg max y 2 Y { f ( x, y ) } f ( x, C 1 ( x )) − f ( x, y ) ≤ m s . t . • The highest scoring transition C 1 ( x ) is low confident if   |C k ( x, m ) | > 1 . 9

Experiments • Parsing algorithm (Choi & McCallum, 2013) - Hybrid between Nivre’s arc-eager and list-based algorithms. - Projective parsing: O( n ) . - Non-projective parsing: expected linear time. • Features - Rich non-local features from Zhang & Nivre, 2011. - For languages with coarse-grained POS tags, feature templates using fine-grained POS tags are replicated. - For languages with morphological features, morphologies of σ [0] and β [0] are used as unigram features. 10

Number of Transitions • # of transitions performed with respect to beam sizes. 1,200,000 1,000,000 Transitions 800,000 600,000 400,000 200,000 0 0 10 20 30 40 50 60 80 70 Beam size = 1, 2, 4, 8, 16, 32, 64, 80 11

Projective Parsing • The benchmark setup using WSJ. Approach USA LAS Time b t = 80, b d = 80 92.96 91.93 0.009 b t = 80, b d = 64 92.96 91.93 0.009 92.96 91.94 0.009 b t = 80, b d = 32 b t = 80, b d = 16 92.96 91.94 0.008 b t = 80, b d = 8 92.89 91.87 0.006 92.76 91.76 0.004 b t = 80, b d = 4 b t = 80, b d = 2 92.56 91.54 0.003 b t = 80, b d = 1 92.26 91.25 0.002 b t = 1, b d = 1 92.06 91.05 0.002 12

Projective Parsing • The benchmark setup using WSJ. Approach USA LAS Time 92.96 91.93 0.009 b t = 80, b d = 80 92.1 Zhang & Clark, 2008 92.1 0.04 Huang & Sagae, 2010 92.9 91.8 0.03 Zhang & Nivre, 2011 93.38 92.44 0.4 Bohnet & Nivre, 2012 90.9 McDonald et al., 2005 91.5 McDonald & Pereira, 2006 92.7 Sagae & Lavie, 2006 93.04 Koo & Collins, 2010 93.06 91.86 Zhang & McDonald, 2012 93.26 Martins et al., 2010 93.8 Rush et al., 2010 13

Non-projective Parsing • CoNLL-X shared task data Danish Dutch Slovene Swedish Approach LAS UAS LAS UAS LAS UAS LAS UAS 87.27 91.36 82.45 85.33 77.46 84.65 86.8 91.36 b t = 80, b d = 80 86.75 91.04 80.75 83.59 75.66 83.29 86.32 91.12 b t = 80, b d = 1 84.77 89.8 78.59 81.35 70.3 78.72 84.58 89.5 Nivre et al., 2006 84.79 90.58 79.19 83.57 73.44 83.17 82.55 88.93 McDonald et al., 2006 84.2 - - - 75.2 - - - Nivre, 2009 85.17 90.1 - - - - 83.55 89.3 F.-Gonz. & G.-Rodr., 2012 86.67 - 81.63 - 75.94 - 84.66 - Nivre & McDonald, 2008 - 91.5 - 84.91 - 85.53 - 89.8 Martins et al., 2010 14

SPMRL 2013 Shared Task • Baseline results provided by ClearNLP . 5K Full Language LAS UAS LS LAS UAS LS Arabic 81.72 84.46 93.41 84.19 86.48 94.43 Basque 78.01 84.62 82.71 79.16 85.32 83.63 French 73.39 85.3 81.42 74.51 86.41 82 German 82.58 85.36 90.49 86.73 88.8 92.95 Hebrew 75.09 81.74 82.84 - - - Hungarian 81.98 86.09 88.26 82.68 86.56 88.8 Korean 76.28 80.39 87.32 83.55 86.82 92.39 Polish 80.64 88.49 86.47 81.12 89.24 86.59 Swedish 80.96 86.48 85.1 - - - 15

Conclusion • Selectional branching - Uses confidence estimates to decide when to employ a beam. - Shows comparable accuracy against traditional beam search. - Gives faster speed against any other non-greedy parsing. • ClearNLP - Provides several NLP tools including morphological analyzer, dependency parser, semantic role labeler, etc. - Webpage: clearnlp.com. 16

Transition-based Dependency Parsing with Selectional Branching - PowerPoint PPT Presentation

Transition-based Dependency Parsing with Selectional Branching Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts Amherst Greedy vs. Non-greedy

Selectional Restrictions Selectional Restrictions Introduction Selectional Restrictions

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre,

Comparing Computational Models of Selectional Preferences Second-order Co-Occurrence vs.

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Marina Valeeva Outline 2 1. Introduction What is Dependency Parsing? What is a

Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based dependency parsing Emma

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Advanced Dependency Parsing Joakim Nivre Uppsala University Linguistics and Philology Based on

Last class Dependency parsing and logistic regression Dependency parsing: a fully lexicalized

j =0 1 Noncommutative torus d y d x = R \ Q 1.0 C ( T 2 ) ' C ( S 1 ) o Z 0.8 0.6 V

Minimalist Grammars Formalisme en Dependency Structures Sjoerd Dost Myrthe Tielman Logical

Using direct stop searches at ATLAS to constrain the parameter space of supersymmetric models

Non-linear Interpolant Generation and Its Application to Program Verification Naijun Zhan State

Search for top squarks with the ATLAS detector Nicolas Khler Max Planck Institute for Physics

Reasoning with Variables An instance of an atom or a clause is obtained by uniformly substituting

On Minimality and Equivalence of Petri Nets Annegret K. Wagler, Jan-Thierry Wegener Laboratoire

DOT ( D ependent O bject T ypes) Nada Amin ECOOP PC Workshop February 28, 2016 1 DOT:

Transition-based Dependency Parsing with Selectional Branching - PowerPoint PPT Presentation

Transition-based Dependency Parsing with Selectional Branching Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts Amherst Greedy vs. Non-greedy

Selectional Restrictions Selectional Restrictions Introduction Selectional Restrictions

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre,

Comparing Computational Models of Selectional Preferences Second-order Co-Occurrence vs.

Dependency Parsing &amp; Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Marina Valeeva Outline 2 1. Introduction What is Dependency Parsing? What is a

Algorithms for NLP CS 11-711 Fall 2020 Lecture 14: Graph-based dependency parsing Emma

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Advanced Dependency Parsing Joakim Nivre Uppsala University Linguistics and Philology Based on

Last class Dependency parsing and logistic regression Dependency parsing: a fully lexicalized

j =0 1 Noncommutative torus d y d x = R \ Q 1.0 C ( T 2 ) ' C ( S 1 ) o Z 0.8 0.6 V

Minimalist Grammars Formalisme en Dependency Structures Sjoerd Dost Myrthe Tielman Logical

Using direct stop searches at ATLAS to constrain the parameter space of supersymmetric models

Non-linear Interpolant Generation and Its Application to Program Verification Naijun Zhan State

Search for top squarks with the ATLAS detector Nicolas Khler Max Planck Institute for Physics

Reasoning with Variables An instance of an atom or a clause is obtained by uniformly substituting

On Minimality and Equivalence of Petri Nets Annegret K. Wagler, Jan-Thierry Wegener Laboratoire

DOT ( D ependent O bject T ypes) Nada Amin ECOOP PC Workshop February 28, 2016 1 DOT:

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP