Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 - PowerPoint PPT Presentation

Transition-Based Dependency Parsing Saarbrücken, December 23rd 2011 David Przybilla – davida@coli.uni-saarland.de

Outline 1. MaltParser 2. Transition Based Parsing a. Example b. Oracle 3. Integrating Graph and Transition Based 4. Non – Projective Dependency Parsing

MaltParser input output MaltParser Treebank Dependency Parser ( Transition Based ) ● Different Languages ( No tuning for an Specific Lang) ● Language independent: accurate parsing for a wide variety of languages ● Accuracy between 80% and 90% ● Deterministic

Transition Based Parsing Stack Buffer Transitions 𝑗 𝑙 𝑋 𝑋 Actions 𝑦 ● Shift 𝑋 𝑗+1 𝑋 ● Left-arc .. 𝑗+2 𝑋 ● Right-arc .. ● Reduction

Example Stack Buffer John John hit the ball hit the ball

Example Transition=Shift Stack Buffer John John hit the ball hit the ball

Example Transition=left Arc Stack Buffer John hit John hit the ball the ball Subj Only if ℎ(𝑘𝑝ℎ𝑜) = 0

Example Transition=Shift Stack Buffer hit John hit the ball the ball Subj

Example Transition=Shift Stack Buffer the John hit the ball ball hit Subj

Example Transition=left Arc Stack Buffer the ball John hit the ball hit Det Subj Only if ℎ(𝑢ℎ𝑓) = 0

Example Transition=Right Arc Stack Buffer ball John hit the ball hit Det Subj Obj Only if ℎ(𝑐𝑏𝑚𝑚) = 0

Example Stack Buffer ball John hit the ball hit Det Subj Obj Buffer is Empty= Terminal Configuration

Transition Based Parsing Stack Stack Buffer Buffer 𝑙 𝑋 𝑗 𝑗 𝑋 𝑙 𝑋 𝑋 𝑦 𝑋 𝑗+1 𝑦 𝑗+1 𝑋 𝑋 𝑋 .. Reduction .. 𝑗+2 𝑗+2 𝑋 𝑋 .. .. 𝑗+1 … . 𝑇𝑓𝑜𝑢𝑓𝑜𝑑𝑓: 𝑋 𝑦 𝑋 𝑙 … 𝑋 𝑗 𝑋 Only if ℎ(𝑋 𝑙 ) ≠ 0

Oracle ● Greedy Algorithm, choose a local optimal hoping it will lead to the global optimal ● It makes Transition Based Algorithm Deterministic. ● Originally there might be more than one possible transition from one configuration to another ● Construct the Optimal Transition sequence for the Input Sentence ● How to Build the Oracle? Build a Classifier

Classifier The Classifier Feature Vector (Features) Classes: ● POS of words in the Buffer ● Shift and Stack ● Left-arc ● Words themselves ● Right-arc ● The First Word in the Stack ● Reduction ● The L World in the Buffer ● The current arcs in the Graph

Results of the MaltParser ● Evaluation Metrics: ● ASU (Unlabeled Attachment Score) : Proportion of Tokens assigned the correct head ● ASL(Labeled Attachment Score): Proportion of tokens assigned with the correct head and the correct dependency type

Results of the MaltParser Danish Dutch Czech English Italian Turkish Chinese Swedish German More Inflexible Word order, More flexible Word order ‘ poor ’ Morphology Rich Morphology Goal -> Evaluate if Maltparser can do reasonably accurate parsing for a wide variety of languages

Results of the MaltParser

Results of the MaltParser ● Results: ● Above 80% unlabeled dependency Accuracy (ASU) for all languages ● morphological richness and word order are the cause of variation across languages In General lower accuracy for languages like Czech and Turkish. There are more non-projective structures in those languages – ● It is difficult to do Cross-Language Comparison: – Big difference in the amount of annotated data – existence of accurate POS Taggers.. State of the art for Italian, Swedish, Danish, Turkish

Graph Based vs Transition Based Graph Based Transition Based ● Search for Optimal Graph ● Search for Optimal Graph by finding (Highest Scoring Graph) the best transition between two states. (Local Optimal Desitions) ● Globally Trained(Global ● Locally Trained (configurations) Optimal) ● Limited History of Parsing ● Rich History of Parsing Desitions Desitions ● Less rich feature ● More rich feature but Error representation Propagation (Greedy Alg.)

Graph Based vs Transition Based Transition Based(Malt) Graph Based (MST) ● Better for Short dependencies ● Better for Long Dependencies ● More accurate for ● More accurate for dependents dependents that are : that are: ● Verbs ● Nouns ● Adjectives ● Pronouns ● Adverbs Integrate Both Approaches

Integrating Graph and Transition Based ● Integrate both approaches at learning time. ● Base MSTParser guided by Malt Transition Based Treebank T Malt Parser Parse sed d T MST Parser Parser ● Base MALTParser guided by MLT Transition Based Parse sed d T Treebank T MST Parser Malt Parser Parser

Features used in the Integration MaltParser guided by MST ● MSTParser guided by Malt ● Is arc (𝑇 0 , 𝐶 0 ,∗) in 𝐻 𝑛𝑡𝑢 ● Is arc (𝑗, 𝑘,∗) in 𝐻 𝑛𝑏𝑚𝑢 ● Is arc (𝐶 0 , 𝑇 0 ,∗) in 𝐻 𝑛𝑡𝑢 ● Is arc (𝑗, 𝑘, 𝑚) in 𝐻 𝑛𝑏𝑚𝑢 ● Head direction of 𝐶 0 in 𝐻 𝑛𝑡𝑢 ● Is arc 𝑗, 𝑘,∗ 𝑜𝑝𝑢 in 𝐻 𝑛𝑏𝑚𝑢 (left,right,root..) ● Identity of 𝑚’ such that ● Identity of 𝑚’ such that ∗, 𝐶 0 , 𝑚′ 𝑗, 𝑘, 𝑚′ is in 𝐻 𝑛𝑏𝑚𝑢 is in 𝐻 𝑛𝑡𝑢 ● .. 𝑇 0 =fist element of the Stack, 𝐶 0 = First element of the Buffer

Results of Integration Asl(Correct head And Correct Label)

Results of Integration ● Graph-based models predict better long arcs ● Each model learn streghts from the others ● The integration actually improves accuracy ● Trying to do more chaining of systems do not gain better accuracy

Non-Projectivity ● Some Sentences have long distance dependencies which cannot be parsed with this algorithm ● Cause it only consider relations between neighbors words ● 25% or more of the sentences in some languages are non- projective ● Useful for some languages with less constraints on word order ● Harder Problem, There could be relations over unbounded distances.

Non-Projectivity A dependency Tree 𝑈 is Projective: if for every 𝐵𝑠𝑑 (𝑋 𝑗 , 𝑋 𝑘 , 𝑠𝑓𝑚) there is a path from 𝑋 𝑗 to 𝑋 𝑙 , if 𝑋 𝑙 is between 𝑋 𝑗 and 𝑋 𝑘 From ‘Scheduled’ 𝑋 2 there is an arc to 𝑋 5 however there is no way to get to 𝑋 4 , 𝑋 3 from 𝑋 2

Non-Projectivity ● Why the previous transition algorithm would not be able to generate this tree? Stack Buffer On is … hearing … ‘is’ can never be reduced ‘hearing’ and ‘ on ’ will never get an arc

Handling Non-Projectivity ● Add a new Transition – ’’Swap’’ Buffer Stack Stack Buffer swap 𝑋 𝑙 𝑗+1 𝑋 𝑋 𝑋 𝑗 𝑗+1 𝑋 𝑗 𝑙 𝑋 ● Re-Order the initial Input Sentance

Non-Projectivity Stack Buffer Stack Buffer swap Hearing On is is On … hearing .. … …

Non-Projective Dependency Parsing ● Useful for some languages with less constraints on word order Theoretically ● Best case 𝑃(𝑂), , that is: no swaps ● Worst Case 𝑃(𝑂 2 ),

Results Non-Projective Dependency Parsing Running Time ● Test on 5 languages( Danish, Arabic, Czech, Slovene, Turkish) ● In practice the running time is 𝑃 𝑂 . Parsing Accuracy ● Criteria ● Attachment Score: Percentage of tokens with correct head and dependency label ● Exact match: completely correct labeled dependency tree

Results Non-Projective Dependency Parsing ● Systems Compared ● 𝑻 𝒗 = allowing Non Projective ● 𝑻 𝒒 =Just Projective ● 𝑻 𝒒𝒒 =Handling non-Projectivity as a pos-processing ● AS: Percentage of tokens with correct head and dependency label ● EM: completely correct labeled dependency tree

Results Non-Projective Dependency Parsing ● AS ● Performance of 𝑇 𝑣 is better for for: – Czech and Slovene  more non-porjective arcs in this languages. ● In AS 𝑇 𝑣 is lower than 𝑇 𝑞 , however the drop is not really significant ● For Arabic the results are not meaningful since there are only 11 non- projective arcs in the whole set ● ME ● 𝑇 𝑣 outperforms all other parsers. ● The positive effect of 𝑇 𝑣 is dependent on the non-projectivity arcs in the language

References ● Joakim Nivre, Jens Nilsson, Johan Hall, Atanas Chanev, Gülsen Eryigit, Sandra Kübler, Svetoslav Marinov, and Erwin Marsi. Maltparser: a language- independent system for data-driven dependency parsing. Natural Language Engineering , 13(1):1 – 41, 2007. ● Joakim Nivre and Ryan McDonald. Integrating graph-based and transition- based dependency parsers. In Proceedings of ACL-08: HLT , pages 950 – 958, Columbus, Ohio, June 2008. ● Joakim Nivre. Non-projective dependency parsing in expected linear time. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP , pages 351 – 359, Suntec, Singapore, 2009. ● Sandra Kübler, Ryan McDonald, Joakim Nivre. Dependency Parsing, Morgan & Claypool Publishers, 2009

Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 - PowerPoint PPT Presentation

Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 David Przybilla davida@coli.uni-saarland.de Outline 1. MaltParser 2. Transition Based Parsing a. Example b. Oracle 3. Integrating Graph and Transition Based 4. Non

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

THE TRANSITION YEAR PROGRAMME AN OVERVIEW TRANSITION YEAR Transition Year is a one year school

Transition Year Overview November 2019 Transition Year Transition Year is a one year school

Good Transition Planning and Coordination Transition Transition Children in Wales Plan Plan

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Turbulence and CFD models 1 Roadmap 1. Transition to turbulence in shear flows 2 Transition to

Update on Nigeria Polio Transition Planning Process NIGERIA Transition Independent Monitoring

Rights and Responsibilities of Transition Alameda County Transition Information Fair March 23,

RESUR GAM What is the Just Energy Transition? Just transition was a set of practices lived by

Transition Teams ISD Steve Simpson, EH&S Leslie Ginder, UHR Transition Team Purpose

Transition-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Introduction The 5 Phases of the ICD-10 Transition 1. Engaging and educating Physicians and Staff

A Guide to the Transition Year Programme 1 The Aims of Transition Year To provide a

TRANSITION TO MIDDLE SCHOOL PROGRAMMING IN THE DW POPPY COMMUNITY TRANSITION COMMITTEE MEETING

Medicaid & Exchange Advisory Board Monday, September 9, 2013 Agenda Transition Plan &

New Mexico State Investment Council Transition Management Review February 28, 2017 New Mexico

72 \ (2)(3) = 6 Pant 2 53 k n n n 1 2 k n n ... n n k 1 2 k 1

Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE L INEAR P ROGRAMMING brewers problem

Transparent System Introspection in Support of Analyzing Stealthy Malware Kevin Leach PhD

Generative Grammar Linguistics is a branch of cognitive psychology. It is the study of a

Number of solutions to a linear system We just proved: If u 1 is a solution to a linear system

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference Alexandre Sablayrolles,

Linear Programming Lecturer: Shi Li Department of Computer Science and Engineering University at

Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 - PowerPoint PPT Presentation

Transition-Based Dependency Parsing Saarbrcken, December 23rd 2011 David Przybilla davida@coli.uni-saarland.de Outline 1. MaltParser 2. Transition Based Parsing a. Example b. Oracle 3. Integrating Graph and Transition Based 4. Non

Phase Transition in 3SAT Yi Zhou Phase Transition in 3SAT Phase Transition in 3SAT Fine Grained

THE TRANSITION YEAR PROGRAMME AN OVERVIEW TRANSITION YEAR Transition Year is a one year school

Transition Year Overview November 2019 Transition Year Transition Year is a one year school

Good Transition Planning and Coordination Transition Transition Children in Wales Plan Plan

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Turbulence and CFD models 1 Roadmap 1. Transition to turbulence in shear flows 2 Transition to

Update on Nigeria Polio Transition Planning Process NIGERIA Transition Independent Monitoring

Rights and Responsibilities of Transition Alameda County Transition Information Fair March 23,

RESUR GAM What is the Just Energy Transition? Just transition was a set of practices lived by

Transition Teams ISD Steve Simpson, EH&amp;S Leslie Ginder, UHR Transition Team Purpose

Transition-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Introduction The 5 Phases of the ICD-10 Transition 1. Engaging and educating Physicians and Staff

A Guide to the Transition Year Programme 1 The Aims of Transition Year To provide a

TRANSITION TO MIDDLE SCHOOL PROGRAMMING IN THE DW POPPY COMMUNITY TRANSITION COMMITTEE MEETING

Medicaid &amp; Exchange Advisory Board Monday, September 9, 2013 Agenda Transition Plan &amp;

New Mexico State Investment Council Transition Management Review February 28, 2017 New Mexico

72 \ (2)(3) = 6 Pant 2 53 k n n n 1 2 k n n ... n n k 1 2 k 1

Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE L INEAR P ROGRAMMING brewers problem

Transparent System Introspection in Support of Analyzing Stealthy Malware Kevin Leach PhD

Generative Grammar Linguistics is a branch of cognitive psychology. It is the study of a

Number of solutions to a linear system We just proved: If u 1 is a solution to a linear system

White-box vs Black-box: Bayes Optimal Strategies for Membership Inference Alexandre Sablayrolles,

Linear Programming Lecturer: Shi Li Department of Computer Science and Engineering University at

Transition Teams ISD Steve Simpson, EH&S Leslie Ginder, UHR Transition Team Purpose

Medicaid & Exchange Advisory Board Monday, September 9, 2013 Agenda Transition Plan &