Ensemble Models for Dependency Parsing: Cheap and Good? Mihai - PowerPoint PPT Presentation

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning Stanford University June 3, 2010

Ensemble Parsing Parser ¡2 ¡ Parser ¡1 ¡ Parser ¡3 ¡ Ensemble ¡Parser ¡ Parser ¡4 ¡ Parser ¡6 ¡ Parser ¡5 ¡

Ensemble Parsing Parser ¡2 ¡ Parser ¡1 ¡ Parser ¡3 ¡ ? ¡ Ensemble ¡Parser ¡ Parser ¡4 ¡ Parser ¡6 ¡ Parser ¡5 ¡ Many questions still unanswered despite all the previous work This work: empirical answers for projective English dependency parsing

Setup Corpus: syntactic dependencies of the CoNLL 2008-09 shared tasks 7 individual parsing models: Devel In domain Out of domain LAS LAS LAS MST 85.36 87.07 80.48 Malt → 84.24 85.96 78.74 AE Malt → 83.75 85.61 78.55 CN Malt → 83.74 85.36 77.23 AS Malt ← 82.43 83.90 76.69 AS Malt ← 81.75 83.53 77.29 CN Malt ← 80.76 82.51 76.18 AE

Scoring Models for Parser Combination Parser ¡3 ¡ Parser ¡1 ¡ Parser ¡2 ¡ Dependency ¡Scoring ¡ Output ¡Construc<on ¡ Ensemble ¡

Scoring Models for Parser Combination Parser ¡3 ¡ Parser ¡1 ¡ Parser ¡2 ¡ Dependency ¡Scoring ¡ Output ¡Construc<on ¡ Ensemble ¡ Which scoring model is best? → Unweighted voting? → Weighted voting? Weighted by what? → Meta-classification?

Scoring Models: Voting Unweighted Weighted by Weighted by Weighted by ... POS of modifier label of dep. dep. length LAS LAS LAS LAS 3 86.03 86.02 85.53 85.85 4 86.79 86.68 86.38 86.46 5 86.98 86.95 86.60 86.87 6 87.14 87.17 86.74 86.91 7 86.81 86.82 86.50 86.71 Weighting does not really make a difference! More individual parsers helps, but up to a point.

Scoring Models: Meta-classification Can we improve dependency scoring through meta-classification?

Scoring Models: Meta-classification Can we improve dependency scoring through meta-classification? No. We implemented a L2-regularized logistic regression classifier → using as features: identifiers of the base models, POS tags of head and modifier, labels of dependencies, length of dependencies, length of sentence, and combinations of the above. No improvement over the unweighted voting approach. →

Meta-classification Analysis Minority dependencies (MD): dependencies that disagree with the majority vote. Precision of MDs: ratio of MDs in a given context (e.g., POS of modifier is NN and parser is MST) that are correct. Meta-classification can outperform majority vote only when the number of MDs in contexts with precision > 50 % is large. → But these are less than 0.7% of total dependencies!

Re-parsing Algorithms Parser ¡1 ¡ Parser ¡2 ¡ Parser ¡3 ¡ Dependency ¡Scoring ¡ Output ¡Construc<on ¡ Ensemble ¡ How common are badly-formed trees for word-by-word combination? Which is the best re-parsing strategy?

Re-parsing Algorithms In domain Out of domain Zero roots 0.83% 0.70% Multiple roots 3.37% 6.11% Cycles 4.29% 4.23% Total 7.46% 9.64% Percentage of badly-formed trees for word-by-word combination

Re-parsing Algorithms In domain Out of domain Zero roots 0.83% 0.70% Multiple roots 3.37% 6.11% Cycles 4.29% 4.23% Total 7.46% 9.64% Percentage of badly-formed trees for word-by-word combination In domain Out of domain LAS LAS Word by word ( O ( N ) ) 88.89 82.13 ∗ Eisner (exact – O ( N 3 ) ) 88.83 ∗ 81.99 Attardi (approximate – O ( N ) ) 88.70 81.82 Performance of re-parsing algorithms Badly-formed trees are common! But approximate re-parsing algorithms perform as well as exact ones! ∗ indicates statistical significance over the next lower ranked model

Combination Strategies How important is it to combine parsers at learning time? → E.g., stacking: MST Malt = MST + Malt features

Combination Strategies How important is it to combine parsers at learning time? → E.g., stacking: MST Malt = MST + Malt features In domain Out of domain LAS LAS ensemble 3 88.83 ∗ 81.99 ∗ 100 % ensemble 1 88.01 ∗ 80.78 100 % ensemble 3 87.45 81.12 50 % 87.45 ∗ 80.25 ∗ MST Malt ensemble 1 86.74 79.44 50 % The advantages gained from combining parsers at learning time can be easily surpassed by runtime combination models that have access to more base parsers! The ensemble models are more robust out of domain

Comparison with State of the Art Parsers In domain Out of domain LAS LAS 90.13 ∗ 82.81 ∗ CoNLL 2008 #1 (Johansson and Nugues) ensemble 3 88.83 ∗ 81.99 ∗ 100 % 88.14 80.80 CoNLL 2008 #2 (Zhang et al.) ensemble 1 88.01 80.78 100 % Our best ensemble model is second In the out-of-domain corpus, performance is within 1% LAS of a parser that uses second-order features and is O ( N 4 ) The ensemble models are more robust out of domain

Conclusion: Less Is More The diversity of base parsers is more important than complex learning models for parser combination (e.g., meta-classification, stacking) Well-formed dependency trees can be guaranteed without significant performance loss by linear-time approximate re-parsing algorithms Unweighted voting performs as well as weighted voting for the re-parsing of candidate dependencies Ensemble parsers that are both accurate and fast can be rapidly developed with minimal effort

Thank you! Many thanks to Johan Hall, Joakim Nivre, Ryan McDonald, and Giuseppe Attardi Code: www.surdeanu.name/mihai/ensemble/ Questions?

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai - PowerPoint PPT Presentation

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning Stanford University June 3, 2010 Ensemble Parsing Parser 2 Parser 1 Parser 3 Ensemble Parser Parser 4

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre,

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Marina Valeeva Outline 2 1. Introduction What is Dependency Parsing? What is a

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen & Christopher D.

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

MALT & NUMAPROF , Memory Profiling for HPC Applications SBASTIEN VALAT FOSDEM 2019

MALT : MALloc Tracker A memory profiling tool 3/02/2019 MALT, Sbastien Valat 1 Questions

Using Hardware Features for Increased Debugging Transparency Fengwei Zhang, Kevin Leach, Angelos

MALT: Distributed Data Parallelism for Existing ML Applications Hao Li*, Asim Kadav, Erik Kruus,

Transparent System Introspection in Support of Analyzing Stealthy Malware Kevin Leach PhD

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE L INEAR P ROGRAMMING brewers problem

Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael

72 \ (2)(3) = 6 Pant 2 53 k n n n 1 2 k n n ... n n k 1 2 k 1

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai - PowerPoint PPT Presentation

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning Stanford University June 3, 2010 Ensemble Parsing Parser 2 Parser 1 Parser 3 Ensemble Parser Parser 4

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Natural Language Processing Other Syntactic Models Parsing IV Dan Klein UC Berkeley Dependency

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Dependency Parsing CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre, Dan

Dependency Parsing 2 CMSC 723 / LING 723 / INST 725 Marine Carpuat Fig credits: Joakim Nivre,

Dependency Parsing &amp; Feature-based Parsing Ling571 Deep Processing Techniques for NLP

Marina Valeeva Outline 2 1. Introduction What is Dependency Parsing? What is a

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen &amp; Christopher D.

Cheap Talk Games: Extensions Cheap Talk Games: Extensions F. Koessler / November 12, 2008 Cheap

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Lecture 19: Dependency Grammars and Dependency Parsing Julia Hockenmaier juliahmr@illinois.edu

Thoughts on Learner Data and Motivation Learner Language Dependency Parsing and Dependency

Dependency Grammars and Parsing CMSC 473/673 UMBC Outline Review: PCFGs and CKY Dependency

MALT &amp; NUMAPROF , Memory Profiling for HPC Applications SBASTIEN VALAT FOSDEM 2019

MALT : MALloc Tracker A memory profiling tool 3/02/2019 MALT, Sbastien Valat 1 Questions

Using Hardware Features for Increased Debugging Transparency Fengwei Zhang, Kevin Leach, Angelos

MALT: Distributed Data Parallelism for Existing ML Applications Hao Li*, Asim Kadav, Erik Kruus,

Transparent System Introspection in Support of Analyzing Stealthy Malware Kevin Leach PhD

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE L INEAR P ROGRAMMING brewers problem

Inspecting the Structural Biases of Dependency Parsing Algorithms Yoav Goldberg and Michael

72 \ (2)(3) = 6 Pant 2 53 k n n n 1 2 k n n ... n n k 1 2 k 1

Dependency Parsing & Feature-based Parsing Ling571 Deep Processing Techniques for NLP

A Fast and Accurate Dependency Parser using Neural Networks Danqi Chen & Christopher D.

MALT & NUMAPROF , Memory Profiling for HPC Applications SBASTIEN VALAT FOSDEM 2019