Subdomain Sensitive Statistical Parsing Barbara Plank & Khalil - PowerPoint PPT Presentation

Subdomain Sensitive Statistical Parsing using Raw Corpora Subdomain Sensitive Statistical Parsing Barbara Plank & Khalil Sima’an using Raw Corpora Introduction and Motivation Subdomain Sensitive Barbara Plank 1 and Khalil Sima’an 2 Statistical Parsing Subdomain Sensitive Parsers Parser Combination 1 Alfa Informatica, Faculty of Arts Techniques University of Groningen, The Netherlands Experiments and Results b.plank@rug.nl 2 Language and Computation, Faculty of Science Conclusions and Future Work University of Amsterdam, The Netherlands simaan@science.uva.nl LREC 2008 Marrakech, Morocco 1 / 21

Subdomain Outline Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Khalil Sima’an Introduction and Motivation Introduction and 1 Motivation Subdomain Sensitive Subdomain Sensitive Statistical Parsing using Raw Corpora Statistical Parsing 2 Subdomain Sensitive Subdomain Sensitive Parsers Parsers Parser Combination Techniques Parser Combination Techniques Experiments and Results Conclusions and Future Work Experiments and Results 3 Conclusions and Future Work 4 2 / 21

Subdomain Statistical parsing Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Khalil Sima’an Introduction and Motivation Problem: Ambiguity of Subdomain Sensitive natural language sentences Statistical Parsing Subdomain Sensitive Common approach: Train a Parsers Parser Combination parser/model on a treebank. Techniques Experiments and Apply to new input. Results Variations: Conclusions and Future Work phrase/dependency structure, formal grammar, statistical model and estimator. 3 / 21

Subdomain Motivation Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Is there more in a treebank that we might exploit? Khalil Sima’an We view a treebank as a mixture of subdomains, each Introduction and Motivation addressing certain concepts more than others Subdomain ”politics, stock market, financial news etc. can be Sensitive Statistical Parsing found in the WSJ“ (Kneser and Peters, 1997) Subdomain Sensitive Parsers The parsing statistics gathered from the treebank are Parser Combination Techniques averages over different subdomains, Experiments and Results Averages smooth out the differences between Conclusions and Future Work subdomains and weaken the biases 1 Do subdomains matter? 2 How to incorporate subdomain sensitivity into an existing state-of-the-art parser? 4 / 21

Subdomain Motivation - Our Approach Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Khalil Sima’an Introduction and Subdomains { c i } as hidden features Motivation Subdomain � P ( s , t ) = P ( s , c i ) P ( t | s , c i ) (1) Sensitive Statistical Parsing i Subdomain Sensitive Parsers This work: approximate it by creating an ensemble of parsers Parser Combination Techniques Experiments and Results Assumptions: Conclusions and Future Work We know a set of subdomains { c i , . . . , c k } Approximate � i by combining predictions of subdomains parsers 5 / 21

Subdomain Overview and Problem Statement Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Khalil Sima’an Introduction and Motivation Subdomain Sensitive Statistical Parsing Subdomain Sensitive Parsers Parser Combination Techniques Experiments and Results Conclusions and Future Work 6 / 21

Subdomain Creating subdomain-specific parsers Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Weight the trees in treebank TB with subdomain statistics Khalil Sima’an Use domain-dependent raw corpus C (flat sentences) Introduction and Motivation Induce statistical Language Model (LM) θ from C Subdomain Sensitive Assign a count f to every tree π i ∈ TB such that: Statistical Parsing Subdomain Sensitive f = average per-word “count” of yield y [ π i ] under LM θ Parsers Parser Combination Techniques Experiments and Results Conclusions and Future Work Retrain parser on subdomain-weighted TB θ . 7 / 21

Subdomain Overview of our approach - Details Sensitive Statistical Parsing using Raw Corpora Barbara Plank & Khalil Sima’an Introduction and Motivation Subdomain Sensitive Statistical Parsing Subdomain Sensitive Parsers Parser Combination Techniques Experiments and Results Conclusions and Future Work 8 / 21

Subdomain Parser Combination Techniques Sensitive Statistical Parsing using Raw Corpora How to combine them? Barbara Plank & Khalil Sima’an Introduction and Motivation Subdomain Sensitive Statistical Parsing Subdomain Sensitive Parsers Parser Combination Techniques Experiments and Results Conclusions and Future Work 9 / 21

Subdomain Parser Combination Techniques Sensitive Statistical Parsing using Raw Corpora How to combine them? Barbara Plank & Khalil Sima’an Introduction and Motivation Subdomain Sensitive Statistical Parsing Subdomain Sensitive Parsers Parser Combination Techniques Experiments and Results Conclusions and Future Work Parser Pre-selection: Parser Post-selection: selecting a parser selecting a parser after up-front (given: s ) parsing (given: s , t ) 9 / 21

Subdomain Pre-selection: Divergence Model (DVM) Sensitive Statistical Parsing using Raw Corpora Barbara Plank & We measure for every word how well it discriminates Khalil Sima’an between the subdomains using the notion of divergence. Introduction and Motivation The divergence of a word w in a subdomain i ∈ [1 . . . k ], Subdomain from all other ( k − 1) subdomains ( j ∈ [1 . . . k ] , j � = i ): Sensitive Statistical Parsing Subdomain Sensitive Parsers p θ i ( w ) � j � = i | log p θ j ( w ) | Parser Combination Techniques divergence i ( w ) = 1 + (2) Experiments and ( k − 1) Results Conclusions and � n Future Work x =1 divergence i ( w x ) divergence sent i ( w n 1 ) = (3) n Boundary issues: if p θ i ( w ) = 0 then divergence i ( w ) = 1, and if p θ j ( w ) = 0, then p θ j ( w ) = 10 − 15 (constant). 10 / 21

Subdomain Pre-selection: Divergence Model (DVM) - Example Sensitive Statistical Parsing using Raw Corpora For example, ’multi-million-dollar’ (score financial Barbara Plank & Khalil Sima’an domain: 5.5), ’equal’ (score all domains from 1.6 to 1.9) Introduction and Motivation Subdomain 7 Politics Sensitive Financial Statistical Parsing Sports 6 WSJ Subdomain Sensitive Parsers Parser Combination Techniques 5 Experiments and Results 4 Conclusions and Future Work 3 2 1 S m f s e 1 N e v s c p f l f K k C p a t B f A I a B A i e u l u e h i U n e e x 4 m u p u e u e e o n l r - t a s q u s e x r a i a e t v c l f p 2 t l n o m t c u t n y l r t r t y P e r c e u l i o - D a a t i c v t i d n l w u t a 6 e r i r l w a s i n p A n e s a - i n i n i t u b a i s h i u - t r n 3 i l i r s y n b g o e e C d e n t l i m i s s r n a c l a y t e a a b i t s e t i i s l e n e g d g i i u i g b a t e o r i e s e l t v c i s n t n d a e a l s r i s v s n s i s o i a e t l d e i n n r t - i t e s i n t n n t s i s r g r i e a g - c i a d s n t d e i o o g i n n l l g a r 11 / 21

Subdomain Post-Selection: Node Weighting + DVM (NW-DVM) Sensitive Statistical Parsing using Raw Corpora Barbara Plank & For parse tree π i with 1 ≤ i ≤ k and sentence w n 1 : Khalil Sima’an Introduction and � k � Motivation 1 � score ( c ) = δ [ c , π i ] (4) Subdomain k Sensitive i =1 Statistical Parsing Subdomain Sensitive Parsers Parser Combination Techniques � � 1 � Experiments and + λ ∗ divergence sent i ( w n score ( π i ) = (1 − λ ) score ( c ) 1 ) Results | π i | c ∈ π i Conclusions and Future Work (5) where | π i | is the size of the constituent set, and 0 < λ < 1 an interpolation factor. How well does the parse tree π i fit the domain? How well does w n 1 fit the domain? 12 / 21

Subdomain First Experiment: Variance among Parsers Sensitive Statistical Parsing using Raw Corpora Are subdomain parsers complementary? Barbara Plank & Khalil Sima’an Optimal decision procedure - an oracle: Introduction and Motivation π best oracle = argmax i f F-score( π i ) (6) Subdomain Sensitive Statistical Parsing Subdomain Sensitive Parsers Parser Combination Techniques Experiments and Results Conclusions and Future Work 13 / 21

Subdomain First Experiment: Variance among Parsers Sensitive Statistical Parsing using Raw Corpora Are subdomain parsers complementary? Barbara Plank & Khalil Sima’an Optimal decision procedure - an oracle: Introduction and Motivation π best oracle = argmax i f F-score( π i ) (6) Subdomain Sensitive ≤ 40 Statistical Parsing Subdomain Sensitive Parser LR LP F-score Parsers Parser Combination Section 00 (development set) Techniques Baseline 89.44 89.63 89.53 Experiments and Results Sports 88.95 88.83 88.89 Conclusions and Financial 89.01 88.84 88.92 Future Work Politics 88.86 88.70 88.78 Oracle combination 90.59 90.66 90.62 Improvement over baseline +1.15 +1.03 +1.09 Section 23 (test set) Baseline 88.77 88.87 88.82 Oracle combination 90.11 90.11 90.11 Improvement over baseline +1.34 +1.24 +1.29 13 / 21

Subdomain Sensitive Statistical Parsing Barbara Plank & Khalil - PowerPoint PPT Presentation

Subdomain Sensitive Statistical Parsing using Raw Corpora Subdomain Sensitive Statistical Parsing Barbara Plank & Khalil Simaan using Raw Corpora Introduction and Motivation Subdomain Sensitive Barbara Plank 1 and Khalil Simaan

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

A6: Sensitive Data Exposure A6 Sensitive Data Exposure Sensitive data stored or transmitted

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Compilers Shift-Reduce Parsing Alex Aiken Shift-Reduce Parsing Important Fact #1 about

Precision, Recall, and Sensitivity of Monitoring Partially Synchronous Distributed Systems

A Sensitivity Analysis of (and Practitioners Guide to) Convolutional Neural Networks for

Alias Analysis Last time Alias analysis I (pointer analysis) Address Taken FIAlias,

Corporate Earnings Sensitivity to FX Volatility: Evidence from Peru Alberto Humala Central

Sensitivity of Joint Estimation in Multi Agent Iterative Learning Control Angela Schoellig and

Lecture 13: Classification 6.0002 LECTURE 13 1 uncements Anno nounc Reading Chapter 24

Non-Profiled Deep Learning-based Side-Channel attacks with Sensitivity Analysis Benjamin Timon

Precision-Guided Context Sensitivity for Pointer Analysis Yue Li, Tian Tan, Anders Mller,