Multiword Expression Identification with Tree Substitution Grammars - PowerPoint PPT Presentation

Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011

Main Idea Use syntactic context to find multiword expressions

Main Idea Use syntactic context to find multiword expressions Syntactic context → constituency parses

Main Idea Use syntactic context to find multiword expressions Syntactic context → constituency parses Multiword expressions → idiomatic constructions

Which languages? Results and analysis for French 3 / 42

Which languages? Results and analysis for French ◮ Lexicographic tradition of compiling MWE lists ◮ Annotated data! 3 / 42

Which languages? Results and analysis for French ◮ Lexicographic tradition of compiling MWE lists ◮ Annotated data! English examples in the talk 3 / 42

Motivating Example: Humans get this 1. He kicked the pail. 2. He kicked the bucket. ◮ “He died.” (Katz and Postal 1963) 4 / 42

Stanford parser can’t tell the difference S NP VP NP He kicked the pail 5 / 42

Stanford parser can’t tell the difference S S NP VP NP VP NP NP He He kicked kicked the pail the bucket 5 / 42

What does the lexicon contain? S Single-word entries? ◮ kick : <agent, theme> ◮ die : <theme> NP VP Multi-word entries? NP ◮ kick the bucket : <theme> He kicked the bucket 6 / 42

Lexicon-Grammar: He kicked the bucket S NP VP He died 7 / 42

Lexicon-Grammar: He kicked the bucket S S NP VP NP VP MWV He He died kicked the bucket (Gross 1986) 7 / 42

MWEs in Lexicon-Grammar Classified by global POS MWV Described by internal POS VBD DT NN sequence kicked the bucket Flat structures! 8 / 42

MWEs in Lexicon-Grammar Classified by global POS MWV Described by internal POS VBD DT NN sequence kicked the bucket Flat structures! Of theoretical interest but... 8 / 42

Why do we care (in NLP)? MWE knowledge improves: Dependency parsing (Nivre and Nilsson 2004) Constituency parsing (Arun and Keller 2005) Sentence generation (Hogan et al. 2007) Machine translation (Carpuat and Diab 2010) Shallow parsing (Korkontzelos and Manandhar 2010) 9 / 42

Why do we care (in NLP)? MWE knowledge improves: Dependency parsing (Nivre and Nilsson 2004) Constituency parsing (Arun and Keller 2005) Sentence generation (Hogan et al. 2007) Machine translation (Carpuat and Diab 2010) Shallow parsing (Korkontzelos and Manandhar 2010) Most experiments assume high accuracy identification! 9 / 42

French and the French Treebank MWEs common in French ◮ ∼ 5,000 multiword adverbs 10 / 42

French and the French Treebank MWC MWEs common in French ◮ ∼ 5,000 multiword adverbs P N C Paris 7 French Treebank ◮ ∼ 16,000 trees sous prétexte que ◮ 13% of tokens are MWE on the grounds that 10 / 42

French Treebank: MWE types I ET CL PRO Lots of nominal compounds ADV S e.g. N – N numéro deux O P D l a V b o l G C P ADV N 0 10 20 30 40 50 %Total MWEs 11 / 42

MWE Identification Evaluation Identification is a by-product of parsing 12 / 42

MWE Identification Evaluation Identification is a by-product of parsing ◮ Corpus: Paris 7 French Treebank (FTB) ◮ Split: same as (Crabbé and Candito 2008) ◮ Metrics: Precision and Recall ◮ Lengths ≤ 40 words 12 / 42

MWE Identification: Parent-Annotated PCFG 60 40 32.6 F1 20 0 PA-PCFG 13 / 42

MWE Identification: n -gram methods 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit 14 / 42

MWE Identification: n -gram methods 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit Standard approach in 2008 MWE Shared Task, MWE Workshops, etc. 14 / 42

n -gram methods: mwetoolkit Based on surface statistics 15 / 42

n -gram methods: mwetoolkit Based on surface statistics Step 1 : Lemmatize and POS tag corpus 15 / 42

n -gram methods: mwetoolkit Based on surface statistics Step 1 : Lemmatize and POS tag corpus Step 2 : Compute n -gram statistics: ◮ Maximum likelihood estimator ◮ Dice’s coefficient ◮ Pointwise mutual information ◮ Student’s t -score (Ramisch, Villavicencio, and Boitet 2010) 15 / 42

n -gram methods: mwetoolkit Step 3 : Create n -gram feature vectors 16 / 42

n -gram methods: mwetoolkit Step 3 : Create n -gram feature vectors Step 4 : Train a binary classifier 16 / 42

n -gram methods: mwetoolkit Step 3 : Create n -gram feature vectors Step 4 : Train a binary classifier Exploits statistical idiomaticity of MWEs 16 / 42

Is statistical idiomaticity sufficient? VN French multiword verbs MWV MWADV MWV Tree maintains relationship between MWV parts va d’ ailleurs bon train is also well underway 17 / 42

Recap: French MWE Identification Baselines 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit 18 / 42

Recap: French MWE Identification Baselines 60 40 34.7 32.6 F1 20 0 PA-PCFG mwetoolkit Let’s build a better grammar 18 / 42

Better PCFGs: Manual grammar splits Symbol refinement à la (Klein and Manning 2003) 19 / 42

Better PCFGs: Manual grammar splits Symbol refinement à la (Klein and Manning 2003) ◮ Has a verbal nucleus (VN) 19 / 42

Better PCFGs: Manual grammar splits COORD Symbol refinement à la (Klein and Manning 2003) C ADV VN ... ◮ Has a verbal nucleus doit -il (VN) Ou bien Otherwise he must 19 / 42

Better PCFGs: Manual grammar splits COORD- hasVN Symbol refinement à la (Klein and Manning 2003) C ADV VN ... ◮ Has a verbal nucleus doit -il (VN) Ou bien Otherwise he must 20 / 42

French MWE Identification: Manual Splits 80 63.1 60 34.7 40 32.6 1 F 20 0 G t s i k t F i l l o p C o S P t e - A w P m 21 / 42

French MWE Identification: Manual Splits 80 63.1 60 34.7 40 32.6 1 F 20 0 G t s i k t F i l l o p C o S P t e - A w P m MWE features: high frequency POS sequences 21 / 42

Capture more syntactic context? PCFGs work well! 22 / 42

Capture more syntactic context? PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) 22 / 42

Capture more syntactic context? PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) Relationship with Data-Oriented Parsing (DOP): ◮ Same grammar formalism (TSG) ◮ We include unlexicalized fragments ◮ Different parameter estimation 22 / 42

Which tree fragments do we select? S NP VP N MWV He V D N kicked the bucket 23 / 42

Which tree fragments do we select? S NP VP N MWV He V D N kicked the bucket 24 / 42

Which tree fragments do we select? MWV NP V S NP VP N kicked V D N MWV He the bucket 25 / 42

TSG Grammar Extraction as Tree Selection MWV V D N the bucket 26 / 42

TSG Grammar Extraction as Tree Selection MWV V D N the bucket ◮ Describes MWE context ◮ Allows for inflection: kick, kicked, kicking 26 / 42

Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering 1 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42

Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering 1 Labeled Chinese Restaurant process ◮ Dirichlet process (DP) prior for each non-terminal type c 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42

Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering 1 Labeled Chinese Restaurant process ◮ Dirichlet process (DP) prior for each non-terminal type c Supervised case: segment the treebank 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42

DP-TSG: Learning and Inference DP base distribution from manually-split CFG 28 / 42

DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler (Liang, Jordan, and Klein 2010) ◮ Fast convergence: 400 iterations 28 / 42

DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler (Liang, Jordan, and Klein 2010) ◮ Fast convergence: 400 iterations Derivations of a TSG are a CFG forest 28 / 42

DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler (Liang, Jordan, and Klein 2010) ◮ Fast convergence: 400 iterations Derivations of a TSG are a CFG forest ◮ SCFG decoder: cdec (Dyer et al. 2010) 28 / 42

French MWE Identification: DP-TSG 80 71.1 63.1 60 34.7 40 32.6 F1 20 0 G t s G i k t i F S l l o p C T o S P - t P e - w D A P m 29 / 42

French MWE Identification: DP-TSG 80 71.1 63.1 60 34.7 40 32.6 F1 20 0 G t s G i k t i F S l l o p C T o S P - t P e - w D A P m DP-TSG result is a lower bound 29 / 42

Human-interpretable DP-TSG rules MWN → coup de N coup de pied ‘kick’ coup de coeur ‘favorite’ coup de foudre ‘love at first sight’ coup de main ‘help’ coup de grâce ‘death blow’ 30 / 42

Multiword Expression Identification with Tree Substitution Grammars - PowerPoint PPT Presentation

Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic context to find multiword expressions

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multiword expressions: Insights from a multi-lingual perspective Manfred Sailer and Stella

A semi-supervised approach to extracting multiword entity names from user reviews Olga

When the whole is greater than the sum of its parts: Multiword expressions and idiomaticity

Multiword expressions: Getting the taste of things to come MWE 2017 Workshop Panel discussion

Learning Sentiment Polarity of Multiword Expressions M A X K A U F M A N N , N I C K C H E N ,

The Expression Problem and Lenses Lambdajam 2016 Tony Morris The Expression Problem A new name

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (1.1) C.

Tree Identification Tree Identification Summer Phase Summer Phase Learning to identify trees by

Tree Identification Techniques Massachusetts Qualified Tree Warden Course Learning Objectives

Differential expression analysis John Blischak Instructor DataCamp Differential Expression

Confluent Orthogonal Drawing for Syntax Diagrams S-expression ( S-expression

Lec 03. Regular expression, Pumping lemma Eunjung Kim F ORMAL DEFINITION OF R EGULAR EXPRESSION

1A God provides for the transition of the Kingship from David to Solomon in fulfillment of the

H-COUP 1. H-COUP ? 2. H-COUP ? 3. H-COUP

"Power Struggles and the Natural Resource Curse" Francisco Caselli (LSE) April 2006

Proof-Carrying Data: secure computation on untrusted execution platforms Eran Tromer Joint work

H-COUP toward version 2 1. Introduction: top-down vs. bottom-up Kentarou Mawatari

x F;.r,, p/e.r J 6.+1 1o frray' a !o t /VtlLrJ & lrcr" I v/rth r.s. lo e?i P.

Just-in-Time Data Structures Languages and Runtimes for Big Data Updates Slack Channel

Reconstructing thin shapes by a level set technique presented by: Oliver Dorn joint with: D.

Multiword Expression Identification with Tree Substitution Grammars - PowerPoint PPT Presentation

Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic context to find multiword expressions

Gene Expression Data Introduction to gene expression data Expression data storage concept An

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Multiword expressions: Insights from a multi-lingual perspective Manfred Sailer and Stella

A semi-supervised approach to extracting multiword entity names from user reviews Olga

When the whole is greater than the sum of its parts: Multiword expressions and idiomaticity

Multiword expressions: Getting the taste of things to come MWE 2017 Workshop Panel discussion

Learning Sentiment Polarity of Multiword Expressions M A X K A U F M A N N , N I C K C H E N ,

The Expression Problem and Lenses Lambdajam 2016 Tony Morris The Expression Problem A new name

Tree-sitter @maxbrunsfeld What is Tree-sitter? Why I wrote Tree-sitter What were

The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions (1.1) C.

Tree Identification Tree Identification Summer Phase Summer Phase Learning to identify trees by

Tree Identification Techniques Massachusetts Qualified Tree Warden Course Learning Objectives

Differential expression analysis John Blischak Instructor DataCamp Differential Expression

Confluent Orthogonal Drawing for Syntax Diagrams S-expression ( S-expression

Lec 03. Regular expression, Pumping lemma Eunjung Kim F ORMAL DEFINITION OF R EGULAR EXPRESSION

1A God provides for the transition of the Kingship from David to Solomon in fulfillment of the

H-COUP 1. H-COUP ? 2. H-COUP ? 3. H-COUP

&quot;Power Struggles and the Natural Resource Curse&quot; Francisco Caselli (LSE) April 2006

Proof-Carrying Data: secure computation on untrusted execution platforms Eran Tromer Joint work

H-COUP toward version 2 1. Introduction: top-down vs. bottom-up Kentarou Mawatari

x F;.r,, p/e.r J 6.+1 1o frray' a !o t /VtlLrJ &amp; lrcr&quot; I v/rth r.s. lo e?i P.

Just-in-Time Data Structures Languages and Runtimes for Big Data Updates Slack Channel

Reconstructing thin shapes by a level set technique presented by: Oliver Dorn joint with: D.

"Power Struggles and the Natural Resource Curse" Francisco Caselli (LSE) April 2006

x F;.r,, p/e.r J 6.+1 1o frray' a !o t /VtlLrJ & lrcr" I v/rth r.s. lo e?i P.