Multiword Expression Identification with Tree Substitution Grammars - - PowerPoint PPT Presentation

multiword expression identification with tree
SMART_READER_LITE
LIVE PREVIEW

Multiword Expression Identification with Tree Substitution Grammars - - PowerPoint PPT Presentation

Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic context to find multiword expressions


slide-1
SLIDE 1

Multiword Expression Identification with Tree Substitution Grammars

Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011

slide-2
SLIDE 2

Main Idea

Use syntactic context to find multiword expressions

slide-3
SLIDE 3

Main Idea

Use syntactic context to find multiword expressions Syntactic context → constituency parses

slide-4
SLIDE 4

Main Idea

Use syntactic context to find multiword expressions Syntactic context → constituency parses Multiword expressions → idiomatic constructions

slide-5
SLIDE 5

Which languages?

Results and analysis for French

3 / 42

slide-6
SLIDE 6

Which languages?

Results and analysis for French

◮ Lexicographic tradition of compiling MWE lists ◮ Annotated data!

3 / 42

slide-7
SLIDE 7

Which languages?

Results and analysis for French

◮ Lexicographic tradition of compiling MWE lists ◮ Annotated data!

English examples in the talk

3 / 42

slide-8
SLIDE 8

Motivating Example: Humans get this

  • 1. He kicked the pail.
  • 2. He kicked the bucket.

◮ “He died.”

(Katz and Postal 1963)

4 / 42

slide-9
SLIDE 9

Stanford parser can’t tell the difference

S NP He VP kicked NP the pail

5 / 42

slide-10
SLIDE 10

Stanford parser can’t tell the difference

S NP He VP kicked NP the pail S NP He VP kicked NP the bucket

5 / 42

slide-11
SLIDE 11

What does the lexicon contain?

Single-word entries?

◮ kick : <agent, theme> ◮ die : <theme>

Multi-word entries?

◮ kick the bucket : <theme>

S NP He VP kicked NP the bucket

6 / 42

slide-12
SLIDE 12

Lexicon-Grammar: He kicked the bucket

S NP He VP died

7 / 42

slide-13
SLIDE 13

Lexicon-Grammar: He kicked the bucket

S NP He VP died S NP He VP MWV kicked the bucket

(Gross 1986)

7 / 42

slide-14
SLIDE 14

MWEs in Lexicon-Grammar

Classified by global POS Described by internal POS sequence Flat structures! MWV VBD kicked DT the NN bucket

8 / 42

slide-15
SLIDE 15

MWEs in Lexicon-Grammar

Classified by global POS Described by internal POS sequence Flat structures! MWV VBD kicked DT the NN bucket Of theoretical interest but...

8 / 42

slide-16
SLIDE 16

Why do we care (in NLP)?

MWE knowledge improves: Dependency parsing

(Nivre and Nilsson 2004)

Constituency parsing

(Arun and Keller 2005)

Sentence generation

(Hogan et al. 2007)

Machine translation

(Carpuat and Diab 2010)

Shallow parsing

(Korkontzelos and Manandhar 2010)

9 / 42

slide-17
SLIDE 17

Why do we care (in NLP)?

MWE knowledge improves: Dependency parsing

(Nivre and Nilsson 2004)

Constituency parsing

(Arun and Keller 2005)

Sentence generation

(Hogan et al. 2007)

Machine translation

(Carpuat and Diab 2010)

Shallow parsing

(Korkontzelos and Manandhar 2010)

Most experiments assume high accuracy identification!

9 / 42

slide-18
SLIDE 18

French and the French Treebank

MWEs common in French

◮ ∼5,000 multiword adverbs

10 / 42

slide-19
SLIDE 19

French and the French Treebank

MWEs common in French

◮ ∼5,000 multiword adverbs

Paris 7 French Treebank

◮ ∼16,000 trees ◮ 13% of tokens are MWE

MWC P sous N prétexte C que

  • n the grounds that

10 / 42

slide-20
SLIDE 20

French Treebank: MWE types

N ADV P C V D ADV PRO CL ET I

10 20 30 40 50 %Total MWEs

G l

  • b

a l P O S Lots of nominal compounds e.g. N – N numéro deux

11 / 42

slide-21
SLIDE 21

MWE Identification Evaluation

Identification is a by-product of parsing

12 / 42

slide-22
SLIDE 22

MWE Identification Evaluation

Identification is a by-product of parsing

◮ Corpus: Paris 7 French Treebank (FTB) ◮ Split: same as (Crabbé and Candito 2008) ◮ Metrics: Precision and Recall ◮ Lengths ≤ 40 words

12 / 42

slide-23
SLIDE 23

MWE Identification: Parent-Annotated PCFG

PA-PCFG

20 40 60

32.6

F1

13 / 42

slide-24
SLIDE 24

MWE Identification: n-gram methods

PA-PCFG mwetoolkit

20 40 60

32.6 34.7

F1

14 / 42

slide-25
SLIDE 25

MWE Identification: n-gram methods

PA-PCFG mwetoolkit

20 40 60

32.6 34.7

F1

Standard approach in 2008 MWE Shared Task, MWE Workshops, etc.

14 / 42

slide-26
SLIDE 26

n-gram methods: mwetoolkit

Based on surface statistics

15 / 42

slide-27
SLIDE 27

n-gram methods: mwetoolkit

Based on surface statistics Step 1: Lemmatize and POS tag corpus

15 / 42

slide-28
SLIDE 28

n-gram methods: mwetoolkit

Based on surface statistics Step 1: Lemmatize and POS tag corpus Step 2: Compute n-gram statistics:

◮ Maximum likelihood estimator ◮ Dice’s coefficient ◮ Pointwise mutual information ◮ Student’s t-score

(Ramisch, Villavicencio, and Boitet 2010)

15 / 42

slide-29
SLIDE 29

n-gram methods: mwetoolkit

Step 3: Create n-gram feature vectors

16 / 42

slide-30
SLIDE 30

n-gram methods: mwetoolkit

Step 3: Create n-gram feature vectors Step 4: Train a binary classifier

16 / 42

slide-31
SLIDE 31

n-gram methods: mwetoolkit

Step 3: Create n-gram feature vectors Step 4: Train a binary classifier Exploits statistical idiomaticity of MWEs

16 / 42

slide-32
SLIDE 32

Is statistical idiomaticity sufficient?

French multiword verbs Tree maintains relationship between MWV parts VN MWV va MWADV d’ ailleurs MWV bon train is also well underway

17 / 42

slide-33
SLIDE 33

Recap: French MWE Identification Baselines

PA-PCFG mwetoolkit

20 40 60

32.6 34.7

F1

18 / 42

slide-34
SLIDE 34

Recap: French MWE Identification Baselines

PA-PCFG mwetoolkit

20 40 60

32.6 34.7

F1

Let’s build a better grammar

18 / 42

slide-35
SLIDE 35

Better PCFGs: Manual grammar splits

Symbol refinement à la (Klein

and Manning 2003)

19 / 42

slide-36
SLIDE 36

Better PCFGs: Manual grammar splits

Symbol refinement à la (Klein

and Manning 2003)

◮ Has a verbal nucleus

(VN)

19 / 42

slide-37
SLIDE 37

Better PCFGs: Manual grammar splits

Symbol refinement à la (Klein

and Manning 2003)

◮ Has a verbal nucleus

(VN) COORD C Ou ADV bien VN doit -il ... Otherwise he must

19 / 42

slide-38
SLIDE 38

Better PCFGs: Manual grammar splits

Symbol refinement à la (Klein

and Manning 2003)

◮ Has a verbal nucleus

(VN) COORD-hasVN C Ou ADV bien VN doit -il ... Otherwise he must

20 / 42

slide-39
SLIDE 39

French MWE Identification: Manual Splits

P A

  • P

C F G m w e t

  • l

k i t S p l i t s

20 40 60 80

32.6 34.7 63.1

F 1

21 / 42

slide-40
SLIDE 40

French MWE Identification: Manual Splits

P A

  • P

C F G m w e t

  • l

k i t S p l i t s

20 40 60 80

32.6 34.7 63.1

F 1

MWE features: high frequency POS sequences

21 / 42

slide-41
SLIDE 41

Capture more syntactic context?

PCFGs work well!

22 / 42

slide-42
SLIDE 42

Capture more syntactic context?

PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG)

22 / 42

slide-43
SLIDE 43

Capture more syntactic context?

PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) Relationship with Data-Oriented Parsing (DOP):

◮ Same grammar formalism (TSG) ◮ We include unlexicalized fragments ◮ Different parameter estimation

22 / 42

slide-44
SLIDE 44

Which tree fragments do we select?

S NP N He VP MWV V kicked D the N bucket

23 / 42

slide-45
SLIDE 45

Which tree fragments do we select?

S NP N He VP MWV V kicked D the N bucket

24 / 42

slide-46
SLIDE 46

Which tree fragments do we select?

NP N He V kicked MWV V D the N bucket S NP VP MWV

25 / 42

slide-47
SLIDE 47

TSG Grammar Extraction as Tree Selection

MWV V D the N bucket

26 / 42

slide-48
SLIDE 48

TSG Grammar Extraction as Tree Selection

MWV V D the N bucket

◮ Describes MWE context ◮ Allows for inflection: kick, kicked, kicking

26 / 42

slide-49
SLIDE 49

Dirichlet process TSG (DP-TSG)

Tree selection as non-parametric clustering1

1Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009;

O’Donnell, Tenenbaum, and Goodman 2009.

27 / 42

slide-50
SLIDE 50

Dirichlet process TSG (DP-TSG)

Tree selection as non-parametric clustering1 Labeled Chinese Restaurant process

◮ Dirichlet process (DP) prior for each non-terminal

type c

1Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009;

O’Donnell, Tenenbaum, and Goodman 2009.

27 / 42

slide-51
SLIDE 51

Dirichlet process TSG (DP-TSG)

Tree selection as non-parametric clustering1 Labeled Chinese Restaurant process

◮ Dirichlet process (DP) prior for each non-terminal

type c Supervised case: segment the treebank

1Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009;

O’Donnell, Tenenbaum, and Goodman 2009.

27 / 42

slide-52
SLIDE 52

DP-TSG: Learning and Inference

DP base distribution from manually-split CFG

28 / 42

slide-53
SLIDE 53

DP-TSG: Learning and Inference

DP base distribution from manually-split CFG Type-based Gibbs sampler

(Liang, Jordan, and Klein 2010)

◮ Fast convergence: 400 iterations

28 / 42

slide-54
SLIDE 54

DP-TSG: Learning and Inference

DP base distribution from manually-split CFG Type-based Gibbs sampler

(Liang, Jordan, and Klein 2010)

◮ Fast convergence: 400 iterations

Derivations of a TSG are a CFG forest

28 / 42

slide-55
SLIDE 55

DP-TSG: Learning and Inference

DP base distribution from manually-split CFG Type-based Gibbs sampler

(Liang, Jordan, and Klein 2010)

◮ Fast convergence: 400 iterations

Derivations of a TSG are a CFG forest

◮ SCFG decoder: cdec

(Dyer et al. 2010)

28 / 42

slide-56
SLIDE 56

French MWE Identification: DP-TSG

P A

  • P

C F G m w e t

  • l

k i t S p l i t s D P

  • T

S G

20 40 60 80

32.6 34.7 63.1 71.1

F1

29 / 42

slide-57
SLIDE 57

French MWE Identification: DP-TSG

P A

  • P

C F G m w e t

  • l

k i t S p l i t s D P

  • T

S G

20 40 60 80

32.6 34.7 63.1 71.1

F1

DP-TSG result is a lower bound

29 / 42

slide-58
SLIDE 58

Human-interpretable DP-TSG rules

MWN → coup de N coup de pied ‘kick’ coup de coeur ‘favorite’ coup de foudre ‘love at first sight’ coup de main ‘help’ coup de grâce ‘death blow’

30 / 42

slide-59
SLIDE 59

Human-interpretable DP-TSG rules

MWN → coup de N coup de pied ‘kick’ coup de coeur ‘favorite’ coup de foudre ‘love at first sight’ coup de main ‘help’ coup de grâce ‘death blow’ n-gram methods: separate feature vectors

30 / 42

slide-60
SLIDE 60

DP-TSG errors: Overgeneration

NP D Le N marché AP A national ‘The national march’ Reference NP D Le MWN N marché A national DP-TSG

31 / 42

slide-61
SLIDE 61

DP-TSG errors: Overgeneration

NP D Le N marché AP A national ‘The national march’ Reference NP D Le MWN N marché A national DP-TSG MWEs are subtle; reference sometimes inconsistent

31 / 42

slide-62
SLIDE 62

Standard Parsing Evaluation

Same setup as MWE identification!

32 / 42

slide-63
SLIDE 63

Standard Parsing Evaluation

Same setup as MWE identification!

◮ Corpus: Paris 7 French Treebank (FTB) ◮ Split: same as (Crabbé and Candito 2008) ◮ Metrics: Evalb and Leaf Ancestor ◮ Lengths ≤ 40 words

32 / 42

slide-64
SLIDE 64

French Parsing Evaluation: All bracketings

PA-PCFG Splits DP-TSG

60 70 80 90

67.6 75.2 75.8

Evalb F1

33 / 42

slide-65
SLIDE 65

French Parsing Evaluation: All bracketings

PA-PCFG Splits DP-TSG

60 70 80 90

67.6 75.2 75.8

Evalb F1

Paper: more results (Stanford, Berkeley, etc.)

33 / 42

slide-66
SLIDE 66

Future Directions

Syntactic context for n-gram methods

◮ Parse the corpus! ◮ Adapt lexical context measures to syntactic context

34 / 42

slide-67
SLIDE 67

Future Directions

Syntactic context for n-gram methods

◮ Parse the corpus! ◮ Adapt lexical context measures to syntactic context

DP-TSG

◮ Better base distribution

34 / 42

slide-68
SLIDE 68

Conclusion

Parsers work well for MWE identification

35 / 42

slide-69
SLIDE 69

Conclusion

Parsers work well for MWE identification Other languages: combine treebanks with MWE lists

35 / 42

slide-70
SLIDE 70

Conclusion

Parsers work well for MWE identification Other languages: combine treebanks with MWE lists Non-“gold mode” parsing results for French

35 / 42

slide-71
SLIDE 71

Conclusion

Parsers work well for MWE identification Other languages: combine treebanks with MWE lists Non-“gold mode” parsing results for French

Code → Google: “Stanford parser”

35 / 42

slide-72
SLIDE 72

un grand merci. thanks a lot.

slide-73
SLIDE 73

Questions?

slide-74
SLIDE 74

MWE Identification Results

P A

  • P

C F G m w e t

  • l

k i t S p l i t s B e r k e l e y S t a n f

  • r

d D P

  • T

S G

20 40 60 80

32.6 34.7 63.1 69.6 70.1 71.1

F1

38 / 42

slide-75
SLIDE 75

Dirichlet process TSG

DP prior for each non-terminal type c ∈ V: θc|c, αc, P0(·|c) ∼ DP(αc, P0) e|θc ∼ θc

2Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009;

O’Donnell, Tenenbaum, and Goodman 2009.

39 / 42

slide-76
SLIDE 76

Dirichlet process TSG

DP prior for each non-terminal type c ∈ V: θc|c, αc, P0(·|c) ∼ DP(αc, P0) e|θc ∼ θc Binary variable bs for each non-terminal node in corpus

◮ Supervised case: segment the treebank2

2Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009;

O’Donnell, Tenenbaum, and Goodman 2009.

39 / 42

slide-77
SLIDE 77

DP-TSG: Base distribution P0

Phrasal rules: P0(A+ → B− C+) = pMLE(A → B C) sB(1 − sC)

40 / 42

slide-78
SLIDE 78

DP-TSG: Base distribution P0

Phrasal rules: P0(A+ → B− C+) = pMLE(A → B C) sB(1 − sC) pMLE is the manually-split grammar! sB is the stop probability

40 / 42

slide-79
SLIDE 79

DP-TSG: Base distribution P0

Lexical insertion rules: P0(C+ → t) = pMLE(C → t) p(t)

41 / 42

slide-80
SLIDE 80

DP-TSG: Base distribution P0

Lexical insertion rules: P0(C+ → t) = pMLE(C → t) p(t) p(t) is unigram probability of word t

41 / 42

slide-81
SLIDE 81

Tree substitution grammars

A Probabilistic TSG is a 5-tuple V, Σ, R, ♦, θ c ∈ V are non-terminals ♦ ∈ V is a unique start symbol t ∈ Σ are terminals e ∈ R are elementary trees θc,e ∈ θ are parameters for each tree fragment

42 / 42

slide-82
SLIDE 82

Tree substitution grammars

A Probabilistic TSG is a 5-tuple V, Σ, R, ♦, θ c ∈ V are non-terminals ♦ ∈ V is a unique start symbol t ∈ Σ are terminals e ∈ R are elementary trees θc,e ∈ θ are parameters for each tree fragment elementary tree == tree fragment

42 / 42