Tree-Adjoining Grammar Parsing and Vector Representations of - - PowerPoint PPT Presentation

tree adjoining grammar parsing and vector representations
SMART_READER_LITE
LIVE PREVIEW

Tree-Adjoining Grammar Parsing and Vector Representations of - - PowerPoint PPT Presentation

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W Tree-Adjoining Grammar Parsing and Vector Representations of Supertags Jungo Kasai Yale University December 14, 2017


slide-1
SLIDE 1

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Tree-Adjoining Grammar Parsing and Vector Representations of Supertags

Jungo Kasai

Yale University

December 14, 2017

slide-2
SLIDE 2

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-3
SLIDE 3

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-4
SLIDE 4

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Syntactic Parsing

S VP VP NP Mary V likes AdvP really NP John

Why do we need parsing? Does John love Mary? Does Mary love John? Understanding of a sentence depends on the structure

slide-5
SLIDE 5

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Context Free Grammars

S VP VP NP Mary V likes AdvP really NP John S → NP VP VP → AdvP VP AdvP → really VP → V NP NP → Mary NP → they NP → John V → like V → likes These production rules generate sentences

slide-6
SLIDE 6

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Context Free Grammars

S VP VP NP Mary V likes AdvP really NP John S → NP VP VP → AdvP VP AdvP → really VP → V NP NP → Mary NP → they NP → John V → like V → likes Fundamental problem: constraints are distributed over separate rules How do we choose V → like or V → likes?

slide-7
SLIDE 7

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Tree-Adjoining Grammar

Tree-Adjoining Grammar (TAG) localizes grammatical constraints Finite set of lexicalized elementary trees Finite set of operations (Substitution and Adjunction) are used to combine elementary trees

S VP NP1↓ V♦ likes NP0↓ S VP V♦ sleep NP0↓ NP N♦ John VP VP* AdvP Ad♦ really

slide-8
SLIDE 8

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Tree-Adjoining Grammar

Substitution

slide-9
SLIDE 9

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Tree-Adjoining Grammar

Adjunction

slide-10
SLIDE 10

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Tree-Adjoining Grammar

Adjunction allows for unbounded recursion while still enforcing agreement. John smartly occasionally really only likes Mary...

slide-11
SLIDE 11

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Derivation Tree

Derivation tree records the operations. Forms a dependency tree (each token has exactly one parent) likes Mary really John Subst 0 AdjSubst 1

ROOT John really likes Mary

ROOT Subst 0 ADJ Subst 1

slide-12
SLIDE 12

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Two Steps in TAG Parsing

Now the reverse process. Supertagging Assign elementary trees (supertags) to each token. Similar to POS tagging. Parsing Predict operations on the elementary trees.

S VP V♦ left NP0↓ NP* S S VP V♦ left NP

  • NONE-

NP0↓ NP*

slide-13
SLIDE 13

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-14
SLIDE 14

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Supertagging is a bottleneck

Supertagger Parser Stag Acc UAS LAS Gold Chart (MICA) 100.00 97.60 97.30 Maxent (MICA) Chart (MICA) 88.52 87.60 85.80

Supertagging is almost parsing There are about 5,000 supertags in the grammar About half of them occur only once in the training data (PTB WSJ Sections 1-22).

slide-15
SLIDE 15

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

BiLSTM Supertagging

Figure: BiLSTM Supertagger Architecture.

slide-16
SLIDE 16

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Supertagging is still a bottleneck

Supertagger Parser Stag Acc UAS LAS Maxent (MICA) Chart (MICA) 88.52 BiLSTM Chart (MICA) 89.32

slide-17
SLIDE 17

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Supertagging is still a bottleneck

Supertagger Parser Stag Acc UAS LAS Gold Chart (MICA) 100.00 97.60 97.30 Maxent (MICA) Chart (MICA) 88.52 87.60 85.80 BiLSTM Chart (MICA) 89.32 90.05 88.32

We can compensate for supertagging errors by exploiting structural similarities across elementary trees. Similarities across supertags are not utilized by the chart parser. We use two alternative families of parsing algorithms

slide-18
SLIDE 18

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-19
SLIDE 19

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Parsing Models

Prior Work: Unlexicalized Chart-Parser (MICA) [Bangalore et al., 2009] Unlexicalized Transition-based Parser [Kasai et al., 2017, Friedman et al., 2017] Graph-based Parser (work in progress)

slide-20
SLIDE 20

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

Arc-Eager System (MALT) [Nivre et al., 2006]

ROOT John really likes Mary

ROOT Subst 0 ADJ Subst1

slide-21
SLIDE 21

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-22
SLIDE 22

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-23
SLIDE 23

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-24
SLIDE 24

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-25
SLIDE 25

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-26
SLIDE 26

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-27
SLIDE 27

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based Parsing

slide-28
SLIDE 28

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Transition-based TAG Parsing

How do we learn? Represent the configuration by the top k elements from stack and buffer: {si, bi}k

i=1 [Chen and Manning, 2014].

Represent si (bi) by the TAG elementary tree and the derived substitution operations performed into si. Encode the TAG elementary trees and the substitution

  • perations with dense vectors.
slide-29
SLIDE 29

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

NN Transition-based Parsing Model

Figure: Transition-based Parser Neural Network Architecture.

slide-30
SLIDE 30

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Example

John really likes Mary

Stack Buffer Relations Action ROOT likes Mary {(ROOT, likes, ROOT), (likes, John, 0) · · · } RIGHT:1

slide-31
SLIDE 31

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Parsing Results

Gold Stags Predicted Stags Parsing Model UAS LAS UAS LAS MICA Chart 97.60 97.30 90.05 88.32 Transition-based 97.67 97.45 90.23 88.77 Table: Results on Section 00. Beam size 16.

Predicted supertags are from our BiLSTM supertagger.

slide-32
SLIDE 32

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-33
SLIDE 33

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Embeddings for Elementary Trees

slide-34
SLIDE 34

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Induced Embeddings

The input is a one-hot vector for each supertag; randomly initialized weights are trained. Embeddings do not have a priori knowledge of syntactic properties of the elementary trees

slide-35
SLIDE 35

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

PCA Plots of Vector Representations

Figure: Declarative/subject relative alignment (Atomic embeddings) ex: the man sneezed vs. the man who sneezed

slide-36
SLIDE 36

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

PCA Plots of Vector Representations

Figure: Transitive/intransitive alignment (Atomic embeddings). ex: the man who devoured the pizza vs. the man who sneezed

slide-37
SLIDE 37

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Analogy tests

Semantic analogies have been used to test word embeddings (e.g. [Mikolov et al., 2013]): − − → king : − − → man :: − − − − → queen : − − − − − → woman = ⇒ − − → king − − − → man + − − − − − → woman ≈ − − − − → queen

slide-38
SLIDE 38

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Analogy tests

We use syntactic analogies to test supertag embeddings: − − − →

  • trans. : −

− − − − − − → intransitive :: − − − − − − − − − − → subj.rel.trans. : − − − − − − − − − − − → subj.rel.intrans. = ⇒ − − − →

  • trans. − −

− − − − − − → intransitive + − − − − − − − − − − − → subj.rel.intrans. ≈ − − − − − − − − − − → subj.rel.trans.

S VP NP1↓ V♦ NP0↓ S VP V♦ NP0↓

NP* S S VP NP1↓ V♦ NP

  • NONE-

NP0↓ NP* NP* S S VP V♦ NP

  • NONE-

NP0↓ NP*

slide-39
SLIDE 39

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Formulation of Tests

Syntactic transformations used to construct analogies: Subject relativization Object relativization Subject wh-movement Object wh-movement Transitivization Passivization with a by phrase Passivization without a by phrase Infinitivization Dative shift

slide-40
SLIDE 40

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Analogy Test Results

n # equations % correct

  • Avg. position

300 246 50.40 7.98 4724 57220 4.62 289.48

Table: Analogy task results. (n is a number used to restrict which supertags are considered: For a given n, only equations for which all supertags are among the n most common supertags are considered.)

% correct: Percent of equations for which the left hand side’s closest cosine neighbor was the right hand side.

  • Avg. position: The position of the correct right hand side

in the list of supertag embeddings ranked by cosine distance from the left hand side.

slide-41
SLIDE 41

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-42
SLIDE 42

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Graph-based Parsing

[McDonald et al., 2005] Score n2 directed edges (n potential parents for each of the n tokens in a sentence) using features Find the maximum spanning tree (greedy + cycle fix)

slide-43
SLIDE 43

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Graph-based Parsing TAG Parsing

slide-44
SLIDE 44

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Comparison btw transition-based and graph-based

Rich feature representations with parse history v.s. global training/inference Transition-based parsers have parse history that naturally relates supertags that differ only by a certain operation (e.g. transitive/intransive) Transition-based parsers suffer from global error propagation Graph-based parsers assign scores independently Graph-based parser with BiLSTM feature representations (still no history)

slide-45
SLIDE 45

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

New Results

Parser UAS LAS MICA Chart 86.66 84.90 Transition-based Parsing 90.97 89.68 Joint Graph Parsing (POS+Stag) 93.26 91.89 Table: Parsing results on the test set.

slide-46
SLIDE 46

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-47
SLIDE 47

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Syntactically-oriented Textual Entailment

[Xu et al., 2017] E.g. The guy who left the room saw a squirrel = ⇒ The guy left the room = ⇒ The guy saw a squirrel

  • =

⇒ The room saw a squirrel

  • =

⇒ The guy saw an animal (not pure syntactic) Parse the original sentence and hypothesis Transform the parses using properties of supertags If the original sentence parse subsumes the hypothesis

  • ne, YES.
slide-48
SLIDE 48

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Syntactically-oriented Textual Entailment

ROOT the guy who left the room saw a squirrel

ADJ Subst 0 Subst 0 ADJ ADJ Subst 1 ROOT ADJ Subst 1

ROOT the guy saw a squirrel

ADJ Subst 0 ROOT ADJ Subst1

slide-49
SLIDE 49

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Syntactically-oriented Textual Entailment

ROOT the guy who left the room saw a squirrel

ADJ Subst 0 Subst 0 ADJ ADJ Subst 1 ROOT ADJ Subst 1

ROOT the guy left the room

ADJ Subst 0 ROOT ADJ Subst1

slide-50
SLIDE 50

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Syntactically-oriented Textual Entailment

System %A %P %R F1 [Rimell and Clark, 2010] 72.4 79.6 62.8 70.2 [Ng et al., 2010] 70.4 68.3 80.1 73.7 [Lien, 2014] 70.7 88.6 50.0 63.9 Transition-based TAG Parsing 72.4 85.4 56.4 68.0 Graph-based Method 78.1 86.3 68.6 76.4 Table: PETE test results. Precision (P), recall (R), and F1 are calculated for “entails.”

slide-51
SLIDE 51

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Supertag-based Semantic Role Labeling

Who did what to whom?

Peter hit Mary with a ball yesterday

WHO WHOM WHAT WHEN

slide-52
SLIDE 52

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Supertag-based Semantic Role Labeling

Syntactic Parsing and SRL are related. Supertags instead? (Work in Progress)

slide-53
SLIDE 53

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Supertag-based Semantic Role Labeling

Non-ensemble System P R F1 [FitzGerald et al., 2015] – – 87.3 [Roth and Lapata, 2016] 90.0 85.5 87.7 [Marcheggiani et al., 2017] 88.7 86.8 87.7 [Marcheggiani and Titov, 2017] 89.1 86.8 88.0 Supertag-based 89.0 88.2 88.6

Table: Non-ensemble system results on the CoNLL-2009 in-domain test set for English.

slide-54
SLIDE 54

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Outline

1

Background and Motivations

2

Supertagging Models

3

Parsing Models

4

Vector Representations of Supertags

5

Ongoing TAG Parsing Work

6

Applications of TAG

7

Future Work

slide-55
SLIDE 55

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Future Work

Further improvement of parsing Add a priori syntactic knowledge from supertags? Relax conditional independence in our graph-based parser More applications Semantic Parsing

slide-56
SLIDE 56

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Acknowledgement

Thank you! Robert Frank, Dan Friedman, Tom McCoy, William Merrill, Alexis Nasr, Dragomir Radev, Owen Rambow, and Pauli Xu Computational Linguistics at Yale (CLAY) lab

slide-57
SLIDE 57

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Bangalore, S., Boullier, P ., Nasr, A., Rambow, O., and Sagot, B. (2009). MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars. In NAACL HLT 2009 (Short Papers). Chen, D. and Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 740–750, Doha, Qatar. Association for Computational Linguistics. FitzGerald, N., T¨ ackstr¨

  • m, O., Ganchev, K., and Das, D.

(2015). Semantic role labeling with neural network factors.

slide-58
SLIDE 58

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 960–970, Lisbon, Portugal. Association for Computational Linguistics. Friedman, D., Kasai, J., McCoy, R. T., Frank, R., Davis, F ., and Rambow, O. (2017). Linguistically rich vector representations of supertags for tag parsing. In Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms, pages 122–131, Ume˚ a, Sweden. Association for Computational Linguistics. Kasai, J., Frank, R., McCoy, R. T., Rambow, O., and Nasr,

  • A. (2017).

TAG Parsing with Neural Networks and Vector Representations of Supertags. In Proceedings of EMNLP. Lien, E. (2014).

slide-59
SLIDE 59

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

Using minimal recursion semantics for entailment recognition. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, page 7684, Gothenburg, Sweden. Marcheggiani, D., Frolov, A., and Titov, I. (2017). A simple and accurate syntax-agnostic neural model for dependency-based semantic role labeling. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 411–420, Vancouver, Canada. Association for Computational Linguistics. Marcheggiani, D. and Titov, I. (2017). Encoding sentences with graph convolutional networks for semantic role labeling.

slide-60
SLIDE 60

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1507–1516, Copenhagen, Denmark. Association for Computational Linguistics. McDonald, R., Pereira, F ., Ribarov, K., and Hajic, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 523–530, Vancouver, British Columbia, Canada. Association for Computational Linguistics. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality.

slide-61
SLIDE 61

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc. Ng, D., Constable, J. W., Honnibal, M., and Curran, J. R. (2010). SCHWA: PETE using CCG dependencies with the C&C parser. In Proceedings of the 5th International Workshop on Semantic Evaluation, page 313316. Nivre, J., Hall, J., and Nilsson, J. (2006). Maltparser: A data-driven parser-generator for dependency parsing. In LREC. Rimell, L. and Clark, S. (2010). Cambridge: Parser evaluation using textual entailment by grammatical relation comparison.

slide-62
SLIDE 62

Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W

In Proceedings of the 5th International Workshop on Semantic Evaluation, pages 268–271. Roth, M. and Lapata, M. (2016). Neural semantic role labeling with dependency path embeddings. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1192–1202, Berlin, Germany. Association for Computational Linguistics. Xu, P ., Frank, R., Kasai, J., and Rambow, O. (2017). Tag parser evaluation using textual entailments. In Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms, pages 132–141, Ume˚ a, Sweden. Association for Computational Linguistics.