Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Tree-Adjoining Grammar Parsing and Vector Representations of - - PowerPoint PPT Presentation
Tree-Adjoining Grammar Parsing and Vector Representations of - - PowerPoint PPT Presentation
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W Tree-Adjoining Grammar Parsing and Vector Representations of Supertags Jungo Kasai Yale University December 14, 2017
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Syntactic Parsing
S VP VP NP Mary V likes AdvP really NP John
Why do we need parsing? Does John love Mary? Does Mary love John? Understanding of a sentence depends on the structure
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Context Free Grammars
S VP VP NP Mary V likes AdvP really NP John S → NP VP VP → AdvP VP AdvP → really VP → V NP NP → Mary NP → they NP → John V → like V → likes These production rules generate sentences
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Context Free Grammars
S VP VP NP Mary V likes AdvP really NP John S → NP VP VP → AdvP VP AdvP → really VP → V NP NP → Mary NP → they NP → John V → like V → likes Fundamental problem: constraints are distributed over separate rules How do we choose V → like or V → likes?
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Tree-Adjoining Grammar
Tree-Adjoining Grammar (TAG) localizes grammatical constraints Finite set of lexicalized elementary trees Finite set of operations (Substitution and Adjunction) are used to combine elementary trees
S VP NP1↓ V♦ likes NP0↓ S VP V♦ sleep NP0↓ NP N♦ John VP VP* AdvP Ad♦ really
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Tree-Adjoining Grammar
Substitution
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Tree-Adjoining Grammar
Adjunction
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Tree-Adjoining Grammar
Adjunction allows for unbounded recursion while still enforcing agreement. John smartly occasionally really only likes Mary...
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Derivation Tree
Derivation tree records the operations. Forms a dependency tree (each token has exactly one parent) likes Mary really John Subst 0 AdjSubst 1
ROOT John really likes Mary
ROOT Subst 0 ADJ Subst 1
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Two Steps in TAG Parsing
Now the reverse process. Supertagging Assign elementary trees (supertags) to each token. Similar to POS tagging. Parsing Predict operations on the elementary trees.
S VP V♦ left NP0↓ NP* S S VP V♦ left NP
- NONE-
NP0↓ NP*
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Supertagging is a bottleneck
Supertagger Parser Stag Acc UAS LAS Gold Chart (MICA) 100.00 97.60 97.30 Maxent (MICA) Chart (MICA) 88.52 87.60 85.80
Supertagging is almost parsing There are about 5,000 supertags in the grammar About half of them occur only once in the training data (PTB WSJ Sections 1-22).
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
BiLSTM Supertagging
Figure: BiLSTM Supertagger Architecture.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Supertagging is still a bottleneck
Supertagger Parser Stag Acc UAS LAS Maxent (MICA) Chart (MICA) 88.52 BiLSTM Chart (MICA) 89.32
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Supertagging is still a bottleneck
Supertagger Parser Stag Acc UAS LAS Gold Chart (MICA) 100.00 97.60 97.30 Maxent (MICA) Chart (MICA) 88.52 87.60 85.80 BiLSTM Chart (MICA) 89.32 90.05 88.32
We can compensate for supertagging errors by exploiting structural similarities across elementary trees. Similarities across supertags are not utilized by the chart parser. We use two alternative families of parsing algorithms
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Parsing Models
Prior Work: Unlexicalized Chart-Parser (MICA) [Bangalore et al., 2009] Unlexicalized Transition-based Parser [Kasai et al., 2017, Friedman et al., 2017] Graph-based Parser (work in progress)
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Arc-Eager System (MALT) [Nivre et al., 2006]
ROOT John really likes Mary
ROOT Subst 0 ADJ Subst1
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Transition-based TAG Parsing
How do we learn? Represent the configuration by the top k elements from stack and buffer: {si, bi}k
i=1 [Chen and Manning, 2014].
Represent si (bi) by the TAG elementary tree and the derived substitution operations performed into si. Encode the TAG elementary trees and the substitution
- perations with dense vectors.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
NN Transition-based Parsing Model
Figure: Transition-based Parser Neural Network Architecture.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Example
John really likes Mary
Stack Buffer Relations Action ROOT likes Mary {(ROOT, likes, ROOT), (likes, John, 0) · · · } RIGHT:1
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Parsing Results
Gold Stags Predicted Stags Parsing Model UAS LAS UAS LAS MICA Chart 97.60 97.30 90.05 88.32 Transition-based 97.67 97.45 90.23 88.77 Table: Results on Section 00. Beam size 16.
Predicted supertags are from our BiLSTM supertagger.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Embeddings for Elementary Trees
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Induced Embeddings
The input is a one-hot vector for each supertag; randomly initialized weights are trained. Embeddings do not have a priori knowledge of syntactic properties of the elementary trees
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
PCA Plots of Vector Representations
Figure: Declarative/subject relative alignment (Atomic embeddings) ex: the man sneezed vs. the man who sneezed
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
PCA Plots of Vector Representations
Figure: Transitive/intransitive alignment (Atomic embeddings). ex: the man who devoured the pizza vs. the man who sneezed
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Analogy tests
Semantic analogies have been used to test word embeddings (e.g. [Mikolov et al., 2013]): − − → king : − − → man :: − − − − → queen : − − − − − → woman = ⇒ − − → king − − − → man + − − − − − → woman ≈ − − − − → queen
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Analogy tests
We use syntactic analogies to test supertag embeddings: − − − →
- trans. : −
− − − − − − → intransitive :: − − − − − − − − − − → subj.rel.trans. : − − − − − − − − − − − → subj.rel.intrans. = ⇒ − − − →
- trans. − −
− − − − − − → intransitive + − − − − − − − − − − − → subj.rel.intrans. ≈ − − − − − − − − − − → subj.rel.trans.
S VP NP1↓ V♦ NP0↓ S VP V♦ NP0↓
NP* S S VP NP1↓ V♦ NP
- NONE-
NP0↓ NP* NP* S S VP V♦ NP
- NONE-
NP0↓ NP*
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Formulation of Tests
Syntactic transformations used to construct analogies: Subject relativization Object relativization Subject wh-movement Object wh-movement Transitivization Passivization with a by phrase Passivization without a by phrase Infinitivization Dative shift
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Analogy Test Results
n # equations % correct
- Avg. position
300 246 50.40 7.98 4724 57220 4.62 289.48
Table: Analogy task results. (n is a number used to restrict which supertags are considered: For a given n, only equations for which all supertags are among the n most common supertags are considered.)
% correct: Percent of equations for which the left hand side’s closest cosine neighbor was the right hand side.
- Avg. position: The position of the correct right hand side
in the list of supertag embeddings ranked by cosine distance from the left hand side.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Graph-based Parsing
[McDonald et al., 2005] Score n2 directed edges (n potential parents for each of the n tokens in a sentence) using features Find the maximum spanning tree (greedy + cycle fix)
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Graph-based Parsing TAG Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Comparison btw transition-based and graph-based
Rich feature representations with parse history v.s. global training/inference Transition-based parsers have parse history that naturally relates supertags that differ only by a certain operation (e.g. transitive/intransive) Transition-based parsers suffer from global error propagation Graph-based parsers assign scores independently Graph-based parser with BiLSTM feature representations (still no history)
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
New Results
Parser UAS LAS MICA Chart 86.66 84.90 Transition-based Parsing 90.97 89.68 Joint Graph Parsing (POS+Stag) 93.26 91.89 Table: Parsing results on the test set.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Syntactically-oriented Textual Entailment
[Xu et al., 2017] E.g. The guy who left the room saw a squirrel = ⇒ The guy left the room = ⇒ The guy saw a squirrel
- =
⇒ The room saw a squirrel
- =
⇒ The guy saw an animal (not pure syntactic) Parse the original sentence and hypothesis Transform the parses using properties of supertags If the original sentence parse subsumes the hypothesis
- ne, YES.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Syntactically-oriented Textual Entailment
ROOT the guy who left the room saw a squirrel
ADJ Subst 0 Subst 0 ADJ ADJ Subst 1 ROOT ADJ Subst 1
ROOT the guy saw a squirrel
ADJ Subst 0 ROOT ADJ Subst1
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Syntactically-oriented Textual Entailment
ROOT the guy who left the room saw a squirrel
ADJ Subst 0 Subst 0 ADJ ADJ Subst 1 ROOT ADJ Subst 1
ROOT the guy left the room
ADJ Subst 0 ROOT ADJ Subst1
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Syntactically-oriented Textual Entailment
System %A %P %R F1 [Rimell and Clark, 2010] 72.4 79.6 62.8 70.2 [Ng et al., 2010] 70.4 68.3 80.1 73.7 [Lien, 2014] 70.7 88.6 50.0 63.9 Transition-based TAG Parsing 72.4 85.4 56.4 68.0 Graph-based Method 78.1 86.3 68.6 76.4 Table: PETE test results. Precision (P), recall (R), and F1 are calculated for “entails.”
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Supertag-based Semantic Role Labeling
Who did what to whom?
Peter hit Mary with a ball yesterday
WHO WHOM WHAT WHEN
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Supertag-based Semantic Role Labeling
Syntactic Parsing and SRL are related. Supertags instead? (Work in Progress)
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Supertag-based Semantic Role Labeling
Non-ensemble System P R F1 [FitzGerald et al., 2015] – – 87.3 [Roth and Lapata, 2016] 90.0 85.5 87.7 [Marcheggiani et al., 2017] 88.7 86.8 87.7 [Marcheggiani and Titov, 2017] 89.1 86.8 88.0 Supertag-based 89.0 88.2 88.6
Table: Non-ensemble system results on the CoNLL-2009 in-domain test set for English.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Outline
1
Background and Motivations
2
Supertagging Models
3
Parsing Models
4
Vector Representations of Supertags
5
Ongoing TAG Parsing Work
6
Applications of TAG
7
Future Work
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Future Work
Further improvement of parsing Add a priori syntactic knowledge from supertags? Relax conditional independence in our graph-based parser More applications Semantic Parsing
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Acknowledgement
Thank you! Robert Frank, Dan Friedman, Tom McCoy, William Merrill, Alexis Nasr, Dragomir Radev, Owen Rambow, and Pauli Xu Computational Linguistics at Yale (CLAY) lab
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Bangalore, S., Boullier, P ., Nasr, A., Rambow, O., and Sagot, B. (2009). MICA: A Probabilistic Dependency Parser Based on Tree Insertion Grammars. In NAACL HLT 2009 (Short Papers). Chen, D. and Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 740–750, Doha, Qatar. Association for Computational Linguistics. FitzGerald, N., T¨ ackstr¨
- m, O., Ganchev, K., and Das, D.
(2015). Semantic role labeling with neural network factors.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 960–970, Lisbon, Portugal. Association for Computational Linguistics. Friedman, D., Kasai, J., McCoy, R. T., Frank, R., Davis, F ., and Rambow, O. (2017). Linguistically rich vector representations of supertags for tag parsing. In Proceedings of the 13th International Workshop on Tree Adjoining Grammars and Related Formalisms, pages 122–131, Ume˚ a, Sweden. Association for Computational Linguistics. Kasai, J., Frank, R., McCoy, R. T., Rambow, O., and Nasr,
- A. (2017).
TAG Parsing with Neural Networks and Vector Representations of Supertags. In Proceedings of EMNLP. Lien, E. (2014).
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
Using minimal recursion semantics for entailment recognition. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, page 7684, Gothenburg, Sweden. Marcheggiani, D., Frolov, A., and Titov, I. (2017). A simple and accurate syntax-agnostic neural model for dependency-based semantic role labeling. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 411–420, Vancouver, Canada. Association for Computational Linguistics. Marcheggiani, D. and Titov, I. (2017). Encoding sentences with graph convolutional networks for semantic role labeling.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 1507–1516, Copenhagen, Denmark. Association for Computational Linguistics. McDonald, R., Pereira, F ., Ribarov, K., and Hajic, J. (2005). Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 523–530, Vancouver, British Columbia, Canada. Association for Computational Linguistics. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W
In Burges, C. J. C., Bottou, L., Welling, M., Ghahramani, Z., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 26, pages 3111–3119. Curran Associates, Inc. Ng, D., Constable, J. W., Honnibal, M., and Curran, J. R. (2010). SCHWA: PETE using CCG dependencies with the C&C parser. In Proceedings of the 5th International Workshop on Semantic Evaluation, page 313316. Nivre, J., Hall, J., and Nilsson, J. (2006). Maltparser: A data-driven parser-generator for dependency parsing. In LREC. Rimell, L. and Clark, S. (2010). Cambridge: Parser evaluation using textual entailment by grammatical relation comparison.
Background and Motivations Supertagging Models Parsing Models Vector Representations of Supertags Ongoing TAG Parsing W