Straight to the Tree: Constituency Parsing with Neural Syntactic - - PowerPoint PPT Presentation

straight to the tree constituency parsing with neural
SMART_READER_LITE
LIVE PREVIEW

Straight to the Tree: Constituency Parsing with Neural Syntactic - - PowerPoint PPT Presentation

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance Yikang Shen*, Zhouhan Lin*, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio University of Montreal, Microsoft Research, University of Waterloo


slide-1
SLIDE 1

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

Yikang Shen*, Zhouhan Lin*, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio University of Montreal, Microsoft Research, University of Waterloo

slide-2
SLIDE 2

Overview

  • Motivation
  • Syntactic Distance based Parsing Framework
  • Model
  • Experimental Results
slide-3
SLIDE 3

Overview

  • Motivation
  • Syntactic Distance based Parsing Framework
  • Model
  • Experimental Results
slide-4
SLIDE 4

ICLR 2018: Neural Language Modeling by Jointly Learning Syntax and Lexicon

Syntactic Distance Structured Self-Attention LSTM Language Model (61 ppl) Unsupervised Constituency parser (68 UF1)

Supervised Constituency Parsing with Syntactic Distance?

[Shen et al. 2018]

slide-5
SLIDE 5

Chart Neural Parsers 1. High computational cost: Complexity of CYK is O(n^3). 2. Complicated loss function: Transition based Neural Parsers 1. Greedy decoding: Incompleted tree (the shift and reduce steps may not match). 2. Exposure bias The model is never exposed to its own mistakes during training

[Stern et al., 2017; Cross and Huang, 2016]

slide-6
SLIDE 6

Overview

  • Motivation
  • Syntactic Distance based Parsing Framework
  • Model
  • Experimental Results
slide-7
SLIDE 7

Intuitions

Only the order of split (or combination) matters for reconstructing the tree. Can we model the order directly?

slide-8
SLIDE 8

Syntactic distance

For each split point, their syntactic distance should share the same order as the height of related node N1 N2 S1 S2

slide-9
SLIDE 9

Convert to binary tree

[Stern et al., 2017]

slide-10
SLIDE 10

Tree to Distance

The height for each non-terminal node is the maximum height of its children plus 1

slide-11
SLIDE 11

Tree to Distance

S VP S-VP ∅ NP ∅ ∅ NP ∅

slide-12
SLIDE 12

Distance to Tree

Split point for each bracket is the one with maximum distance.

slide-13
SLIDE 13

Distance to Tree

slide-14
SLIDE 14

Overview

  • Motivation
  • Syntactic Distance based Parsing Framework
  • Model
  • Experimental Results
slide-15
SLIDE 15

Framework for inferring the distances and labels

Distances Labels for leaf nodes Labels for non-leaf nodes

slide-16
SLIDE 16

Inferring the distances

Distances

slide-17
SLIDE 17

Inferring the distances

slide-18
SLIDE 18

Pairwise learning-to-rank loss for distances

a variant of hinge loss

slide-19
SLIDE 19

Pairwise learning-to-rank loss for distances

L L

  • 1

1 While di > dj : While di < dj :

slide-20
SLIDE 20

Framework for inferring the distances and labels

Distances Labels for leaf nodes Labels for non-leaf nodes

slide-21
SLIDE 21

Framework for inferring the distances and labels

Labels for leaf nodes Labels for non-leaf nodes

slide-22
SLIDE 22

Inferring the Labels

slide-23
SLIDE 23

Inferring the Labels

slide-24
SLIDE 24

Inferring the Labels

slide-25
SLIDE 25

Putting it together

slide-26
SLIDE 26

Putting it together

slide-27
SLIDE 27

Overview

  • Motivation
  • Syntactic Distance based Parsing Framework
  • Model
  • Experimental Results
slide-28
SLIDE 28

Experiments: Penn Treebank

slide-29
SLIDE 29

Experiments: Chinese Treebank

slide-30
SLIDE 30

Experiments: Detailed statistics in PTB and CTB

slide-31
SLIDE 31

Experiments: Ablation Test

slide-32
SLIDE 32

Experiments: Parsing Speed

slide-33
SLIDE 33

Conclusions and Highlights

  • A novel constituency parsing scheme: predicting tree structure

from a set of real-valued scalars (syntactic distances).

  • Completely free from compounding errors.
  • Strong performance compare to previous models, and
  • Significantly more efficient than previous models
  • Easy deployment: The architecture of model is no more than a stack
  • f standard recurrent and convolutional layers.
slide-34
SLIDE 34

One more thing... Why it works now?

  • The research in rank loss is well-studied in the topic of

learning-to-rank, since 2005 (Burges et al. 2005).

  • Models that are good at learning these syntactic distances are not

widely known until the rediscovery of LSTM in 2013 (Graves 2013).

  • Efficient regularization methods for LSTM didn’t become mature until

2017 (Merity 2017).

slide-35
SLIDE 35

Thank you!

Questions?

Yikang Shen, Zhouhan Lin MILA, Université de Montréal {yikang.shn, lin.zhouhan}@gmail.com

Paper: Code: