Linear Time Constituency Parsing with RNNs and Dynamic Programming - PowerPoint PPT Presentation

Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab

Span Parsing is SOTA in Constituency Parsing • Cross+Huang 2016 introduced Span Parsing • But with greedy decoding. • Stern et al. 2017 had Span Parsing with Exact Search and Global Training • But was too slow: O ( n 3 ) • Can we get the best of both worlds? Cross + Huang 2016 Our Work • Something that is both fast and accurate? Speed New at ACL 2018! Also Span Parsing! Kitaev + Klein 2018 Stern et al. 2017 Joshi et al. 2018 Accuracy 2

Both Fast and Accurate! chart parsing k r o w r u o Baseline Chart Parser (Stern et al. 2017a) 91.79 Our Linear Time Parser 91.97 3

<latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> In this talk, we will discuss: • Linear Time Constituency Parsing using dynamic programming • Going slower in order to go faster: O ( n 3 ) → O ( n 4 ) → O ( n ) • Cube Pruning to speed up Incremental Parsing with Dynamic Programming • From O ( n b 2 ) to O ( n b log b ) • An improved loss function for Loss-Augmented Decoding • 2nd highest accuracy among single systems trained on PTB only O (2 n ) → O ( n 3 ) → O ( n 4 ) O ( nb 2 ) O ( nb log b ) 4

Span Parsing s ( i, j, X ) • Span differences are taken from an encoder   (in our case: a bi-LSTM) s • A span is scored and labeled by a feed-forward network. • The score of a tree is the sum of all the labeled ( f j − f i , b i − b j ) span scores f 0 f 1 f 2 f 3 f 4 f 5 s tree ( t ) = P s ( i, j, X ) ( i,j,X ) ∈ t ⟨ / s ⟩ ⟨ s ⟩ should eat ice cream You 0 4 1 2 3 5 b 0 b 1 b 2 b 3 b 4 b 5 5 Cross + Huang 2016 Stern et al. 2017 Wang + Chang 2016

Incremental Span Parsing Example Action Label Stack 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 6

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP VB NP PP Eat NN NN IN NP ice cream after NN lunch ø 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 7

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NN NP PP Eat ice NN IN NP cream after NN lunch ø ø 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 8

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP PP 2 Shift ø (0, 1) (1, 2) VB NN NN NP IN NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat ice cream NN after NN cream lunch ø ø ø 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 9

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN lunch NP ø ø ø 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 10

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch ø NP ø ø ø 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 11

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP ø ø ø ø 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 12

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NN 4 Reduce NP (0, 1) (1, 3) ice cream after lunch 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) ø NP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 13

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 14

Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) S-VP ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 9 Reduce S-VP (0, 5) Eat   ice   cream   after   lunch   VB NN NN IN NN Cross + Huang 2016 15

How Many Possible Parsing Paths? • 2 actions per state. • O (2 n ) O (2 n ) 16

      Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off.   [( 0 , 2 ), ( 2 , 7 ), ( 7 , 9 )] […, ( 7 , 9 )] becomes [ (0, 3), ( 3 , 7 ), ( 7 , 9 )] • So we can treat these as “temporarily equivalent”, and merge.   17 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

      Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off.   […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 7 , 9 )] […, ( 3 , 7 )] […, ( 0 , 3 )] • This is our new stack representation.   Left Pointers 18 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 2 , 9 )] reduce […, ( 7 , 9 )] reduce […, ( 3 , 7 )] […, ( 3 , 9 )] […, ( 0 , 3 )] Left Pointers […, ( k , i )] […, ( i , j )] Reduce Actions: O(n 3 ) […, ( k , j )] 19 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

                  Dynamic Programming: Merging Stacks • Temporarily merging stacks will make our state space polynomial.   O (2 n ) O ( n 3 ) • And our parsing state is represented by top span ( i , j ). 20

            Becoming Action Synchronous • Shift-Reduce Parsers are traditionally action synchronous. • This makes beam-search straight forward. • We will also do the same   O (2 n ) O ( n 4 ) • But will show that this will slow down our DP (before applying beam-search) 21

Linear Time Constituency Parsing with RNNs and Dynamic Programming - PowerPoint PPT Presentation

Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab Span Parsing is SOTA in Constituency Parsing Cross+Huang 2016 introduced Span

Recursive Neural Networks and Its Applications LU Yangyang luyy11@sei.pku.edu.cn KERE Seminar

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Domain Adaptation for Constituency Parsing Using Partial Annotations Vidur Joshi Matthew Peters

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Span-Based Constituency Parsing with Provably Optimal Dynamic Oracles James Cross and Liang Huang

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein

Syntax: Conjunction Constituency Tests Recursion, Conjunction, and Auxiliary Verbs

The Constituency of Hyperlinks in a Hypertext Corpus . mitcho (Michael Yoshitaka Erlewine)

Advisory Group 5 Subregional Focal Points, 14 Constituency Focal Points Constituency Groups (1)

WORLD BANK GROUP AFRICA GROUP 1 CONSTITUENCY 17 th Statutory Constituency Meeting ANNUAL REPORT

WORLD BANK GROUP AFRICA GROUP 1 CONSTITUENCY 16 th Statutory Constituency Meeting INTERIM REPORT

Constituency/Stakeholder Travel FY 11 FY 12 Update Update on Constituency Travel Support

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Alfred L. Cralle Ice Cream Scooper Jalynn R Invention This is the ice cream scooper that

Modelling NORM in the Modelling NORM in the environment environment EMRAS Project, NORM Working

By Tracey Hecht Complaining and Nighttime! Oquossoc, Maine 1. Brainstorm 2. Outline 3. Play

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Getting to Know Purchasing and Contracting Services Office of the General Counsel Employee vs.

Value cr alue creation eation for Karlskrona Glass Sally Mohebbi, Joel Rydberg,

Presentation Guidelines Design Basics Use a sans serif font for body text. Sans serifs like

VADILAL INDUSTRIES Q2 & H1FY16 Results Presentation Disclaimer Certain statements in this

Linear Time Constituency Parsing with RNNs and Dynamic Programming - PowerPoint PPT Presentation

Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab Span Parsing is SOTA in Constituency Parsing Cross+Huang 2016 introduced Span

Recursive Neural Networks and Its Applications LU Yangyang luyy11@sei.pku.edu.cn KERE Seminar

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Domain Adaptation for Constituency Parsing Using Partial Annotations Vidur Joshi Matthew Peters

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Span-Based Constituency Parsing with Provably Optimal Dynamic Oracles James Cross and Liang Huang

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Policy Gradient as a Proxy for Dynamic Oracles in Constituency Parsing Daniel Fried and Dan Klein

Syntax: Conjunction Constituency Tests Recursion, Conjunction, and Auxiliary Verbs

The Constituency of Hyperlinks in a Hypertext Corpus . mitcho (Michael Yoshitaka Erlewine)

Advisory Group 5 Subregional Focal Points, 14 Constituency Focal Points Constituency Groups (1)

WORLD BANK GROUP AFRICA GROUP 1 CONSTITUENCY 17 th Statutory Constituency Meeting ANNUAL REPORT

WORLD BANK GROUP AFRICA GROUP 1 CONSTITUENCY 16 th Statutory Constituency Meeting INTERIM REPORT

Constituency/Stakeholder Travel FY 11 FY 12 Update Update on Constituency Travel Support

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

Alfred L. Cralle Ice Cream Scooper Jalynn R Invention This is the ice cream scooper that

Modelling NORM in the Modelling NORM in the environment environment EMRAS Project, NORM Working

By Tracey Hecht Complaining and Nighttime! Oquossoc, Maine 1. Brainstorm 2. Outline 3. Play

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Getting to Know Purchasing and Contracting Services Office of the General Counsel Employee vs.

Value cr alue creation eation for Karlskrona Glass Sally Mohebbi, Joel Rydberg,

Presentation Guidelines Design Basics Use a sans serif font for body text. Sans serifs like

VADILAL INDUSTRIES Q2 &amp; H1FY16 Results Presentation Disclaimer Certain statements in this

VADILAL INDUSTRIES Q2 & H1FY16 Results Presentation Disclaimer Certain statements in this