linear time constituency parsing with rnns and dynamic
play

Linear Time Constituency Parsing with RNNs and Dynamic Programming - PowerPoint PPT Presentation

Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab Span Parsing is SOTA in Constituency Parsing Cross+Huang 2016 introduced Span


  1. Linear Time Constituency Parsing with RNNs and Dynamic Programming Juneki Hong 1 Liang Huang 1,2 1 Oregon State University 2 Baidu Research Silicon Valley AI Lab

  2. Span Parsing is SOTA in Constituency Parsing • Cross+Huang 2016 introduced Span Parsing • But with greedy decoding. • Stern et al. 2017 had Span Parsing with Exact Search and Global Training • But was too slow: O ( n 3 ) • Can we get the best of both worlds? Cross + Huang 2016 Our Work • Something that is both fast and accurate? Speed New at ACL 2018! Also Span Parsing! Kitaev + Klein 2018 Stern et al. 2017 Joshi et al. 2018 Accuracy 2

  3. Both Fast and Accurate! chart parsing k r o w r u o Baseline Chart Parser (Stern et al. 2017a) 91.79 Our Linear Time Parser 91.97 3

  4. <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> <latexit sha1_base64="J5yt+Ykz1sg9to8zcqinVqW85aE=">ACPnicbZBNTwIxEIa7+IX4hXr0khM4EJ2kUSPRC/ewEQ+ElhIt9tdGrtpu1qCOGXefE3ePoxYPGePVogT0oOEmTd56ZyXReL2ZUadt+sTJr6xubW9nt3M7u3v5B/vCopUQiMWliwYTseEgRjlpaqoZ6cSoMhjpO2Nrmf19j2Rigp+p8cxcSMUchpQjLRBg3yzXqz0eQn2JA2HGkpHmC9yPvnq6hqECPIV1rMcuj1K8vEpCKEXmQL9hlex5wVTipKIA0GoP8c8XOIkI15ghpbqOHWt3gqSmJFprpcoEiM8QiHpGslRJQ7mZ8/hWeG+DAQ0jyu4Zz+npigSKlx5JnOCOmhWq7N4H+1bqKDS3dCeZxowvFiUZAwaM6deQl9KgnWbGwEwpKav0I8RBJhbRzPGROc5ZNXRatSduyc1st1K5SO7LgBJyCInDABaiBG9ATYDBI3gF7+DerLerE/ra9GasdKZY/AnrO8fb6arpA=</latexit> In this talk, we will discuss: • Linear Time Constituency Parsing using dynamic programming • Going slower in order to go faster: O ( n 3 ) → O ( n 4 ) → O ( n ) • Cube Pruning to speed up Incremental Parsing with Dynamic Programming • From O ( n b 2 ) to O ( n b log b ) • An improved loss function for Loss-Augmented Decoding • 2nd highest accuracy among single systems trained on PTB only O (2 n ) → O ( n 3 ) → O ( n 4 ) O ( nb 2 ) O ( nb log b ) 4

  5. Span Parsing s ( i, j, X ) • Span differences are taken from an encoder 
 (in our case: a bi-LSTM) s • A span is scored and labeled by a feed-forward network. • The score of a tree is the sum of all the labeled ( f j − f i , b i − b j ) span scores f 0 f 1 f 2 f 3 f 4 f 5 s tree ( t ) = P s ( i, j, X ) ( i,j,X ) ∈ t ⟨ / s ⟩ ⟨ s ⟩ should eat ice cream You 0 4 1 2 3 5 b 0 b 1 b 2 b 3 b 4 b 5 5 Cross + Huang 2016 Stern et al. 2017 Wang + Chang 2016

  6. Incremental Span Parsing Example Action Label Stack 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 6

  7. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP VB NP PP Eat NN NN IN NP ice cream after NN lunch ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 7

  8. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NN NP PP Eat ice NN IN NP cream after NN lunch ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 8

  9. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP PP 2 Shift ø (0, 1) (1, 2) VB NN NN NP IN NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat ice cream NN after NN cream lunch ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 9

  10. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN lunch NP ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 10

  11. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch ø NP ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 11

  12. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP ø ø ø ø 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 12

  13. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP NP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NN 4 Reduce NP (0, 1) (1, 3) ice cream after lunch 5 Reduce ø (0, 3) 6 Shift ø (0, 3) (3, 4) ø NP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 13

  14. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) temp ∅ 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 14

  15. Incremental Span Parsing Example S Action Label Stack 1 Shift ø (0, 1) VP 2 Shift ø (0, 1) (1, 2) VB NP PP 3 Shift ø (0, 1) (1, 2) (2, 3) Eat NN NN IN NP 4 Reduce NP (0, 1) (1, 3) S-VP ice cream after NN 5 Reduce ø (0, 3) lunch 6 Shift ø (0, 3) (3, 4) ø NP PP 7 Shift NP (0, 3) (3, 4) (4, 5) ø ø ø ø NP 8 Reduce PP (0, 3) (3, 5) 0 1 2 3 4 5 9 Reduce S-VP (0, 5) Eat 
 ice 
 cream 
 after 
 lunch 
 VB NN NN IN NN Cross + Huang 2016 15

  16. How Many Possible Parsing Paths? • 2 actions per state. • O (2 n ) O (2 n ) 16

  17. 
 
 
 Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. 
 [( 0 , 2 ), ( 2 , 7 ), ( 7 , 9 )] […, ( 7 , 9 )] becomes [ (0, 3), ( 3 , 7 ), ( 7 , 9 )] • So we can treat these as “temporarily equivalent”, and merge. 
 17 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

  18. 
 
 
 Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. 
 […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 7 , 9 )] […, ( 3 , 7 )] […, ( 0 , 3 )] • This is our new stack representation. 
 Left Pointers 18 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

  19. Equivalent Stacks? • Observe that all stacks that end with ( i , j ) will be treated the same! • …Until ( i , j ) is popped off. […, ( 0 , 2 )] […, ( 2 , 7 )] […, ( 2 , 9 )] reduce […, ( 7 , 9 )] reduce […, ( 3 , 7 )] […, ( 3 , 9 )] […, ( 0 , 3 )] Left Pointers […, ( k , i )] […, ( i , j )] Reduce Actions: O(n 3 ) […, ( k , j )] 19 Graph-Structured Stack (Tomita 1988; Huang + Sagae 2010)

  20. 
 
 
 
 
 
 
 
 
 Dynamic Programming: Merging Stacks • Temporarily merging stacks will make our state space polynomial. 
 O (2 n ) O ( n 3 ) • And our parsing state is represented by top span ( i , j ). 20

  21. 
 
 
 
 
 
 Becoming Action Synchronous • Shift-Reduce Parsers are traditionally action synchronous. • This makes beam-search straight forward. • We will also do the same 
 O (2 n ) O ( n 4 ) • But will show that this will slow down our DP (before applying beam-search) 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend