sequence to sequence models for cache transition systems
play

Sequence-to-sequence Models for Cache Transition Systems Xiaochang - PowerPoint PPT Presentation

Sequence-to-sequence Models for Cache Transition Systems Xiaochang Peng 1 , Linfeng Song 1 , Daniel Gildea 1 and Giorgio Satta 2 1 2 AMR John wants to go want-01 ARG1 ARG0 go-01 ARG0 boy AMR After its competitor invented the and


  1. Sequence-to-sequence Models for Cache Transition Systems Xiaochang Peng 1 , Linfeng Song 1 , Daniel Gildea 1 and Giorgio Satta 2 1 2

  2. AMR § “John wants to go” want-01 ARG1 ARG0 go-01 ARG0 boy

  3. AMR After its competitor invented the and op1 op2 time front loading washing machine, believe-01 formulate-01 after ARG1 ARG0 ARG0 ARG1 op1 the CEO of the American IM capable-41 person countermeasure invent-01 ARG2 ARG0-of purpose mod ARG1 ARG0 company believed that each of ARG1 innovate-01 have-org-role-91 innovate-01 strategy machine its employees had the ability for ARG0 ARG2 prep-in ARG1-of ARG0-of company person CEO industry load-01 wash-01 ARG0-of innovation, and formulated ARG1-of mod ARG1 mod compete-01 strategic countermeasures for employ-01 each front ARG1 ARG0 innovation in the industry. company name mod name country op1 name IM name op1 op2 United States

  4. Transition-based AMR parsing § There has been previous work (Sagae and Tsujii; Damonte et al.; Zhou et al.; Ribeyre et al.; Wang et al.) on transition-based graph parsing. § Our work introduces a new data structure “cache” for generating graphs of certain treewidth .

  5. Introduction to treewidth I B D F G J A w K L R M O E P H j s C S Q m N Complete graph of N A tree: treewidth 1 treewidth 2 nodes: treewidth N-1

  6. Introduction to treewidth and op1 op2 time believe-01 formulate-01 after ARG1 ARG0 ARG0 ARG1 op1 capable-41 person countermeasure invent-01 ARG2 ARG0-of purpose mod ARG1 ARG0 ARG1 innovate-01 have-org-role-91 innovate-01 strategy machine ARG0 ARG2 prep-in ARG1-of ARG0-of company person CEO industry load-01 wash-01 ARG0-of ARG1-of mod ARG1 mod compete-01 employ-01 each front ARG1 ARG0 company name mod name country op1 name IM name op1 op2 United States small tree width large tree width ~ 2.8 on average

  7. Tree decomposition ALB LBR BRD RDM DMF MFO FOG I B D F G J A K L R M O E KAL DMP OGE P H C GEJ IKA MPH S Q PHC N CSQ SQN HCS graph tree decomposition

  8. Cache transition system § Configuration c = ($, η, ', () § Stack $ : place for temporarily storing concepts § Cache * : working zone for making edges, fixed size corresponding to the treewidth. § Buffer ' : unprocessed concepts § E: set of already-built edges

  9. Cache transition system § Actions § SHIFT PUSH(i): shift one concept from buffer to right- most position of cache, then select one concept (index i) from cache to stack. stack cache buffer $ $ $ PER want-01 go-01 SHIFT PUSH(1) stack cache buffer ($,1) $ $ PER want-01 go-01

  10. Cache transition system § Actions § POP: pop the top from stack and put back to cache, then drop the right-most item from cache. stack cache buffer ($,1) $ $ PER want-01 go-01 stack cache buffer $ $ $ want-01 go-01

  11. Cache transition system § Actions § Arc(i, l, d): make an arc (with direction d, label l) between the right-most node to node i. Arc(i,-,-) represents no edge between them. stack cache buffer ($,1), ($,1) $ PER want-01 go-01 Arc(1,-,-), Arc(2,L,ARG0) stack cache buffer $ PER want-01 go-01

  12. Example of cache transition Action taken: Initialization stack cache bu ff er $ $ $ PER want-01 go-01

  13. Example of cache transition Action taken: SHIFT, PUSH(1) stack cache bu ff er (1, $) $ $ PER want-01 go-01 Hypothesis: PER

  14. Example of cache transition Action taken: SHIFT, PUSH(1) Action taken: Arc(1, -, -), Arc(2, -, -) stack cache bu ff er (1, $) $ $ PER want-01 go-01 Hypothesis: PER

  15. Example of cache transition Action taken: SHIFT, PUSH(1) stack cache bu ff er (1, $) (1, $) $ PER want-01 go-01 Hypothesis: PER want-01

  16. Example of cache transition Action taken: Arc(1, -, -), Arc(2, L, ARG0) stack cache ARG0 bu ff er (1, $) (1, $) $ PER want-01 go-01 ARG0 Hypothesis: PER want-01

  17. Example of cache transition Action taken: SHIFT, PUSH(1) stack cache bu ff er (1, $) (1, $) (1, $) PER want-01 go-01 ARG0 Hypothesis: PER want-01 go-01

  18. Example of cache transition Action taken: Arc(1, L, ARG0), Arc(2, R, ARG1) ARG0 stack cache ARG1 bu ff er (1, $) (1, $) (1, $) PER want-01 go-01 ARG0 ARG0 ARG1 Hypothesis: PER want-01 go-01

  19. Example of cache transition Action taken: POP POP POP stack cache bu ff er $ $ $ ARG0 ARG0 ARG1 Hypothesis: PER want-01 go-01

  20. Sequence to sequence models for cache transition system § Concepts are generated from input sentences by another classifier in the preprocessing step. § Separate encoders are adopted for input sentences and sequences of concepts, respectively. § One decoder for generating transition actions.

  21. Seq2seq (soft-attention+features) SHIFT PushIndex(1) ... ... SHIFT John wants to go Per want-01 go-01 Input sequence Concept sequence

  22. Seq2seq (hard-attention+features) NOARC ARC L-ARG0 SHIFT PushIndex(1) ... ... John wants to go Per want-01 go-01 Input sequence Concept sequence

  23. Experiments § Dataset: LDC2015E86 § 16,833(train)/1,368(dev)/1,371(test) § Evaluation: Smatch (Cai et al., 2013)

  24. AMR Coverage with different cache sizes 6000 99% 5000 97% 4000 91% 3000 2000 1000 0 0 1 2 3 4 5 6 7 >=8

  25. Development results Model P R F cache size P R F Soft 0.55 0.51 0.53 4 0.69 0.63 0.66 Soft+feats 0.69 0.63 0.66 5 0.70 0.64 0.67 Hard+feats 0.70 0.64 0.67 6 0.69 0.64 0.66 Impact of various components Impact of cache size

  26. Main results Model P R F Buys and Blunsom (2017) -- -- 0.60 Konstas et al. (2017) 0.60 0.65 0.62 Ballesteros and Al-Onaizan (2017) -- -- 0.64 Damonte et al. (2016) -- -- 0.64 Wang et al. (2015a) 0.70 0.63 0.66 Flanigan et al. (2016) 0.70 0.65 0.67 Wang and Xue (2017) 0.72 0.65 0.68 Lyu and Titov (2018) -- -- 0.74 Soft+feats 0.68 0.63 0.65 Hard+feats 0.69 0.64 0.66

  27. Accuracy on reentrancies Model P R F Peng et al., (2018) 0.44 0.28 0.34 Damonte et al., (2017) -- -- 0.41 JAMR 0.47 0.38 0.42 Hard+feats (ours) 0.58 0.34 0.43

  28. Reentrancy example Sentence: I have no desire to live in any city . ARG0 location polarity JAMR output: mod ARG1 i - desire-01 live-01 any city location Peng et al. (2018) output: mod polarity ARG1 i - desire-01 live-01 any city ARG0 location Our hard attention output: ARG0 mod polarity ARG1 i - desire-01 live-01 any city ARG0

  29. Conclusion § Cache transition system based on a mathematical sound formalism for parsing to graphs. § The cache transition process can be well-modeled by sequence-to-sequence models. § Features from transition states. § Monotonic hard attention.

  30. Thank you for listening! Questions

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend