Sequence-to-sequence Models for Cache Transition Systems Xiaochang - - PowerPoint PPT Presentation

sequence to sequence models for cache transition systems
SMART_READER_LITE
LIVE PREVIEW

Sequence-to-sequence Models for Cache Transition Systems Xiaochang - - PowerPoint PPT Presentation

Sequence-to-sequence Models for Cache Transition Systems Xiaochang Peng 1 , Linfeng Song 1 , Daniel Gildea 1 and Giorgio Satta 2 1 2 AMR John wants to go want-01 ARG1 ARG0 go-01 ARG0 boy AMR After its competitor invented the and


slide-1
SLIDE 1

Sequence-to-sequence Models for Cache Transition Systems

Xiaochang Peng1, Linfeng Song1, Daniel Gildea1 and Giorgio Satta2 1 2

slide-2
SLIDE 2

AMR

§ “John wants to go”

want-01 boy go-01 ARG1 ARG0 ARG0

slide-3
SLIDE 3

AMR

and believe-01
  • p1
formulate-01
  • p2
after time person ARG0 capable-41 ARG1 have-org-role-91 ARG0-of company ARG1 CEO ARG2 name name country mod IM
  • p1
name name United States
  • p1
  • p2
person ARG1 innovate-01 ARG2 employ-01 ARG1-of each mod ARG0 ARG0 ARG0 countermeasure ARG1 strategy mod innovate-01 purpose industry prep-in invent-01
  • p1
company ARG0 machine ARG1 compete-01 ARG0-of ARG1 wash-01 ARG0-of load-01 ARG1-of front mod

After its competitor invented the front loading washing machine, the CEO of the American IM company believed that each of its employees had the ability for innovation, and formulated strategic countermeasures for innovation in the industry.

slide-4
SLIDE 4

Transition-based AMR parsing

§ There has been previous work (Sagae and Tsujii; Damonte et al.; Zhou et al.; Ribeyre et al.; Wang et al.) on transition-based graph parsing. § Our work introduces a new data structure “cache” for generating graphs of certain treewidth.

slide-5
SLIDE 5

Introduction to treewidth

I A B D F G J K L R M O E P C Q H S N

Complete graph of N nodes: treewidth N-1

w j s m

treewidth 2 A tree: treewidth 1

slide-6
SLIDE 6

Introduction to treewidth

small tree width ~ 2.8 on average large tree width

and believe-01
  • p1
formulate-01
  • p2
after time person ARG0 capable-41 ARG1 have-org-role-91 ARG0-of company ARG1 CEO ARG2 name name country mod IM
  • p1
name name United States
  • p1
  • p2
person ARG1 innovate-01 ARG2 employ-01 ARG1-of each mod ARG0 ARG0 ARG0 countermeasure ARG1 strategy mod innovate-01 purpose industry prep-in invent-01
  • p1
company ARG0 machine ARG1 compete-01 ARG0-of ARG1 wash-01 ARG0-of load-01 ARG1-of front mod
slide-7
SLIDE 7

Tree decomposition

ALB LBR BRD RDM DMF MFO FOG KAL DMP OGE IKA MPH GEJ PHC HCS CSQ SQN

I A B D F G J K L R M O E P C Q H S N

graph tree decomposition

slide-8
SLIDE 8

Cache transition system

§ Configuration c = ($, η, ', ()

§ Stack $: place for temporarily storing concepts § Cache *: working zone for making edges, fixed size corresponding to the treewidth. § Buffer ': unprocessed concepts § E: set of already-built edges

slide-9
SLIDE 9

Cache transition system

§ Actions

§ SHIFT PUSH(i): shift one concept from buffer to right- most position of cache, then select one concept (index i) from cache to stack. stack cache

$ $ $

buffer

PER want-01 go-01

stack

($,1)

cache

$ $ PER

buffer

want-01 go-01

SHIFT PUSH(1)

slide-10
SLIDE 10

Cache transition system

§ Actions

§ POP: pop the top from stack and put back to cache, then drop the right-most item from cache. stack

($,1)

cache

$ $ PER

buffer

want-01 go-01

stack cache

$ $ $

buffer

want-01 go-01

slide-11
SLIDE 11

Cache transition system

§ Actions

§ Arc(i, l, d): make an arc (with direction d, label l) between the right-most node to node i. Arc(i,-,-) represents no edge between them. stack

($,1), ($,1)

cache

$ PER want-01

buffer

go-01

stack cache

$ PER want-01

buffer

go-01

Arc(1,-,-), Arc(2,L,ARG0)

slide-12
SLIDE 12

Example of cache transition

$ $ stack cache buffer PER want-01 go-01 $ Action taken: Initialization

slide-13
SLIDE 13

Example of cache transition

PER $ stack cache buffer want-01 go-01 Action taken: SHIFT, PUSH(1) (1, $) PER $ Hypothesis:

slide-14
SLIDE 14

Example of cache transition

PER $ stack cache buffer want-01 go-01 Action taken: SHIFT, PUSH(1) (1, $) PER $ Hypothesis:

Action taken: Arc(1, -, -), Arc(2, -, -)

slide-15
SLIDE 15

Example of cache transition

PER stack cache buffer want-01 go-01 Action taken: SHIFT, PUSH(1) (1, $) (1, $) PER want-01 $ Hypothesis:

slide-16
SLIDE 16

Example of cache transition

PER stack cache buffer want-01 go-01 Action taken: Arc(1, -, -), Arc(2, L, ARG0) (1, $) (1, $) PER want-01 $ ARG0 Hypothesis: ARG0

slide-17
SLIDE 17

Example of cache transition

PER stack cache buffer want-01 go-01 Action taken: SHIFT, PUSH(1) (1, $) (1, $) (1, $) PER want-01 ARG0 go-01 Hypothesis:

slide-18
SLIDE 18

Example of cache transition

Action taken: Arc(1, L, ARG0), Arc(2, R, ARG1) PER stack cache buffer want-01 go-01 (1, $) (1, $) (1, $) PER want-01 ARG0 go-01 ARG0 ARG1 Hypothesis: ARG0 ARG1

slide-19
SLIDE 19

Example of cache transition

$ $ stack cache buffer $ Action taken: POP POP POP PER want-01 ARG0 go-01 ARG0 ARG1 Hypothesis:

slide-20
SLIDE 20

Sequence to sequence models for cache transition system

§ Concepts are generated from input sentences by another classifier in the preprocessing step. § Separate encoders are adopted for input sentences and sequences of concepts, respectively. § One decoder for generating transition actions.

slide-21
SLIDE 21

Seq2seq (soft-attention+features)

John wants to go

Per want-01 go-01

... ...

Input sequence Concept sequence

SHIFT PushIndex(1) SHIFT

slide-22
SLIDE 22

Seq2seq (hard-attention+features)

John wants to go

Per want-01 go-01

...

Input sequence Concept sequence

ARC L-ARG0

...

NOARC SHIFT PushIndex(1)

slide-23
SLIDE 23

Experiments

§ Dataset: LDC2015E86

§ 16,833(train)/1,368(dev)/1,371(test)

§ Evaluation: Smatch (Cai et al., 2013)

slide-24
SLIDE 24

AMR Coverage with different cache sizes

1000 2000 3000 4000 5000 6000 1 2 3 4 5 6 7 >=8

91% 97% 99%

slide-25
SLIDE 25

Development results

Model P R F Soft 0.55 0.51 0.53 Soft+feats 0.69 0.63 0.66 Hard+feats 0.70 0.64 0.67

cache size

P R F 4 0.69 0.63 0.66 5 0.70 0.64 0.67 6 0.69 0.64 0.66 Impact of various components Impact of cache size

slide-26
SLIDE 26

Main results

Model P R F Buys and Blunsom (2017)

  • 0.60

Konstas et al. (2017) 0.60 0.65 0.62 Ballesteros and Al-Onaizan (2017)

  • 0.64

Damonte et al. (2016)

  • 0.64

Wang et al. (2015a) 0.70 0.63 0.66 Flanigan et al. (2016) 0.70 0.65 0.67 Wang and Xue (2017) 0.72 0.65 0.68 Lyu and Titov (2018)

  • 0.74

Soft+feats 0.68 0.63 0.65 Hard+feats 0.69 0.64 0.66

slide-27
SLIDE 27

Accuracy on reentrancies

Model P R F Peng et al., (2018) 0.44 0.28 0.34 Damonte et al., (2017)

  • 0.41

JAMR 0.47 0.38 0.42 Hard+feats (ours) 0.58 0.34 0.43

slide-28
SLIDE 28

Reentrancy example

i

  • desire-01

live-01 any city ARG0 ARG0 polarity ARG1 location ARG0 Our hard attention output: Sentence: I have no desire to live in any city . JAMR output: Peng et al. (2018) output: mod i

  • desire-01

live-01 any city polarity ARG1 location mod i

  • desire-01

live-01 any city ARG0 polarity ARG1 location mod

slide-29
SLIDE 29

Conclusion

§ Cache transition system based on a mathematical sound formalism for parsing to graphs. § The cache transition process can be well-modeled by sequence-to-sequence models.

§ Features from transition states. § Monotonic hard attention.

slide-30
SLIDE 30

Thank you for listening! Questions