Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - - PowerPoint PPT Presentation

neural amr sequence to sequence models for parsing and
SMART_READER_LITE
LIVE PREVIEW

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - - PowerPoint PPT Presentation

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation annis Konstas joint work with Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer AMR graph Generate from AMR graph text Decoder Encoder text Attention AMR


slide-1
SLIDE 1

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

Ιοannis Konstas

joint work with Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer

slide-2
SLIDE 2

AMR graph text

Generate from AMR

Attention

Encoder Decoder

graph text

slide-3
SLIDE 3

AMR graph text

Generate from AMR Parse to AMR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

slide-4
SLIDE 4

AMR graph text

Generate from AMR Parse to AMR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

Paired Training

slide-5
SLIDE 5

AMR graph text

Generate from AMR Parse to AMR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

S O T A

Paired Training

slide-6
SLIDE 6

Abstract Meaning Representation

(Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I have known a planet that was inhabited by a lazy man.

  • Rooted Directed Acyclic Graph
  • Nodes: concepts (nouns, verbs, named entities, etc)
  • Edges: Semantic Role Labels
slide-7
SLIDE 7

Abstract Meaning Representation

(Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I have known a planet that was inhabited by a lazy man.

know

  • Rooted Directed Acyclic Graph
  • Nodes: concepts (nouns, verbs, named entities, etc)
  • Edges: Semantic Role Labels
slide-8
SLIDE 8

Abstract Meaning Representation

(Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I have known a planet that was inhabited by a lazy man.

know I

  • Rooted Directed Acyclic Graph
  • Nodes: concepts (nouns, verbs, named entities, etc)
  • Edges: Semantic Role Labels
slide-9
SLIDE 9

Abstract Meaning Representation

(Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I have known a planet that was inhabited by a lazy man.

know I planet

  • Rooted Directed Acyclic Graph
  • Nodes: concepts (nouns, verbs, named entities, etc)
  • Edges: Semantic Role Labels
slide-10
SLIDE 10

Abstract Meaning Representation

(Banarescu et al., 2013)

Input: AMR Graph

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I have known a planet that was inhabited by a lazy man. I knew a planet that was inhabited by a lazy man. I know a planet. It is inhabited by a lazy man.

know I planet

Generate from AMR

  • Rooted Directed Acyclic Graph
  • Nodes: concepts (nouns, verbs, named entities, etc)
  • Edges: Semantic Role Labels
slide-11
SLIDE 11

Abstract Meaning Representation

(Banarescu et al., 2013)

know I planet lazy ARG0 ARG1 inhabit man ARG1-of ARG0 mod

I have known a planet that was inhabited by a lazy man.

know I planet

Parse to AMR

Input: Text

  • Rooted Directed Acyclic Graph
  • Nodes: concepts (nouns, verbs, named entities, etc)
  • Edges: Semantic Role Labels
slide-12
SLIDE 12

Applications

  • Text Summarization (Liu et al., 2015)
slide-13
SLIDE 13

Applications

  • Text Summarization (Liu et al., 2015)

sentences:

slide-14
SLIDE 14

Applications

  • Text Summarization (Liu et al., 2015)

sentences: Parse sentence AMR graphs:

slide-15
SLIDE 15

Applications

  • Text Summarization (Liu et al., 2015)

summary AMR graph: sentences: Parse sentence AMR graphs:

slide-16
SLIDE 16

Applications

  • Text Summarization (Liu et al., 2015)

summary AMR graph: sentences: Parse sentence AMR graphs: Generate summary:

slide-17
SLIDE 17

Applications

  • Text Summarization (Liu et al., 2015)

summary AMR graph: sentences: Parse sentence AMR graphs: The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

Generate summary:

slide-18
SLIDE 18

Applications

  • Text Summarization (Liu et al., 2015)

summary AMR graph: sentences: Parse sentence AMR graphs: The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

tell

child

lie that ARG0 ARG1 ARG0-of

  • Machine Translation (Jones et al., 2012)

Generate summary: Parse AMR graph:

slide-19
SLIDE 19

Applications

  • Text Summarization (Liu et al., 2015)

summary AMR graph: sentences: Parse sentence AMR graphs: The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

Graph-to-graph transformation:

tell

child

lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of

  • Machine Translation (Jones et al., 2012)

Generate summary: Parse AMR graph:

slide-20
SLIDE 20

Applications

  • Text Summarization (Liu et al., 2015)

summary AMR graph: sentences: Parse sentence AMR graphs: The children told that lie

Source

その うそ は ⼦孑供 たち が つい た sono uso-wa kodomo-tachi-ga tsui-ta that lie-TOP child-and others-NOM breathe out-PAST

Target

Graph-to-graph transformation:

tell

child

lie that ARG0 ARG1 ARG0-of tsuku kodomo tachi sono ARG1 ARG0 ARG0-of

  • Machine Translation (Jones et al., 2012)

Generate summary: Parse AMR graph: Generate translation:

slide-21
SLIDE 21

Existing Approaches

  • MT-based
  • Flanigan et al. 2016, Pourdamaghani and Knight 2016, Song et al. 2016
  • Grammar-based
  • Lampouras and Vlachos 2017, Mille et al. 2017

Generate from AMR

slide-22
SLIDE 22

Existing Approaches

  • MT-based
  • Flanigan et al. 2016, Pourdamaghani and Knight 2016, Song et al. 2016
  • Grammar-based
  • Lampouras and Vlachos 2017, Mille et al. 2017

Parse to AMR

  • Alignment-based
  • Flanigan et al. 2014, 2017 (JAMR)
  • Grammar-based
  • Wang et al. 2016 (CAMR), Pust et al. 2015, Artzi et al. 2015, Damonte et al. 2017,

Goodman et al. 2016, Puzikov et al. 2016, Brandt et al. 2017, Nguyen et al. 2017

  • Neural-based
  • Barzdins and Gosko 2016, Peng et al. 2017, Noord and Bos 2017, Buys and Blunsom

2017

Generate from AMR

slide-23
SLIDE 23

Overview

  • Sequence-to-sequence architecture
  • End-to-end model w/o intermediate representations
  • Linearisation of AMR graph to string
  • Pre-process
  • Paired Training
  • Scalable data augmentation algorithm
slide-24
SLIDE 24

Sequence-to-sequence model

Encoder

input

slide-25
SLIDE 25

Sequence-to-sequence model

Encoder

input

know ARG0 I ARG1 (

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

slide-26
SLIDE 26

Sequence-to-sequence model

Encoder Decoder

input

  • utput

I The A … know knew planet … a planet man …

inhabit inhabited was …

ˆ w = argmax

w

Y

i

p

  • wi|w<i, h(s)

know ARG0 I ARG1 (

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

slide-27
SLIDE 27

Sequence-to-sequence model

Attention Encoder Decoder

input

  • utput

know ARG0 I ARG1 ( planet ARG1-of inhabit

<s> I know the planet

  • f

I The A … know knew planet … a planet man …

inhabit inhabited was …

ˆ w = argmax

w

Y

i

p

  • wi|w<i, h(s)

know ARG0 I ARG1 (

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

slide-28
SLIDE 28

Linearization

Graph —> Depth First Search (Human-authored annotation)

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

slide-29
SLIDE 29

Linearization

Graph —> Depth First Search (Human-authored annotation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

slide-30
SLIDE 30

Linearization

Graph —> Depth First Search (Human-authored annotation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

person

slide-31
SLIDE 31

Linearization

Graph —> Depth First Search (Human-authored annotation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

person

slide-32
SLIDE 32

Linearization

Graph —> Depth First Search (Human-authored annotation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

person have-role country “United States”

  • fficial
slide-33
SLIDE 33

Linearization

Graph —> Depth First Search (Human-authored annotation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

person have-role country “United States”

  • fficial
slide-34
SLIDE 34

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

slide-35
SLIDE 35

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States”

slide-36
SLIDE 36

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States” 2002 1

slide-37
SLIDE 37

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States” 2002 1 “New York” city

slide-38
SLIDE 38

Pre-processing

Linearization —> Anonymization

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

hold person meet group ARG0 ARG1 person expert ARG1-of have-role country “United States”

  • fficial

date-entity city “New York” 2002 1 time location name ARG1 name ARG2-of ARG0-of ARG2 year month ARG0

US officials held an expert group meeting in January 2002 in New York .

country “United States” 2002 1 “New York” city

loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

slide-39
SLIDE 39

Experimental Setup

AMR LDC2015E86 (SemEval-2016 Task 8)

  • Hand annotated MR graphs: newswire, forums
  • ~16k training / 1k development / 1k test pairs

Train

  • Optimize cross-entropy loss

Evaluation

  • BLEU n-gram precision (Generation)

(Papineni et al., 2002)

  • SMATCH score (Parsing)

(Cai and Knight, 2013)

slide-40
SLIDE 40

Experiments

  • Vanilla experiment
  • Limited Language Model Capacity
  • Paired Training
  • Data augmentation algorithm
slide-41
SLIDE 41

First Attempt (Generation)

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-42
SLIDE 42

First Attempt (Generation)

BLEU 5.8 11.6 17.4 23.2 29

TreeToStr TSP PBMT NeuralAMR

26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-43
SLIDE 43

First Attempt (Generation)

BLEU 5.8 11.6 17.4 23.2 29

TreeToStr TSP PBMT NeuralAMR

22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-44
SLIDE 44

First Attempt (Generation)

BLEU 5.8 11.6 17.4 23.2 29

TreeToStr TSP PBMT NeuralAMR

22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

All systems use a Language Model trained on a very large corpus. We will emulate via data augmentation.

(Sennrich et al., ACL 2016)

slide-45
SLIDE 45

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

44.26% 74.85%

slide-46
SLIDE 46

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition

44.26% 74.85%

slide-47
SLIDE 47

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition
  • Coverage

44.26% 74.85%

slide-48
SLIDE 48

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition
  • Coverage

a) Sparsity

Tokens 4500 9000 13500 18000

Total OOV@1 OOV@5 44.26% 74.85%

slide-49
SLIDE 49

What went wrong?

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . United States officials held held a meeting in January 2002 .

Reference Prediction

  • Repetition
  • Coverage

a) Sparsity b) Avg sent length: 20 words c) Limited Language Modeling capacity

Tokens 4500 9000 13500 18000

Total OOV@1 OOV@5 44.26% 74.85%

slide-50
SLIDE 50

Data Augmentation

Original Dataset: ~16k graph-sentence pairs

slide-51
SLIDE 51

Data Augmentation

Original Dataset: ~16k graph-sentence pairs

Gigaword: ~183M sentences *only*

slide-52
SLIDE 52

Data Augmentation

Original Dataset: ~16k graph-sentence pairs

Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap

% 20 40 60 80 OOV@1 OOV@5

Original Giga-200k Giga-2M Giga-20M

slide-53
SLIDE 53

Data Augmentation

graph text Generate from AMR

Attention

Encoder Decoder

graph text

slide-54
SLIDE 54

Data Augmentation

graph text Generate from AMR Parse to AMR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

slide-55
SLIDE 55

Data Augmentation

graph text Generate from AMR Parse to AMR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

slide-56
SLIDE 56

Data Augmentation

graph text Generate from AMR Parse to AMR

Attention

Encoder Decoder

graph text

Attention

Encoder Decoder

text graph

Re-train

slide-57
SLIDE 57

Semi-supervised Learning

  • Self-training
  • McClosky et al. 2006
  • Co-training
  • Yarowski 1995, Blum and Mitchell 1998, Sarkar 2001
  • Sogaard and Rishoj, 2010
slide-58
SLIDE 58

Paired Training

slide-59
SLIDE 59

Paired Training

Train AMR Parser P on Original Dataset

( , )

slide-60
SLIDE 60

Paired Training

for i = 0 … N

Train AMR Parser P on Original Dataset

( , )

slide-61
SLIDE 61

Paired Training

for i = 0 … N

Si =Sample k 10i sentences from Gigaword

Train AMR Parser P on Original Dataset

( , )

slide-62
SLIDE 62

Paired Training

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Train AMR Parser P on Original Dataset

( , )

slide-63
SLIDE 63

Paired Training

Self-train Parser

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Re-train AMR Parser P on Si Train AMR Parser P on Original Dataset

( , )

slide-64
SLIDE 64

Paired Training

Self-train Parser

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Re-train AMR Parser P on Si Train AMR Parser P on Original Dataset

( , )

slide-65
SLIDE 65

Paired Training

Self-train Parser

for i = 0 … N

Parse Si sentences with P

Si =Sample k 10i sentences from Gigaword

Re-train AMR Parser P on Si Train AMR Parser P on Original Dataset

( , )

Train Generator G on SN

( , )

slide-66
SLIDE 66

Training AMR Parser

Train P on Original Dataset

slide-67
SLIDE 67

Training AMR Parser

Train P on Original Dataset

slide-68
SLIDE 68

Training AMR Parser

Sample S1=200k sentences from Gigaword Train P on Original Dataset

200k

slide-69
SLIDE 69

Training AMR Parser

Sample S1=200k sentences from Gigaword Parse S1 with P Train P on Original Dataset

200k 200k

( , )

slide-70
SLIDE 70

Training AMR Parser

Sample S1=200k sentences from Gigaword Train P on S1=200k Parse S1 with P Train P on Original Dataset

200k 200k 200k

( , )

slide-71
SLIDE 71

Training AMR Parser

Sample S1=200k sentences from Gigaword Train P on S1=200k Fine-tune P on Original Dataset Parse S1 with P Train P on Original Dataset

200k 200k

Fine-tune: init parameters from previous step and train on Original Dataset

200k

( , )

slide-72
SLIDE 72

Training AMR Parser

Train P on S2=2M Fine-tune P on Original Dataset Sample S2=2M sentences from Gigaword Parse S2 with P

200k 200k 200k

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-73
SLIDE 73

Training AMR Parser

Train P on S2=2M Fine-tune P on Original Dataset Sample S2=2M sentences from Gigaword Parse S2 with P

200k 2M 2M 2M

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-74
SLIDE 74

Training AMR Parser

Train P on S3=20M Fine-tune P on Original Dataset Sample S3=20M sentences from Gigaword Parse S3 with P

2M 2M 2M

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

slide-75
SLIDE 75

Training AMR Parser

Train P on S3=20M Fine-tune P on Original Dataset Sample S3=20M sentences from Gigaword Parse S3 with P

2M

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

20M 20M 20M

slide-76
SLIDE 76

Training AMR Parser

Train P on S3=20M Fine-tune P on Original Dataset Sample S3=20M sentences from Gigaword Parse S3 with P

2M

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

20M 20M 20M

slide-77
SLIDE 77

Training AMR Generator

Train G on S4=20M Fine-tune G on Original Dataset Sample S4=20M sentences from Gigaword Parse S4 with P

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

2M 20M 20M 20M

slide-78
SLIDE 78

Training AMR Generator

Train G on S4=20M Fine-tune G on Original Dataset Sample S4=20M sentences from Gigaword Parse S4 with P

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

20M 20M 20M G G 20M

slide-79
SLIDE 79

Training AMR Generator

Train G on S4=20M Fine-tune G on Original Dataset Sample S4=20M sentences from Gigaword Parse S4 with P

Fine-tune: init parameters from previous step and train on Original Dataset

( , )

20M 20M 20M G G 20M

slide-80
SLIDE 80

Final Results (Generation)

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-81
SLIDE 81

Final Results (Generation)

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M

22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-82
SLIDE 82

Final Results (Generation)

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M

27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-83
SLIDE 83

Final Results (Generation)

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M

32.3 27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-84
SLIDE 84

Final Results (Generation)

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M

33.8 32.3 27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-85
SLIDE 85

Final Results (Generation)

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M

33.8 32.3 27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-86
SLIDE 86

Final Results (Generation)

BLEU 7 14 21 28 35

TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M

33.8 32.3 27.4 22 26.9 22.4 23

TreeToStr: Flanigan et al, NAACL 2016 TSP: Song et al, EMNLP 2016 PBMT: Pourdamaghani and Knight, INLG 2016

slide-87
SLIDE 87

Final Results (Parsing)

SBMT: Pust et al, 2015 CharLSTM+CAMR: Noord and Bos, 2017 Seq2Seq: Peng et al., 2017

slide-88
SLIDE 88

Final Results (Parsing)

SMATCH 14 28 42 56 70

SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M

62.1 52 67.3 67.1

SBMT: Pust et al, 2015 CharLSTM+CAMR: Noord and Bos, 2017 Seq2Seq: Peng et al., 2017

slide-89
SLIDE 89

Final Results (Parsing)

SMATCH 14 28 42 56 70

SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M

62.1 52 67.3 67.1

SBMT: Pust et al, 2015 CharLSTM+CAMR: Noord and Bos, 2017 Seq2Seq: Peng et al., 2017

slide-90
SLIDE 90

Final Results (Parsing)

SMATCH 14 28 42 56 70

SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M

62.1 52 67.3 67.1

SBMT: Pust et al, 2015 CharLSTM+CAMR: Noord and Bos, 2017 Seq2Seq: Peng et al., 2017

slide-91
SLIDE 91

Final Results (Parsing)

SMATCH 14 28 42 56 70

SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M

62.1 52 67.3 67.1

SBMT: Pust et al, 2015 CharLSTM+CAMR: Noord and Bos, 2017 Seq2Seq: Peng et al., 2017

slide-92
SLIDE 92

How did we do? (Generation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . In January 2002 United States officials held a meeting of the group experts in New York .

Reference Prediction

44.26% 74.85%

Errors: Disfluency Coverage

slide-93
SLIDE 93

How did we do? (Generation)

hold :ARG0 (person :ARG0-of (have-role :ARG1 loc_0 :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity year_0 month_0) :location loc_1

US officials held an expert group meeting in January 2002 in New York . In January 2002 United States officials held a meeting of the group experts in New York .

Reference Prediction

44.26% 74.85%

The report stated British government must help to stabilize weak states and push for international regulations that would stop terrorists using freely available information to create and unleash new forms of biological warfare such as a modified version of the influenza virus.

Reference

The report stated that the Britain government must help stabilize the weak states and push international regulations to stop the use of freely available information to create a form of new biological warfare such as the modified version

  • f the influenza .

Prediction

Errors: Disfluency Coverage

slide-94
SLIDE 94

Summary

  • Sequence-to-sequence models for Parsing and Generation
  • Paired Training: scalable data augmentation algorithm
  • Achieve state-of-the-art performance on generating from AMR
  • Best-performing Neural AMR Parser
  • Demo, Code and Pre-trained Models: http://ikonstas.net
slide-95
SLIDE 95

Summary

Thank You

  • Sequence-to-sequence models for Parsing and Generation
  • Paired Training: scalable data augmentation algorithm
  • Achieve state-of-the-art performance on generating from AMR
  • Best-performing Neural AMR Parser
  • Demo, Code and Pre-trained Models: http://ikonstas.net

thank-01 you

ARG1

slide-96
SLIDE 96

Bonus Slides

slide-97
SLIDE 97

Encoding

Linearize —> RNN encoding

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

slide-98
SLIDE 98

Encoding

Linearize —> RNN encoding

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

slide-99
SLIDE 99

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
slide-100
SLIDE 100

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
slide-101
SLIDE 101

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
  • Bi-directional RNN
slide-102
SLIDE 102

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
  • Bi-directional RNN
slide-103
SLIDE 103

Encoding

Linearize —> RNN encoding

hold ARG0 ( person ARG0-of

h1(s) h2(s) h3(s) h4(s) h5(s)

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s)

hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  • Token embeddings
  • Recurrent Neural Network (RNN)
  • Bi-directional RNN
slide-104
SLIDE 104

Decoding

h1 hN(s)

RNN Encoding —> RNN Decoding (Beam search)

slide-105
SLIDE 105

Decoding

h1 hN(s)

RNN Encoding —> RNN Decoding (Beam search)

  • init h(s)
slide-106
SLIDE 106

Decoding

h1 hN(s)

Holding Held US …

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • init h(s)
slide-107
SLIDE 107

Decoding

h1 hN(s)

Holding Held US …

h2

a the meeting …

w11:Holding

Helds

w12:

Hold

w13:

US

w14:

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • p
  • wi|w<i, h(s)
  • init h(s)
slide-108
SLIDE 108

Decoding

h1 hN(s)

Holding Held US …

h2

a the meeting …

h3

US person expert …

w11:Holding

Helds

w12:

Hold

w13:

US

w14:

… Hold a

w21:

Hold the

w22:

Held a

w23:

Held the

w24:

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • p
  • wi|w<i, h(s)
  • init h(s)
slide-109
SLIDE 109

Decoding

h1 hN(s)

Holding Held US …

h2

a the meeting …

h3

US person expert …

hk

w11:Holding

Helds

w12:

Hold

w13:

US

w14:

… Hold a

w21:

Hold the

w22:

Held a

w23:

Held the

w24: wk1: The

US

  • fficials held

wk2:

US

  • fficials held a

wk3:

US

  • fficials hold the

wk4:

US

  • fficials will hold a

meeting meetings meet …

RNN Encoding —> RNN Decoding (Beam search)

  • softmax
  • p
  • wi|w<i, h(s)
  • init h(s)
slide-110
SLIDE 110

Attention

h2 h3

a the meeting …

w2: held

slide-111
SLIDE 111

Attention

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

slide-112
SLIDE 112

Attention

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-113
SLIDE 113

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-114
SLIDE 114

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-115
SLIDE 115

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi
slide-116
SLIDE 116

Attention

hold ARG0 ( person role US

  • fficial ) ARG1 (

meet expert group )

US

  • fficials

held an expert group meeting in January 2002

h3

a the meeting …

w2: held

hold ARG0 ( person ARG0-of

[ ] [ ] [ ] [ ] [ ]

h1(s) h2(s) h3(s) h4(s) h5(s) c3

[ ]

ci = X

i

aijh(s)

j

ai = soft max

  • fi
  • h(s), hi