Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - PowerPoint PPT Presentation

Linearization Graph —> Depth First Search (Human-authored annotation) hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official official country ) name :time (date-entity 2002 1) expert group :location New_York “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision ( Generation ) (Papineni et al., 2002) ‣ SMATCH score ( Parsing ) (Cai and Knight, 2013)

Experiments ‣ Vanilla experiment ‣ Limited Language Model Capacity ‣ Paired Training ‣ Data augmentation algorithm

First Attempt ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0

Data Augmentation Original Dataset: ~16k graph-sentence pairs

Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*

Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M Giga-20M 80 60 % 40 20 0 OOV@1 OOV@5

Data Augmentation graph Generate from AMR graph text Encoder Decoder text Attention

Data Augmentation graph Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

Data Augmentation graph Re-train Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

Semi-supervised Learning ‣ Self-training ‣ McClosky et al. 2006 ‣ Co-training ‣ Yarowski 1995, Blum and Mitchell 1998, Sarkar 2001 ‣ Sogaard and Rishoj, 2010

Paired Training

Paired Training ( , ) Train AMR Parser P on Original Dataset

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i ( , ) Train Generator G on S N

Training AMR Parser Train P on Original Dataset

Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k

Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k

Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 2M 2M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 20M 20M 20M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 2M 20M 20M 20M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 20M 20M 20M G G 20M

Final Results ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Parsing ) SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26% Errors : Disfluency Coverage

How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 Reference Prediction The report stated British government must The report stated that the Britain government help to stabilize weak states and push for must help stabilize the weak states and push international regulations that would stop international regulations to stop the use of freely 74.85% terrorists using freely available information to available information to create a form of new create and unleash new forms of biological biological warfare such as the modified version warfare such as a modified version of the 44.26% of the influenza . influenza virus . Errors : Disfluency Coverage

Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net

Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net thank-01 ARG1 Thank You you

Bonus Slides

Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - PowerPoint PPT Presentation

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation annis Konstas joint work with Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer AMR graph Generate from AMR graph text Decoder Encoder text Attention AMR

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Neural AMR: Sequence-to- Sequence Models for Parsing and Generation Author: Ioannis Konstas,

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

AMR and EPSRC AMR Networks Meeting, Sheffield, Sept 16 Christina Turner and Stephanie Newland

CS225: Spatial Computing Course Outline Instructor: Amr Magdy Computer Science and Engineering

Updating the RTP payload format for AMR and AMR-WB draft-ietf-avt-rtp-amr-bis-00.txt Magnus

The AMR Group An n In Intr trod oduc uction tion 2013 2013 The Group The he AMR AMR

WITH C++ Prof. Amr Goneid AUC Part 5. Functions Prof. amr Goneid, AUC 1 Functions Prof. amr

WITH C++ Prof. Amr Goneid AUC Part 12. Recursion Prof. amr Goneid, AUC 1 Recursion Prof. amr

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

CS260-002: Spatial Data Modeling and Analysis Course Outline Instructor: Amr Magdy Computer

WITH C++ Prof. Amr Goneid AUC Part 6. Simple and User Defined Data Types Prof. amr Goneid, AUC

While there may be some reasonable options that cost considerably less than embedded systems, let

Exploring Neural Networks for Entity Discovery and Linking (EDL) Dan Liu 1 , Wei Lin 1 , Shiliang

360 and 3DoF+ video Wo Workshop on Coding Technologies for Immersive Audio/Visual Experiences

P passes it to the TCAM coprocessor for classification. A ACKET classification has been recognized

Agenda What is S-100 What do I need from S-100 Product Specifications S-100

Extremely low bit-rate nearest neighbor search using a Set Compression Tree Relja Arandjelovi

Triggering Deep Vulnerabilities Using Symbolic Execution Dan Caselden, Alex Bazhanyuk, Mathias

t Ps s t