Building Adaptable and Scalable Natural Language Generation Systems - PowerPoint PPT Presentation

Sequence to sequence model input output Encoder Decoder a I know inhabit Attention planet The knew inhabited man A planet was … … … … … know ARG0 I ARG1 ( planet ARG1-of inhabit Y w i | w <i , h ( s ) � � w = argmax ˆ p <s> w i I know the planet of

Linearization Graph —> Depth First Search hold ARG0 ARG1 time location person city date-entity meet ARG0-of ARG0 year month name have-role person ARG1 ARG2 2002 1 “New York” ARG1-of ARG2-of official country name expert group “United States” US officials held an expert group meeting in January 2002 in New York .

Linearization Graph —> Depth First Search hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity 2002 1) expert group :location New_York “United States” US officials held an expert group meeting in January 2002 in New York .

Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - [ ] [ ] [ ] [ ] [ ] h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Decoding RNN Encoding —> RNN Decoding (Beam search) h 1 h N(s)

Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) - h 1 h N(s) ∅

Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) Holding - Held softmax - US … h 1 h N(s) ∅

Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) Holding a - Held the softmax - US meeting w i | w <i , h ( s ) � � … … - p h 1 h 2 h N(s) ∅ w 11: Holding Helds w 12: Hold w 13: US w 14: …

Decoding RNN Encoding —> RNN Decoding (Beam search) US init h ( s ) Holding a - person Held the softmax expert - US meeting … w i | w <i , h ( s ) � � … … - p … h 1 h 2 h 3 h N(s) ∅ w 11: Holding Hold a w 21: Helds w 12: w 22: Hold the Hold w 13: Held a w 23: US w 14: Held the w 24: … …

Decoding RNN Encoding —> RNN Decoding (Beam search) US init h ( s ) Holding a meeting - person Held the meetings softmax expert - US meeting meet … w i | w <i , h ( s ) � � … … … - p … h 1 h 2 h 3 h k h N(s) ∅ officials held w k1: The US w 11: Holding Hold a w 21: officials held a US w k2: Helds w 12: w 22: Hold the Hold w 13: Held a w 23: US officials hold the w k3: US w 14: Held the w 24: officials will hold a w k4: US … … …

Attention a the meeting … h 2 h 3 w 2 : held

Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) h 5(s) ARG0-of

Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) h 5(s) ARG0-of h ( s ) , h i � � �� a i = soft max f i a ij h ( s ) X c i = j i

Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision (Papineni et al., ACL 2002)

First Attempt TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt TreeToStr TSP PBMT NNLG 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt TreeToStr TSP PBMT NNLG 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt TreeToStr TSP PBMT NNLG 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0

Data Augmentation Original Dataset: ~16k graph-sentence pairs

Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*

Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M 80 60 % 40 20 0 OOV@1 OOV@5

Data Augmentation graph Generate from MR graph text Encoder Decoder text Attention

Data Augmentation graph Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

Data Augmentation graph Re-train Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

Data Augmentation input Generate Parse to Input from Input text

Paired Training

Paired Training ( , ) Train MR Parser P on Original Dataset

Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N

Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword

Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P

Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i

Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i ( , ) Train Generator G on S N

Training MR Parser Train P on Original Dataset

Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k

Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k

Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M

Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 200k 2M

Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 2M G G 2M

Final Results TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Building Adaptable and Scalable Natural Language Generation Systems - PowerPoint PPT Presentation

Building Adaptable and Scalable Natural Language Generation Systems Yannis Konstas Natural Language Generation is everywhere (Machine Translation)

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Adaptable skin systems Omar Zalloum Architect M.Sc. Architecture and Planning Beyond

Adaptable Space Design LWC-S, Studio 8 September 2014

HATS: Highly Adaptable & Trustworthy Software Using Formal Models Reiner H ahnle

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

HomeSense Team Nigel Gilbert is Professor and Director of the Centre for Research in Social

Doing IT Security Organizational challenges Laura Kocksch Fraunhofer Institute for Secure

My Background curiosities and interests getting a career Usability A recent

1/9 Jrgen Schfer In Search of Sustainability Institutional and Curricular Limitations of

Stuttgart General Assembly Stuttgart General Assembly Scientific / Science Policy Board

Usability Aspects of Collaborative planning: current problem areas Our Tangible User

Accuracy in Rating and Recommending Item Features Lloyd Rutledge 1 , Natalia Stash 2 , Yiwen

Information engineering and services for the urban environment Kostas Karatzas Informatics A

Building Adaptable and Scalable Natural Language Generation Systems - PowerPoint PPT Presentation

Building Adaptable and Scalable Natural Language Generation Systems Yannis Konstas Natural Language Generation is everywhere (Machine Translation)

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Paula

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Adaptable skin systems Omar Zalloum Architect M.Sc. Architecture and Planning Beyond

Adaptable Space Design LWC-S, Studio 8 September 2014

HATS: Highly Adaptable &amp; Trustworthy Software Using Formal Models Reiner H ahnle

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Scalable String Matching on the Scalable String Matching on the Scalable String Matching on the

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Natural Language Processing George Konidaris gdk@cs.brown.edu Fall 2019 Natural Language

HomeSense Team Nigel Gilbert is Professor and Director of the Centre for Research in Social

Doing IT Security Organizational challenges Laura Kocksch Fraunhofer Institute for Secure

My Background curiosities and interests getting a career Usability A recent

1/9 Jrgen Schfer In Search of Sustainability Institutional and Curricular Limitations of

Stuttgart General Assembly Stuttgart General Assembly Scientific / Science Policy Board

Usability Aspects of Collaborative planning: current problem areas Our Tangible User

Accuracy in Rating and Recommending Item Features Lloyd Rutledge 1 , Natalia Stash 2 , Yiwen

Information engineering and services for the urban environment Kostas Karatzas Informatics A

HATS: Highly Adaptable & Trustworthy Software Using Formal Models Reiner H ahnle