Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

• typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation

• no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account

• no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking

• entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling)

• our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far

. . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts

• we can compare both approaches in a single architecture . . . . . . . . . . . . . Our Solution Introduction . Our NLG system trainable from unaligned pairs of input DAs + sentences context-aware: adapts to previous user utterance two operating modes: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models

• we can compare both approaches in a single architecture . . . . . . . . . . . . . Introduction . Our Solution Our NLG system context-aware: adapts to previous user utterance two operating modes: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences

• we can compare both approaches in a single architecture . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system two operating modes: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance

• we can compare both approaches in a single architecture . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes:

. . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes: • we can compare both approaches in a single architecture

. . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes: • we can compare both approaches in a single architecture ✓ learns to produce meaningful outputs from very little training

• Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention

• Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states

• attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens

• basic greedy generation . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states

. . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation

. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation + reranker ( → )

• we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information

• check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases

• NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 1 0 1 1 1 0 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✓ ✗ ✓ ✗ ✗ penalty=3 X is a restaurant .

• 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer σ lstm lstm lstm lstm lstm X is a restaurant .

• penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre name=X-name area=riverside eattype=bar inform σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm X is a restaurant . • 1-hot DA representation

. . . . . . . . . . . . . . . . . . . System Architecture Reranker 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 1 0 1 1 1 0 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✓ ✗ ✓ ✗ ✗ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

. . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 1 0 1 1 1 0 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✓ ✗ ✓ ✗ ✗ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

• much less data than previous seq2seq methods • partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Experiments on the BAGEL Set . . Basic Sequence-to-Sequence NLG . Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset:

• partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset: • much less data than previous seq2seq methods

• manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”)

• 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it

• manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST

. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data

. 28 (beam size 100) 25 5.510 60.93 (beam size 10) 24 5.487 60.77 + Reranker (beam size 5) 5.293 5.514 58.59 + Beam search (beam size 100) 20 5.144 55.29 Greedy with trees 30 5.231 59.89 Dušek & Jurčíček (2015) 60.44 19 - (beam size 10) Sequence-to-Sequence NLG Ondřej Dušek 10/ 20 19 5.669 62.76 (beam size 100) 21 5.614 62.40 27 Greedy into strings 5.507 61.18 + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 5.052 52.54 0 Mairesse et al. (2010) – alignments . . . . . . . . . . . . . . . . . . . . . . . . . . prev . ERR NIST BLEU Setup Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . ∼ 67

. + Reranker (beam size 5) 5.514 60.44 (beam size 100) 25 5.510 60.93 (beam size 10) 24 5.487 60.77 28 Greedy into strings 5.293 58.59 + Beam search (beam size 100) 20 5.144 55.29 Greedy with trees 30 5.231 59.89 19 52.54 0 21 joint two-step our Sequence-to-Sequence NLG Ondřej Dušek 10/ 20 19 5.669 62.76 (beam size 100) 5.614 5.052 62.40 (beam size 10) 27 5.507 61.18 + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 Dušek & Jurčíček (2015) - . . . . . . . . . . . . . . . . . . . . . . . . . . . prev . Mairesse et al. (2010) – alignments ERR NIST BLEU Setup Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . ∼ 67

. area=riverside, food=French) . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Sample Outputs Input DA inform(name=X-name, type=placetoeat, eattype=restaurant, Reference . X is a French restaurant on the riverside. Greedy with trees X is a restaurant providing french and continental and by the river. + Beam search + Reranker X is a french restaurant in the riverside area. Greedy into strings X is a restaurant in the riverside that serves italian food. [French] + Beam search X is a restaurant in the riverside that serves italian food. [French] + Reranker X is a restaurant in the riverside area that serves french food. 11/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG X is a restaurant that serves french takeaway. [riverside]

• Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context

• Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context • Problem: data sparsity

• Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact

• preceding user utterance . . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek Sequence-to-Sequence NLG inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) . . . . . . . . . . . . . . . . . . . . . . . . Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s)

. . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance NEW →

. . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Heading to Rector Street from Fulton Street, take a bus line M21 at 9:13pm. CONTEXT- AWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance →

• record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . . Collecting the set Collecting the set (via CrowdFlower) . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . 1. Get natural user utterances in calls to a live dialogue system

• manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . Collecting the set . task descriptions use varying synonyms . 2. Generate possible response DAs for the user utterances . 3. Collect natural language paraphrases for the response DAs . Collecting the set (via CrowdFlower) . . 13/ 20 Ondřej Dušek Sequence-to-Sequence NLG You want a connection – your departure stop is Marble Hill , and you want to go to Roosevelt Island . Ask how long the journey will take. Ask about a schedule afuerwards. Then modify your query: Ask for a ride at six o’clock in the evening. Ask for a connection by bus. Do as if you changed your mind: Say that your destination stop is City Hall . You are searching for transit options leaving from Houston Street with the destination of Marble Hill . When you are ofgered a schedule, ask about the time of arrival at your destination. Then ask for a connection afuer that. Modify your query: Request information about an alternative at six p.m. and state that you prefer to go by bus. Tell the system that you want to travel from Park Place to Inwood . When you are ofgered a trip, ask about the time needed. Then ask for another alternative. Change your search: Ask about a ride at 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o’clock p.m. and tell the system that you would rather use the bus. 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS,

• using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set (via CrowdFlower) . . . Collecting the set . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU

• interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set . . . . Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy

• checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . . . . . . Collecting the set Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions

. . . . . . . . . . . . . . . . . Collecting the set Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission)

. . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 14/ 20 . Sequence-to-Sequence NLG . Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

. . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 14/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

. . . . . . . . . . . . . . . . Collecting the set System Architecture . a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 14/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Context in our Seq2seq Generator (1) . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

. . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (1) . decoder b) separate context encoder, hidden states concatenated 14/ 20 Ondřej Dušek Sequence-to-Sequence NLG . a) preceding user utterance prepended to the DA and fed into the . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

• promoting outputs that have a word or phrase overlap with . . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (2) the context utterance 15/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match

. . . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (2) the context utterance 15/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with

. . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (2) the context utterance 15/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with is there a later time inform_no_match(alternative=next) No route found later , sorry . -2.914 -3.544 The next connection is not found . I m sorry , I can not fi nd a later ride . -3.690 ' -3.836 I can not fi nd the next one sorry . I m sorry , a later connection was not found . -4.003 '

• Human pairwise preference ranking (crowdsourced) • baseline • context-aware preferred in 52.5% cases (significant) . Automatic evaluation results 7.037 66.41 Baseline (context not used) NIST BLEU Collecting the set Experiments Experiments 68.68 . . . . n -gram match reranker 63.87 7.577 6.818 Ondřej Dušek 16/ 20 prepending context + n -gram match reranker 7.596 69.17 + n -gram match reranker 63.08 Prepending context Context encoder 7.772 69.26 + n -gram match reranker 6.456 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized

• context-aware preferred in 52.5% cases (significant) . Experiments 66.41 Baseline (context not used) NIST BLEU Automatic evaluation results Collecting the set Experiments n -gram match reranker . . . . . 7.037 7.577 68.68 63.08 Ondřej Dušek 16/ 20 7.596 69.17 + n -gram match reranker 6.818 Context encoder . 7.772 69.26 + n -gram match reranker 6.456 63.87 Prepending context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized • Human pairwise preference ranking (crowdsourced) • baseline × prepending context + n -gram match reranker

. Experiments 66.41 Baseline (context not used) NIST BLEU Automatic evaluation results Experiments Collecting the set n -gram match reranker . . . . . . 7.037 68.68 . 63.08 Ondřej Dušek 16/ 20 7.596 69.17 + n -gram match reranker 6.818 Context encoder 7.577 7.772 69.26 + n -gram match reranker 6.456 63.87 Prepending context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized • Human pairwise preference ranking (crowdsourced) • baseline × prepending context + n -gram match reranker • context-aware preferred in 52.5% cases (significant)

I did not find a bus route. I’m sorry, I cannot find a bus connection. I’m sorry, I cannot find a bus connection. . Collecting the set Input DA is there a later option Context Output Examples Experiments . . Baseline . . . . iconfirm(alternative=next) Prepending context + n -gram match reranker Next connection. n -gram match reranker . Context encoder + n -gram match reranker Context i need to find a bus connection Input DA inform_no_match(vehicle=bus) Baseline No bus found, sorry. n -gram match reranker Prepending context + n -gram match reranker Context encoder + n -gram match reranker 17/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . You want a later connection. You want a later connection. You want a later option.

. Baseline . . . . . . . Collecting the set Experiments Output Examples Context is there a later option Input DA iconfirm(alternative=next) Next connection. . n -gram match reranker Prepending context + n -gram match reranker Context encoder + n -gram match reranker Context i need to find a bus connection Input DA inform_no_match(vehicle=bus) Baseline No bus found, sorry. n -gram match reranker Prepending context + n -gram match reranker Context encoder + n -gram match reranker 17/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . You want a later connection. You want a later connection. You want a later option. I did not find a bus route. I’m sorry, I cannot find a bus connection. I’m sorry, I cannot find a bus connection.

. direction=Cathedral Parkway, from_stop=Bowling Green, . . . . . . . Collecting the set Experiments Output Examples Context i rather take the bus Input DA inform(vehicle=bus, departure_time=8:01am, line=M15) . Parkway at 8:01am. Ondřej Dušek 18/ 20 Parkway. + n -gram match reranker At 8:01am by bus line M15 from Bowling Green to Cathedral Context encoder + n -gram match reranker Baseline Prepending context Parkway. At 8:01am by bus line M15 from Bowling Green to Cathedral n -gram match reranker Parkway. At 8:01am by bus line M15 from Bowling Green to Cathedral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG You can take the M15 bus from Bowling Green to Cathedral

• generates sentences / trees • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . Conclusion . . . Our System… produces valid outputs even with limited training data allows comparing 2-step & joint NLG is 1st trainable & capable of entrainment Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG ✓ works with unaligned data • better than our previous work on the BAGEL set

• generates sentences / trees • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . Conclusion Our System… allows comparing 2-step & joint NLG is 1st trainable & capable of entrainment Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data

• entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . Conclusion Our System… is 1st trainable & capable of entrainment Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees

• Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline

• Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline • Lexicalized generation

• Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching

. . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS

. . . . . . . . . . . . . . . . . Thank you for your attention Download it! Contact me Ondřej Dušek o.dusek@hw.ac.uk EM 1.56 20/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Code: bit.ly/tgen_nlg • Dataset: bit.ly/nlgdata

• some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: sentence plan t-tree zone=en MR sentence surface be surface text planning realization inform(name=X-name,type=placetoeat, v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

• two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step sentence plan t-tree zone=en MR sentence surface be surface text planning realization inform(name=X-name,type=placetoeat, v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

• two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step MR joint NLG surface text inform(name=X-name,type=placetoeat, eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

• joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting

• we can do both in one system . . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline

. . . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system

• main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

• input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs

• output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . . System Workflow 2-step mode – deep syntax trees, in bracketed format . joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs

• 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . . . 2-step mode – deep syntax trees, in bracketed format System Workflow . joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . ( <root> <root> ( ( X-name n:subj ) be v:fin ( ( Italian adj:attr ) restaurant n:obj ( river n:near+X ) ) ) ) . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

• 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . . . 2-step mode – deep syntax trees, in bracketed format System Workflow . joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

. . . . . . . . . . . . . System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface

. near X. . . . . . . . Sample Outputs Input DA inform(name=X-name, type=placetoeat, eattype=restaurant, area=citycentre, near=X-near, food=”Chinese takeaway”, food=Japanese) Reference X is a Chinese takeaway and Japanese restaurant in the city centre Greedy with trees . + Beam search Ondřej Dušek 3/ 6 food. [takeaway] X is a japanese restaurant in the city centre near X providing chinese + Reranker area near X. [Japanese, citycentre] centre area near X. [Japanese, Chinese] X is a restaurant ofgering chinese takeaway in the centre of town Greedy into strings ofgers chinese takeaway. X is a restaurant serving japanese food in the centre of the city that + Reranker + Beam search near X. [Japanese] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG X is a restaurant and japanese food and chinese takeaway. X is a restaurant ofgering italian and indian takeaway in the city X is a restaurant that serves fusion chinese takeaway in the riverside

Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Sequence-to-Sequence Natural Language Generation Ondej Duek work done with Filip Jurek at Charles University in Prague November 15, 2016 Interaction Lab meeting 1/ 20 Ondej Duek

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini PyParis 2018

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

DATA ANALYTICS USING DEEP LEARNING A U T O M A T I C D A T A B A S E M A N A G E M E N T S Y S

A Heuristic Approach to Detect Opaque Predicates that Disrupt Static Disassembly By: Yu-Jye

Lecture notes 12.001 field technique & geological mapping Geological maps are the most data

Intro to Trees After today, you should be able to use tree terminology write recursive

Parsing with Dynamic Continuized CCG Michael White, a Simon Charlow, b Jordan Needle, a Dylan

Solving Systems of Linear Equations There are two basic methods we will use to solve systems of

Modeling and Mitigating the Coremelt Attack Guosong Yang 1 , Hossein Hosseini 2 , Dinuka Sahabandu

Higher Computability and Randomness Paul-Elliot Angls dAuriac Benot Monin 10 mai 2017

Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Sequence-to-Sequence Natural Language Generation Ondej Duek work done with Filip Jurek at Charles University in Prague November 15, 2016 Interaction Lab meeting 1/ 20 Ondej Duek

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini PyParis 2018

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

DATA ANALYTICS USING DEEP LEARNING A U T O M A T I C D A T A B A S E M A N A G E M E N T S Y S

A Heuristic Approach to Detect Opaque Predicates that Disrupt Static Disassembly By: Yu-Jye

Lecture notes 12.001 field technique &amp; geological mapping Geological maps are the most data

Intro to Trees After today, you should be able to use tree terminology write recursive

Parsing with Dynamic Continuized CCG Michael White, a Simon Charlow, b Jordan Needle, a Dylan

Solving Systems of Linear Equations There are two basic methods we will use to solve systems of

Modeling and Mitigating the Coremelt Attack Guosong Yang 1 , Hossein Hosseini 2 , Dinuka Sahabandu

Higher Computability and Randomness Paul-Elliot Angls dAuriac Benot Monin 10 mai 2017

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Lecture notes 12.001 field technique & geological mapping Geological maps are the most data