Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - PowerPoint PPT Presentation

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019

Outline ❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

Conditioned Text Generation A Portuguese train derailed in Oporto Source x on Wednesday, killing three people. Target Portuguese train derailed, killing three. y

Conditioned Text Generation A Portuguese train derailed in Oporto Source x on Wednesday, killing three people. Enc( x ) Encoder · · · Attention + Copy α ⇣ ⌘ Decoder Dec y | Enc( x ) · · · Target Portuguese train derailed, killing three. y

Exemplar-informed Generation Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Motivation • Better performance Cao et al., 2018; Zhang et al., 2018 • Diversity and interpretability Guu et al., 2017; Wiseman et al., 2018 Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation What to say How to say it Retrieve Source x Exemplar z Training target A Portuguese train derailed in Oporto Two die in a Britain train collision on Wednesday, killing three people Similar ( x 1 , y 1 ) ( x 2 , y 2 ) Training pairs ( x 3 , y 3 ) ( x 4 , y 4 ) Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · What to say Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · How to say it Goal y Three die in a Portuguese train derailment Guu et al., 2017; Cao et al., 2018

Status Quo [ ; ] Source x Exemplar z A Portuguese train derailed in Oporto on Wednesday, killing three people. Two die in a Britain train collision. � � Enc [ x ; z ] · · · Attention + copy α ⇣ �⌘ � Dec y | Enc [ x ; z ] · · · Output Britain train derailed, killing two. Guu et al., 2017; Cao et al., 2018

Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder

Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder • Drop-in replacement in x seq2seq · · · α · · ·

Overview Motivation • Encoder what to say � � y | P Source x Exemplar z , • Decoder how to say it Method: Adaptive Decoding AdaDec z • Exemplar-specific decoder • Drop-in replacement in x seq2seq · · · Experiments α • Summarization • Data2text generation · · ·

Adaptive Decoder Goal • Customized decoder for each exemplar. AdaDec z

Adaptive Decoder Goal • Customized decoder for each n o exemplar. , · · · , , Key Points Interpolation • Exemplar-informed z interpolation of backbones. AdaDec z

Adaptive Decoder Goal • Customized decoder for each n o exemplar. , · · · , , Key Points Interpolation • Exemplar-informed z interpolation of backbones. • Low-rank constraints by AdaDec z construction.

Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2

Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 AdaDec z + + σ 1 σ 2 σ 3 = W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 Exemplar z AdaDec z + + σ 1 σ 2 σ 3 = W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

Adaptive Decoder ⇣ ⌘ ⇣ ⌘ ⇣ ⌘ , , , W 3 W 2 c 3 W 1 c 1 c 2 p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3

Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 + + σ 3 σ 1 σ 2 =

Low-rank Constraints Too many params! W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 + + σ 3 σ 1 σ 2 =

Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > W = σ 1 u 1 v 1 + + σ 1 σ 2 σ 3 =

Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > W = σ 1 u 1 v 1 + + σ 1 σ 2 σ 3 = | {z } | {z } Rank = 1 Rank ≤ 3

Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > + · · · W = σ 1 u 1 v 1 + + + σ 1 σ 2 σ 3 + = · · · | {z } | {z } Rank ≤ d Rank = 1 | {z } m = d

Low-rank Constraints W = σ 1 W 1 + σ 2 W 2 + σ 3 W 3 > + σ 2 u 2 v 2 > + σ 3 u 3 v 3 > + · · · W = σ 1 u 1 v 1 + + + σ 1 σ 2 σ 3 + = · · · | {z } | {z } Rank ≤ d Rank = 1 | {z } m = d O ( d 3 ) → O ( d 2 )

Walkthrough Retrieve Source x Exemplar z Training target

Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3

Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 σ 1 σ 2 + σ 3

Walkthrough Retrieve Source x Exemplar z Training target p = RNN( z ) p > c 1     σ 1 p > c 2  = σ 2    p > c 3 σ 3 Enc · · · σ 1 α σ 2 AdaDec z + · · · σ 3

Experiments: Summarization Datasets: • Gigaword. Rush et al., 2015 • New York Times (NYT). Durrett et al., 2016 Implementation: • TF-IDF + cosine similarity for exemplar retrieval. • LSTM encoder/decoder. • Comparable implementation and tuning.

Rouge scores on Gigaword test set ROUGE-1 ROUGE-L ROUGE-2 37.3 37.0 36.0 35.0 34.7 34.5 33.2 32.4 Rouge 19.0 18.5 17.1 16.6 Seq2seq AttExp, Cao AdaDec, this work Cao, Full Enc. & Att. Exemplar Adaptive Decoding Rerank Cao et al., 2018

Rouge scores on NYT test set ROUGE-1 ROUGE-2 43.2 42.9 42.5 41.9 Rouge 26.4 26.0 25.7 25.1 Seq2seq Paulus AttExp AdaDec Enc. & Att. Exemplar Adaptive Decoding Paulus et al., 2018

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - PowerPoint PPT Presentation

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019 Outline Background and Overview Adaptive Decoding Experiments Conditioned Text Generation

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

RCIA Fall Retreat: Jesus Exemplar, Aspects of Prayer, Types of Prayer Fall Retreat:

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Efficient Video Decoding on GPUs Efficient Video Decoding on GPUs by Point Based Rendering by

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Exemplar dynamics and the emergence of categories Gerhard J ager

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Light from the West Figure: The upper limes germanicus , s. ii CE (CC-BY-SA: source) Figure: The

Participatory modelling for water planning and risk management at the urban fringe Dr Katherine

Jhelum Basin, NW Himalaya Presenter Gowhar Meraj Jammu and Kashmir Environmental Information

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

Cryptography This course provides an overview of basic modern cryptographic techniques and covers

Guest Speaker Today Rev. Richard Childress Order of Service October 11, 2020 Nineteenth Sunday

Gra c e E va ng e lic a l F re e Churc h Vision a nd Mission June 2014 Transformed by the

02/08/2016 MUSIC AND ART INSPIRED BY SHAKESPEARE Lecture 6 Alls Well That Ends Henry Moore