Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - - PowerPoint PPT Presentation
Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - - PowerPoint PPT Presentation
Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019 Outline Background and Overview Adaptive Decoding Experiments Conditioned Text Generation
❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments
Outline
Conditioned Text Generation
Source x
y
Target
A Portuguese train derailed in Oporto
- n Wednesday, killing three people.
Portuguese train derailed, killing three.
Conditioned Text Generation
Source x
y
Target
A Portuguese train derailed in Oporto
- n Wednesday, killing three people.
Portuguese train derailed, killing three.
Attention + Copy
· · · α
Encoder
· · ·
Decoder
Enc(x)
Dec ⇣ y | Enc(x) ⌘
Guu et al., 2017; Cao et al., 2018
Exemplar-informed Generation
Source x
A Portuguese train derailed in Oporto
- n Wednesday, killing three people
Two die in a Britain train collision
Exemplar z
Retrieve Training target
Three die in a Portuguese train derailment
Goal y
Guu et al., 2017; Cao et al., 2018
Exemplar-informed Generation
Source x
A Portuguese train derailed in Oporto
- n Wednesday, killing three people
Two die in a Britain train collision
Exemplar z
Retrieve Training target
Three die in a Portuguese train derailment
Goal y Motivation
- Better performance
Cao et al., 2018; Zhang et al., 2018
- Diversity and interpretability
Guu et al., 2017; Wiseman et al., 2018 What to say How to say it
· · ·
Guu et al., 2017; Cao et al., 2018
Source x
A Portuguese train derailed in Oporto
- n Wednesday, killing three people
Two die in a Britain train collision
Exemplar z
Retrieve Training target
Training pairs
Exemplar-informed Generation
(x3, y3) (x1, y1) (x2, y2) (x4, y4)
Three die in a Portuguese train derailment
Goal y What to say How to say it
Guu et al., 2017; Cao et al., 2018
Exemplar-informed Generation
Source x
A Portuguese train derailed in Oporto
- n Wednesday, killing three people
Two die in a Britain train collision
Exemplar z
Retrieve Training target
Similar
(x3, y3) (x1, y1) (x2, y2) (x4, y4)
Three die in a Portuguese train derailment
Goal y
Training pairs
What to say How to say it
Guu et al., 2017; Cao et al., 2018
Exemplar-informed Generation
Source x
A Portuguese train derailed in Oporto
- n Wednesday, killing three people
Two die in a Britain train collision
Exemplar z
Retrieve Training target
(x3, y3) (x1, y1) (x2, y2) (x4, y4)
Three die in a Portuguese train derailment
Goal y
Similar
Training pairs
What to say How to say it
A Portuguese train derailed in Oporto on Wednesday, killing three
- people. Two die in a Britain train collision.
Source x Exemplar z
[ ; ]
Status Quo
Guu et al., 2017; Cao et al., 2018
Three die in a Portuguese train derailment
Goal y
· · ·
Enc
- [x; z]
A Portuguese train derailed in Oporto on Wednesday, killing three
- people. Two die in a Britain train collision.
Status Quo
Guu et al., 2017; Cao et al., 2018
· · ·
α
Attention + copy
· · ·
Enc
- [x; z]
- Dec
⇣ y | Enc
- [x; z]
⌘
Three die in a Portuguese train derailment
Goal y
Source x Exemplar z
[ ; ]
· · ·
α
Attention + copy
· · ·
Enc
- [x; z]
- Dec
⇣ y | Enc
- [x; z]
⌘
Status Quo
A Portuguese train derailed in Oporto on Wednesday, killing three
- people. Two die in a Britain train collision.
Guu et al., 2017; Cao et al., 2018
What to say How to say it
Source x Exemplar z
[ ; ]
Three die in a Portuguese train derailment
Goal y
· · ·
α
Attention + copy
· · ·
Enc
- [x; z]
- Dec
⇣ y | Enc
- [x; z]
⌘
Status Quo
A Portuguese train derailed in Oporto on Wednesday, killing three
- people. Two die in a Britain train collision.
Guu et al., 2017; Cao et al., 2018
Britain train derailed, killing two.
Output
Source x Exemplar z
[ ; ]
Overview
Motivation
- Encoder what to say
- Decoder how to say it
Method: Adaptive Decoding
- Exemplar-specific decoder
Source x Exemplar z
P
- y |
,
- AdaDecz
Overview
Motivation
- Encoder what to say
- Decoder how to say it
Method: Adaptive Decoding
- Exemplar-specific decoder
- Drop-in replacement in
seq2seq
Source x Exemplar z
P
- y |
,
- AdaDecz
· · · α
· · ·
x
Overview
Motivation
- Encoder what to say
- Decoder how to say it
Method: Adaptive Decoding
- Exemplar-specific decoder
- Drop-in replacement in
seq2seq Experiments
- Summarization
- Data2text generation
Source x Exemplar z
P
- y |
,
- AdaDecz
· · · α
· · ·
x
❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments
Outline
Adaptive Decoder
Goal
- Customized decoder for each
exemplar.
AdaDecz
Adaptive Decoder
Goal
- Customized decoder for each
exemplar. Key Points
- Exemplar-informed
interpolation of backbones.
n , , , · · ·
- Interpolation
z
AdaDecz
Adaptive Decoder
Goal
- Customized decoder for each
exemplar. Key Points
- Exemplar-informed
interpolation of backbones.
- Low-rank constraints by
construction.
n , , , · · ·
- Interpolation
z
AdaDecz
Adaptive Decoder
c1
W1
c2
W2
c3
⇣ , ⌘ ⇣ , ⌘ ⇣ , ⌘
W3
Adaptive Decoder
c1
W1
c2
W2
c3
⇣ , ⌘ ⇣ , ⌘ ⇣ , ⌘
W3 AdaDecz
+
= σ2 σ3 σ1
+
W = σ1W1 + σ2W2 + σ3W3
AdaDecz
Adaptive Decoder
c1
⇣ , ⌘
W1
c2
⇣ , ⌘
W2
c3
⇣ , ⌘
W3
Exemplar
+
= σ2 σ3 σ1
+
W = σ1W1 + σ2W2 + σ3W3
z
Adaptive Decoder
c1
⇣ , ⌘
W1
c2
⇣ , ⌘
W2
c3
⇣ , ⌘
W3
σ1 σ2 σ3 = p>c1 p>c2 p>c3 p = RNN(z)
W = σ1W1 + σ2W2 + σ3W3
Adaptive Decoder
c1
⇣ , ⌘
W1
c2
⇣ , ⌘
W2
c3
⇣ , ⌘
W3
σ1 σ2 σ3 = p>c1 p>c2 p>c3 p = RNN(z)
W = σ1W1 + σ2W2 + σ3W3
Low-rank Constraints
W = σ1W1 + σ2W2 + σ3W3
= +
σ1 σ2
+ σ3
Low-rank Constraints
W = σ1W1 + σ2W2 + σ3W3
= +
σ1 σ2
+ σ3
Too many params!
Low-rank Constraints
= + +
σ1 σ2 σ3
W = σ1u1v1
> + σ2u2v2 > + σ3u3v3 >
W = σ1W1 + σ2W2 + σ3W3
Low-rank Constraints
= + +
σ1 σ2 σ3
| {z }
Rank = 1
| {z }
Rank ≤ 3
W = σ1u1v1
> + σ2u2v2 > + σ3u3v3 >
W = σ1W1 + σ2W2 + σ3W3
Low-rank Constraints
W = σ1u1v1
> + σ2u2v2 > + σ3u3v3 > + · · ·
+
| {z }
Rank = 1
| {z }
Rank ≤ d
| {z }
m = d
· · ·
= + +
σ1 σ2 σ3
+
W = σ1W1 + σ2W2 + σ3W3
Low-rank Constraints
W = σ1u1v1
> + σ2u2v2 > + σ3u3v3 > + · · ·
+
| {z }
Rank = 1
| {z }
Rank ≤ d
| {z }
m = d
· · ·
= + +
σ1 σ2 σ3
+
W = σ1W1 + σ2W2 + σ3W3
O(d3) → O(d2)
Walkthrough
Source x
Exemplar z
Retrieve Training target
Walkthrough
Source x
Exemplar z
Retrieve Training target
σ1 σ2 σ3 = p>c1 p>c2 p>c3
p = RNN(z)
Walkthrough
Source x
Exemplar z
Retrieve Training target
σ1 σ2 σ3 = p>c1 p>c2 p>c3
p = RNN(z)
+
σ1 σ2 σ3
Walkthrough
Source x
Exemplar z
Retrieve Training target
σ1 σ2 σ3 = p>c1 p>c2 p>c3
p = RNN(z)
· · · α · · ·
AdaDecz
Enc
+
σ1 σ2 σ3
❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments
Outline
Experiments: Summarization
Datasets:
- Gigaword. Rush et al., 2015
- New York Times (NYT). Durrett et al., 2016
Implementation:
- TF-IDF + cosine similarity for exemplar retrieval.
- LSTM encoder/decoder.
- Comparable implementation and tuning.
Experiments: Summarization
Datasets:
- Gigaword. Rush et al., 2015
- New York Times (NYT). Durrett et al., 2016
Implementation:
- TF-IDF + cosine similarity for exemplar retrieval.
- LSTM encoder/decoder.
- Comparable implementation and tuning.
Seq2seq AttExp, Cao AdaDec, this work Cao, Full
- Enc. & Att. Exemplar
Adaptive Decoding Rerank
Rouge scores on Gigaword test set Rouge
19.0 18.5 17.1 16.6 34.5 34.7 33.2 32.4 37.0 37.3 36.0 35.0
ROUGE-1 ROUGE-L ROUGE-2
Cao et al., 2018
Seq2seq AttExp, Cao AdaDec, this work Cao, Full
- Enc. & Att. Exemplar
Adaptive Decoding Rerank
Rouge scores on Gigaword test set Rouge
19.0 18.5 17.1 16.6 34.5 34.7 33.2 32.4 37.0 37.3 36.0 35.0
ROUGE-1 ROUGE-L ROUGE-2
Cao et al., 2018
Seq2seq AttExp, Cao AdaDec, this work Cao, Full
- Enc. & Att. Exemplar
Adaptive Decoding Rerank
Rouge scores on Gigaword test set Rouge
19.0 18.5 17.1 16.6 34.5 34.7 33.2 32.4 37.0 37.3 36.0 35.0
ROUGE-1 ROUGE-L ROUGE-2
Cao et al., 2018
Paulus et al., 2018
Seq2seq Paulus AttExp AdaDec
- Enc. & Att. Exemplar
Adaptive Decoding
Rouge scores on NYT test set Rouge
26.4 25.7 26.0 25.1 43.2 42.5 42.9 41.9
ROUGE-1 ROUGE-2
Experiments: Data2text
Dataset:
- WikiBio. Lebret et al., 2016
Implementation:
- TF-IDF + cosine similarity for retrieval.
- LSTM encoder/decoder. See et al., 2017
- Comparable implementation and tuning.
Input
Jacques-Louis David (30 August 1748 – 29 December 1825) was a French painter in the Neoclassical style.
Output
Experiments: Data2text
Dataset:
- WikiBio. Lebret et al., 2016
Implementation:
- TF-IDF + cosine similarity for exemplar retrieval.
- LSTM encoder/decoder. See et al., 2017
- Comparable implementation and tuning.
Seq2seq Wiseman AttExp AdaDec, this work
- Enc. & Att. Exemplar
Adaptive Decoding
WikiBio test performance Rouge/BLEU
43.6 43.1 34.8 42.5 40.6 40.0 38.6 39.3
ROUGE-4 BLEU
Wiseman et al., 2018
Conclusion
Problem
P(y | x, z)
Conclusion
Problem Method
AdaDecz ⇣ y | Enc(x) ⌘
P(y | x, z)
Conclusion
Problem Method Results
AdaDecz ⇣ y | Enc(x) ⌘
P(y | x, z)
Exemplars Outputs
Portuguese train derailed, killing three. Two die in a Britain train collision Three killed in a Portuguese train derailment Two people were killed in a Britain train collision Three people were killed in a Portuguese train derailment A train collision in Canada killed two people A Portugueses train derails in northern Mexico, killed three
Thank You!
shorturl.at/kzAGY
A Portuguese train derailed in Oporto
- n Wednesday, killing three people