Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - - PowerPoint PPT Presentation

text generation with exemplar based adaptive decoding
SMART_READER_LITE
LIVE PREVIEW

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, - - PowerPoint PPT Presentation

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019 Outline Background and Overview Adaptive Decoding Experiments Conditioned Text Generation


slide-1
SLIDE 1

Text Generation with Exemplar-based Adaptive Decoding

Hao Peng, Ankur Parikh, Manaal Faruqui, Bhuwan Dhingra, Dipanjan Das @NAACL June 4, 2019

slide-2
SLIDE 2

❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

Outline

slide-3
SLIDE 3

Conditioned Text Generation

Source x

y

Target

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people.

Portuguese train derailed, killing three.

slide-4
SLIDE 4

Conditioned Text Generation

Source x

y

Target

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people.

Portuguese train derailed, killing three.

Attention + Copy

· · · α

Encoder

· · ·

Decoder

Enc(x)

Dec ⇣ y | Enc(x) ⌘

slide-5
SLIDE 5

Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation

Source x

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people

Two die in a Britain train collision

Exemplar z

Retrieve Training target

Three die in a Portuguese train derailment

Goal y

slide-6
SLIDE 6

Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation

Source x

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people

Two die in a Britain train collision

Exemplar z

Retrieve Training target

Three die in a Portuguese train derailment

Goal y Motivation

  • Better performance

Cao et al., 2018; Zhang et al., 2018

  • Diversity and interpretability

Guu et al., 2017; Wiseman et al., 2018 What to say How to say it

slide-7
SLIDE 7

· · ·

Guu et al., 2017; Cao et al., 2018

Source x

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people

Two die in a Britain train collision

Exemplar z

Retrieve Training target

Training pairs

Exemplar-informed Generation

(x3, y3) (x1, y1) (x2, y2) (x4, y4)

Three die in a Portuguese train derailment

Goal y What to say How to say it

slide-8
SLIDE 8

Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation

Source x

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people

Two die in a Britain train collision

Exemplar z

Retrieve Training target

Similar

(x3, y3) (x1, y1) (x2, y2) (x4, y4)

Three die in a Portuguese train derailment

Goal y

Training pairs

What to say How to say it

slide-9
SLIDE 9

Guu et al., 2017; Cao et al., 2018

Exemplar-informed Generation

Source x

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people

Two die in a Britain train collision

Exemplar z

Retrieve Training target

(x3, y3) (x1, y1) (x2, y2) (x4, y4)

Three die in a Portuguese train derailment

Goal y

Similar

Training pairs

What to say How to say it

slide-10
SLIDE 10

A Portuguese train derailed in Oporto on Wednesday, killing three

  • people. Two die in a Britain train collision.

Source x Exemplar z

[ ; ]

Status Quo

Guu et al., 2017; Cao et al., 2018

Three die in a Portuguese train derailment

Goal y

· · ·

Enc

  • [x; z]
slide-11
SLIDE 11

A Portuguese train derailed in Oporto on Wednesday, killing three

  • people. Two die in a Britain train collision.

Status Quo

Guu et al., 2017; Cao et al., 2018

· · ·

α

Attention + copy

· · ·

Enc

  • [x; z]
  • Dec

⇣ y | Enc

  • [x; z]

Three die in a Portuguese train derailment

Goal y

Source x Exemplar z

[ ; ]

slide-12
SLIDE 12

· · ·

α

Attention + copy

· · ·

Enc

  • [x; z]
  • Dec

⇣ y | Enc

  • [x; z]

Status Quo

A Portuguese train derailed in Oporto on Wednesday, killing three

  • people. Two die in a Britain train collision.

Guu et al., 2017; Cao et al., 2018

What to say How to say it

Source x Exemplar z

[ ; ]

Three die in a Portuguese train derailment

Goal y

slide-13
SLIDE 13

· · ·

α

Attention + copy

· · ·

Enc

  • [x; z]
  • Dec

⇣ y | Enc

  • [x; z]

Status Quo

A Portuguese train derailed in Oporto on Wednesday, killing three

  • people. Two die in a Britain train collision.

Guu et al., 2017; Cao et al., 2018

Britain train derailed, killing two.

Output

Source x Exemplar z

[ ; ]

slide-14
SLIDE 14

Overview

Motivation

  • Encoder what to say
  • Decoder how to say it

Method: Adaptive Decoding

  • Exemplar-specific decoder

Source x Exemplar z

P

  • y |

,

  • AdaDecz
slide-15
SLIDE 15

Overview

Motivation

  • Encoder what to say
  • Decoder how to say it

Method: Adaptive Decoding

  • Exemplar-specific decoder
  • Drop-in replacement in

seq2seq

Source x Exemplar z

P

  • y |

,

  • AdaDecz

· · · α

· · ·

x

slide-16
SLIDE 16

Overview

Motivation

  • Encoder what to say
  • Decoder how to say it

Method: Adaptive Decoding

  • Exemplar-specific decoder
  • Drop-in replacement in

seq2seq Experiments

  • Summarization
  • Data2text generation

Source x Exemplar z

P

  • y |

,

  • AdaDecz

· · · α

· · ·

x

slide-17
SLIDE 17

❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

Outline

slide-18
SLIDE 18

Adaptive Decoder

Goal

  • Customized decoder for each

exemplar.

AdaDecz

slide-19
SLIDE 19

Adaptive Decoder

Goal

  • Customized decoder for each

exemplar. Key Points

  • Exemplar-informed

interpolation of backbones.

n , , , · · ·

  • Interpolation

z

AdaDecz

slide-20
SLIDE 20

Adaptive Decoder

Goal

  • Customized decoder for each

exemplar. Key Points

  • Exemplar-informed

interpolation of backbones.

  • Low-rank constraints by

construction.

n , , , · · ·

  • Interpolation

z

AdaDecz

slide-21
SLIDE 21

Adaptive Decoder

c1

W1

c2

W2

c3

⇣ , ⌘ ⇣ , ⌘ ⇣ , ⌘

W3

slide-22
SLIDE 22

Adaptive Decoder

c1

W1

c2

W2

c3

⇣ , ⌘ ⇣ , ⌘ ⇣ , ⌘

W3 AdaDecz

+

= σ2 σ3 σ1

+

W = σ1W1 + σ2W2 + σ3W3

slide-23
SLIDE 23

AdaDecz

Adaptive Decoder

c1

⇣ , ⌘

W1

c2

⇣ , ⌘

W2

c3

⇣ , ⌘

W3

Exemplar

+

= σ2 σ3 σ1

+

W = σ1W1 + σ2W2 + σ3W3

z

slide-24
SLIDE 24

Adaptive Decoder

c1

⇣ , ⌘

W1

c2

⇣ , ⌘

W2

c3

⇣ , ⌘

W3

  σ1 σ2 σ3   =   p>c1 p>c2 p>c3   p = RNN(z)

W = σ1W1 + σ2W2 + σ3W3

slide-25
SLIDE 25

Adaptive Decoder

c1

⇣ , ⌘

W1

c2

⇣ , ⌘

W2

c3

⇣ , ⌘

W3

  σ1 σ2 σ3   =   p>c1 p>c2 p>c3   p = RNN(z)

W = σ1W1 + σ2W2 + σ3W3

slide-26
SLIDE 26

Low-rank Constraints

W = σ1W1 + σ2W2 + σ3W3

= +

σ1 σ2

+ σ3

slide-27
SLIDE 27

Low-rank Constraints

W = σ1W1 + σ2W2 + σ3W3

= +

σ1 σ2

+ σ3

Too many params!

slide-28
SLIDE 28

Low-rank Constraints

= + +

σ1 σ2 σ3

W = σ1u1v1

> + σ2u2v2 > + σ3u3v3 >

W = σ1W1 + σ2W2 + σ3W3

slide-29
SLIDE 29

Low-rank Constraints

= + +

σ1 σ2 σ3

| {z }

Rank = 1

| {z }

Rank ≤ 3

W = σ1u1v1

> + σ2u2v2 > + σ3u3v3 >

W = σ1W1 + σ2W2 + σ3W3

slide-30
SLIDE 30

Low-rank Constraints

W = σ1u1v1

> + σ2u2v2 > + σ3u3v3 > + · · ·

+

| {z }

Rank = 1

| {z }

Rank ≤ d

| {z }

m = d

· · ·

= + +

σ1 σ2 σ3

+

W = σ1W1 + σ2W2 + σ3W3

slide-31
SLIDE 31

Low-rank Constraints

W = σ1u1v1

> + σ2u2v2 > + σ3u3v3 > + · · ·

+

| {z }

Rank = 1

| {z }

Rank ≤ d

| {z }

m = d

· · ·

= + +

σ1 σ2 σ3

+

W = σ1W1 + σ2W2 + σ3W3

O(d3) → O(d2)

slide-32
SLIDE 32

Walkthrough

Source x

Exemplar z

Retrieve Training target

slide-33
SLIDE 33

Walkthrough

Source x

Exemplar z

Retrieve Training target

  σ1 σ2 σ3   =   p>c1 p>c2 p>c3  

p = RNN(z)

slide-34
SLIDE 34

Walkthrough

Source x

Exemplar z

Retrieve Training target

  σ1 σ2 σ3   =   p>c1 p>c2 p>c3  

p = RNN(z)

+

σ1 σ2 σ3

slide-35
SLIDE 35

Walkthrough

Source x

Exemplar z

Retrieve Training target

  σ1 σ2 σ3   =   p>c1 p>c2 p>c3  

p = RNN(z)

· · · α · · ·

AdaDecz

Enc

+

σ1 σ2 σ3

slide-36
SLIDE 36

❖ Background and Overview ❖ Adaptive Decoding ❖ Experiments

Outline

slide-37
SLIDE 37

Experiments: Summarization

Datasets:

  • Gigaword. Rush et al., 2015
  • New York Times (NYT). Durrett et al., 2016

Implementation:

  • TF-IDF + cosine similarity for exemplar retrieval.
  • LSTM encoder/decoder.
  • Comparable implementation and tuning.
slide-38
SLIDE 38

Experiments: Summarization

Datasets:

  • Gigaword. Rush et al., 2015
  • New York Times (NYT). Durrett et al., 2016

Implementation:

  • TF-IDF + cosine similarity for exemplar retrieval.
  • LSTM encoder/decoder.
  • Comparable implementation and tuning.
slide-39
SLIDE 39

Seq2seq AttExp, Cao AdaDec, this work Cao, Full

  • Enc. & Att. Exemplar

Adaptive Decoding Rerank

Rouge scores on Gigaword test set Rouge

19.0 18.5 17.1 16.6 34.5 34.7 33.2 32.4 37.0 37.3 36.0 35.0

ROUGE-1 ROUGE-L ROUGE-2

Cao et al., 2018

slide-40
SLIDE 40

Seq2seq AttExp, Cao AdaDec, this work Cao, Full

  • Enc. & Att. Exemplar

Adaptive Decoding Rerank

Rouge scores on Gigaword test set Rouge

19.0 18.5 17.1 16.6 34.5 34.7 33.2 32.4 37.0 37.3 36.0 35.0

ROUGE-1 ROUGE-L ROUGE-2

Cao et al., 2018

slide-41
SLIDE 41

Seq2seq AttExp, Cao AdaDec, this work Cao, Full

  • Enc. & Att. Exemplar

Adaptive Decoding Rerank

Rouge scores on Gigaword test set Rouge

19.0 18.5 17.1 16.6 34.5 34.7 33.2 32.4 37.0 37.3 36.0 35.0

ROUGE-1 ROUGE-L ROUGE-2

Cao et al., 2018

slide-42
SLIDE 42

Paulus et al., 2018

Seq2seq Paulus AttExp AdaDec

  • Enc. & Att. Exemplar

Adaptive Decoding

Rouge scores on NYT test set Rouge

26.4 25.7 26.0 25.1 43.2 42.5 42.9 41.9

ROUGE-1 ROUGE-2

slide-43
SLIDE 43

Experiments: Data2text

Dataset:

  • WikiBio. Lebret et al., 2016

Implementation:

  • TF-IDF + cosine similarity for retrieval.
  • LSTM encoder/decoder. See et al., 2017
  • Comparable implementation and tuning.

Input

Jacques-Louis David (30 August 1748 – 29 December 1825) was a French painter in the Neoclassical style.

Output

slide-44
SLIDE 44

Experiments: Data2text

Dataset:

  • WikiBio. Lebret et al., 2016

Implementation:

  • TF-IDF + cosine similarity for exemplar retrieval.
  • LSTM encoder/decoder. See et al., 2017
  • Comparable implementation and tuning.
slide-45
SLIDE 45

Seq2seq Wiseman AttExp AdaDec, this work

  • Enc. & Att. Exemplar

Adaptive Decoding

WikiBio test performance Rouge/BLEU

43.6 43.1 34.8 42.5 40.6 40.0 38.6 39.3

ROUGE-4 BLEU

Wiseman et al., 2018

slide-46
SLIDE 46

Conclusion

Problem

P(y | x, z)

slide-47
SLIDE 47

Conclusion

Problem Method

AdaDecz ⇣ y | Enc(x) ⌘

P(y | x, z)

slide-48
SLIDE 48

Conclusion

Problem Method Results

AdaDecz ⇣ y | Enc(x) ⌘

P(y | x, z)

slide-49
SLIDE 49

Exemplars Outputs

Portuguese train derailed, killing three. Two die in a Britain train collision Three killed in a Portuguese train derailment Two people were killed in a Britain train collision Three people were killed in a Portuguese train derailment A train collision in Canada killed two people A Portugueses train derails in northern Mexico, killed three

Thank You!

shorturl.at/kzAGY

A Portuguese train derailed in Oporto

  • n Wednesday, killing three people

Source