Encoder Decoder Models Antonios Anastasopoulos Site - - PowerPoint PPT Presentation

encoder decoder models
SMART_READER_LITE
LIVE PREVIEW

Encoder Decoder Models Antonios Anastasopoulos Site - - PowerPoint PPT Presentation

CS11-731 MT and Seq2Seq models Encoder Decoder Models Antonios Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ (Slides by: Antonis Anastasopoulos and Graham Neubig) Language Models Language models are generative models of


slide-1
SLIDE 1

CS11-731 MT and Seq2Seq models

Encoder Decoder Models

Antonios Anastasopoulos

Site https://phontron.com/class/mtandseq2seq2019/

(Slides by: Antonis Anastasopoulos and Graham Neubig)

slide-2
SLIDE 2

Language Models

  • Language models are generative models of text

s ~ P(x)

Text Credit: Max Deutsch (https://medium.com/deep-writing/)

“The Malfoys!” said Hermione. Harry was watching him. He looked like Madame Maxime. When she strode up the wrong staircase to visit himself.
 “I’m afraid I’ve definitely been suspended from power, no chance — indeed?” said Snape. He put his head back behind them and read groups as they crossed a corner and fluttered down onto their ink lamp, and picked up his spoon. The doorbell rang. It was a lot cleaner down in London.

slide-3
SLIDE 3

Conditioned Language Models

  • Not just generate text, generate text according to

some specification Input X Output Y (Text) English Japanese Task Translation Structured Data NL Description NL Generation Document Short Description Summarization Utterance Response Response Generation Image Text Image Captioning Speech Transcript Speech Recognition

slide-4
SLIDE 4

Formulation and Modeling

slide-5
SLIDE 5

Calculating the Probability of a Sentence

P(X) =

I

Y

i=1

P(xi | x1, . . . , xi−1)

Next Word Context

slide-6
SLIDE 6

Conditional Language Models

P(Y |X) =

J

Y

j=1

P(yj | X, y1, . . . , yj−1)

Added Context!

slide-7
SLIDE 7

(One Type of) Language Model

(Mikolov et al. 2011)

LSTM LSTM LSTM LSTM

movie this hate I

predict hate predict this predict movie predict </s> LSTM

<s>

predict I

slide-8
SLIDE 8

LSTM LSTM LSTM LSTM LSTM

</s>

LSTM LSTM LSTM LSTM argmax argmax argmax argmax

</s>

argmax

(One Type of) Conditional Language Model

(Sutskever et al. 2014)

I hate this movie kono eiga ga kirai I hate this movie Encoder Decoder

slide-9
SLIDE 9

How to Pass Hidden State?

  • Initialize decoder w/ encoder (Sutskever et al. 2014)

encoder decoder

  • Transform (can be different dimensions)

encoder decoder transform

  • Input at every time step (Kalchbrenner & Blunsom 2013)

encoder decoder decoder decoder

slide-10
SLIDE 10

Methods of Generation

slide-11
SLIDE 11

The Generation Problem

  • We have a model of P(Y|X), how do we use it to

generate a sentence?

  • Two methods:
  • Sampling: Try to generate a random sentence

according to the probability distribution.

  • Argmax: Try to generate the sentence with the

highest probability.

slide-12
SLIDE 12

Ancestral Sampling

  • Randomly generate words one-by-one.



 
 


  • An exact method for sampling from P(X), no further

work needed.

while yj-1 != “</s>”: yj ~ P(yj | X, y1, …, yj-1)

slide-13
SLIDE 13

Greedy Search

  • One by one, pick the single highest-probability word
  • Not exact, real problems:
  • Will often generate the “easy” words first
  • Will prefer multiple common words to one rare word

while yj-1 != “</s>”: yj = argmax P(yj | X, y1, …, yj-1)

slide-14
SLIDE 14

Beam Search

  • Instead of picking one high-probability word,

maintain several paths

slide-15
SLIDE 15

Sentence Embedding
 Methods

slide-16
SLIDE 16

Sentence Embeddings from larger context:
 Skip-thought Vectors

(Kiros et al. 2015)

  • Unsupervised training: predict surrounding

sentences on large-scale data (using encoder- decoder)

  • Use resulting representation as sentence

representation

slide-17
SLIDE 17

Sentence Embeddings from Autoencoder

(Dai and Le 2015)

  • Unsupervised training: predict the same

sentence

slide-18
SLIDE 18

Sentence Embeddings from Language Model

(Dai and Le 2015)

  • Unsupervised training: predict the next word
slide-19
SLIDE 19

Sentence Embeddings from larger LMs
 ELMo

(Peters et al. 2018)

  • Bi-directional language models
  • Use linear combination of three layers as final

representation

Finetune the weights of the linear combination on the downstream task

slide-20
SLIDE 20

Sentence Embeddings from larger LMs
 using both sides: BERT

(Devlin et al. 2018)