Sequence to Sequence Models for Machine Translation CMSC 723 / LING - - PowerPoint PPT Presentation

sequence to sequence models
SMART_READER_LITE
LIVE PREVIEW

Sequence to Sequence Models for Machine Translation CMSC 723 / LING - - PowerPoint PPT Presentation

Sequence to Sequence Models for Machine Translation CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides & figure credits: Graham Neubig Machine Translation 3 problems Translation system Input: source sentence F Output:


slide-1
SLIDE 1

Sequence to Sequence Models for Machine Translation

CMSC 723 / LING 723 / INST 725 Marine Carpuat

Slides & figure credits: Graham Neubig

slide-2
SLIDE 2

Machine Translation

  • Translation system
  • Input: source sentence F
  • Output: target sentence E
  • Can be viewed as a function
  • Statistical machine translation systems
  • 3 problems
  • Modeling
  • how to define P(.)?
  • Training/Learning
  • how to estimate parameters from

parallel corpora?

  • Search
  • How to solve argmax efficiently?
slide-3
SLIDE 3

Introduction to Neural Machine Translation

  • Neural language models review
  • Sequence to sequence models for MT
  • Encoder-Decoder
  • Sampling and search (greedy vs beam search)
  • Practical tricks
  • Sequence to sequence models for other NLP tasks
slide-4
SLIDE 4

A feedforward neural 3-gram model

slide-5
SLIDE 5

A recurrent language model

slide-6
SLIDE 6

A recurrent language model

slide-7
SLIDE 7

Examples of RNN variants

  • LSTMs
  • Aim to address vanishing/exploding gradient issue
  • Stacked RNNs
slide-8
SLIDE 8

Training in practice: online

slide-9
SLIDE 9

Training in practice: batch

slide-10
SLIDE 10

Training in practice: minibatch

  • Compromise between online and batch
  • Computational advantages
  • Can leverage vector processing instructions in modern hardware
  • By processing multiple examples simultaneously
slide-11
SLIDE 11

Problem with minibatches: in language modeling, examples don’t have the same length

  • 3 tricks
  • Padding
  • Add </s> symbol to make all

sentences same length

  • Masking
  • Multiply loss function calculated
  • ver padded symbols by zero
  • + sort sentences by length
slide-12
SLIDE 12

Introduction to Neural Machine Translation

  • Neural language models review
  • Sequence to sequence models for MT
  • Encoder-Decoder
  • Sampling and search (greedy vs beam search)
  • Training tricks
  • Sequence to sequence models for other NLP tasks
slide-13
SLIDE 13

Encoder-decoder model

slide-14
SLIDE 14

Encoder-decoder model

slide-15
SLIDE 15

Generating Output

  • We have a model P(E|F), how can we generate translations?
  • 2 methods
  • Sampling: generate a random sentence according to probability distribution
  • Argmax: generate sentence with highest probability
slide-16
SLIDE 16

Ancestral Sampling

  • Randomly generate words one

by one

  • Until end of sentence symbol
  • Done!
slide-17
SLIDE 17

Greedy search

  • One by one, pick single highest

probability word

  • Problems
  • Often generates easy words first
  • Often prefers multiple common

words to rare words

slide-18
SLIDE 18

Greedy Search

Example

slide-19
SLIDE 19

Beam Search

Example with beam size b = 2 We consider b top hypotheses at each time step

slide-20
SLIDE 20

Introduction to Neural Machine Translation

  • Neural language models review
  • Sequence to sequence models for MT
  • Encoder-Decoder
  • Sampling and search (greedy vs beam search)
  • Practical tricks
  • Sequence to sequence models for other NLP tasks