Semisupervised and Unsupervised Methods Antonis Anastasopoulos - - PowerPoint PPT Presentation

semisupervised and unsupervised methods
SMART_READER_LITE
LIVE PREVIEW

Semisupervised and Unsupervised Methods Antonis Anastasopoulos - - PowerPoint PPT Presentation

CS11-731 Machine Translation and Sequence-to-Sequence Models Semisupervised and Unsupervised Methods Antonis Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ Supervised Learning We are provided the ground truth


slide-1
SLIDE 1

CS11-731 
 Machine Translation and 
 Sequence-to-Sequence Models

Semisupervised and Unsupervised Methods

Antonis Anastasopoulos

Site https://phontron.com/class/mtandseq2seq2019/

slide-2
SLIDE 2

Supervised Learning

We are provided the ground truth

slide-3
SLIDE 3

Unsupervised Learning

No ground labels: 
 the task is to uncover latent structure

slide-4
SLIDE 4

Semi-supervised Learning

A happy medium: use both annotated and unannotated data

By Techerin - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=19514958

slide-5
SLIDE 5

Incorporating Monolingual Data

slide-6
SLIDE 6

On Using Monolingual Corpora in Neural Machine Translation (Gulcehre et al. 2015)

English French

Parallel Monolingual Train NMT

French

Train LM Combine the two!

MTef LMf

slide-7
SLIDE 7

On Using Monolingual Corpora in Neural Machine Translation (Gulcehre et al. 2015)

slide-8
SLIDE 8

Back-translation (Sennrich et al. 2016)

English French

Parallel Monolingual

English

Train French->English

French

Back-Translate 
 Monolingual data Train English->French

MTfe

slide-9
SLIDE 9

Dual Learning 
 (He et al. 2016)

English French

Parallel Monolingual

English French

MTef MTfe LMe LMf

Assume MTef, MTfe, LMe, LMf Game:

Translate sample with MTef Translate sample with MTfe Get reward with LMf Get reward with LMe

slide-10
SLIDE 10

Semi-Supervised Learning for MT
 (Cheng et al. 2016)

English French

Parallel Monolingual

English French

MTef MTfe MTef MTfe

Round-trip translation for
 supervision

Translate e to f’ with MTef Translate f’ to e’ with MTfe Loss from e and e’

slide-11
SLIDE 11

Another idea: use monolingual data to pretrain model components

English French

Parallel Monolingual

English French

LMe LMf

Use the monolingual data
 to train the encoder
 and the decoder.

slide-12
SLIDE 12

Another idea: use monolingual data to pretrain model components

From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

Shaded regions are pre-trained

slide-13
SLIDE 13

Another idea: use monolingual data to pretrain model components

From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

Shaded regions are pre-trained

slide-14
SLIDE 14

Another idea: use monolingual data to pretrain model components

From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

slide-15
SLIDE 15

Another idea: use monolingual data to pretrain model components

From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

slide-16
SLIDE 16

Another idea: use monolingual data to pretrain model components

From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

slide-17
SLIDE 17

Another idea: use monolingual data to pretrain model components

From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

slide-18
SLIDE 18

Pre-trained Word Embeddings in NMT

slide-19
SLIDE 19

Modern neural embeddings
 (Mikolov et al, 2014)

Skip-gram model: predict a word’s context CBOW model: predict a word from its context Others: GLoVe, fastText, etc

slide-20
SLIDE 20

Pre-trained embeddings

From "A Bag of Useful Tricks for Practical Neural Machine Translation: 
 Embedding Layer Initialization and Large Batch Size", Neishi et al. 2017.

slide-21
SLIDE 21

Pre-trained embeddings:
 when are they useful?

From "When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?", Qi et al. 2017.

slide-22
SLIDE 22

Bilingual Lexicon Induction

slide-23
SLIDE 23

What is Bilingual Lexicon Induction?

From "Learning Bilingual Lexicons from Monolingual Corpora", Haghighi et al. 2008.

slide-24
SLIDE 24

What is Bilingual Lexicon Induction?

From "Learning Bilingual Lexicons from Monolingual Corpora", Haghighi et al. 2008.

slide-25
SLIDE 25

Bilingual Skip-gram model:
 Using translations and alignments

From "Bilingual Word Representations with Monolingual Quality in Mind", Luong et al. 2015.

slide-26
SLIDE 26

Mapping two monolingual embedding spaces

From "Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction", Zhang et al. 2015.

Rotation Scaling Translation

slide-27
SLIDE 27

Finding the best mapping

From "Word Translation Without Parallel Data", Conneau et al. 2018.

The orthogonality assumption is important! What about if we don’t have a seed lexicon?

slide-28
SLIDE 28

Unsupervised Mapping + Refinement

From "Word Translation Without Parallel Data", Conneau et al. 2018.

slide-29
SLIDE 29

Issues with mapping methods

From "On the Limitations of Unsupervised Bilingual Dictionary Induction", Søgaard et al. 2018.

slide-30
SLIDE 30

Unsupervised Translation

slide-31
SLIDE 31

… at the core of it all: decipherment

French English French

Weaver (1955): This is really English, encrypted in some strange symbols

From "Deciphering Foreign Language", Ravi and Knight 2011.

slide-32
SLIDE 32

Unsupervised MT
 (Lample et al. and Artetxe et al. 2018)

English French

MTef MTfe LMe LMf

English French

  • 1. Embeddings + Unsup. BLI
  • 2. BLI —> Word Translations
  • 3. Train MTfe and MTef systems
  • 4. Meanwhile, use unsupervised

  • bjectives (denoising LM)

French English French English

  • 5. Iterate
slide-33
SLIDE 33

Unsupervised MT
 (Lample et al. 2018)

From "Unsupervised MT Using Monolingual Corpora Only", Lample et al 2018.

Also add an adversarial loss for the intermediate representations:

slide-34
SLIDE 34

Unsupervised MT
 (Lample et al. 2018)

From "Unsupervised MT Using Monolingual Corpora Only", Lample et al 2018.