Semisupervised and Unsupervised Methods Antonis Anastasopoulos - PowerPoint PPT Presentation

CS11-731   Machine Translation and   Sequence-to-Sequence Models Semisupervised and Unsupervised Methods Antonis Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/

Supervised Learning We are provided the ground truth

Unsupervised Learning No ground labels:   the task is to uncover latent structure

Semi-supervised Learning A happy medium: use both annotated and unannotated data By Techerin - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=19514958

Incorporating Monolingual Data

On Using Monolingual Corpora in Neural Machine Translation (Gulcehre et al. 2015) MT ef Parallel Train NMT French English LM f Monolingual Train LM French Combine the two!

On Using Monolingual Corpora in Neural Machine Translation (Gulcehre et al. 2015)

Back-translation (Sennrich et al. 2016) Train French->English Parallel English French MT fe Back-Translate   Monolingual data Monolingual English French Train English->French

Dual Learning   (He et al. 2016) Assume MT ef, MT fe, LM e, LM f Game: MT ef Parallel English French MT fe Translate sample with MT ef Get reward with LM f LM f LM e Monolingual English French Translate sample with MT fe Get reward with LM e

Semi-Supervised Learning for MT   (Cheng et al. 2016) Round-trip translation for   supervision MT ef Parallel English French MT fe Translate e to f’ with MT ef Translate f’ to e’ with MT fe MT ef Loss from e and e’ MT fe Monolingual English French

Another idea: use monolingual data to pretrain model components Use the monolingual data   to train the encoder   and the decoder. Parallel English French LM f LM e Monolingual English French

Another idea: use monolingual data to pretrain model components Shaded regions are pre-trained From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

Another idea: use monolingual data to pretrain model components From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

Another idea: use monolingual data to pretrain model components From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

Pre-trained Word Embeddings in NMT

Modern neural embeddings   (Mikolov et al, 2014) Skip-gram model: predict a word’s context CBOW model: predict a word from its context Others: GLoVe, fastText, etc

Pre-trained embeddings From "A Bag of Useful Tricks for Practical Neural Machine Translation:   Embedding Layer Initialization and Large Batch Size", Neishi et al. 2017.

Pre-trained embeddings:   when are they useful? From "When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?", Qi et al. 2017.

Bilingual Lexicon Induction

What is Bilingual Lexicon Induction? From "Learning Bilingual Lexicons from Monolingual Corpora", Haghighi et al. 2008.

Bilingual Skip-gram model:   Using translations and alignments From "Bilingual Word Representations with Monolingual Quality in Mind", Luong et al. 2015.

Mapping two monolingual embedding spaces Rotation Scaling Translation From "Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction", Zhang et al. 2015.

Finding the best mapping The orthogonality assumption is important! What about if we don’t have a seed lexicon? From "Word Translation Without Parallel Data", Conneau et al. 2018.

Unsupervised Mapping + Refinement From "Word Translation Without Parallel Data", Conneau et al. 2018.

Issues with mapping methods From "On the Limitations of Unsupervised Bilingual Dictionary Induction", Søgaard et al. 2018.

Unsupervised Translation

… at the core of it all: decipherment French Weaver (1955): This is really English, encrypted in some strange symbols English French From "Deciphering Foreign Language", Ravi and Knight 2011.

Unsupervised MT   (Lample et al. and Artetxe et al. 2018) LM e 1. Embeddings + Unsup. BLI 2. BLI —> Word Translations MT fe English French French French 3. Train MT fe and MT ef systems 4. Meanwhile, use unsupervised   objectives (denoising LM) 5. Iterate MT ef English English English French LM f

Unsupervised MT   (Lample et al. 2018) Also add an adversarial loss for the intermediate representations: From "Unsupervised MT Using Monolingual Corpora Only", Lample et al 2018.

Unsupervised MT   (Lample et al. 2018) From "Unsupervised MT Using Monolingual Corpora Only", Lample et al 2018.

Semisupervised and Unsupervised Methods Antonis Anastasopoulos - PowerPoint PPT Presentation

CS11-731 Machine Translation and Sequence-to-Sequence Models Semisupervised and Unsupervised Methods Antonis Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ Supervised Learning We are provided the ground truth

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Semisupervised Learning, Transfer Learning, and the Future at a Glance Shan-Hung Wu

Semisupervised Autoencoder for Sentiment Analysis Shuangfei Zhai, Zhongfei Zhang.

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Evaluation methods for unsupervised word embeddings EMNLP 2015 Tobias Schnabel, Igor Labutov,

Unsupervised Methods for NLP WSD Samuel Brody Department of Biomedical Informatics Columbia

Automatic Learning of a Morphological Model Theory and Unsupervised Approaches Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised

Mobile Networks Module I Part 2 Securing Vehicular Networks Prof. J.-P. Hubaux 1 Outline

The Sagas of Icelanders c. 40 sagas of Icelanders survive. c. 65 shorter tales (

Teaching information literacy for engineering students in a rapidly changing landscape Creating

FUN FAILURE How positive failure feedback could enhance the instructional effectiveness of CALL

DC-AAPOR/WSS SUMMER CONFERENCE 2019 INCREASING REPRESENTATIVENESS THROUGH THE USE OF

THE FIRST FULLY BILINGUAL SCHOOL IN GREAT BRITAIN ENGLISH AND RUSSIAN 2016

PIRE INFORMATION MEETING August 2015 PIRE: Partnerships for Interna2onal

THE IMPACTS OF BILINGUAL PRODUCTION MONITORING ON NON-DOMINANT LANGUAGE LEXICA T. Mark Ellison