Low Resource Machine Translation MarcAurelio Ranzato Facebook AI - - PowerPoint PPT Presentation

low resource machine translation
SMART_READER_LITE
LIVE PREVIEW

Low Resource Machine Translation MarcAurelio Ranzato Facebook AI - - PowerPoint PPT Presentation

Low Resource Machine Translation MarcAurelio Ranzato Facebook AI Research - NYC ranzato@fb.com Stanford - CS224N, 10 March 2020 Machine Translation English French Training data Ingredients: Train NMT seq2seq with attention NMT


slide-1
SLIDE 1

Low Resource Machine Translation

Marc’Aurelio Ranzato

Facebook AI Research - NYC ranzato@fb.com

Stanford - CS224N, 10 March 2020

slide-2
SLIDE 2

Machine Translation

2

English French Training data

NMT System

Train NMT Test NMT

NMT System

life is beautiful la vie est belle

Ingredients:

  • seq2seq with attention
  • SGD

Ingredient:

  • beam
slide-3
SLIDE 3

3

  • 6000+ languages in the world
  • 80% of the world population

does not speak English

  • Less than 5% of the people in

the world are native English speakers.

Some Stats

slide-4
SLIDE 4

https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/ source:

The Long Tail of Languages

The top 10 languages are spoken by less than 50% of the people. The remaining ~6500 are spoken by the rest! More than 2000 languages are spoken by less than 1000 people.

slide-5
SLIDE 5

https://ai.googleblog.com/2019/10/exploring-massively-multilingual.html source:

(X to English)

slide-6
SLIDE 6

Machine Translation in Practice

6

English Nepali Training data 25M people

slide-7
SLIDE 7

7

English Nepali Training data Parallel training data (collection of sentences with corresponding translation) is small! 25M people

Machine Translation in Practice

slide-8
SLIDE 8

8

English Nepali Training data Let’s represent data with rectangles. The color indicates the language.

Machine Translation in Practice

slide-9
SLIDE 9

English Nepali

  • Some parallel data originates in the source, some in the target language.
  • Source and target domains may not match.

Domain

Bible Parliamentary

Let’s represent (human) translations with empty rectangles.

sentences originating in English corresponding Nepali translations sentences originating in Nepali corresponding English translations

Machine Translation in Practice

slide-10
SLIDE 10

English Nepali

  • Test data might be in another domain.
  • There might exist source side in-domain monolingual data.

Domain

mono mono

News Bible Parliamentary

TEST mono

Machine Translation in Practice

slide-11
SLIDE 11

English Nepali

  • There might be parallel and monolingual data with a high resource language close to the low

resource language of interest. This data may belong to a different domain.

Domain

mono

Hindi

mono

Books

mono TEST mono

Machine Translation in Practice

News Bible Parliamentary

slide-12
SLIDE 12

English Nepali Domain Hindi

TEST

Sinhala Bengali Spanish Tamil Gujarati

… …

the Mondrian like learning setting!

slide-13
SLIDE 13

13

Low Resource Machine Translation

Loose definition: A language pair can be considered low resource when the number of parallel sentences is in the order of 10,000 or less. Note: modern NMT systems have several hundred million parameters nowadays! Challenges:

  • data
  • sourcing data to train on
  • evaluation datasets
  • modeling
  • unclear learning paradigm
  • domain adaptation
  • generalization
slide-14
SLIDE 14

Why Low Resource MT Is Interesting?

  • It is about learning with less labeled data.
  • It is about modeling structured outputs and

compositional learning.

  • It is a real problem to solve.

14

slide-15
SLIDE 15

Outline

15

MODEL DATA ANALYSIS

life of a researcher

“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019 “Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Investigating Multilingual NMT Representations at Scale” Kudugunta et al., EMNLP 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020 “Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019

slide-16
SLIDE 16

16

http://opus.nlpl.eu/

A Big “Small-Data” Challenge

slide-17
SLIDE 17

English Nepali Domain

Wikipedia Bible, JW300, etc. GNOME, Ubuntu, etc.

mono TEST mono

Common Crawl

mono mono

In-domain data: no parallel, little monolingual. Out-of-domain: little parallel, quite a bit monolingual No translation originating from Nepali.

Case Study: En-Ne

slide-18
SLIDE 18

A Case Study: En-Ne

  • Parallel Training data: versions of bible and ubuntu handbook (<1M

sentences).

  • Nepali Monolingual data: wikipedia (90K), common crawl (few millions).
  • English Monolingual data: unlimited almost.
  • Test data: ???

18

slide-19
SLIDE 19

FLoRes Evaluation Benchmark

  • Validation, test and hidden test set, each with 3000 sentences in

English-Nepali and English-Sinhala.

  • Sentences taken from Wikipedia documents.
  • Very expensive and slow.
  • Very hard to produce high-quality translations:
  • automatic checks (language model filtering, transliteration filtering, length filtering,

language id filtering, etc),

  • human assessment.

Data Collection Process:

Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019

slide-20
SLIDE 20

Examples

Si-En En-Si

  • riginal

translation Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019

Wikipedia originating in Si has different topics than Wikipedia originating in En

slide-21
SLIDE 21

Examples

Ne-En En-Ne

Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019

slide-22
SLIDE 22
  • Useful to evaluate truly low resource language pairs.
  • WMT 2019 and WMT 2020 shared filtering task.
  • Several publications.
  • Sustained effort, more to come…

22

https://github.com/facebookresearch/flores data & baseline models

slide-23
SLIDE 23

What Did We Learn?

  • Data is often as or more important than designing a model.
  • Collecting data is not trivial.
  • Look at the data!!

23

slide-24
SLIDE 24

Outline

24

MODEL DATA ANALYSIS

life of a researcher

“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019 “Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Massively Multilingual NMT” Aharoni et al.,ACL 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020 “Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019

slide-25
SLIDE 25

English Nepali Domain Hindi

TEST

Sinhala Bengali Spanish Tamil Gujarati

… …

slide-26
SLIDE 26

ML Perspective: Supervised Learning

If N is small, how can we further regularize the model?

  • dropout [1]
  • label smoothing [2]

Learning Framework: Supervised Learning.

Encoder

en ne

Training Dataset

D = {(x, y)i}i=1,..,N

<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>

L(θ) = − log p(y|x)

<latexit sha1_base64="5sjE0mn5xSjyfKofXGLV97RWgQ=">ADP3icdZJNb9NAEIbX5qPFfDSFI5cVUSRbMpZdkMqlUoEeOFAUJNJWihNrvdkm6w/5F1Xtrb+Z1z4C9y4cuEAQly5sbHTkqQwkqXRO8/svDveMGWUC9f9ouk3bt6vbV9x7h7/6DndbuwxOe5BkmPZywJDsLESeMxqQnqGDkLM0IikJGTsP560X9JxknCbxB1GmZBChSUzHFCOhpGBX63X8CIkpRkweVfA+hKahV1aAYV+FUh64NmOY7+rjL/cTXkDVkMeTCruVnDHQd8gxQNWQ5FMK/J+SUpVsmjaij9KEwKifKiujS2+cbRrqKTXLi8IyOmo69DmN4JqzlUNfXhk1a6c29EOUybIKZtb/Tas1+DhP4doxqt60Glfy28r0xZQIZKkZT6HPkglU1uAFLKyg1XYdtw54PfGWSRsoxu0PvujBOcRiQVmiPO+56ZiIFEmKGZETc05SRGeownpqzRGEeEDWf/CnaUMoLjJFNfLGCtrnZIFHFeRqEiF+b5Zm0h/qvWz8X4xUDSOM0FiXEzaJwzKBK4eExwRDOCBStVgnBGlVeIpyhDWKgnZ6gleJtXvp6c7DneM2fv/fP24avlOrbBY/AEmMAD+AQvAFd0ANY+6h91b5rP/RP+jf9p/6rQXVt2fMIrIX+w+p+wMb</latexit>

Per-sample loss:

English Nepali

TEST TRAIN //

x

<latexit sha1_base64="iWIFvj4hQ56IJLkVNOFByT95Uok=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt+LYrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr390Xyaq</latexit>

Cross-Entropy Loss

y

<latexit sha1_base64="0flskoD0eLhdjGE1HTyI8J9TWSE=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt+LErlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr3914yar</latexit>

human translator human reference prediction input sentence

Decoder

NMT system

DATA [1] Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting” JMLR 2014 [2] Szegedy et al. “Rethinking the inception architecture for computer vision” CVPR 2016

usual attention-based transformer

slide-27
SLIDE 27

ML Perspective: Semi-Supervised Learning

Adding source-side monolingual data. Idea: model p(x). Training Dataset Learning Framework: DAE

D = {(x, y)i}i=1,..,N

<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>

Either pre-train or add a DAE loss to the supervised cross-entropy term.

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

Noise: word drop, swap, etc.

LDAE(θ) = − log p(x|x + n)

<latexit sha1_base64="BVELMu8ESha+pzsI/mZ3d7HZo=">ADd3icdZJLb9NAEMc3Do8SXm5wYERIchRQxSXSnCp1EKQOFAUJNJWyibWerNJNvFL3nVly92PwJfjxvfgwo2NnUCSlpFWGs38Zuc/o3FClwvZbv8sGeVbt+/c3blXuf/g4aPH1d29MxHEWU9GrhBdOEQwVzus57k0mUXYcSI57js3Jl/WOTPL1keOB/k2nIBh6Z+HzMKZE6ZO+WvtexR+SUEjfrKDgCnIGZNOGzQErO+NHVrPVan5RlX/cqRqKgkyGwp7l3KzgTm2xRcqCTIfSnufkfEXKdbKjhn2nCDJSJyolZC4ebklpKuLQjO9ShqVu4OWHAPNpStfXryV6iZK20CdkiUpcqeNf4vWq8B0ziEjW90vihdAz8rE8spk6Shm7wG7AYT0NrgCrS6NWo1meqcfLyxJFmUwD74Dbta7faucF1x1o6NbS0rl39gUcBjT3mS+oSIfpWO5SDjESU5epCo4FCwmdkwnra9cnHhODL8bBXUdGcE4iPTzJeTR9YqMeEKknqPJxTRiO7cI3pTrx3L8bpBxP4wl82nRaBy7IANYHCGMeMSodFPtEBpxrRXolESESn2qFb0Ea3vk687ZQct60zr4elg7fr9cxw56hl4gE1noLTpGn1AX9RAt/TKeGjXjpfG7/Lz8qmwWqFa1jxBG1a2/gBhxBC2</latexit>

en

English Nepali

TEST TRAIN //

Cross-Entropy Loss

prediction input sentence mono

+

n

<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>

en

Encoder Decoder

NMT system

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

DATA Vincent et al. “Stacked denoising auto-encoders:…” JMLR 2010 Liu et al. “Multilingual denoising pretraining for NMT” arXiv:2001.08210 2020 E.g.: The cat the on sat mat. The cat sat on the. no noise noise only good region

slide-28
SLIDE 28

Adding source-side monolingual data. An alternative approach to DAE. Training Dataset Learning Framework: Self-Training (ST).

D = {(x, y)i}i=1,..,N

<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>

ALGORITHM

  • train model on
  • repeat
  • decode to and create additional

dataset

  • retrain model on:

p(y|x)

<latexit sha1_base64="0bhALflm3VcKbOBXhaYvsf98m/8=">ACnHicbZFdSxtBFIZnt7XVrW1jeyWFMhiECGHZtQV7IxXNhaWkRGikA3D7GSiY2Y/mDkru2z3V/lPvPfONmN1MYeGHh5z3OY8xGmUmjwvHvLfvFy7dXr9Q3nzebd+9bWx9GOskU40OWyERdhFRzKWI+BAGSX6SK0yiU/Dycnyzy5zdcaZHEv6FI+Sil7GYCUbBWKR1uxtEFK4YlWvwoc4KHEn7xZ7ROCgIqU49Lu2/1VOX+5fkV0Q+bkuqauG6pP9AoHDVeQec3NHzl4yvVMJojCJC9plePTWTdm5UmBpWTdo/+R5ptT3XqwM/F/5StNEyBqR1F0wTlkU8Biap1mPfS2FSUgWCSV45QaZ5StmcXvKxkTGNuJ6U9XIrvGucKZ4lyrwYcO0+rShpHURhYZcTKRXcwvzf7lxBrNvk1LEaQY8Zs1Hs0xiSPDiUngqFGcgCyMoU8L0itkVZSBuadjluCvjvxcjPZd/4u7f/a1fXS8XMc6+oR2UAf56AdoVM0QEPErG3ru3Vq/bA/2z37p91vUNta1nxE/4Q9egC4K8XU</latexit>

D

<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">ACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYiODiWt/DHbteazjaZjO1Xhj8KdiSaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvb3rvebx6fzNaxgrbQNmohFx2gY3SOqiLqPHTuDC6Rs/8ZV6bfNvjZrGLOJ/imTvgLIHMr3</latexit>

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

As = {(xs

j, ¯

yj)}j=1,..,Ms

<latexit sha1_base64="WViJMaNEUbzC6NMXxyJtKcfS2lg=">AC73icbZJNb9MwGMedwNgIbx0cuVhUlVopipINaVwmDdiBy1CR6DapaS3HdTe3zgu2MyUy+RJcOIAQV74ON74NbtKNLeORLP39PL9Hz98vUcaZVL7/x7Lv3N24t7l13nw8NHjJ53tp8cyzQWhI5LyVJxGWFLOEjpSTHF6mgmK4jTk2j5dlU/uaBCsjT5qMqMTmJ8lrA5I1iZFNq2NnphjNU5wVwfVnAfhr2C7cIAbDCm2H7ie576vnH/cUTWVDVlMJVrU3KLhjpBskaohy6lCy5pcXpLqOnlYTXUYR2mhcV5Ul0Zy96JlZGiasn75uRg4PTMdhpLF8IYz52r3+spnvzbqwjDCQpcVWgxanlGn63t+HfC2CNaiC9YxRJ3f4SwleUwTRTiWchz4mZpoLBQjnFZOmEuaYbLEZ3RsZIJjKie6fq8K9kxmBuepMCtRsM5e79A4lrKMI0OuziLbtVXyf7VxruavJpolWa5oQpB85xDlcLV48MZE5QoXhqBiWDGKyTnWGCizBdxzCUE7SPfFsc7XrDr7Xx42T14s76OLfAcvAB9EIA9cADegSEYAWJx64v1zfpuf7K/2j/snw1qW+ueZ+BG2L/+Ao275JQ=</latexit>

D ∪ As

<latexit sha1_base64="LzJmXdcvEVdwpwvmNy9sV6dCi8=">AD3icbZLPb9MwFMedjB+j/OrgyMWiqtRKUZQMJLhMGmMHLkNFotukpo0c193c2klkO1Mik/+Ay/6VXTiAEFeu3PhvcJMO2ownRfrqvc/z+/rFUcqoVJ7327K3bt2+c3f7Xuv+g4ePHrd3nhzLJBOYDHCEnEaIUkYjclQUcXIaSoI4hEjJ9Hi7bJ+ckGEpEn8URUpGXN0FtMZxUiZVLhjdbsBR+ocI6YPS7gHAw17uVP0QwqDMtR0z3dc13lftv5xR+VE1mQ+keG84uY1dxTKBqlqspiocFGRi2tSrZOH5UQHPEpyjbK8vDaSORcNIwPTlPaKT3m/1TXTYSAphxvO1g5989dor3LqwCBCQhdlO83Ta9vIcBZCjdOCdsdz/WqgDeFvxIdsIpB2P4VTBOcRIrzJCUI9L1VgjoShmxIzLJEkRXqAzMjIyRpzIsa7+Zwm7JjOFs0SYL1awyq53aMSlLHhkyKVJ2awtk/+rjTI1ez3WNE4zRWJcD5plDKoELh8HnFJBsGKFEQgLarxCfI4Ewso8oZgt+8k1xvOv6L9zdDy87+werdWyDZ+A56AEfvAL74B0YgCHA1mfryvpqfbMv7S/2d/tHjdrWqucp2Aj75x/BqfGZ</latexit>

¯ y

<latexit sha1_base64="tLy1OgwMuQcWRWcjdx94KCJ6o=">ADGHicdZJNb9MwGMed8DbCWwdHLhZVpVaKomRDgsukATtwGSoS3SY1beS47ubWeZHtTImMPwYXvgoXDiDEdTe+DW7SQtvBI1n63l+j/3Y8c5o0L6/i/LvnHz1u07O3ede/cfPHzU2n18IrKCYzLAGcv4WYwEYTQlA0klI2c5JyiJGTmN528W9dNLwgXN0g+yskoQecpnVKMpElFu5bXCRMkLzBi6kjDAxgq2C3dqhdRGOpI0YPA9Tz3nXb+csd6LBqyHItoVnOzhjuOxBYpG7Iay2hek/MVKdfJIz1WYRJnpUJFqVdGCvdy0jfNOXd6mPZczrmdBgKmsANZ2ubvpjtFs7dWEYI64qHc16/zdtxhDiIocb2zirzlb9/w64HURLEUbLKMfta7CSYaLhKQSMyTEMPBzOVKIS4oZ0U5YCJIjPEfnZGhkihIiRqp+WA07JjOB04yblUpYZ9c7FEqEqJLYkAuzYru2SP6rNizk9OVI0TQvJElxc9C0YFBmcPFL4IRygiWrjECYU+MV4gvEZbmLzlmCMH2la+Lkz0v2Pf23j9vH75ejmMHPAXPQBcE4AU4BG9BHwAtj5ZX6xv1nf7s/3V/mH/bFDbWvY8ARthX/0Gco31JA=</latexit>

Key elements: decoding and training noise.

LST (θ) = − log p(¯ y|x + n)

<latexit sha1_base64="F3K+2TxkBw65jeBJbH/60t8pRlA=">ADtHicdZJdb9MwFIbdhI8Rvjq45MaiqtSKUjUDATeTNigSFwVsa6T6jY4rtu6dT4UO1OizH+QS+74NzhJO9quWLJ0dM7z+rw+Om7ImZCdzp+KYd65e+/+wQPr4aPHT5WD59diCOCO2TgAfRpYsF5cynfckp5dhRLHncjpwl5/y+uCKRoIF/rlMQzry8MxnU0aw1CnsPKrjws5wTzrKvgMUQZbCStOkwiJSTsWO71W63vinrH3emxqIk7FwFgW3KLkzR+yQsiTsXSWBblck3KT7Kpxhjw3SDIcJ2ptJG5d7RjpaVHYSK+TplX3SESzINbzjYePb0x2ictiBycZSlylk0/29ajwGROIRbz+h6Kd0Av6oGknMqcVM3eQ0RD2ZQe4PXMHe3ga2/prqn/dqklwDX0G/ae2T/Tjfq1o5utE61Vqn3SkOvB3Yq6AGVqfnVH+jSUBij/qScCzE0O6EcpThSDLCqbJQLGiIyRLP6FCHPvaoGXF0ilY15kJnAaRvr6ERXZTkWFPiNRzNZl/SezW8uS+2jCW0w+jPlhLKlPykbTmEMZwHyD4YRFlEie6gCTiGmvkMxhInUe27pIdi7X74dXBy17Tfto+9vaycfV+M4AC/AS9ANngPTsAX0AN9QAzbGBg/DWy+M5FJTFqiRmWleQ62jun/BT2aJeU=</latexit>

en

English Nepali

TEST TRAIN //

Cross-Entropy Loss

prediction input sentence mono

+

n

<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>

ne

Encoder Decoder

NMT system

Encoder Decoder

¯ y

<latexit sha1_base64="dqcAHIDoQn8vOgakMoO2ou+kvGw=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznle+z1Hxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmf4dIMBeuOVt59Pi/0XrutAmRg8M0UfascbdpPQZEogCuPaPrhXQF/KbqSE6pxA39yVuIuD+B2hu8hpm7FWzZmuocf9mqiTMNfAO9O3Q/T7fKFpZuxMv2KtV2q50feDuwFkEVLE7XrvxGI59ELvUk4ViIvtUO5CDFoWSEU1VGkaABJnM8oX0detilYpDm26dgTWdGcOyH+noS5tlVRYpdIRLX0WTWmtisZcltX4kxweDlHlBJKlHio/GEYfSh9kqwxELKZE80QEmIdNeIZniEBOpF76sh2Btnw7ONtrWe9aez/eV48+LcaxA16B16AOLAPjsBX0AU9QIwPxqXBjJn50aQmN70CNUoLzUuwdsxf/wDiJClw</latexit>

decoded output

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

DATA He et al. “Revisiting self-training for neural sequence generation” ICLR 2020

ML Perspective: Semi-Supervised Learning

L(θ) = Lsup(θ) + λLST(θ)

<latexit sha1_base64="LB2Wg9oUku21WD7fdgfkmjlUEFI=">AGP3ichVRNb9NAEHVLAiV8tXDkMqKqlAgTxQEJFSpLanEgaIimrZSnFjrzabZxl/yritH7v4zLvwFbly5cAhrtxYfzQ4jhtWijTZeW/nzdvxmp5FGW+1vq6s3qhUb95au127c/fe/QfrGw+PmRv4mHSxa7n+qYkYsahDupxyi5x6PkG2aZETc/Imzp9cEJ9R1zniU4/0bXTm0BHFiMstY6PS3dJtxMcYWVFHwDboEdRDdowKOjCiOi2pjab6ntR+4c7EAOWIsMBM84T3HmKOzBYAclT5HTAjUmCnFwheR7ZEYNIt03jFAQishgXpREHIoSV59ehk2aluyOuiM2jCnLHfo7kxoPVGqgm4iP5oK47xvWhpg4D+aOkfmUmgO+E3WdjwlHDVnkGeiWewZSG1xCrC4Hu2pNdHb3SzlhzIGn4FzD+3hUSsklZF3Z86nqFAYEzW9hcayW1hsnZdL2iuXFHefFWz87+BymxdrcZcjSywxO/kvZ36IDG2WkefPpAx4HtMu+CeH5DIe5caSu2WBbURM5SLVa5rRvnQx+VIKM9gRGVBAD2BO6GuQOqA/54uRNUkc1SGiaNcfkyX5MdUJc6SvENSfq28xRLbWeDlTAc5YZmBZWA5oTlszVjfbDVbyYLFQMuCTSVbh8b6F3o4sAmDscWYqyntTzej5DPKbaIqOkBIx7CE3RGejJ0kE1YP0rePwFbcmcI9eXP4dDsptnRMhmbGqbEhkrZ8VcvFmW6wV89KofUcLOHFwWmgUWMBdiB9TGFKfYG5NZYCwT6VWwGPkI8zlkxuboBVbXgyO203tebP94cXmzl5mx5ryWHmi1BVNeansKG+VQ6Wr4MqnyrfKj8rP6ufq9+qv6u8UurqScR4pc6v65y9sChKf</latexit>
slide-29
SLIDE 29

Adding target-side monolingual data. Two benefits: a) Decoder learns a good language model. b) Better generalization via data augmentation. c) Unlike ST, target is correct but input is not. Training Dataset Learning Framework: Back-Translation (BT).

D = {(x, y)i}i=1,..,N

<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>

ALGORITHM

  • train model and on
  • decode to with , create

additional dataset

  • retrain model on:

D

<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">ACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYiODiWt/DHbteazjaZjO1Xhj8KdiSaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvb3rvebx6fzNaxgrbQNmohFx2gY3SOqiLqPHTuDC6Rs/8ZV6bfNvjZrGLOJ/imTvgLIHMr3</latexit>

ML Perspective: Semi-Supervised Learning

English Nepali

TEST TRAIN // mono

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

D ∪ At

<latexit sha1_base64="9r4+5t8z/Eutjwqsqwo6D2I5ofs=">AECnicfZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdKUlSyNZp5352xbQcO46LV+rOl6deu37i5fat0+87de/fLOw+OuB+FhPaI7/jhiY05dZhHe4IJh54EIcWu7dBje/Y6qx+f0ZAz3zsUSUD7Lh57bMQIFipl7WhQNV0sJgQ7aUfCHpgp1OJGUrcYmNJK2Z7RaDYb72XpH3cgB7wg4wG3pjk3LbgDi6+RoiCTgbBmOTlbkGKZ7MhBarq2H6c4iuXCSNQ4WzPSVaKglpzH9VJVdQeTMxdWnC1d2r4wWsudNsC0cZgm0prWrzat1mCSKICVa1S9kC6B72TNFBMqcF01eQqm49BeYNzyNwtYvRZKf9ZqMmzjTwBLwrdB8PN8rmljaJ2xebL6hYZuMn2RZWR1dv4b+TC6tcaTVb+YHLgTEPKmh+ulb5tzn0SeRSTxAHc35qtALRT3EoGHGoahdxGmAyw2N6qkIPu5T30/xTlBVmSGM/FA9noA8u6xIsct54tqKzEzy9VqW3FQ7jcToZT9lXhAJ6pGi0ShyQPiQ/RcwZCElwklUgEnIlFcgExiItTfU1JLMNZHvhwc7TaNZ83dD8r+6/m69hGj9BjVEMGeoH20VvURT1EtE/aF+2b9l3/rH/Vf+g/C1TbmseopWj/oLD/dGQg=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

Encoder

en ne

Cross-Entropy Loss

prediction input sentence

Decoder

NMT system

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

Encoder Decoder

ne

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

backward NMT system

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

LBT (θ) = − log p(y|¯ x)

<latexit sha1_base64="QFoLzUqdj7aJb5peYo68MenzgH0=">AERHichZPLbtNAFIanNpdibiks2YyIiUiRHFBgk2lpgSJBUVBNG1FnFjySZxDd5xpUtdx6ODQ/AjidgwKE2CLGl5QkdcVIx2d8/1z/nOksXybMt5uf91S1GvXb9zcvqXdvnP3v3KzoNj5oUBJn3s2V5waiFGbOqSPqfcJqd+QJBj2eTEWrxK6ydnJGDUc4947JOhg6YunVCMuEyZO8rHmuEgPsPITroC7kEjgfWoGTdMCg1hJnRPb7ZazXdC+8cdihHLyWjEzHnGzXPu0GQbJM/JeMTNRUYuliRfJbtilBiO5UJCiOxNBI2zaM9KTIr8fnUOrye7QYNSBa85WHu1cGK1nTpvQsFCQxMKcN642Ldg4NCHa8/Iei5dAd+KusFnhKOGbPIUGrY3hdIbPIepuxVsOZrodl6XaqJUA59A9wrdh6NSWGpTNy52HxORSIdP063sD46/9/oXCtzdFDuKB2+6NfQzEq13WpnB14O9CKoguL0zMoXY+zh0CEuxzZibKC3fT5MUMAptonQjJARH+EFmpKBDF3kEDZMsk8gYE1mxnDiBfK6HGbZVUWCHMZix5JkOg/brKXJstog5JOXw4S6fsiJi/NGk9CG3IPpj4JjGhDM7VgGCAdUeoV4hgKEufx36RL0zZEvB8e7Lf1Za/f98+r+QbGObfAIPAZ1oIMXYB+8AT3QB1j5pHxTfig/1c/qd/WX+jtHla1C8xCsHfXPXzOCXHE=</latexit>

DATA Sennrich et al. “Improving NMT models with monolingual data” ACL 2016

At = {(¯ xk, yt

k)}k=1,..,Mt

<latexit sha1_base64="fgtcm85BUPx8R0Y2MIqfz4NPuA=">AF2HichVRdb9MwFM1YC6N8bfDIyxXTpEaEqhlIKFJ2+gkHhgaYt0mjZyXHf1mi/FzpQos8QDCPHKT+ONP8FvwPnYlqZdsVTp1vece89dmz5NmW83f6zdGu5Vr9Z+Vu4979Bw8fra49PmJeGDSxZ7tBScWYsSmLulym1y4gcEOZNjq3JuzR/fE4CRj3kMc+6Tvo1KUjihGXW+ba8t8Nw0F8jJGdARsgZFAM9Ji1aRgCDOhW7rWamkfReMaty8GLEdGA2aeZbizHLdvsgqS58h4wM1JhpxcInkZ2RGDxHAsL0pQGIlLIaF2XhFyIEl+M76I1MaG7A4Gow5MKSsV3bkS2syUamBYKEhiYZ6pN4uWNhg49GqjMzn1BLwg2gafEw4UmWTF2DY3ilIbXABqboS7HI0dnZm8uJUg48B/cG3ufDubRC0jW5LkwPgdFwpxo+SGoiw5hdnI+X9HufEXp8EVD9X+F57s824t7HNligdfZf3nlh8jUrzKy/pWUAS9jNiv2yTtykd5kdcHRstAxE6Zxkeu1rGRPuph9KJUr2BEFUEAPYEroW5A6oD/li1kMSVzNJaJq01R+TBfkx1Qj7oK8S3K+ubrebrWzBbOBXgTrSrEOzNXfxtDoUNcjm3EWE9v+7yfoIBTbBPRMEJGfIQn6JT0ZOgih7B+kj1MAjbkzhBGXiB/Lodst8xIkMNY7FgSmUpm1Vy6OS/XC/noT+hrh9y4uK80Si0gXuQvnIwpAHB3I5lgHBApVbAYxQgzOVb2JAm6NWRZ4OjzZb+srX56dX69m5hx4ryVHmNBVdea1sK+VA6Wr4Fq3ltS+1b7Xv9S/1n/Uf+bQW0sF54kyteq/gGro+qV</latexit>

L(θ) = Lsup(θ) + λLBT(θ)

<latexit sha1_base64="P8VXi2JP7nzKZMIQIatLsVBxvsM=">AGP3ichVRNb9NAEHVLAiV8tXDkMqKqlAgTxQEJFSpLanEgaIimrZSnFjrzabZxl/yritH7v4zLvwFbly5cAhrtxYfzQ4jhtWijTZeW/nzdvxmp5FGW+1vq6s3qhUb95au127c/fe/QfrGw+PmRv4mHSxa7n+qYkYsahDupxyi5x6PkG2aZETc/Imzp9cEJ9R1zniU4/0bXTm0BHFiMstY6PS3dJtxMcYWVFHwDboEdRDdowKOjCiOi2pjab6ntR+4c7EAOWIsMBM84T3HmKOzBYAclT5HTAjUmCnFwheR7ZEYNIt03jFAQishgXpREHIoSV59ehk2aluyOuiM2jCnLHfo7kxoPVGqgm4iP5oK47xvWhpg4D+aOkfmUmgO+E3WdjwlHDVnkGeiWewZSG1xCrC4Hu2pNdHb3SzlhzIGn4FzD+3hUSsklZF3Z86nqFAYEzW9hcayW1hsnZdL2iuXFHefFWz87+BymxdrcZcjSywxO/kvZ36IDG2WkefPpAx4HtMu+CeH5DIe5caSu2WBbURM5SLVa5rRvnQx+VIKM9gRGVBAD2BO6GuQOqA/54uRNUkc1SGiaNcfkyX5MdUJc6SvENSfq28xRLbWeDlTAc5YZmBZWA5DjlszVjfbDVbyYLFQMuCTSVbh8b6F3o4sAmDscWYqyntTzej5DPKbaIqOkBIx7CE3RGejJ0kE1YP0rePwFbcmcI9eXP4dDsptnRMhmbGqbEhkrZ8VcvFmW6wV89KofUcLOHFwWmgUWMBdiB9TGFKfYG5NZYCwT6VWwGPkI8zlkxuboBVbXgyO203tebP94cXmzl5mx5ryWHmi1BVNeansKG+VQ6Wr4MqnyrfKj8rP6ufq9+qv6u8UurqScR4pc6v65y9RWBKO</latexit>
slide-30
SLIDE 30

ALGORITHM

  • train model and on
  • repeat
  • decode to with , create

additional dataset

  • decode to with , create

additional dataset

  • retrain both and on:

D

<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">ACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYiODiWt/DHbteazjaZjO1Xhj8KdiSaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvb3rvebx6fzNaxgrbQNmohFx2gY3SOqiLqPHTuDC6Rs/8ZV6bfNvjZrGLOJ/imTvgLIHMr3</latexit>

ML Perspective: Semi-Supervised Learning

English Nepali

TEST TRAIN // mono

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

mono

Adding both source & target-side monolingual data. Training Dataset

D = {(x, y)i}i=1,..,N

<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

Learning Framework: Iterative ST & BT.

¯ y

<latexit sha1_base64="tLy1OgwMuQcWRWcjdx94KCJ6o=">ADGHicdZJNb9MwGMed8DbCWwdHLhZVpVaKomRDgsukATtwGSoS3SY1beS47ubWeZHtTImMPwYXvgoXDiDEdTe+DW7SQtvBI1n63l+j/3Y8c5o0L6/i/LvnHz1u07O3ede/cfPHzU2n18IrKCYzLAGcv4WYwEYTQlA0klI2c5JyiJGTmN528W9dNLwgXN0g+yskoQecpnVKMpElFu5bXCRMkLzBi6kjDAxgq2C3dqhdRGOpI0YPA9Tz3nXb+csd6LBqyHItoVnOzhjuOxBYpG7Iay2hek/MVKdfJIz1WYRJnpUJFqVdGCvdy0jfNOXd6mPZczrmdBgKmsANZ2ubvpjtFs7dWEYI64qHc16/zdtxhDiIocb2zirzlb9/w64HURLEUbLKMfta7CSYaLhKQSMyTEMPBzOVKIS4oZ0U5YCJIjPEfnZGhkihIiRqp+WA07JjOB04yblUpYZ9c7FEqEqJLYkAuzYru2SP6rNizk9OVI0TQvJElxc9C0YFBmcPFL4IRygiWrjECYU+MV4gvEZbmLzlmCMH2la+Lkz0v2Pf23j9vH75ejmMHPAXPQBcE4AU4BG9BHwAtj5ZX6xv1nf7s/3V/mH/bFDbWvY8ARthX/0Gco31JA=</latexit>

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

As = {(xs

j, ¯

yj)}j=1,..,Ms

<latexit sha1_base64="WViJMaNEUbzC6NMXxyJtKcfS2lg=">AC73icbZJNb9MwGMedwNgIbx0cuVhUlVopipINaVwmDdiBy1CR6DapaS3HdTe3zgu2MyUy+RJcOIAQV74ON74NbtKNLeORLP39PL9Hz98vUcaZVL7/x7Lv3N24t7l13nw8NHjJ53tp8cyzQWhI5LyVJxGWFLOEjpSTHF6mgmK4jTk2j5dlU/uaBCsjT5qMqMTmJ8lrA5I1iZFNq2NnphjNU5wVwfVnAfhr2C7cIAbDCm2H7ie576vnH/cUTWVDVlMJVrU3KLhjpBskaohy6lCy5pcXpLqOnlYTXUYR2mhcV5Ul0Zy96JlZGiasn75uRg4PTMdhpLF8IYz52r3+spnvzbqwjDCQpcVWgxanlGn63t+HfC2CNaiC9YxRJ3f4SwleUwTRTiWchz4mZpoLBQjnFZOmEuaYbLEZ3RsZIJjKie6fq8K9kxmBuepMCtRsM5e79A4lrKMI0OuziLbtVXyf7VxruavJpolWa5oQpB85xDlcLV48MZE5QoXhqBiWDGKyTnWGCizBdxzCUE7SPfFsc7XrDr7Xx42T14s76OLfAcvAB9EIA9cADegSEYAWJx64v1zfpuf7K/2j/snw1qW+ueZ+BG2L/+Ao275JQ=</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

D ∪ At ∪ As

<latexit sha1_base64="e0NAylEw1oQi7AHFM3rtuBW9V4=">AEdnichZNdaxNBFIanSdS6fqX1SgQZGoIJpmG3CvWm0NQIXliJ2LSFbLMTibJPvFzmzZTv/wF/nb/DGy+d/UhN0q0ODBzOed457zkwpmdRxlX151apXLl3/8H2Q+XR4ydPn1V3ds+ZG/iY9LFruf6liRixqEP6nHKLXHo+QbZpkQtz8SGpX1wRn1HXOeOR4Y2mjp0QjHiMmXslL7XdRvxGUZW3BXwCOoxbIStqGlQqAsjpkdaq91ufRHKX+5UjFhGhiNmzFNunGnBtsgeUZGI24sUnKxJPkq2RWjWLdN4xREIqlkaB1tWGkJ0VeI7oOm0pdoc6ozZc7byaOfGaCN12oK6ifw4Esa8ebdpuQYdBx5ce0bWM+kK+Fk0dD4jHDVlk32oW+4USm/wGibuVrDlaKLb+VioCRMNfAOdO3TfzgpluaUicedm8xkVimT8KNnC+uj8f6PzYksnxZaS6fOGTeXf7xZs2ajW1LaHng70PKgBvLTM6o/9LGLA5s4HFuIsYGmenwYI59TbBGh6AEjHsILNCUDGTrIJmwYp9GwLrMjOHE9eV1OEyzq4oY2YxFtinJxCTbrCXJotog4JP3w5g6XsCJg7NGk8C3IXJH4Rj6hPMrUgGCPtUeoV4hnyEufypilyCtjny7eD8oK29bR98fVc7PsnXsQ1egj3QABo4BMfgE+iBPsClX+UX5b1yrfy78qpSr7zO0NJWrnkO1k5F/QNLWpM</latexit>

Ltotal(θ) = − log p(y|x) − λ1 log p(yt|¯ xt) − λ2 log p(¯ ys|xs)

<latexit sha1_base64="yVeodM+YA8v80/XuviO5lyFlo=">AE9HichVRNb9NAEHWbACV8pXDksiKFIsQxQEJLpWaUiQOFAXRtJXqxFpvNskm/sI7jmw5+zu4cAhrvwYbvwb1nYSJalLV7I0nlv5s3T2qZnMQ7N5t+d3ULx1u07e3dL9+4/ePiovP/4jLuBT2iXuJbrX5iYU4s5tAsMLHrh+RTbpkXPzenbpH4+oz5nrnMKkUd7Nh45bMgIBpky9gulqm5jGBNsxcCHSA9RrWwHqkGQ7owYnag1RuN+kexhjsRfZ4hwz43JilukuFODL6FhAwZ9cGYpsjpEgliY3Y/1m3TDWMchGIpJKjPtoR0JMmrRfNQLVXldKRzZqMNZWtN2yuhtVRpHekm9uNIGBP1etHSBp0EHtpoI+sZdQ34QdR0GFPAqhzyAumWO0JSG5qjRN0abLmaOG6/y+WECQc9R841vM+nubSFpDxye+V8hgpFsn6UuLC5Oty0OuRLOsqXlGy/GKje1DjP5pxR4AK2xH+8Tt/lR9gQ1tVZPuVkj6sY1pb9sk7Mk9usmqUK81GMz3oaqAtgoqyOB2j/EcfuCSwqQPEwpxfak0PejH2gRGLipIecOphMsUjeilDB9uU9+L0oxWoKjMDNHR9+TiA0uw6I8Y25FtSmTiCd+uJcm82mUAwze9mDleANQh2aBhYCFwUfIHQAPmUwJWJANMfCa1IjLGPiYg/xMlaYK2vfLV4KzV0F42Wp9eVQ6PFnbsKU+VZ0pN0ZTXyqHyXukoXYUvhS+Fr4XfhRnxW/Fn8VfGXR3Z8F5omyc4u9/a1CXhg=</latexit>

mono mono

English Nepali

decoded with: decoded with:

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

phase 1 phase 1 phase 2 phase 2

TRAIN // mono mono

generate train models DATA English Nepali

Shen et al. “The source-target domain mismatch problem in MT” arXiv:1909.13151 2019 Chen et al. “FBAI WAT’19 Myanmar-English translation task submission” WAT@EMNLP 2019

At = {(¯ xk, yt

k)}k=1,..,Mt

<latexit sha1_base64="fgtcm85BUPx8R0Y2MIqfz4NPuA=">AF2HichVRdb9MwFM1YC6N8bfDIyxXTpEaEqhlIKFJ2+gkHhgaYt0mjZyXHf1mi/FzpQos8QDCPHKT+ONP8FvwPnYlqZdsVTp1vece89dmz5NmW83f6zdGu5Vr9Z+Vu4979Bw8fra49PmJeGDSxZ7tBScWYsSmLulym1y4gcEOZNjq3JuzR/fE4CRj3kMc+6Tvo1KUjihGXW+ba8t8Nw0F8jJGdARsgZFAM9Ji1aRgCDOhW7rWamkfReMaty8GLEdGA2aeZbizHLdvsgqS58h4wM1JhpxcInkZ2RGDxHAsL0pQGIlLIaF2XhFyIEl+M76I1MaG7A4Gow5MKSsV3bkS2syUamBYKEhiYZ6pN4uWNhg49GqjMzn1BLwg2gafEw4UmWTF2DY3ilIbXABqboS7HI0dnZm8uJUg48B/cG3ufDubRC0jW5LkwPgdFwpxo+SGoiw5hdnI+X9HufEXp8EVD9X+F57s824t7HNligdfZf3nlh8jUrzKy/pWUAS9jNiv2yTtykd5kdcHRstAxE6Zxkeu1rGRPuph9KJUr2BEFUEAPYEroW5A6oD/li1kMSVzNJaJq01R+TBfkx1Qj7oK8S3K+ubrebrWzBbOBXgTrSrEOzNXfxtDoUNcjm3EWE9v+7yfoIBTbBPRMEJGfIQn6JT0ZOgih7B+kj1MAjbkzhBGXiB/Lodst8xIkMNY7FgSmUpm1Vy6OS/XC/noT+hrh9y4uK80Si0gXuQvnIwpAHB3I5lgHBApVbAYxQgzOVb2JAm6NWRZ4OjzZb+srX56dX69m5hx4ryVHmNBVdea1sK+VA6Wr4Fq3ltS+1b7Xv9S/1n/Uf+bQW0sF54kyteq/gGro+qV</latexit>
slide-31
SLIDE 31

ML Perspective: Multi-Task/Multi-Modal

English Nepali

TEST TRAIN //

Adding parallel data in other languages. Training Dataset Learning Framework: Multilingual Training

DATA

TRAIN // TRAIN //

Hindi

Encoder

Src Tgt

Cross-Entropy Loss

y

<latexit sha1_base64="0flskoD0eLhdjGE1HTyI8J9TWSE=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt+LErlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr3914yar</latexit>

human translator human reference prediction input sentence with target language ID

Decoder

NMT system

(x, LID)

<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUgtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j3aGw7cBiHZvPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSFS/WByIElBLRnHurYpyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjoGfKu4H06LKVNJWRW1PnC1QsvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB31+HPJoH</latexit>

L(θ) = − X

s,t

E(x,y)∼Ds,t[log p(y|x; t)]

<latexit sha1_base64="2m27Zes8z1jOYapGr5U+EniNnU=">AFW3ichVTbtNAEHWbhAZTaAviZcRVaVEhCguSChSm1JR4oKqI3KU6s9WbTOvEN7iy5e5P8gQP/ApifWlIUpeuZGm8c87MmeP1mr5tcex0fi0tV6q1Byv1h+qj1cdP1tY3np5yLwoO6Ge7QXnJuHMtlx2ghba7NwPGHFMm52Zk49p/uyKBdzy3GOMfdZ3yIVrjSxKUG4ZG5XvW7pD8JISO+kK2AE9gUbUipuGBbowEmtHa7XbrS9C/Yc7FAOeI6MBN8YZbpzjDg2+gMQcGQ/QmGTIyQ0SZ5FdMUh0x/SihISRuBEStq4WhBxJkt+Ir6OmuiW7g84tB+aUzRTdmwptZEpboJskSGJhjJt3i5Y26DT0Ya6MzOfUGeBn0dDxkiFpyiavQbe9C5Da4BpSdTOwm9FEd+glBOlHgF7h28b8eltEJSGXlv6nyOikQ6fpy6MD863jc6lkvaL5eUTl80bN5XuNzm273Q2KL/5idvcszPySGNs3I+lMpA5zFbC/4Jw/JdXqUm+qdn5aHjpHwFopcrmkmB9LE7EdZOIJdUQAF9ADmdH4AKQP6xvpmp93JFtwOtCLYVIp1ZKz/0IceDR3mIrUJ5z2t42M/IQFa1GZC1UPOfEIn5IL1ZOgSh/F+kt0NArbkzhBGXiAfFyHbnWUkxOE8dkyJTEfgi7l0syzXC3H0vp9Yrh8ic2neaBTagB6kFw0MrYBRtGMZEBpYUivQSxIQivI6UqUJ2uLIt4PT7b2pr39e3m7n5hR15obxUGoqmvFN2lU/KkXKi0MrPyp/qSrVe/V2r1NTag5dXio4z5S5VXv+F4MBuCM=</latexit>

Share the same encoder and the same decoder with all the language pairs. Prepend a target language identifier to the source sentence to inform decoder of desired language. Concatenate all the datasets together. Train using standard cross-entropy loss.

TRAIN //

Den,ne ∪ Den,hi ∪ Dhi,en ∪ Dne,hi

<latexit sha1_base64="tUwh6HQXAg03JxStpQv2fWQE=">AF2HichVTLbtNAFHVpAiW8WliyuaKqlAgTxQUJFSpLanEgqIimrYiTqzxZNJM45c848qWOxILEGLp7HjJ/gGxo+G2HXTkSxdz3n3nOPx2N6FmW80/mzdGu5Vr9Z+Vu4979Bw8fra49PmJu4GPSw67l+icmYsSiDulxyi1y4vkE2aZFjs3puyR/fE58Rl3nkEceGdjo1KFjihGXW8ba8t8N3UZ8gpEVdwVsgR5DM1SjlkFBF0ZMtzS13VY/isZ/3L4YsgwZDplxluLOMty+wUpIniGjITemKXJ6ieTzyK4YxrptumGMglBcCgnU85KQA0nymtF2GpsyO6gM2pDQdlc0Z2Z0GaqVAXdRH4cCeOsdb1oaYOAw8KZWQ+o84BP4imzieEo5Zs8gJ0yz0FqQ0uIFE3B7scTXR39io5YcKB5+Bcw/t8WEnLJVWRd2bOZ6hQJONHiQvF0flNo/NqSbvVkpLp84atmwpX23y1F3c5sQCs9N3eZHyNBmGVl/JmXI5zGbJf/kIblIjnJrwbdlgW3ETOUi02ua8Z50Mf1TSmewK3KgD5AQehbkDpg0ChgsxmJozpElF0q5Cd0QX5CVeIsyDsk4xur6512J1wNdDyYF3J14Gx+lsfuTiwicOxhRjrax2PD2Lkc4otIhp6wIiH8BSdkr4MHWQTNojTi0nAhtwZwdj15eNwSHfnGTGyGYtsUyITyaycSzarcv2Aj98MYup4AScOzhqNAwu4C8ktByPqE8ytSAYI+1RqBTxBPsJc3oUNaYJWHvlqcLTZ1l62Nz+9Wt/eze1YUZ4qz5SmoimvlW3lvXKg9BRc69Xi2rfa9/qX+tf6j/rPDHprKec8UQqr/usfSBTqkQ=</latexit>

Johnson et al. “Google’s multilingual NMT system…” ACL 2017 Aharoni et al. “Massively multilingual NMT” ACL 2019

slide-32
SLIDE 32

Conclusion so far…

  • Assuming no domain effect, there are lots of training paradigms

depending on the available data.

  • It is hard to predict what works best.
  • In general, DAE pretraining, (iterative) BT and multi-lingual

training perform strongly on low resource languages.

  • All these methods can be combined together, but it requires

some level of craftsmanship…

  • Final touch: ensembling, fine-tuning, distillation, etc.

32

amount of data domain language pair

slide-33
SLIDE 33

Open Challenges

  • Diversity of domains and varying translation quality.
  • Wildly varying dataset sizes.
  • Diversity of language pairs.
  • Training large models to handle large datasets, as soon as we

put together lots of language pairs and monolingual data.

33

slide-34
SLIDE 34

Case Study #1: Unsupervised MT

English French

TEST mono mono

DATA

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

en

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

Encoder Decoder

fr

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Artetxe et al. “An effective approach to unsupervised MT” ACL 2019

slide-35
SLIDE 35

Case Study #1: Unsupervised MT

35

English French

TEST mono mono

DATA

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

Encoder

en fr

Cross-Entropy Loss

prediction input sentence

Decoder

NMT system

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

Encoder Decoder

fr

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

backward NMT system

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

…and vice versa starting from English. This is an example of auto-encoding or cycle consistency. Problem: lack of constrained on ¯

x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>
slide-36
SLIDE 36

Case Study #1: Unsupervised MT

English French

TEST mono mono

DATA

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

Encoder

en fr

Cross-Entropy Loss

prediction input sentence

Decoder

NMT system

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

Encoder Decoder

fr

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

backward NMT system

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

Problem: lack of modularity.

Decoder may behave differently when fed with representations from French encoder VS English encoder.

en

Cross-Entropy Loss

prediction input sentence

+

en

Encoder Decoder

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

DAE makes sure decoder outputs fluently in the desired language.

n

<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>

NMT system

Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Artetxe et al. “An effective approach to unsupervised MT” ACL 2019

slide-37
SLIDE 37

Case Study #1: Unsupervised MT

37

English French

TEST mono mono

DATA

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

Encoder

en fr

Cross-Entropy Loss

prediction input sentence

Decoder

NMT system

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

Encoder Decoder

fr

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

backward NMT system

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

en

Cross-Entropy Loss

prediction input sentence

+

n

<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>

en

Encoder Decoder

NMT system

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

DAE makes sure decoder outputs fluently in the desired language.

Encoder

Src Tgt

prediction input sentence with target language ID

Decoder

NMT system

(x, LID)

<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUgtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j3aGw7cBiHZvPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSFS/WByIElBLRnHurYpyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjoGfKu4H06LKVNJWRW1PnC1QsvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB31+HPJoH</latexit>

Like in multilingual NMT, share encoder and decoder

  • parameters. Encoder is encouraged to produce

shared representations (particularly if pre-trained).

slide-38
SLIDE 38

Case Study #1: Unsupervised MT

38

English French

TEST mono mono

DATA

Mt = {yt

k}k=1,..,Mt

<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>

Ms = {xs

j}j=1,..,Ms

<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>

Encoder

en fr

Cross-Entropy Loss

prediction input sentence

Decoder

NMT system

¯ x

<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>

Encoder Decoder

fr

yt ∼ Mt

<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>

backward NMT system

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

en

Cross-Entropy Loss

prediction input sentence

+

n

<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>

en

Encoder Decoder

NMT system

xs ∼ Ms

<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>

DAE makes sure decoder outputs fluently in the desired language.

Encoder

Src Tgt

prediction input sentence with target language ID

Decoder

NMT system

(x, LID)

<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUgtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j3aGw7cBiHZvPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSFS/WByIElBLRnHurYpyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjoGfKu4H06LKVNJWRW1PnC1QsvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB31+HPJoH</latexit>

Like in multilingual NMT, share encoder and decoder

  • parameters. Encoder is encouraged to produce

shared representations (particularly if pre-trained).

ITERATIVE BT DAE Multi-Lingual + +

slide-39
SLIDE 39

Same ideas can be applied to phrase- based statistical MT systems (PBSMT). NMT and PBSMT can be combined for even better results.

WMT’14 En-Fr

Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Since unsupMT was trained on about 10M sentences, each parallel sentence is worth 100 monolingual sentences (for this dataset and language pair).

slide-40
SLIDE 40

Case Study #2: FLoRes Ne-En

40

In-domain (Wikipedia) Out-of-domain Parallel None

500K sentences (Bible, GNOME/Ubuntu, OpenSubtitle, …) *Hindi: 1.5M

Monolingual 100K sentences

~5M sentences (CommonCrawl) *Hindi: 45M

slide-41
SLIDE 41

Results on FLoRes: Ne-En

41

slide-42
SLIDE 42

42

Results on FLoRes: Ne-En

slide-43
SLIDE 43

43

Results on FLoRes: Ne-En

slide-44
SLIDE 44

44

Results on FLoRes: Ne-En

slide-45
SLIDE 45

Case-Study #3: English-Burmese

slide-46
SLIDE 46

Workshop on Asian Translation 2019: English-Burmese

In-domain (News) Out-of-domain Parallel 20K sentences 200K sentences Monolingual ~79M sentences (En only) ~23M sentences (My only)

“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019

slide-47
SLIDE 47

Results: Iterative ST+BT

My —> En En —> My

BLEU 26 29 32 35 38 Parallel

  • Iter. 1
  • Iter. 2
  • Iter. 3

BLEU 35 36.5 38 39.5 41 Parallel

  • Iter. 1
  • Iter. 2
  • Iter. 3

“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019

slide-48
SLIDE 48

Results: BT vs ST vs BT+ST

My —> En, iter. 2

BLEU 30 31.25 32.5 33.75 35

  • iter. 1
  • iter. 2 BT
  • iter. 2 ST
  • iter. 2 BT+ST

“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019

slide-49
SLIDE 49

Final Results of 2019 Competition

BLEU 24 28 32 36 40 Ours NICT NICT-NMT UCSMNLP

En —> My

BLEU 18 23.5 29 34.5 40 Ours NICT-NMT NICT UCSYNLP

My —> En +8 BLEU compared to second best

“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019

slide-50
SLIDE 50

50

Demo (Burmese —> English)

1

23.3 26.32 27.74 slide credit to Peng-Jen Chen

slide-51
SLIDE 51

Conclusion so far…

  • Iterative back-translation, multi-lingual training work remarkably well.
  • By feeding more data (BT, ST, pre-training, multi-lingual training) we

can afford training bigger models. Bigger models train on more data generalize better.

  • Low-resource MT requires big compute! Remember that about 100

monolingual sentences give the same training signal as a single pair of parallel sentence.

51

slide-52
SLIDE 52

Outline

52

MODEL DATA ANALYSIS

life of a researcher

“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019 “Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Massively Multilingual NMT” Aharoni et al.,ACL 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020 “Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019

slide-53
SLIDE 53

Simulating Low-Resource MT

Simulating low-resource MT with a high resource language: using EuroParl data with 20K parallel sentences and 100K monolingual target sentences.

  • nly parallel data

parallel data + BT 30.4 BLEU 33.8 BLEU

+3.4 BLEU!

https://www.statmt.org/europarl/

EuroParl Fr—>En

slide-54
SLIDE 54

A Worrisome Finding

BT sometimes yields very mild improvements.

The FLORES evaluation datasets… Guzmán, Chen et al. EMNLP 2019

Example

  • nly parallel data

parallel data + BT 15.2 BLEU 15.3 BLEU FB public posts En—>My

+0.1 BLEU!

Why is BT not working as well?

slide-55
SLIDE 55
  • football
  • baseball
  • basketball
  • soccer
  • cricket
  • rowing

Sports

55

For the same topic: Different distribution over words!

slide-56
SLIDE 56
  • hamburger
  • clam chowder
  • apple pie
  • kimchi
  • bulgogi
  • bibimbap

Food

56

For the same topic: Different distribution over words!

slide-57
SLIDE 57

topic distribution

57

Domains differ in the topic distribution

  • politics
  • sports
  • religion
  • environment
slide-58
SLIDE 58

Examples from FLoRes

Si-En En-Si

  • riginal

translation Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019

Wikipedia in Sinhala has different topic distribution.

slide-59
SLIDE 59

Source-Target Domain Mismatch (STDM)

  • Def.: Content produced in blogs, social networks, news outlets, etc. varies with the

geographic location.

  • STDM is even more pronounced in low resource MT, where source & target

geographic locations are typically farther apart and cultures have more distinct traits.

  • STDM manifests itself in terms of: Different distribution of topics, and for the same

topic different distribution over words.

  • STDM is hard to measure because of the unknown effect of translationese.
  • STDM makes the MT problem even harder: source originating data and target
  • riginating data are not comparable in general. BT won’t work as well. UnsupMT

won’t work as well.

Language and place. Johnstone Cambridge Univ. Press 2010 Leech et al. “Computer corpora: What do they tell us about culture?”Journal Computers in English Linguistics 1992

slide-60
SLIDE 60

Questions

  • Is it true that BT is less effective when there is STDM?
  • What baselines shall we consider when there is STDM?
  • What are general best practices when there is STDM?
  • How to study STDM in a controlled setting?

60

slide-61
SLIDE 61

Controlled Setting

Source Domain EuroParl Target Domain OpenSubtitles

mono mono

Source Language: Fr Target Language: En

10K sentences <1M sentences

61

<1M sentences human translations human translations

slide-62
SLIDE 62

Controlled Setting

Source Domain EuroParl

mono mono

Source Language: Fr Target Language: En

62

(1 − α)

<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AGAHicjVTbtMwGM5GCyOcNpC4cZiqtTSg5KBAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMFCHLY3DH2/A7SHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL2vnH9kIfj2CUHbsjC+MjBnDAakANBSNHUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLsnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCgaiRGp+lQwL6ac/9q8/y3lqkbr/efPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr+A3w4ANP</latexit>

EuroParl + OpenSubtitles

α

<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AF+nicjVTbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpcuiEL4xMHc8JoQA4FYycRDHBvsPIsXP6QumP5KY0zA4EJOIdH08CKhHXSxAZG+WKlU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+lMqwj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhjxlCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCgaiRGp+VQwL6Wc/9q8/y3lqmbr7be/XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEgZtf5I0YgnKodxD1aUxcwSawW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav9uHW7vNpOja0u9o9raZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA=</latexit>

α = 0

<latexit sha1_base64="ZPnRfubrH7b3OY+yvc0OydClHfo=">AF/nicjVTbtMwGM5GCyMctgF3FhMlTp6UDIQIMGkCTSJCxADdpKaNnJStzVzDopdlspY4lW4QKEuOU5uONt+J2k0HbtwFIU+z98/v6TvZhRLizr19LyhVL54qWVy+aVq9eur6t3zjk0TDxyYEfsSg59jAnjIbkQFDByHGcEBx4jBx5J8+0/ug9STiNwn0xik7wP2Q9qiPBYjc9dKtCoqr6SZy4iSKRYRIRyKHRX0Uu9IJvCiVjqDhSL54qZTShjXE4afMStpoG1UPXWlvWrutONBK/r0wNLAR6nwVyMBR7oAzp17TrKhCgzARiz8qrqiAEReBPucgIaVq26JtCpPUZjRUMLzEqnMSkKwFcTzdw0BT8KuWgihZxelGDGUIocGkCOCc8zsL0oZLOSJ6SaunbBL3X3tcN5mZq2rYEpVKWLNdczqgmCkFh9kYB8pK58ItRiYlNGQLKXYB+KF2Ax8Dy5q1wJSifrEZmQrhI1uzh5DPsn4wLsqxzGLs4CagRdMF3UyF1LvB/wTbsOcAQbh4Z7UFpZuLsYoF1x0CiZzTvIhoKrfpTwkl282EUai1ASLtrE8WQowtz8cws1nq3AWstEOnZmoiaNBA1EiPz8IhAf245/7VZ3lvzVM3uy+/Wsy2WNI40fwLjDSEzhJolPZV6aDWTzAcJ3lrm1YTStb6OzGLjYbRrH23LWfUGR/GJBQ+Axz3rKtWLQlTgT1GQHoIScxNAfukxZsQxwQ3pZ1yhUAUkXwVTCFwqUSc9JA4HwUeWOoK8VmdFs7TtYai96gtaRgPBQn9/KLekCEoiX4LUZcmxBdsBvsJxS4In+AYoEvJgmJMGeDfns5nCrad9rbr2+v7HztEjHinHbuGNUDdt4aOwYz40948DwS7L0qfSl9LX8sfy5/K38PTdXip8bhpTq/zjN1cdAvk=</latexit>

Target Domain OpenSubtitles

slide-63
SLIDE 63

Controlled Setting

Source Domain EuroParl Target Domain

mono mono

Source Language: Fr Target Language: En

63

(1 − α)

<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AGAHicjVTbtMwGM5GCyOcNpC4cZiqtTSg5KBAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMFCHLY3DH2/A7SHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL2vnH9kIfj2CUHbsjC+MjBnDAakANBSNHUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLsnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCgaiRGp+lQwL6ac/9q8/y3lqkbr/efPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr+A3w4ANP</latexit>

EuroParl + OpenSubtitles

α

<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AF+nicjVTbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpcuiEL4xMHc8JoQA4FYycRDHBvsPIsXP6QumP5KY0zA4EJOIdH08CKhHXSxAZG+WKlU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+lMqwj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhjxlCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCgaiRGp+VQwL6Wc/9q8/y3lqmbr7be/XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEgZtf5I0YgnKodxD1aUxcwSawW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav9uHW7vNpOja0u9o9raZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA=</latexit>

intermediate value of α

<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AF+nicjVTbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpcuiEL4xMHc8JoQA4FYycRDHBvsPIsXP6QumP5KY0zA4EJOIdH08CKhHXSxAZG+WKlU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+lMqwj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhjxlCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCgaiRGp+VQwL6Wc/9q8/y3lqmbr7be/XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEgZtf5I0YgnKodxD1aUxcwSawW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav9uHW7vNpOja0u9o9raZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA=</latexit>

OpenSubtitles

slide-64
SLIDE 64

Controlled Setting

Source Domain & Target Domain EuroParl

mono mono

Source Language: Fr Target Language: En

64

(1 − α)

<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AGAHicjVTbtMwGM5GCyOcNpC4cZiqtTSg5KBAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMFCHLY3DH2/A7SHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL2vnH9kIfj2CUHbsjC+MjBnDAakANBSNHUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLsnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCgaiRGp+lQwL6ac/9q8/y3lqkbr/efPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr+A3w4ANP</latexit>

EuroParl + OpenSubtitles

α

<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AF+nicjVTbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpcuiEL4xMHc8JoQA4FYycRDHBvsPIsXP6QumP5KY0zA4EJOIdH08CKhHXSxAZG+WKlU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+lMqwj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhjxlCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCgaiRGp+VQwL6Wc/9q8/y3lqmbr7be/XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEgZtf5I0YgnKodxD1aUxcwSawW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav9uHW7vNpOja0u9o9raZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA=</latexit>

α = 1

<latexit sha1_base64="xIEQP2x/mHoaZFbz/3+L5Jgy2hk=">AF/nicjVTbtMwGM5GCyMctgF3FhMlTp6UDIQIMGkCTSJCxADdpKaNnJSpzVzDopdlspY4lW4QKEuOU5uONt+J2kHbtwFIU+z98/v6TvYRLizr19LyhVL54qWVy+aVq9eur6t3zjk8SD1yYEfszg9jAnjEbkQFDByHGSEhx6jBx5J8+0/ug9STmNo30xTEg7xL2IBtTHAkTueulWBSXVbBM5SRonIkakI5HD4h5KXOmEXpxJR9BoKF+8VEpwxri8FNmJes0DaqnrS3rJV3enGgtf16YGlAI/TcC7GAg/0AZ26dh3lQpSbAIxZeV1RJ8IvAl3OSGNqlZdE+jUHqOJoqEFZqXTKIpC8NVEczdNwY8jLpISeIU8wYypBDQ8gx4aMbC8K2ayMElLNXHvML3P3tcN5mZq2rYEpVKWLNdczqgJBSKy+SEA+Mlc+EWoxsSkjIBmk2IfihVj0PU/uKleC0sl7RKakq0TNHp8hv2TSQH21QjGHp8F1Ai6oMB3UyF1LvB/wTbsOcAQ7igyGkBpZuLsYoF1x0CiZzTvYhoJrfpTwiK7+TAKtRagAJF23icLISaW52OY+Sx17gJW1qFTM1UIGjQNdLjs3BIQD/puX/12ai35qkb3bf/jUp9hjS+DG8C4wEAqdpfCp7ynQwS/oYrPdtQ2raeULnd3Y482GMV57tpPKLI/CEkfIY5b9lWItoSp4L6jAD0gJMEmgP3SAu2EQ4Jb8u8axSqgKSLYCrhiwTKpUPiUPOh6EHlrpCfFanhfN0rYEIHrUljZKBIJE/uigYMAQl0W8h6tKU+INYP9lAJX5PcxTJGAF9OEJNizIZ/dHG417XvNrdf3N3aejtOxYtw27hVwzYeGjvGc2POD8kix9Kn0pfS1/LH8ufyt/H5kuL419bhpTq/zjN1ihAvo=</latexit>
slide-65
SLIDE 65
  • Back-Translation:
  • Self-Training:

target mono data source mono data in-domain mono data

Q.: Is it better to have clean targets but out-of-domain data, or noisy targets but in-domain data? Q.: What’s the effect of amount of parallel/monolingual data? Q.: What’s the effect of the quality of the model forward model when training with ST?

65

slide-66
SLIDE 66

Varying Domain of Target Originating Data

Target originating data is out-of-domain Target originating data is in-domain

(1 − α)

<latexit sha1_base64="hXiGHlRj7nWlfLezr2o5piRxOpQ=">AGAHicjVTbtMwGM5GCyOcNpC4cZiqtTSg5KBAkmTaBJXIAYsI1JTRs5qdOaOQfFLksVfMOrcMFCHLY3DH2/A7SHt2oGlKPZ/+Pz9JzsRo1wYxq+V1XOV6vkLaxf1S5evXL2vnH9kIfj2CUHbsjC+MjBnDAakANBSNHUyw7zDy1jl+qvRv35OY0zDYF5OI9Hw8DKhHXSxAZG9UbtZQVE8ayIriMBIhIv0UWSwcoshOLd8Jk9QSNJikz19IKZVhE3H4Sb2W9NtoG9VP7NTcMmXLGoSCt9TpviEBj1N/IcYSD/QBndhmC2VClJkAjF57WbfEiAjcgLsnwZ1o6UI9JuP0FTRVgK91m+XRT74KqKZm6LghgEXHSR5YUxZgwlyKI+5JjwPAPby0LWa3lC6oltFvwSe185nJWpWdsmEJVBlhxPaUqEYTEqosE5COx08dCLic2YwQkvRi7UDwfi5HjpLvSTkFpZT2SxmQgRdMsTg7D7vG0APsyhzGLs4AaQReU+DYkmcC/xds21wADOHmkVEPSjMX5wALrDoGEj2neRfSQCjVnxKW2S2Gkai7BAWI9LI+WQoxtTwbQ89mqX8HsJI+nZmpUtCgaiRGp+lQwL6ac/9q8/y3lqkbr/efPXpNxjSOGH8C4w4gkcx+FJOpR63YS5sTCLRrhr28aHSNb6PTGLDabWrH27PWfUGZ37JNAuAxz3jWNSPRSHAvqMiJ1a8xJBO2Bh6QL2wD7hPfSrG8kqoFkgGAu4QsEyqRljxT7nE98ByxVjfi8TgkX6bpj4T3spTSIxoIEbn6RN2YIiqJeQzSgMXEFm8AGuzEFrsgdYZgjAW+mDkw50M+vTnc6ph3O1uv7m3uPCnSsabd0m5rdc3UHmg72jNtTzvQ3IqsfKp8qXytfqx+rn6rfs9NV1cKnxvazKr+A3w4ANP</latexit>

EuroParl + OpenSubtitles

α

<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AF+nicjVTbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpcuiEL4xMHc8JoQA4FYycRDHBvsPIsXP6QumP5KY0zA4EJOIdH08CKhHXSxAZG+WKlU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+lMqwj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhjxlCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCgaiRGp+VQwL6Wc/9q8/y3lqmbr7be/XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEgZtf5I0YgnKodxD1aUxcwSawW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav9uHW7vNpOja0u9o9raZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA=</latexit>
slide-67
SLIDE 67

Baseline Approaches

  • Bitext only:
  • Back-Translation:
  • Self-Training:

learn:

1) 2) 3)

learn: apply Source Target

1)

learn:

2)

source mono

apply

model translation

3)

re-learn:

source mono

model translation

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>
slide-68
SLIDE 68
  • Bitext only:
  • Back-Translation:
  • Self-Training:

learn:

1) 2) 3)

learn: apply Source Target

1)

learn:

2)

source mono

apply

model translation

3)

re-learn:

source mono

model translation

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>

p(x|y)

<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>

Baseline Approaches: Only In-Domain Data

p(y|x)

<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>
slide-69
SLIDE 69

In-Domain Only VS. Mixed Domain

69

slide-70
SLIDE 70

Varying Amount of Monolingual Data ( )

α = 0

<latexit sha1_base64="ZPnRfubrH7b3OY+yvc0OydClHfo=">AF/nicjVTbtMwGM5GCyMctgF3FhMlTp6UDIQIMGkCTSJCxADdpKaNnJStzVzDopdlspY4lW4QKEuOU5uONt+J2k0HbtwFIU+z98/v6TvZhRLizr19LyhVL54qWVy+aVq9eur6t3zjk0TDxyYEfsSg59jAnjIbkQFDByHGcEBx4jBx5J8+0/ug9STiNwn0xik7wP2Q9qiPBYjc9dKtCoqr6SZy4iSKRYRIRyKHRX0Uu9IJvCiVjqDhSL54qZTShjXE4afMStpoG1UPXWlvWrutONBK/r0wNLAR6nwVyMBR7oAzp17TrKhCgzARiz8qrqiAEReBPucgIaVq26JtCpPUZjRUMLzEqnMSkKwFcTzdw0BT8KuWgihZxelGDGUIocGkCOCc8zsL0oZLOSJ6SaunbBL3X3tcN5mZq2rYEpVKWLNdczqgmCkFh9kYB8pK58ItRiYlNGQLKXYB+KF2Ax8Dy5q1wJSifrEZmQrhI1uzh5DPsn4wLsqxzGLs4CagRdMF3UyF1LvB/wTbsOcAQbh4Z7UFpZuLsYoF1x0CiZzTvIhoKrfpTwkl282EUai1ASLtrE8WQowtz8cws1nq3AWstEOnZmoiaNBA1EiPz8IhAf245/7VZ3lvzVM3uy+/Wsy2WNI40fwLjDSEzhJolPZV6aDWTzAcJ3lrm1YTStb6OzGLjYbRrH23LWfUGR/GJBQ+Axz3rKtWLQlTgT1GQHoIScxNAfukxZsQxwQ3pZ1yhUAUkXwVTCFwqUSc9JA4HwUeWOoK8VmdFs7TtYai96gtaRgPBQn9/KLekCEoiX4LUZcmxBdsBvsJxS4In+AYoEvJgmJMGeDfns5nCrad9rbr2+v7HztEjHinHbuGNUDdt4aOwYz40948DwS7L0qfSl9LX8sfy5/K38PTdXip8bhpTq/zjN1cdAvk=</latexit>
slide-71
SLIDE 71

Conclusion

  • STDM is particularly significant in low resource language pairs.
  • Controlled setting helps studying STDM in isolation.
  • ST is more robust than BT to STDM. We have already seen that

combining ST & BT worked best in En-My.

  • In practice, the influence of STDM depends on several factors, such

as the amount of parallel and monolingual data, the domains, etc. In particular, if domains are not too distinct, STDM may even help regularizing!

71

slide-72
SLIDE 72

What I did not talk about: Filtering

  • Data: extract clean version of common crawl.
  • Learn a joint embedding space.
  • Use approximate nearest neighbor methods to find closest

matching sentence in embedding space.

  • Translation quality on several languages is even higher than

using actual existing bilingual + mono data!

Schwenk et al. “CCMatrix: mining billions of high-quality parallel sentences on the WEB” arXiv:1911.04944 2019

BT pre-training multilingual filtering

slide-73
SLIDE 73

Final Remarks

  • Low resource MT is a good use case applications of several long standing ML problems:

aligning domains, learning with less supervision, leveraging compositionality, etc.

  • The importance and difficulty of data collection should not be under-estimated.
  • A healthy cycle of research: data, modeling, analysis.
  • Low resource MT key idea: use as many auxiliary tasks and data.
  • Low resource MT requires lots of data and compute.
  • Lots of open challenges.
  • Specific to low resource: dealing with all sorts of domain mismatch, learning from little data, quality of evaluation

sets…

  • General of text generation: better use of context, common sense, striking a good trade-off between accuracy and

speed, controllability, safety, biases…

73

slide-74
SLIDE 74

Questions? Вопросы? ¿Preguntas? Domande?

Guillaume Lample Ludovic Denoyer Alexis Conneau Hervé Jegou Myle Ott Michael Auli Sergey Edunov Peng-Jen Chen Matt Le Jiajun Shen Juan-Miguel Pino Jiatao Gu Philipp Kohen Paco Guzmán Vishrav Chaudhary Xian Li Junxian He Naman Goyal