Low Resource Machine Translation
Marc’Aurelio Ranzato
Facebook AI Research - NYC ranzato@fb.com
Stanford - CS224N, 10 March 2020
Low Resource Machine Translation MarcAurelio Ranzato Facebook AI - - PowerPoint PPT Presentation
Low Resource Machine Translation MarcAurelio Ranzato Facebook AI Research - NYC ranzato@fb.com Stanford - CS224N, 10 March 2020 Machine Translation English French Training data Ingredients: Train NMT seq2seq with attention NMT
Facebook AI Research - NYC ranzato@fb.com
Stanford - CS224N, 10 March 2020
2
English French Training data
NMT System
Train NMT Test NMT
NMT System
life is beautiful la vie est belle
Ingredients:
Ingredient:
3
does not speak English
the world are native English speakers.
Some Stats
https://www.statista.com/statistics/266808/the-most-spoken-languages-worldwide/ source:
The top 10 languages are spoken by less than 50% of the people. The remaining ~6500 are spoken by the rest! More than 2000 languages are spoken by less than 1000 people.
https://ai.googleblog.com/2019/10/exploring-massively-multilingual.html source:
(X to English)
6
English Nepali Training data 25M people
7
English Nepali Training data Parallel training data (collection of sentences with corresponding translation) is small! 25M people
8
English Nepali Training data Let’s represent data with rectangles. The color indicates the language.
English Nepali
Domain
Bible Parliamentary
Let’s represent (human) translations with empty rectangles.
sentences originating in English corresponding Nepali translations sentences originating in Nepali corresponding English translations
English Nepali
Domain
mono mono
News Bible Parliamentary
TEST mono
English Nepali
resource language of interest. This data may belong to a different domain.
Domain
mono
Hindi
mono
Books
mono TEST mono
News Bible Parliamentary
English Nepali Domain Hindi
TEST
Sinhala Bengali Spanish Tamil Gujarati
the Mondrian like learning setting!
13
Loose definition: A language pair can be considered low resource when the number of parallel sentences is in the order of 10,000 or less. Note: modern NMT systems have several hundred million parameters nowadays! Challenges:
compositional learning.
14
15
life of a researcher
“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019 “Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Investigating Multilingual NMT Representations at Scale” Kudugunta et al., EMNLP 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020 “Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019
16
http://opus.nlpl.eu/
English Nepali Domain
Wikipedia Bible, JW300, etc. GNOME, Ubuntu, etc.
mono TEST mono
Common Crawl
mono mono
In-domain data: no parallel, little monolingual. Out-of-domain: little parallel, quite a bit monolingual No translation originating from Nepali.
sentences).
18
English-Nepali and English-Sinhala.
language id filtering, etc),
Data Collection Process:
Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
Si-En En-Si
translation Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
Wikipedia originating in Si has different topics than Wikipedia originating in En
Ne-En En-Ne
Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
22
https://github.com/facebookresearch/flores data & baseline models
23
24
life of a researcher
“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019 “Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Massively Multilingual NMT” Aharoni et al.,ACL 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020 “Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019
English Nepali Domain Hindi
TEST
Sinhala Bengali Spanish Tamil Gujarati
If N is small, how can we further regularize the model?
Learning Framework: Supervised Learning.
Encoder
en ne
Training Dataset
D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>L(θ) = − log p(y|x)
<latexit sha1_base64="5sjE0mn5xSjyfKofXGLV97RWgQ=">ADP3icdZJNb9NAEIbX5qPFfDSFI5cVUSRbMpZdkMqlUoEeOFAUJNJWihNrvdkm6w/5F1Xtrb+Z1z4C9y4cuEAQly5sbHTkqQwkqXRO8/svDveMGWUC9f9ouk3bt6vbV9x7h7/6DndbuwxOe5BkmPZywJDsLESeMxqQnqGDkLM0IikJGTsP560X9JxknCbxB1GmZBChSUzHFCOhpGBX63X8CIkpRkweVfA+hKahV1aAYV+FUh64NmOY7+rjL/cTXkDVkMeTCruVnDHQd8gxQNWQ5FMK/J+SUpVsmjaij9KEwKifKiujS2+cbRrqKTXLi8IyOmo69DmN4JqzlUNfXhk1a6c29EOUybIKZtb/Tas1+DhP4doxqt60Glfy28r0xZQIZKkZT6HPkglU1uAFLKyg1XYdtw54PfGWSRsoxu0PvujBOcRiQVmiPO+56ZiIFEmKGZETc05SRGeownpqzRGEeEDWf/CnaUMoLjJFNfLGCtrnZIFHFeRqEiF+b5Zm0h/qvWz8X4xUDSOM0FiXEzaJwzKBK4eExwRDOCBStVgnBGlVeIpyhDWKgnZ6gleJtXvp6c7DneM2fv/fP24avlOrbBY/AEmMAD+AQvAFd0ANY+6h91b5rP/RP+jf9p/6rQXVt2fMIrIX+w+p+wMb</latexit>Per-sample loss:
English Nepali
TEST TRAIN //
x
<latexit sha1_base64="iWIFvj4hQ56IJLkVNOFByT95Uok=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt+LYrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr390Xyaq</latexit>Cross-Entropy Loss
y
<latexit sha1_base64="0flskoD0eLhdjGE1HTyI8J9TWSE=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt+LErlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr3914yar</latexit>human translator human reference prediction input sentence
Decoder
NMT system
DATA [1] Srivastava et al. “Dropout: a simple way to prevent neural networks from overfitting” JMLR 2014 [2] Szegedy et al. “Rethinking the inception architecture for computer vision” CVPR 2016
usual attention-based transformer
Adding source-side monolingual data. Idea: model p(x). Training Dataset Learning Framework: DAE
D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>Either pre-train or add a DAE loss to the supervised cross-entropy term.
Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>Noise: word drop, swap, etc.
LDAE(θ) = − log p(x|x + n)
<latexit sha1_base64="BVELMu8ESha+pzsI/mZ3d7HZo=">ADd3icdZJLb9NAEMc3Do8SXm5wYERIchRQxSXSnCp1EKQOFAUJNJWyibWerNJNvFL3nVly92PwJfjxvfgwo2NnUCSlpFWGs38Zuc/o3FClwvZbv8sGeVbt+/c3blXuf/g4aPH1d29MxHEWU9GrhBdOEQwVzus57k0mUXYcSI57js3Jl/WOTPL1keOB/k2nIBh6Z+HzMKZE6ZO+WvtexR+SUEjfrKDgCnIGZNOGzQErO+NHVrPVan5RlX/cqRqKgkyGwp7l3KzgTm2xRcqCTIfSnufkfEXKdbKjhn2nCDJSJyolZC4ebklpKuLQjO9ShqVu4OWHAPNpStfXryV6iZK20CdkiUpcqeNf4vWq8B0ziEjW90vihdAz8rE8spk6Shm7wG7AYT0NrgCrS6NWo1meqcfLyxJFmUwD74Dbta7faucF1x1o6NbS0rl39gUcBjT3mS+oSIfpWO5SDjESU5epCo4FCwmdkwnra9cnHhODL8bBXUdGcE4iPTzJeTR9YqMeEKknqPJxTRiO7cI3pTrx3L8bpBxP4wl82nRaBy7IANYHCGMeMSodFPtEBpxrRXolESESn2qFb0Ea3vk687ZQct60zr4elg7fr9cxw56hl4gE1noLTpGn1AX9RAt/TKeGjXjpfG7/Lz8qmwWqFa1jxBG1a2/gBhxBC2</latexit>en
English Nepali
TEST TRAIN //
Cross-Entropy Loss
prediction input sentence mono
+
n
<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>en
Encoder Decoder
NMT system
xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>DATA Vincent et al. “Stacked denoising auto-encoders:…” JMLR 2010 Liu et al. “Multilingual denoising pretraining for NMT” arXiv:2001.08210 2020 E.g.: The cat the on sat mat. The cat sat on the. no noise noise only good region
Adding source-side monolingual data. An alternative approach to DAE. Training Dataset Learning Framework: Self-Training (ST).
D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>ALGORITHM
dataset
p(y|x)
<latexit sha1_base64="0bhALflm3VcKbOBXhaYvsf98m/8=">ACnHicbZFdSxtBFIZnt7XVrW1jeyWFMhiECGHZtQV7IxXNhaWkRGikA3D7GSiY2Y/mDkru2z3V/lPvPfONmN1MYeGHh5z3OY8xGmUmjwvHvLfvFy7dXr9Q3nzebd+9bWx9GOskU40OWyERdhFRzKWI+BAGSX6SK0yiU/Dycnyzy5zdcaZHEv6FI+Sil7GYCUbBWKR1uxtEFK4YlWvwoc4KHEn7xZ7ROCgIqU49Lu2/1VOX+5fkV0Q+bkuqauG6pP9AoHDVeQec3NHzl4yvVMJojCJC9plePTWTdm5UmBpWTdo/+R5ptT3XqwM/F/5StNEyBqR1F0wTlkU8Biap1mPfS2FSUgWCSV45QaZ5StmcXvKxkTGNuJ6U9XIrvGucKZ4lyrwYcO0+rShpHURhYZcTKRXcwvzf7lxBrNvk1LEaQY8Zs1Hs0xiSPDiUngqFGcgCyMoU8L0itkVZSBuadjluCvjvxcjPZd/4u7f/a1fXS8XMc6+oR2UAf56AdoVM0QEPErG3ru3Vq/bA/2z37p91vUNta1nxE/4Q9egC4K8XU</latexit>D
<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">ACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYiODiWt/DHbteazjaZjO1Xhj8KdiSaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvb3rvebx6fzNaxgrbQNmohFx2gY3SOqiLqPHTuDC6Rs/8ZV6bfNvjZrGLOJ/imTvgLIHMr3</latexit>xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>As = {(xs
j, ¯
yj)}j=1,..,Ms
<latexit sha1_base64="WViJMaNEUbzC6NMXxyJtKcfS2lg=">AC73icbZJNb9MwGMedwNgIbx0cuVhUlVopipINaVwmDdiBy1CR6DapaS3HdTe3zgu2MyUy+RJcOIAQV74ON74NbtKNLeORLP39PL9Hz98vUcaZVL7/x7Lv3N24t7l13nw8NHjJ53tp8cyzQWhI5LyVJxGWFLOEjpSTHF6mgmK4jTk2j5dlU/uaBCsjT5qMqMTmJ8lrA5I1iZFNq2NnphjNU5wVwfVnAfhr2C7cIAbDCm2H7ie576vnH/cUTWVDVlMJVrU3KLhjpBskaohy6lCy5pcXpLqOnlYTXUYR2mhcV5Ul0Zy96JlZGiasn75uRg4PTMdhpLF8IYz52r3+spnvzbqwjDCQpcVWgxanlGn63t+HfC2CNaiC9YxRJ3f4SwleUwTRTiWchz4mZpoLBQjnFZOmEuaYbLEZ3RsZIJjKie6fq8K9kxmBuepMCtRsM5e79A4lrKMI0OuziLbtVXyf7VxruavJpolWa5oQpB85xDlcLV48MZE5QoXhqBiWDGKyTnWGCizBdxzCUE7SPfFsc7XrDr7Xx42T14s76OLfAcvAB9EIA9cADegSEYAWJx64v1zfpuf7K/2j/snw1qW+ueZ+BG2L/+Ao275JQ=</latexit>D ∪ As
<latexit sha1_base64="LzJmXdcvEVdwpwvmNy9sV6dCi8=">AD3icbZLPb9MwFMedjB+j/OrgyMWiqtRKUZQMJLhMGmMHLkNFotukpo0c193c2klkO1Mik/+Ay/6VXTiAEFeu3PhvcJMO2ownRfrqvc/z+/rFUcqoVJ7327K3bt2+c3f7Xuv+g4ePHrd3nhzLJBOYDHCEnEaIUkYjclQUcXIaSoI4hEjJ9Hi7bJ+ckGEpEn8URUpGXN0FtMZxUiZVLhjdbsBR+ocI6YPS7gHAw17uVP0QwqDMtR0z3dc13lftv5xR+VE1mQ+keG84uY1dxTKBqlqspiocFGRi2tSrZOH5UQHPEpyjbK8vDaSORcNIwPTlPaKT3m/1TXTYSAphxvO1g5989dor3LqwCBCQhdlO83Ta9vIcBZCjdOCdsdz/WqgDeFvxIdsIpB2P4VTBOcRIrzJCUI9L1VgjoShmxIzLJEkRXqAzMjIyRpzIsa7+Zwm7JjOFs0SYL1awyq53aMSlLHhkyKVJ2awtk/+rjTI1ez3WNE4zRWJcD5plDKoELh8HnFJBsGKFEQgLarxCfI4Ewso8oZgt+8k1xvOv6L9zdDy87+werdWyDZ+A56AEfvAL74B0YgCHA1mfryvpqfbMv7S/2d/tHjdrWqucp2Aj75x/BqfGZ</latexit>¯ y
<latexit sha1_base64="tLy1OgwMuQcWRWcjdx94KCJ6o=">ADGHicdZJNb9MwGMed8DbCWwdHLhZVpVaKomRDgsukATtwGSoS3SY1beS47ubWeZHtTImMPwYXvgoXDiDEdTe+DW7SQtvBI1n63l+j/3Y8c5o0L6/i/LvnHz1u07O3ede/cfPHzU2n18IrKCYzLAGcv4WYwEYTQlA0klI2c5JyiJGTmN528W9dNLwgXN0g+yskoQecpnVKMpElFu5bXCRMkLzBi6kjDAxgq2C3dqhdRGOpI0YPA9Tz3nXb+csd6LBqyHItoVnOzhjuOxBYpG7Iay2hek/MVKdfJIz1WYRJnpUJFqVdGCvdy0jfNOXd6mPZczrmdBgKmsANZ2ubvpjtFs7dWEYI64qHc16/zdtxhDiIocb2zirzlb9/w64HURLEUbLKMfta7CSYaLhKQSMyTEMPBzOVKIS4oZ0U5YCJIjPEfnZGhkihIiRqp+WA07JjOB04yblUpYZ9c7FEqEqJLYkAuzYru2SP6rNizk9OVI0TQvJElxc9C0YFBmcPFL4IRygiWrjECYU+MV4gvEZbmLzlmCMH2la+Lkz0v2Pf23j9vH75ejmMHPAXPQBcE4AU4BG9BHwAtj5ZX6xv1nf7s/3V/mH/bFDbWvY8ARthX/0Gco31JA=</latexit>Key elements: decoding and training noise.
LST (θ) = − log p(¯ y|x + n)
<latexit sha1_base64="F3K+2TxkBw65jeBJbH/60t8pRlA=">ADtHicdZJdb9MwFIbdhI8Rvjq45MaiqtSKUjUDATeTNigSFwVsa6T6jY4rtu6dT4UO1OizH+QS+74NzhJO9quWLJ0dM7z+rw+Om7ImZCdzp+KYd65e+/+wQPr4aPHT5WD59diCOCO2TgAfRpYsF5cynfckp5dhRLHncjpwl5/y+uCKRoIF/rlMQzry8MxnU0aw1CnsPKrjws5wTzrKvgMUQZbCStOkwiJSTsWO71W63vinrH3emxqIk7FwFgW3KLkzR+yQsiTsXSWBblck3KT7Kpxhjw3SDIcJ2ptJG5d7RjpaVHYSK+TplX3SESzINbzjYePb0x2ictiBycZSlylk0/29ajwGROIRbz+h6Kd0Av6oGknMqcVM3eQ0RD2ZQe4PXMHe3ga2/prqn/dqklwDX0G/ae2T/Tjfq1o5utE61Vqn3SkOvB3Yq6AGVqfnVH+jSUBij/qScCzE0O6EcpThSDLCqbJQLGiIyRLP6FCHPvaoGXF0ilY15kJnAaRvr6ERXZTkWFPiNRzNZl/SezW8uS+2jCW0w+jPlhLKlPykbTmEMZwHyD4YRFlEie6gCTiGmvkMxhInUe27pIdi7X74dXBy17Tfto+9vaycfV+M4AC/AS9ANngPTsAX0AN9QAzbGBg/DWy+M5FJTFqiRmWleQ62jun/BT2aJeU=</latexit>en
English Nepali
TEST TRAIN //
Cross-Entropy Loss
prediction input sentence mono
+
n
<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>ne
Encoder Decoder
NMT system
Encoder Decoder
¯ y
<latexit sha1_base64="dqcAHIDoQn8vOgakMoO2ou+kvGw=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznle+z1Hxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmf4dIMBeuOVt59Pi/0XrutAmRg8M0UfascbdpPQZEogCuPaPrhXQF/KbqSE6pxA39yVuIuD+B2hu8hpm7FWzZmuocf9mqiTMNfAO9O3Q/T7fKFpZuxMv2KtV2q50feDuwFkEVLE7XrvxGI59ELvUk4ViIvtUO5CDFoWSEU1VGkaABJnM8oX0detilYpDm26dgTWdGcOyH+noS5tlVRYpdIRLX0WTWmtisZcltX4kxweDlHlBJKlHio/GEYfSh9kqwxELKZE80QEmIdNeIZniEBOpF76sh2Btnw7ONtrWe9aez/eV48+LcaxA16B16AOLAPjsBX0AU9QIwPxqXBjJn50aQmN70CNUoLzUuwdsxf/wDiJClw</latexit>decoded output
xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>DATA He et al. “Revisiting self-training for neural sequence generation” ICLR 2020
L(θ) = Lsup(θ) + λLST(θ)
<latexit sha1_base64="LB2Wg9oUku21WD7fdgfkmjlUEFI=">AGP3ichVRNb9NAEHVLAiV8tXDkMqKqlAgTxQEJFSpLanEgaIimrZSnFjrzabZxl/yritH7v4zLvwFbly5cAhrtxYfzQ4jhtWijTZeW/nzdvxmp5FGW+1vq6s3qhUb95au127c/fe/QfrGw+PmRv4mHSxa7n+qYkYsahDupxyi5x6PkG2aZETc/Imzp9cEJ9R1zniU4/0bXTm0BHFiMstY6PS3dJtxMcYWVFHwDboEdRDdowKOjCiOi2pjab6ntR+4c7EAOWIsMBM84T3HmKOzBYAclT5HTAjUmCnFwheR7ZEYNIt03jFAQishgXpREHIoSV59ehk2aluyOuiM2jCnLHfo7kxoPVGqgm4iP5oK47xvWhpg4D+aOkfmUmgO+E3WdjwlHDVnkGeiWewZSG1xCrC4Hu2pNdHb3SzlhzIGn4FzD+3hUSsklZF3Z86nqFAYEzW9hcayW1hsnZdL2iuXFHefFWz87+BymxdrcZcjSywxO/kvZ36IDG2WkefPpAx4HtMu+CeH5DIe5caSu2WBbURM5SLVa5rRvnQx+VIKM9gRGVBAD2BO6GuQOqA/54uRNUkc1SGiaNcfkyX5MdUJc6SvENSfq28xRLbWeDlTAc5YZmBZWA5oTlszVjfbDVbyYLFQMuCTSVbh8b6F3o4sAmDscWYqyntTzej5DPKbaIqOkBIx7CE3RGejJ0kE1YP0rePwFbcmcI9eXP4dDsptnRMhmbGqbEhkrZ8VcvFmW6wV89KofUcLOHFwWmgUWMBdiB9TGFKfYG5NZYCwT6VWwGPkI8zlkxuboBVbXgyO203tebP94cXmzl5mx5ryWHmi1BVNeansKG+VQ6Wr4MqnyrfKj8rP6ufq9+qv6u8UurqScR4pc6v65y9sChKf</latexit>Adding target-side monolingual data. Two benefits: a) Decoder learns a good language model. b) Better generalization via data augmentation. c) Unlike ST, target is correct but input is not. Training Dataset Learning Framework: Back-Translation (BT).
D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>ALGORITHM
additional dataset
D
<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">ACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYiODiWt/DHbteazjaZjO1Xhj8KdiSaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvb3rvebx6fzNaxgrbQNmohFx2gY3SOqiLqPHTuDC6Rs/8ZV6bfNvjZrGLOJ/imTvgLIHMr3</latexit>English Nepali
TEST TRAIN // mono
Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>D ∪ At
<latexit sha1_base64="9r4+5t8z/Eutjwqsqwo6D2I5ofs=">AECnicfZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdKUlSyNZp5352xbQcO46LV+rOl6deu37i5fat0+87de/fLOw+OuB+FhPaI7/jhiY05dZhHe4IJh54EIcWu7dBje/Y6qx+f0ZAz3zsUSUD7Lh57bMQIFipl7WhQNV0sJgQ7aUfCHpgp1OJGUrcYmNJK2Z7RaDYb72XpH3cgB7wg4wG3pjk3LbgDi6+RoiCTgbBmOTlbkGKZ7MhBarq2H6c4iuXCSNQ4WzPSVaKglpzH9VJVdQeTMxdWnC1d2r4wWsudNsC0cZgm0prWrzat1mCSKICVa1S9kC6B72TNFBMqcF01eQqm49BeYNzyNwtYvRZKf9ZqMmzjTwBLwrdB8PN8rmljaJ2xebL6hYZuMn2RZWR1dv4b+TC6tcaTVb+YHLgTEPKmh+ulb5tzn0SeRSTxAHc35qtALRT3EoGHGoahdxGmAyw2N6qkIPu5T30/xTlBVmSGM/FA9noA8u6xIsct54tqKzEzy9VqW3FQ7jcToZT9lXhAJ6pGi0ShyQPiQ/RcwZCElwklUgEnIlFcgExiItTfU1JLMNZHvhwc7TaNZ83dD8r+6/m69hGj9BjVEMGeoH20VvURT1EtE/aF+2b9l3/rH/Vf+g/C1TbmseopWj/oLD/dGQg=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>Encoder
en ne
Cross-Entropy Loss
prediction input sentence
Decoder
NMT system
¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>Encoder Decoder
ne
yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>backward NMT system
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>LBT (θ) = − log p(y|¯ x)
<latexit sha1_base64="QFoLzUqdj7aJb5peYo68MenzgH0=">AERHichZPLbtNAFIanNpdibiks2YyIiUiRHFBgk2lpgSJBUVBNG1FnFjySZxDd5xpUtdx6ODQ/AjidgwKE2CLGl5QkdcVIx2d8/1z/nOksXybMt5uf91S1GvXb9zcvqXdvnP3v3KzoNj5oUBJn3s2V5waiFGbOqSPqfcJqd+QJBj2eTEWrxK6ydnJGDUc4947JOhg6YunVCMuEyZO8rHmuEgPsPITroC7kEjgfWoGTdMCg1hJnRPb7ZazXdC+8cdihHLyWjEzHnGzXPu0GQbJM/JeMTNRUYuliRfJbtilBiO5UJCiOxNBI2zaM9KTIr8fnUOrye7QYNSBa85WHu1cGK1nTpvQsFCQxMKcN642Ldg4NCHa8/Iei5dAd+KusFnhKOGbPIUGrY3hdIbPIepuxVsOZrodl6XaqJUA59A9wrdh6NSWGpTNy52HxORSIdP063sD46/9/oXCtzdFDuKB2+6NfQzEq13WpnB14O9CKoguL0zMoXY+zh0CEuxzZibKC3fT5MUMAptonQjJARH+EFmpKBDF3kEDZMsk8gYE1mxnDiBfK6HGbZVUWCHMZix5JkOg/brKXJstog5JOXw4S6fsiJi/NGk9CG3IPpj4JjGhDM7VgGCAdUeoV4hgKEufx36RL0zZEvB8e7Lf1Za/f98+r+QbGObfAIPAZ1oIMXYB+8AT3QB1j5pHxTfig/1c/qd/WX+jtHla1C8xCsHfXPXzOCXHE=</latexit>DATA Sennrich et al. “Improving NMT models with monolingual data” ACL 2016
At = {(¯ xk, yt
k)}k=1,..,Mt
<latexit sha1_base64="fgtcm85BUPx8R0Y2MIqfz4NPuA=">AF2HichVRdb9MwFM1YC6N8bfDIyxXTpEaEqhlIKFJ2+gkHhgaYt0mjZyXHf1mi/FzpQos8QDCPHKT+ONP8FvwPnYlqZdsVTp1vece89dmz5NmW83f6zdGu5Vr9Z+Vu4979Bw8fra49PmJeGDSxZ7tBScWYsSmLulym1y4gcEOZNjq3JuzR/fE4CRj3kMc+6Tvo1KUjihGXW+ba8t8Nw0F8jJGdARsgZFAM9Ji1aRgCDOhW7rWamkfReMaty8GLEdGA2aeZbizHLdvsgqS58h4wM1JhpxcInkZ2RGDxHAsL0pQGIlLIaF2XhFyIEl+M76I1MaG7A4Gow5MKSsV3bkS2syUamBYKEhiYZ6pN4uWNhg49GqjMzn1BLwg2gafEw4UmWTF2DY3ilIbXABqboS7HI0dnZm8uJUg48B/cG3ufDubRC0jW5LkwPgdFwpxo+SGoiw5hdnI+X9HufEXp8EVD9X+F57s824t7HNligdfZf3nlh8jUrzKy/pWUAS9jNiv2yTtykd5kdcHRstAxE6Zxkeu1rGRPuph9KJUr2BEFUEAPYEroW5A6oD/li1kMSVzNJaJq01R+TBfkx1Qj7oK8S3K+ubrebrWzBbOBXgTrSrEOzNXfxtDoUNcjm3EWE9v+7yfoIBTbBPRMEJGfIQn6JT0ZOgih7B+kj1MAjbkzhBGXiB/Lodst8xIkMNY7FgSmUpm1Vy6OS/XC/noT+hrh9y4uK80Si0gXuQvnIwpAHB3I5lgHBApVbAYxQgzOVb2JAm6NWRZ4OjzZb+srX56dX69m5hx4ryVHmNBVdea1sK+VA6Wr4Fq3ltS+1b7Xv9S/1n/Uf+bQW0sF54kyteq/gGro+qV</latexit>L(θ) = Lsup(θ) + λLBT(θ)
<latexit sha1_base64="P8VXi2JP7nzKZMIQIatLsVBxvsM=">AGP3ichVRNb9NAEHVLAiV8tXDkMqKqlAgTxQEJFSpLanEgaIimrZSnFjrzabZxl/yritH7v4zLvwFbly5cAhrtxYfzQ4jhtWijTZeW/nzdvxmp5FGW+1vq6s3qhUb95au127c/fe/QfrGw+PmRv4mHSxa7n+qYkYsahDupxyi5x6PkG2aZETc/Imzp9cEJ9R1zniU4/0bXTm0BHFiMstY6PS3dJtxMcYWVFHwDboEdRDdowKOjCiOi2pjab6ntR+4c7EAOWIsMBM84T3HmKOzBYAclT5HTAjUmCnFwheR7ZEYNIt03jFAQishgXpREHIoSV59ehk2aluyOuiM2jCnLHfo7kxoPVGqgm4iP5oK47xvWhpg4D+aOkfmUmgO+E3WdjwlHDVnkGeiWewZSG1xCrC4Hu2pNdHb3SzlhzIGn4FzD+3hUSsklZF3Z86nqFAYEzW9hcayW1hsnZdL2iuXFHefFWz87+BymxdrcZcjSywxO/kvZ36IDG2WkefPpAx4HtMu+CeH5DIe5caSu2WBbURM5SLVa5rRvnQx+VIKM9gRGVBAD2BO6GuQOqA/54uRNUkc1SGiaNcfkyX5MdUJc6SvENSfq28xRLbWeDlTAc5YZmBZWA5DjlszVjfbDVbyYLFQMuCTSVbh8b6F3o4sAmDscWYqyntTzej5DPKbaIqOkBIx7CE3RGejJ0kE1YP0rePwFbcmcI9eXP4dDsptnRMhmbGqbEhkrZ8VcvFmW6wV89KofUcLOHFwWmgUWMBdiB9TGFKfYG5NZYCwT6VWwGPkI8zlkxuboBVbXgyO203tebP94cXmzl5mx5ryWHmi1BVNeansKG+VQ6Wr4MqnyrfKj8rP6ufq9+qv6u8UurqScR4pc6v65y9RWBKO</latexit>ALGORITHM
additional dataset
additional dataset
D
<latexit sha1_base64="bApTgAzl7lzR9QdmUSctCLbuIdA=">ACqXicbZFdS+NAFIYn0d3V7IdVL70ZLC6VLSFxBb0RL3wQqXCtpZtSphMpzp28sHMiSTE/Dd/g3f+G6dJxW7dAwMv531e5syZIBFcgeO8GObS8qfPX1ZWra/fv9Ya6xv9FScSsq6NBax7AdEMcEj1gUOgvUTyUgYCHYTE6n/s0Dk4rH0R/IEzYMyW3Ex5wS0C2/8bTjhQTuKBHFWYmPsFfgVtbOd32OvdIv+JHbtu32VWm9c5elr2oy8+8r6r6mLn21wEHN5f6k4iZvHMxzZ9rxwiDOCpJm5dsQafthYiODiWt/DHbteazjaZjO1Xhj8KdiSaVcdvPHujmKYhi4AKotTAdRIYFkQCp4KVlpcqlhA6IbdsoGVEQqaGRbXpEu/ozgiPY6lPBLjqzicKEiqVh4EmpyOqRW/a/J83SGF8OCx4lKTAIlpfNE4FhPvw2PuGQURK4FoZLrWTG9I5JQ0J9r6SW4i0/+KHp7tvb3rvebx6fzNaxgrbQNmohFx2gY3SOqiLqPHTuDC6Rs/8ZV6bfNvjZrGLOJ/imTvgLIHMr3</latexit>English Nepali
TEST TRAIN // mono
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>mono
Adding both source & target-side monolingual data. Training Dataset
D = {(x, y)i}i=1,..,N
<latexit sha1_base64="KpnY63s6K47gGjqo/CKclue32o=">ACuXicfZHfa9swEMdl70fb7Fe2Pe5FLBQyCMbuSjsYhbLtoS8bGSxtIc6MrMiJEsky0rnECP+PY2/7b6Y4DnRp2YHgy9190N30kJwA2H4x/MfPHz0eG/oPk6bPnL7ovX10aVWrKRlQJpa9TYpjgORsB8GuC82ITAW7Spef1/WrG6YNV/kPqAo2kWSW84xTAi6VdH/FksCcEmG/1PgMxb3V4PqXcJxXCeWn0WDIBh8qzuHt/p+2limamWlyhU2mtYtuUoWDbXYUF8T8x8OZrDlqmTZcMstB/dzBdEk3K15frl4GZn0mGdHthEDaB74qoFT3UxjDp/o6nipaS5UAFMWYchQVMLNHAqWB1Jy4NKwhdkhkbO5kTyczENs7X+NBlpjhT2r0cJO9TVgijalk6jrX+5jd2jp5X21cQvZhYnlelMByuvkoKwUG5w7I5yzSiIyglCNXezYjp3/lBwx+4E6Ldle+Ky6Mgeh8cfT/unX9q7dhHb9Bb1EcROkXn6AIN0QhR78SLPeZl/kef+HN/sWn1vZ5jf4J3/wFCeLTrg=</latexit>Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>Learning Framework: Iterative ST & BT.
¯ y
<latexit sha1_base64="tLy1OgwMuQcWRWcjdx94KCJ6o=">ADGHicdZJNb9MwGMed8DbCWwdHLhZVpVaKomRDgsukATtwGSoS3SY1beS47ubWeZHtTImMPwYXvgoXDiDEdTe+DW7SQtvBI1n63l+j/3Y8c5o0L6/i/LvnHz1u07O3ede/cfPHzU2n18IrKCYzLAGcv4WYwEYTQlA0klI2c5JyiJGTmN528W9dNLwgXN0g+yskoQecpnVKMpElFu5bXCRMkLzBi6kjDAxgq2C3dqhdRGOpI0YPA9Tz3nXb+csd6LBqyHItoVnOzhjuOxBYpG7Iay2hek/MVKdfJIz1WYRJnpUJFqVdGCvdy0jfNOXd6mPZczrmdBgKmsANZ2ubvpjtFs7dWEYI64qHc16/zdtxhDiIocb2zirzlb9/w64HURLEUbLKMfta7CSYaLhKQSMyTEMPBzOVKIS4oZ0U5YCJIjPEfnZGhkihIiRqp+WA07JjOB04yblUpYZ9c7FEqEqJLYkAuzYru2SP6rNizk9OVI0TQvJElxc9C0YFBmcPFL4IRygiWrjECYU+MV4gvEZbmLzlmCMH2la+Lkz0v2Pf23j9vH75ejmMHPAXPQBcE4AU4BG9BHwAtj5ZX6xv1nf7s/3V/mH/bFDbWvY8ARthX/0Gco31JA=</latexit>xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>As = {(xs
j, ¯
yj)}j=1,..,Ms
<latexit sha1_base64="WViJMaNEUbzC6NMXxyJtKcfS2lg=">AC73icbZJNb9MwGMedwNgIbx0cuVhUlVopipINaVwmDdiBy1CR6DapaS3HdTe3zgu2MyUy+RJcOIAQV74ON74NbtKNLeORLP39PL9Hz98vUcaZVL7/x7Lv3N24t7l13nw8NHjJ53tp8cyzQWhI5LyVJxGWFLOEjpSTHF6mgmK4jTk2j5dlU/uaBCsjT5qMqMTmJ8lrA5I1iZFNq2NnphjNU5wVwfVnAfhr2C7cIAbDCm2H7ie576vnH/cUTWVDVlMJVrU3KLhjpBskaohy6lCy5pcXpLqOnlYTXUYR2mhcV5Ul0Zy96JlZGiasn75uRg4PTMdhpLF8IYz52r3+spnvzbqwjDCQpcVWgxanlGn63t+HfC2CNaiC9YxRJ3f4SwleUwTRTiWchz4mZpoLBQjnFZOmEuaYbLEZ3RsZIJjKie6fq8K9kxmBuepMCtRsM5e79A4lrKMI0OuziLbtVXyf7VxruavJpolWa5oQpB85xDlcLV48MZE5QoXhqBiWDGKyTnWGCizBdxzCUE7SPfFsc7XrDr7Xx42T14s76OLfAcvAB9EIA9cADegSEYAWJx64v1zfpuf7K/2j/snw1qW+ueZ+BG2L/+Ao275JQ=</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>D ∪ At ∪ As
<latexit sha1_base64="e0NAylEw1oQi7AHFM3rtuBW9V4=">AEdnichZNdaxNBFIanSdS6fqX1SgQZGoIJpmG3CvWm0NQIXliJ2LSFbLMTibJPvFzmzZTv/wF/nb/DGy+d/UhN0q0ODBzOed457zkwpmdRxlX151apXLl3/8H2Q+XR4ydPn1V3ds+ZG/iY9LFruf6liRixqEP6nHKLXHo+QbZpkQtz8SGpX1wRn1HXOeOR4Y2mjp0QjHiMmXslL7XdRvxGUZW3BXwCOoxbIStqGlQqAsjpkdaq91ufRHKX+5UjFhGhiNmzFNunGnBtsgeUZGI24sUnKxJPkq2RWjWLdN4xREIqlkaB1tWGkJ0VeI7oOm0pdoc6ozZc7byaOfGaCN12oK6ifw4Esa8ebdpuQYdBx5ce0bWM+kK+Fk0dD4jHDVlk32oW+4USm/wGibuVrDlaKLb+VioCRMNfAOdO3TfzgpluaUicedm8xkVimT8KNnC+uj8f6PzYksnxZaS6fOGTeXf7xZs2ajW1LaHng70PKgBvLTM6o/9LGLA5s4HFuIsYGmenwYI59TbBGh6AEjHsILNCUDGTrIJmwYp9GwLrMjOHE9eV1OEyzq4oY2YxFtinJxCTbrCXJotog4JP3w5g6XsCJg7NGk8C3IXJH4Rj6hPMrUgGCPtUeoV4hnyEufypilyCtjny7eD8oK29bR98fVc7PsnXsQ1egj3QABo4BMfgE+iBPsClX+UX5b1yrfy78qpSr7zO0NJWrnkO1k5F/QNLWpM</latexit>Ltotal(θ) = − log p(y|x) − λ1 log p(yt|¯ xt) − λ2 log p(¯ ys|xs)
<latexit sha1_base64="yVeodM+YA8v80/XuviO5lyFlo=">AE9HichVRNb9NAEHWbACV8pXDksiKFIsQxQEJLpWaUiQOFAXRtJXqxFpvNskm/sI7jmw5+zu4cAhrvwYbvwb1nYSJalLV7I0nlv5s3T2qZnMQ7N5t+d3ULx1u07e3dL9+4/ePiovP/4jLuBT2iXuJbrX5iYU4s5tAsMLHrh+RTbpkXPzenbpH4+oz5nrnMKkUd7Nh45bMgIBpky9gulqm5jGBNsxcCHSA9RrWwHqkGQ7owYnag1RuN+kexhjsRfZ4hwz43JilukuFODL6FhAwZ9cGYpsjpEgliY3Y/1m3TDWMchGIpJKjPtoR0JMmrRfNQLVXldKRzZqMNZWtN2yuhtVRpHekm9uNIGBP1etHSBp0EHtpoI+sZdQ34QdR0GFPAqhzyAumWO0JSG5qjRN0abLmaOG6/y+WECQc9R841vM+nubSFpDxye+V8hgpFsn6UuLC5Oty0OuRLOsqXlGy/GKje1DjP5pxR4AK2xH+8Tt/lR9gQ1tVZPuVkj6sY1pb9sk7Mk9usmqUK81GMz3oaqAtgoqyOB2j/EcfuCSwqQPEwpxfak0PejH2gRGLipIecOphMsUjeilDB9uU9+L0oxWoKjMDNHR9+TiA0uw6I8Y25FtSmTiCd+uJcm82mUAwze9mDleANQh2aBhYCFwUfIHQAPmUwJWJANMfCa1IjLGPiYg/xMlaYK2vfLV4KzV0F42Wp9eVQ6PFnbsKU+VZ0pN0ZTXyqHyXukoXYUvhS+Fr4XfhRnxW/Fn8VfGXR3Z8F5omyc4u9/a1CXhg=</latexit>mono mono
English Nepali
decoded with: decoded with:
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>phase 1 phase 1 phase 2 phase 2
TRAIN // mono mono
generate train models DATA English Nepali
Shen et al. “The source-target domain mismatch problem in MT” arXiv:1909.13151 2019 Chen et al. “FBAI WAT’19 Myanmar-English translation task submission” WAT@EMNLP 2019
At = {(¯ xk, yt
k)}k=1,..,Mt
<latexit sha1_base64="fgtcm85BUPx8R0Y2MIqfz4NPuA=">AF2HichVRdb9MwFM1YC6N8bfDIyxXTpEaEqhlIKFJ2+gkHhgaYt0mjZyXHf1mi/FzpQos8QDCPHKT+ONP8FvwPnYlqZdsVTp1vece89dmz5NmW83f6zdGu5Vr9Z+Vu4979Bw8fra49PmJeGDSxZ7tBScWYsSmLulym1y4gcEOZNjq3JuzR/fE4CRj3kMc+6Tvo1KUjihGXW+ba8t8Nw0F8jJGdARsgZFAM9Ji1aRgCDOhW7rWamkfReMaty8GLEdGA2aeZbizHLdvsgqS58h4wM1JhpxcInkZ2RGDxHAsL0pQGIlLIaF2XhFyIEl+M76I1MaG7A4Gow5MKSsV3bkS2syUamBYKEhiYZ6pN4uWNhg49GqjMzn1BLwg2gafEw4UmWTF2DY3ilIbXABqboS7HI0dnZm8uJUg48B/cG3ufDubRC0jW5LkwPgdFwpxo+SGoiw5hdnI+X9HufEXp8EVD9X+F57s824t7HNligdfZf3nlh8jUrzKy/pWUAS9jNiv2yTtykd5kdcHRstAxE6Zxkeu1rGRPuph9KJUr2BEFUEAPYEroW5A6oD/li1kMSVzNJaJq01R+TBfkx1Qj7oK8S3K+ubrebrWzBbOBXgTrSrEOzNXfxtDoUNcjm3EWE9v+7yfoIBTbBPRMEJGfIQn6JT0ZOgih7B+kj1MAjbkzhBGXiB/Lodst8xIkMNY7FgSmUpm1Vy6OS/XC/noT+hrh9y4uK80Si0gXuQvnIwpAHB3I5lgHBApVbAYxQgzOVb2JAm6NWRZ4OjzZb+srX56dX69m5hx4ryVHmNBVdea1sK+VA6Wr4Fq3ltS+1b7Xv9S/1n/Uf+bQW0sF54kyteq/gGro+qV</latexit>English Nepali
TEST TRAIN //
Adding parallel data in other languages. Training Dataset Learning Framework: Multilingual Training
DATA
TRAIN // TRAIN //
Hindi
Encoder
Src Tgt
Cross-Entropy Loss
y
<latexit sha1_base64="0flskoD0eLhdjGE1HTyI8J9TWSE=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt+LErlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr3914yar</latexit>human translator human reference prediction input sentence with target language ID
Decoder
NMT system
(x, LID)
<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUgtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j3aGw7cBiHZvPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSFS/WByIElBLRnHurYpyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjoGfKu4H06LKVNJWRW1PnC1QsvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB31+HPJoH</latexit>L(θ) = − X
s,t
E(x,y)∼Ds,t[log p(y|x; t)]
<latexit sha1_base64="2m27Zes8z1jOYapGr5U+EniNnU=">AFW3ichVTbtNAEHWbhAZTaAviZcRVaVEhCguSChSm1JR4oKqI3KU6s9WbTOvEN7iy5e5P8gQP/ApifWlIUpeuZGm8c87MmeP1mr5tcex0fi0tV6q1Byv1h+qj1cdP1tY3np5yLwoO6Ge7QXnJuHMtlx2ghba7NwPGHFMm52Zk49p/uyKBdzy3GOMfdZ3yIVrjSxKUG4ZG5XvW7pD8JISO+kK2AE9gUbUipuGBbowEmtHa7XbrS9C/Yc7FAOeI6MBN8YZbpzjDg2+gMQcGQ/QmGTIyQ0SZ5FdMUh0x/SihISRuBEStq4WhBxJkt+Ir6OmuiW7g84tB+aUzRTdmwptZEpboJskSGJhjJt3i5Y26DT0Ya6MzOfUGeBn0dDxkiFpyiavQbe9C5Da4BpSdTOwm9FEd+glBOlHgF7h28b8eltEJSGXlv6nyOikQ6fpy6MD863jc6lkvaL5eUTl80bN5XuNzm273Q2KL/5idvcszPySGNs3I+lMpA5zFbC/4Jw/JdXqUm+qdn5aHjpHwFopcrmkmB9LE7EdZOIJdUQAF9ADmdH4AKQP6xvpmp93JFtwOtCLYVIp1ZKz/0IceDR3mIrUJ5z2t42M/IQFa1GZC1UPOfEIn5IL1ZOgSh/F+kt0NArbkzhBGXiAfFyHbnWUkxOE8dkyJTEfgi7l0syzXC3H0vp9Yrh8ic2neaBTagB6kFw0MrYBRtGMZEBpYUivQSxIQivI6UqUJ2uLIt4PT7b2pr39e3m7n5hR15obxUGoqmvFN2lU/KkXKi0MrPyp/qSrVe/V2r1NTag5dXio4z5S5VXv+F4MBuCM=</latexit>Share the same encoder and the same decoder with all the language pairs. Prepend a target language identifier to the source sentence to inform decoder of desired language. Concatenate all the datasets together. Train using standard cross-entropy loss.
TRAIN //
Den,ne ∪ Den,hi ∪ Dhi,en ∪ Dne,hi
<latexit sha1_base64="tUwh6HQXAg03JxStpQv2fWQE=">AF2HichVTLbtNAFHVpAiW8WliyuaKqlAgTxQUJFSpLanEgqIimrYiTqzxZNJM45c848qWOxILEGLp7HjJ/gGxo+G2HXTkSxdz3n3nOPx2N6FmW80/mzdGu5Vr9Z+Vu4979Bw8fra49PmJu4GPSw67l+icmYsSiDulxyi1y4vkE2aZFjs3puyR/fE58Rl3nkEceGdjo1KFjihGXW8ba8t8N3UZ8gpEVdwVsgR5DM1SjlkFBF0ZMtzS13VY/isZ/3L4YsgwZDplxluLOMty+wUpIniGjITemKXJ6ieTzyK4YxrptumGMglBcCgnU85KQA0nymtF2GpsyO6gM2pDQdlc0Z2Z0GaqVAXdRH4cCeOsdb1oaYOAw8KZWQ+o84BP4imzieEo5Zs8gJ0yz0FqQ0uIFE3B7scTXR39io5YcKB5+Bcw/t8WEnLJVWRd2bOZ6hQJONHiQvF0flNo/NqSbvVkpLp84atmwpX23y1F3c5sQCs9N3eZHyNBmGVl/JmXI5zGbJf/kIblIjnJrwbdlgW3ETOUi02ua8Z50Mf1TSmewK3KgD5AQehbkDpg0ChgsxmJozpElF0q5Cd0QX5CVeIsyDsk4xur6512J1wNdDyYF3J14Gx+lsfuTiwicOxhRjrax2PD2Lkc4otIhp6wIiH8BSdkr4MHWQTNojTi0nAhtwZwdj15eNwSHfnGTGyGYtsUyITyaycSzarcv2Aj98MYup4AScOzhqNAwu4C8ktByPqE8ytSAYI+1RqBTxBPsJc3oUNaYJWHvlqcLTZ1l62Nz+9Wt/eze1YUZ4qz5SmoimvlW3lvXKg9BRc69Xi2rfa9/qX+tf6j/rPDHprKec8UQqr/usfSBTqkQ=</latexit>Johnson et al. “Google’s multilingual NMT system…” ACL 2017 Aharoni et al. “Massively multilingual NMT” ACL 2019
depending on the available data.
training perform strongly on low resource languages.
some level of craftsmanship…
32
amount of data domain language pair
put together lots of language pairs and monolingual data.
33
English French
TEST mono mono
DATA
Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>en
¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>Encoder Decoder
fr
yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Artetxe et al. “An effective approach to unsupervised MT” ACL 2019
35
English French
TEST mono mono
DATA
Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>Encoder
en fr
Cross-Entropy Loss
prediction input sentence
Decoder
NMT system
¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>Encoder Decoder
fr
yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>backward NMT system
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>…and vice versa starting from English. This is an example of auto-encoding or cycle consistency. Problem: lack of constrained on ¯
English French
TEST mono mono
DATA
Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>Encoder
en fr
Cross-Entropy Loss
prediction input sentence
Decoder
NMT system
¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>Encoder Decoder
fr
yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>backward NMT system
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>Problem: lack of modularity.
Decoder may behave differently when fed with representations from French encoder VS English encoder.
en
Cross-Entropy Loss
prediction input sentence
+
en
Encoder Decoder
xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>DAE makes sure decoder outputs fluently in the desired language.
n
<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>NMT system
Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Artetxe et al. “An effective approach to unsupervised MT” ACL 2019
37
English French
TEST mono mono
DATA
Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>Encoder
en fr
Cross-Entropy Loss
prediction input sentence
Decoder
NMT system
¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>Encoder Decoder
fr
yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>backward NMT system
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>en
Cross-Entropy Loss
prediction input sentence
+
n
<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>en
Encoder Decoder
NMT system
xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>DAE makes sure decoder outputs fluently in the desired language.
Encoder
Src Tgt
prediction input sentence with target language ID
Decoder
NMT system
(x, LID)
<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUgtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j3aGw7cBiHZvPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSFS/WByIElBLRnHurYpyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjoGfKu4H06LKVNJWRW1PnC1QsvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB31+HPJoH</latexit>Like in multilingual NMT, share encoder and decoder
shared representations (particularly if pre-trained).
38
English French
TEST mono mono
DATA
Mt = {yt
k}k=1,..,Mt
<latexit sha1_base64="WPS0Z7qvI346bxZ6Qp8zIQogqY=">ADd3icbZJLb9NAEMc3Do8SXinc4MCIEOQIE8UFCS6VWgSB4qCRNpKcWKtN5tkE7/kXVe23P0IfDlufA8u3FjbCThpV7I0mvnNzH/G4Qu46LX+1XT6jdu3rq9d6dx979Bw+b+49OeRBHhA5J4AbRuYM5dZlPh4IJl56HEcWe49IzZ/Uxj59d0IizwP8u0pCOPTz32YwRLJTL3q/9aFseFguC3awv4RCsDPTESDs2A0vaGTs0jW7X+Cob/7kTOeElmUy4vSy4Zcmd2Fw2qAowXQi7FUBrjagqJbsy0lmeU6QZDhO5EZHbFzs6BiopFBPL5NOo62ag8WZB1vCKkWP/+nUC6EGWA6OslTay86u5q0tWCQOYauMipepFfCL1C2xoAJ3VJPXYLnBHJQ2uIRcXQXbjCb7x5+uzUnyHgFfsdutnrdXvHgqmGujRZav4Hd/GlNAxJ71BfExZyPzF4oxhmOBCMuVf8i5jTEZIXndKRMH3uUj7PibiS0lWcKsyBSny+g8FYzMuxnqOIvNp+G4sd14XG8Vi9n6cMT+MBfVJ2WgWuyACyI8QpiyiRLipMjCJmNIKZIEjTIQ61YZagrk78lXj9KBrvukefHvbOvqwXsceoqeIx2Z6B06Qp/RA0Rqf3Wnmgt7YX2p/6s/rKul6hW+c8Rluvbv4FlMEQtg=</latexit>Ms = {xs
j}j=1,..,Ms
<latexit sha1_base64="Wuso7jnamC5w7EBqWeo2cvUsqLM=">ACx3icbZFNj9MwEIad8LWErwJHLhZVpa5URcmCBJeVFugBDouKRHdXalrLcd1dt04c2ZMqUciBv8iNC78FN+mibpeRL2aecZ+PRNnUhgIgt+Oe+fuvfsPDh56jx4/efqs8/zFmVG5ZnzMlFT6IqaGS5HyMQiQ/CLTnCax5Ofx6tOmfr7m2giVfocy49OEXqZiIRgFmyKdP70oXDFqKyGNT7GUYX7xaA8JAJHNanEcTjw/cHX2vuHndYz04LFzJBlgy1b7JSY2uvtktCS5QzIqiFX1yTsksN6VkVJrIqK5kV97SMfrPd8jGxT1i9/FIdez76OIyMSfMPZzqUfamJIpxv4QRP4tgi3ou2MSKdX9FcsTzhKTBJjZmEQbTimoQTHI7htzwjLIVveQTK1OacDOtmj3UuGczc7xQ2p4UcJPd7ahoYkyZxJbcmDT7tU3yf7VJDov30qkWQ48Ze1Di1xiUHizVDwXmjOQpRWUaWG9YnZFNWVgV+/ZIYT7X74tzo78I1/9O1t9+TjdhwH6BV6jfoRO/QCfqMRmiMmDN0lo5xwP3iKnftFi3qOtuel+hGuD/Am3y1zs=</latexit>Encoder
en fr
Cross-Entropy Loss
prediction input sentence
Decoder
NMT system
¯ x
<latexit sha1_base64="jDLmMVPu3t12MXHk9cCHPRMrYik=">ADvXicdZJdb9owFIZNso+OfdHtcjfWEBJoDJFuUqtJ1dqNSbtYJ6aVthKG1DEGDM6HYqdLlPpPblf7N3MSWIFS5aOznlen9dHxwk4E7Ld/lsyzHv3HzceVR+/OTps+eV3Rdnwo9CQnvE5354WBOfNoTzLJ6UQUuw6nJ4789Z/fyKhoL53qlMAjpw8cRjY0aw1Cl7t/SnhlwspwTztKPgIUQprMfNpGEziJSdskOr2Wo1v6vyDXeihqIg46GwZzk3K7gTW2yQsiCTobTnOTlfknKV7KhilzHj1McxWpJGpebRjpalFQT67jRrmu0MkmAvXnK08evzfaD132oTIwWGaKHvWuNu0HgMiUQDXntH1QroCflN1JKdU4oZu8hYi7k+g9gavYeZuBVt+TXWOv2zVxJkGvoHeHbqfp1tlC0s34jwRK7tSbfa+YG3A2sRVMHidO3KbzTySeRSTxKOhehb7UAOUhxKRjhVZRQJGmAyxPa16GHXSoGab59CtZ0ZgTHfqivJ2GeXVWk2BUicR1NZl8Tm7Usua3Wj+T4YJAyL4gk9UjRaBxKH2YrTIcsZASyRMdYBIy7RWSKQ4xkXrhy3oI1uaXbwdney3rXWvx/vq0afFOHbAK/Aa1IEF9sER+Aq6oAeI8cG4NJgxMz+a1OSmV6BGaF5CdaO+esf4J8pbw=</latexit>Encoder Decoder
fr
yt ∼ Mt
<latexit sha1_base64="oC/1YnfKTpkRP2NJh0qgsiTvklU=">ADzHicdZJba9swFMcVe5cu6Xb417EQiBhWYi7QftSaLcMNlhLxpq2ECVGVpREiW9YcrFR9boPuLe97pNMtpM1tx4QHM75/X+OsgJXcZFu/2nZJgPHj56vPek/PTZ8xcvK/uvLnkQR4T2SOAG0bWDOXWZT3uCZdehxHFnuPSK2f+Oetf3dCIs8C/EGlIBx6e+GzMCBa6ZO+X/taQh8WUYFd2FDyGSMJ60kwbNoNI2ZIdW81Wq3muynfcmRrygkyG3J7l3Kzgzmy+QYqCTIfCnufkfEmKVbKjhJ5TpBIHCdqaSRu3mwY6WpRWE9vk0a5pqdDxJkH15ytXHr632g9d9qEyMGRTJU9a9xvWq8BkTiEa9fofiFdAb+rOhJTKnBD3kPkRtMoPYGb2HmbgVbPk1Tr/s1CSZBr6D/j26nxc7ZQtLd2K95e2NCLtSbfaecDtxFokVbCIrl35jUYBiT3qC+JizvtWOxQDiSPBiEtVGcWchpjM8YT2depj/KBzD+jgjVdGcFxEOnjC5hXVxUSe5ynqPJzCTf7GXFXb1+LMZHA8n8MBbUJ8WgcexCEcDsZ8MRiygRbqoTCKmvUIyxREmQv/sl6Ctfnk7eTyoGV9aB38+Fg9+bRYx54A96COrDAITgBX0EX9AxvhmBkRipeW4KU5qQI3SQvMarIX56x/eni+B</latexit>backward NMT system
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>en
Cross-Entropy Loss
prediction input sentence
+
n
<latexit sha1_base64="EpqUFrDe+QYGhdFpOA+JzvOejPY=">ADt3icdZJdb9owFIZNso+OfZR2l7uxhpBAY4h0lbZeVGo3Ju1inZhW2moYIscYMDgfi50qUeqfuJvd7d/MSWAFSi1ZOjrneX1eHx0n4EzIdvtvyTAfPHz0eOdJ+emz5y92K3v7F8KPQkJ7xOd+eOVgQTnzaE8yelVEFLsOpxeOvNPWf3ymoaC+d65TAI6cPHEY2NGsNQpe6/0u4ZcLKcE87Sj4DFEKazHzaRhM4iUnbJjq9lqNb+p8i13poaiIOhsGc5Nyu4M1tskLIgk6G05zk5X5JyleyoYpcx49THMVqaSRqXm8Y6WpRUE9u4ka5prtDJgL15ytPHr632g9d9qEyMFhmih71rjftB4DIlEA157R9UK6An5VdSnVOKGbvIWIu5PoPYGb2DmbgVbfk1Tj9v1cSZBr6B3j26H+dbZQtLt2LPrlTbrXZ+4N3AWgRVsDhdu/IHjXwSudSThGMh+lY7kIMUh5IRTlUZRYIGmMzxhPZ16GXikGa752CNZ0ZwbEf6utJmGdXFSl2hUhcR5PZp8RmLUtuq/UjOf4wSJkXRJ6pGg0jiUPsyWGI5YSInkiQ4wCZn2CskUh5hIveplPQRr8t3g4uDlvWudfD9sHrycTGOHfAKvAZ1YIH34AR8AV3QA8Q4NH4axBiZR6Ztjs1pgRqlheYlWDvmr39lNyag</latexit>en
Encoder Decoder
NMT system
xs ∼ Ms
<latexit sha1_base64="+D2rKjCVJUv9t1Da29dPHSAUntk=">ACx3icbZFNbxMxEIa9y0fL8hXgyMUiqpRK0Wq3INFLpQI5wKEoSKStlE0sr+O0TrwfsmejXZk98Be5ceG34OymIk0ZydKrmWc8rz1xLoWGIPjtuPfuP3i4t/Ie/zk6bPnRcvz3VWKMZHLJOZuoyp5lKkfAQCJL/MFadJLPlFvPy0rl+suNIiS79DlfNJQq9SMReMgk2Rzp+DKFwzag0gxqf4MjgXtmvDonAU2MOAn7vt/Wnv/uLN6qluynGqyaLhFy50RvUNCS1ZTIMuGXN6QsE0O6qmJkjgrDS3K+sZI0V/tGBnaprxX/SgPTscR1ok+JaxrTs/1ESTjfwgybwXRFuRBdtYkg6v6JZxoqEp8Ak1XocBjlMDFUgmOS1FxWa5Qt6RUfW5nShOuJafZQ4wObmeF5puxJATfZ7Q5DE62rJLbk2qTera2T/6uNC5gfT4xI8wJ4ytpB80JiyPB6qXgmFGcgKysoU8J6xeyaKsrArt6znxDuPvmuOD/yw7f+0bd3dOPm+/YR6/RG9RDIXqPTtFnNEQjxJyBs3C0A+4XN3NXbtmirPpeYVuhfvzL1Ju1zs=</latexit>DAE makes sure decoder outputs fluently in the desired language.
Encoder
Src Tgt
prediction input sentence with target language ID
Decoder
NMT system
(x, LID)
<latexit sha1_base64="HmWqM9tiKnFxrNajg0yfX6wBGuo=">AE/nichVTLbtNAFHUbA8W8UmDHZkRVKRYhigsSbCo1JUgtaiIvqQ6scaTSTOJX/JcV7ackfgVNixAiC3fwY6/YWynIQ+XjmTp+t5z7j3aGw7cBiHZvPympFvXHz1tpt7c7de/cfVNcfHnM/Cgk9Ir7jh6c25tRhHj0CBg49DUKXduhJ/boTVY/uaAhZ753CElAOy4+91ifEQwyZa1XHm+aLoYBwU7aFmgbmSmqxfVEtxgyhZWybaPeaNQ/CO0fbl90eYGMu9wa5rhgdu3+AISCmTSBWuUI0eXSJhFtkU3NV3bj1McxeJSFS/WByIElBLRnHurYpyOTMxfNKZtp2poKreVK68i0cZgmwhrqV4uWNpgkCtBcG1kvqDPAPVEzYUAB63LIc2Q6/jmS2tAYZepmYJeriXbrbSknzjoGfKu4H06LKVNJWRW1PnC1QsvWTzIX51eG61aFc0m65pGz7yUD9usblNi/PAh+wI/5jdv4u73wPW8a0IvtPpXRhFrO14J+8JOPsKuavPlo731bt6obzUYzP2g5MCbBhjI5B1b1t9nzSeRSD4iDOT8zmgF0UhwCIw4VmhlxGmAywuf0TIYedinvpPnK9CmzPRQ3w/l4wHKs7OMFLucJ64tkZk5fLGWJctqZxH0X3dS5gURUI8Ug/qRg8BH2b8A9VhICTiJDAJmdSKyACHmID8Y2jSBGNx5eXgeKthvGhsfXy5sbM7sWNeaI8VWqKobxSdpR3yoFypJBKWvlS+Vb5rn5Wv6o/1J8FdHVlwnmkzB31+HPJoH</latexit>Like in multilingual NMT, share encoder and decoder
shared representations (particularly if pre-trained).
Same ideas can be applied to phrase- based statistical MT systems (PBSMT). NMT and PBSMT can be combined for even better results.
WMT’14 En-Fr
Lample et al. “Phrase-based and neural unsupervised MT” EMNLP 2018 Since unsupMT was trained on about 10M sentences, each parallel sentence is worth 100 monolingual sentences (for this dataset and language pair).
40
In-domain (Wikipedia) Out-of-domain Parallel None
500K sentences (Bible, GNOME/Ubuntu, OpenSubtitle, …) *Hindi: 1.5M
Monolingual 100K sentences
~5M sentences (CommonCrawl) *Hindi: 45M
41
42
43
44
In-domain (News) Out-of-domain Parallel 20K sentences 200K sentences Monolingual ~79M sentences (En only) ~23M sentences (My only)
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
My —> En En —> My
BLEU 26 29 32 35 38 Parallel
BLEU 35 36.5 38 39.5 41 Parallel
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
My —> En, iter. 2
BLEU 30 31.25 32.5 33.75 35
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
BLEU 24 28 32 36 40 Ours NICT NICT-NMT UCSMNLP
En —> My
BLEU 18 23.5 29 34.5 40 Ours NICT-NMT NICT UCSYNLP
My —> En +8 BLEU compared to second best
“FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019
50
1
23.3 26.32 27.74 slide credit to Peng-Jen Chen
can afford training bigger models. Bigger models train on more data generalize better.
monolingual sentences give the same training signal as a single pair of parallel sentence.
51
52
life of a researcher
“The FLoRes evaluation for low resource MT:…” Guzmán, Chen et al. ’EMNLP 2019 “Phrase-based & Neural Unsup MT” Lample et al. EMNLP 2018 “FBAI WAT’19 My-En translation task submission” Chen et al., WAT@EMNLP 2019 “Massively Multilingual NMT” Aharoni et al.,ACL 2019 “Multilingual Denoising Pre-training for NMT” Liu et al., arXiv 2001:08210 2020 “Analyzing uncertainty in NMT” Ott et al. ICML 2018 “On the evaluation of MT systems trained with back-translation” Edunov et al. ACL 2020 “The source-target domain mismatch problem in MT” Shen et al. arXiv 1909.13151 2019
Simulating low-resource MT with a high resource language: using EuroParl data with 20K parallel sentences and 100K monolingual target sentences.
parallel data + BT 30.4 BLEU 33.8 BLEU
+3.4 BLEU!
https://www.statmt.org/europarl/
EuroParl Fr—>En
BT sometimes yields very mild improvements.
The FLORES evaluation datasets… Guzmán, Chen et al. EMNLP 2019
Example
parallel data + BT 15.2 BLEU 15.3 BLEU FB public posts En—>My
+0.1 BLEU!
Sports
55
For the same topic: Different distribution over words!
Food
56
For the same topic: Different distribution over words!
topic distribution
57
Domains differ in the topic distribution
Si-En En-Si
translation Guzmàn, Chen et al. “The FLoRes evaluation datasets for low resource MT…” EMNLP 2019
Wikipedia in Sinhala has different topic distribution.
geographic location.
geographic locations are typically farther apart and cultures have more distinct traits.
topic different distribution over words.
won’t work as well.
Language and place. Johnstone Cambridge Univ. Press 2010 Leech et al. “Computer corpora: What do they tell us about culture?”Journal Computers in English Linguistics 1992
60
Source Domain EuroParl Target Domain OpenSubtitles
mono mono
Source Language: Fr Target Language: En
10K sentences <1M sentences
61
<1M sentences human translations human translations
Source Domain EuroParl
mono mono
Source Language: Fr Target Language: En
62
EuroParl + OpenSubtitles
Target Domain OpenSubtitles
Source Domain EuroParl Target Domain
mono mono
Source Language: Fr Target Language: En
63
EuroParl + OpenSubtitles
intermediate value of α
<latexit sha1_base64="KLcj9QsFirN9JCJt1Gli0B4M6Gw=">AF+nicjVTbtMwGM5GCyOcOrjkxmKq1NGDkoIACSZNoElcgBiwk9S0kZM6rZlzUOzSVJkfhRsuQIhbnoQ73obfSQtp1w4sRbH/w+fvP9mJGOXCMH6trV8qlS9f2biqX7t+4+atyubtIx6OYpcuiEL4xMHc8JoQA4FYycRDHBvsPIsXP6QumP5KY0zA4EJOIdH08CKhHXSxAZG+WKlU1ZJtZEVxGIkQkV6KLBYOUGSnlu+ESWoJGkzSV6+lMqwj8pF5Nek20g2pjOzXbpmxY/VDwhjo9MiTgceovxVjhgc7Q2DYbKBOizARg9OqbmiWGROBtuMvyaVAzGopAr/4UzRNJdCrvWZR5IOvIpq5KQpuGHDRQhJZXhjxlCLOpDjgnPM7CzKmS9miekltjmlF9iHyiHizI1b1sHU6hKHyu51QFgpBYdZGAfCR2+kzI1cTmjICkF2MXiudjMXScdE/aKSitrEfSmPSlqJvTk8OwezorwIHMYczpWUCNoAsKfLclkhcC/xds01wCDOHmkVEPSrMQZx8LrDoGEr2g+RDSQCjVnxIW2S2HkaizAgWIdLM+WQkxs7wYQ89mqXcfsJIenZupQtCgaiRGp+VQwL6Wc/9q8/y3lqmbr7be/XpNhjSOGH8C4w4gkcx+E4HUjdwiwaYruyZbSMbKHzG3O62dKma9+u/IQCuyOfBMJlmPOaUSim+JYUJcRgB1xEkFj4AHpwDbAPuHdNOsYiaog6SOYSPgCgTJp0SPFPucT3wFLVR2+qFPCZbrOSHhPuikNopEgZtf5I0YgnKodxD1aUxcwSawW5MgStyhxgmSMBrqUMSzMWQz2+O2i3zQav9uHW7vNpOja0u9o9raZ2mNtV3up7WuHmlsalz6VvpS+ls/Kn8vfyt9z0/W1qc8dbW6Vf/wG2L8CJA=</latexit>OpenSubtitles
Source Domain & Target Domain EuroParl
mono mono
Source Language: Fr Target Language: En
64
EuroParl + OpenSubtitles
target mono data source mono data in-domain mono data
Q.: Is it better to have clean targets but out-of-domain data, or noisy targets but in-domain data? Q.: What’s the effect of amount of parallel/monolingual data? Q.: What’s the effect of the quality of the model forward model when training with ST?
65
Varying Domain of Target Originating Data
Target originating data is out-of-domain Target originating data is in-domain
EuroParl + OpenSubtitles
learn:
1) 2) 3)
learn: apply Source Target
1)
learn:
2)
source mono
apply
model translation
3)
re-learn:
source mono
model translation
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>learn:
1) 2) 3)
learn: apply Source Target
1)
learn:
2)
source mono
apply
model translation
3)
re-learn:
source mono
model translation
p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>p(x|y)
<latexit sha1_base64="CyQwSgLN0ehm4cfS61mfQujnag8=">ADvnicdZJda9swFIYVex9d9pVul7sRC4GEZcHuBhuMQrplsIt1ZKxpC1FiZEVJlMgfWHKxcfUnx272bybyZqkqcBwOd5j14dHzfkTEjL+lsxzHv3Hzw8eFR9/OTps+e1wxfnIogjQgck4EF06WJBOfPpQDLJ6WUYUey5nF64y95/eKRoIF/plMQzry8MxnU0aw1CnsPKngTws5wTzrKfgMUQZbCbtOUwiJSTsWO73em0f6jqDXeqxqIk7FwFgW3KLlTR+yQsiTsXSWBblck3KT7Klxhjw3SDIcJ2ptJG5f7Rjpa1HYTK+TVrWhb4dIMA9uOdtoevLfaLNw2obIxVGWKmfRutu0HgMicQi32uh6Kd0Av6smknMqcUtf8hYiHsyg9gavYe5uA1s/TfVOvu7VJLkGvoH+HbpfZ3tlK0s34rKT/nm1utWxigNvB/YqIPV6Tu132gSkNijviQcCzG0rVCOMhxJRjhVRQLGmKyxDM61KGPSpGWbF+CjZ0ZgKnQaQ/X8Iiu6nIsCdE6rmazN8mdmt5cl9tGMvpx1HG/DCW1CflRdOYQxnAfJfhEWUSJ7qAJOIa+QzHGEidQbX9VDsHefDs4P+rY7zpHP9/Xu59X4zgAr8Br0AQ2+AC64BvogwEgxicDGwtjaXbNqemZQYkalZXmJdg6ZvIPxHwo5g=</latexit>p(y|x)
<latexit sha1_base64="dBHzK3A8/hcNmG8nE+hfv1ikGd4=">AECnichZNb9NAEIa3Nh8lfKVw5DIipSIEMUFCS6VGgSB4qCaNpKdWKtN5tkE3/Ju65suXvmwl/hwgGEuPILuPFvWNtJSdJUrGRpNPO8O+MvHbgMC5arT9bmn7t+o2b27dKt+/cvXe/vPgiPtRSGiP+I4fntiYU4d5tCeYcOhJEFLs2g49tmevs/rxGQ05871DkQS07+Kx0aMYKFS1o4GVdPFYkKwk3Yk7IGZQi1uJHWLgSmtlO0ZjWaz8V6W/nEHcsALMh5wa5pz04I7sPgaKQoyGQhrlpOzBSmWyY4cpKZr+3GKo1gujESNszUjXVkKasl5XC9VXMwOXNhxdjSne0Ln7XcaANMG4dpIq1p/WrPagsmiQJYuUbVC+kS+E7WTDGhAtdVk6dgOv4YlDc4h8zdEraYTHbabzZq4kwDT8C7QvfxcKNsbmTuH2x+IKZTZ+km1hdXTxv9GFVa60mq38wOXAmAcVND9dq/zbHPokcqkniIM5PzVageinOBSMOFSWzIjTAJMZHtNTFXrYpbyf5r+yhKrKDGHkh+rzBOTZUWKXc4T1ZkZpKv17LkptpJEYv+ynzgkhQjxSNRpEDwofsXcCQhZQIJ1EBJiFTXoFMcIiJUK+npJZgrI98OTjabRrPmrsfnlf2X83XsY0eoceohgz0Au2jt6iLeohon7Qv2jftu/5Z/6r/0H8WqLY1zxEK0f/9RdRQUZC</latexit>69
combining ST & BT worked best in En-My.
as the amount of parallel and monolingual data, the domains, etc. In particular, if domains are not too distinct, STDM may even help regularizing!
71
matching sentence in embedding space.
using actual existing bilingual + mono data!
Schwenk et al. “CCMatrix: mining billions of high-quality parallel sentences on the WEB” arXiv:1911.04944 2019
BT pre-training multilingual filtering
aligning domains, learning with less supervision, leveraging compositionality, etc.
sets…
speed, controllability, safety, biases…
73
Guillaume Lample Ludovic Denoyer Alexis Conneau Hervé Jegou Myle Ott Michael Auli Sergey Edunov Peng-Jen Chen Matt Le Jiajun Shen Juan-Miguel Pino Jiatao Gu Philipp Kohen Paco Guzmán Vishrav Chaudhary Xian Li Junxian He Naman Goyal