Target Conditioned Sampling:
Optimizing Data Selection for Multilingual NMT
Xinyi Wang, Graham Neubig
Language Technologies Institute Carnegie Mellon University
Target Conditioned Sampling: Optimizing Data Selection for - - PowerPoint PPT Presentation
Target Conditioned Sampling: Optimizing Data Selection for Multilingual NMT Xinyi Wang, Graham Neubig Language Technologies Institute Carnegie Mellon University Multilingual NMT glg: A ma que eu nunca vou spa: Una maana que nunca
Xinyi Wang, Graham Neubig
Language Technologies Institute Carnegie Mellon University
spa: Una mañana que nunca olvidaré . ita: Una mattina che non dimenticherò mai . por: Uma manhã que nunca vou esquecer . jpn:その⽇旦の朝のことは 決し て忘れることはないでしょう
A morning that I will never forget .
glg: A mañá que eu nunca vou
LRL (Zoph et al. 2016)
Hu 2018, Gu et al. 2018)
less heuristic way?
Q(X, Y) Q(X, Y) ≈ Ps(X, Y)
.... S S1 Sn−1 T Sn
union of targets
A morning that I will never forget. When I was 11, I usually stay with ....
A morning that I will never forget.
Q(Y) spa: Una mañana.... ita: Una mattina ... por: Uma manhã .. jpn:その⽇旦の朝...
Sampled Data
por: Uma manhã ..
A morning that I will never forget.
Q(X|y)
measures how likely is in language
, normalize over all multilingual for a given target
Q(Y) y Ps(Y) Q(X|y) Ps(X = x|y) x s sim(x, s) xi y
Vocab Overlap
Language Model Language Level
character n-gram between S and each language
score document of each language
Sentence Level
character n-gram between S and each sentence use LM on S to score each sentence
, then sample based
, fixed during training
y Q(Y) (xi, y) Q(X|y) x′ = argmaxxQ(x|y)
Galician (glg), Slovak (slk)
2018)
data by copying them to the source (Currey et al. 2017)
0.75 1.5 2.25 3 aze bel glg slk
All copied TCS-S
Relative difference from Bi
0.55 1.1 1.65 2.2 aze bel glg slk
TCS-D TCS-S
0.55 1.1 1.65 2.2 aze bel glg slk
LM Vocab
Relative difference from Bi
0.55 1.1 1.65 2.2 aze bel glg slk
Sent Lang
Relative difference from Bi
selection
similarity
https://github.com/cindyxinyiwang/TCS
1.25 2.5 3.75 aze bel glg slk
back-translate TCS-S
Ps(X|y)
1 2 3 aze bel glg slk
All copied TCS-S