SwitchOut: An Efficient Data Augmentation for Neural Machine Translation
Xinyi Wang∗, Hieu Pham∗, Zihang Dai, Graham Neubig November 2, 2018
∗:equal contribution 1 / 41
SwitchOut: An Efficient Data Augmentation for Neural Machine - - PowerPoint PPT Presentation
SwitchOut: An Efficient Data Augmentation for Neural Machine Translation Xinyi Wang , Hieu Pham , Zihang Dai, Graham Neubig November 2, 2018 :equal contribution 1 / 41 Data Augmentation Neural models are data hungry, while
∗:equal contribution 1 / 41
1image source:Medium 2 / 41
1image source:Medium 3 / 41
◮ Discrete vocabulary ◮ NMT sensitive to arbitrary noise 1image source:Medium 4 / 41
5 / 41
6 / 41
7 / 41
8 / 41
9 / 41
10 / 41
11 / 41
12 / 41
13 / 41
14 / 41
15 / 41
16 / 41
17 / 41
18 / 41
◮ Diversity: larger support with all valid data pairs (x, y) ⋆ Entropy H
19 / 41
◮ Diversity: larger support with all valid data pairs (x, y) ⋆ Entropy H
◮ Smoothness: probability of similar data pairs are similar ⋆ q maximizes similarity measure rx(x,
20 / 41
◮ Diversity: larger support with all valid data pairs (x, y) ⋆ Entropy H
◮ Smoothness: probability of similar data pairs are similar ⋆ q maximizes similarity measure rx(x,
22 / 41
23 / 41
◮ Dictionary: jointly on x and y, but deterministic and not diverse ◮ Word dropout: only x side with null token ◮ RAML: only y side 24 / 41
25 / 41
26 / 41
◮ Negative Hamming Distance, following RAML 27 / 41
1 How many words to corrupt?
28 / 41
1 How many words to corrupt?
2 What is the corrupted sentence?
29 / 41
◮ en-vi: IWSLT 2015 ◮ de-en: IWSLT 2016 ◮ en-de: WMT 2015
◮ Transformer model ◮ Word-based, standard preprocessing 30 / 41
31 / 41
32 / 41
33 / 41
34 / 41
35 / 41
36 / 41
37 / 41
38 / 41
39 / 41
40 / 41
41 / 41