NAIST’s Machine Translation Systems
for IWSLT 2020 Conversational Speech Translation Task
Ryo Fukuda1, Katsuhito Sudoh1, and Satoshi Nakamura1,2
1Nara Institute of Science and Technology 2AIP Center, RIKEN, Japan
NAISTs Machine Translation Systems for IWSLT 2020 Conversational - - PowerPoint PPT Presentation
NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan Brief Overview Ch
1Nara Institute of Science and Technology 2AIP Center, RIKEN, Japan
style les of out-of-domain data to be like in- domain data, and them performed domain adaptation
2
3
4
disfluent Spanish fluent English fluent Spanish fluent English
in-domain data
data
5
fluent Spanish fluent English disfluent Spanish fluent English St Style Transfer
data pseudo in-domain data
6
7 Out-of- domain monolingual Out-of- domain parallel Ps Pseudo in in-do domain pa parallel In-domain parallel Train In-domain monolingual Train In-domain source In-domain target (1 (1) ) Style Transfer model
Fluent-to-Disfluent
(2 (2) ) NMT model
Spanish-to-English
with out-of-domain fluent data and in-domain disfluent data
8
Ya, ya, so, duerme casi once horas He estado durmiendo casi once horas Style Transfe fer model
Fluent-to-Disfluent He’s been sleeping for almost 11 hours He’s been sleeping for almost 11 hours
9 pr pre-tr trained d mode del
Spanish-to-English
Pseudo in-domain parallel In-domain parallel fi fine-tu tuned d mode del
Spanish-to-English
10
Fisher)
UNCo Corpus)
11
Da Data statistics # sentences Fi Fisher er (in-domain)/Train 138,720 Dev 3,977 Test 3,641 UN UNCorpus (out-of-domain)/Train 1,000,000 Dev 4,000 Test 4,000
similar ilarity ity between domains by measuring the pe perpl plexity of 3-gram language model
12
Out-of- domain monolingual Out-of- domain parallel Pseudo in-domain parallel In-domain parallel Train In-domain monolingual Train In-domain source In-domain target (1) Style Transfe fer model Fluent-to-Disfluent (2 (2) ) NMT model Spanish-to-English
13 Training data perplexity unknow words Fisher 72.46 UNCorpus 589.81 5,173,539 Fi Fishe her-lik like e UN UNCorpus 474.47 4,217,819 UNCorpus Fisher-like UNCorpus
d conducta y disciplina eh eh conducta y disciplina c lista amplia de verificación para la autoevaluación mh mhm lista amplia de verificación para la la la te tele le
14
Out-of- domain monolingual Out-of- domain parallel Pseudo in-domain parallel In-domain parallel Train In-domain monolingual Train In-domain source In-domain target (1) Style Transfe fer model Fluent-to-Disfluent (2 (2) ) NMT model Spanish-to-English
15
BLEU scores of trained NMT models for Disfluent Spanish to Fluent English
System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18.3 Fisher-like UNCorpus + Fisher 18.5
16
BLEU scores of trained NMT models for Disfluent Spanish to Fluent English
System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18. 18.3 Fisher-like UNCorpus + Fisher 18.5
+3.5
17
BLEU scores of trained NMT models for Disfluent Spanish to Fluent English
System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18.3 Fisher-like UNCorpus + Fisher 18. 18.5
+0.2
18
BLEU scores of trained NMT models for Disfluent Spanish to Fluent English
System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18.3 Fisher-like UNCorpus + Fisher
18.5
about 3 points lower BLEU
19
“Fi Fisher er (disfluen ent)“ did not use Fisher’s fluent references but instead used disfluent references
System Fisher/test Fisher (fluent) 14.8 UNCorpus + Fisher (fluent) 18.3 Fisher-like UNCorpus + Fisher (fluent)
18. 18.5
Fisher (di disfluent) 11.6 UNCorpus + Fisher (di disfluent) 15.2 Fisher-like UNCorpus + Fisher (di disfluent) 15.6
20
can co control the tr trade de-of
21
fluen ent re refere rence sentences did not mitigate this problem
22
23