naist s machine translation systems
play

NAISTs Machine Translation Systems for IWSLT 2020 Conversational - PowerPoint PPT Presentation

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan Brief Overview Ch


  1. NAIST’s Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan

  2. Brief Overview Ch Challenge track: Co Conversational Spe peech Tra ranslation Translation task from disfluent Spanish to fluent English • Includes speech-to-text and text-to-text translation subtask Mo Motiv ivatio ion: Tackle tw two problems on text-to to-te text NMT 1. Low-resource translation 2. Noisy input sentences • fillers, hesitations, self-corrections, ASR errors, … Pr Proposal: Dom Domain a n adaptation us on using ng s style t trans nsfer les of out-of-domain data to be like in- • transfer the sty style domain data, and them performed domain adaptation 2

  3. Outline 1. Introduction 2. System Description 3. Experiments 4. Discussion 5. Summary 3

  4. Motivation Th The “ “style” o of t task d data ( (in-do domain): disfluent fluent in-domain data Spanish English → Ideally, augment data by using large corpus same style Large c corpus a available ( (out-of of-do domain): fluent fluent out-of-domain data Spanish English → Effects of training with them are limited 4

  5. Motivation Style transfer model: fluent to disfluent St fluent fluent out-of-domain data Spanish English St Style Transfer disfluent fluent pseudo in-domain data Spanish English • increase the the similarity between out-of-domain and in-domain data → Enables effective domain adaptive training 5

  6. Outline 1. Introduction 2. System Description 3. Experiments 4. Discussion 5. Summary 6

  7. Overview Generate pseudo in-domain data and adapt it for NMT Out-of- In-domain domain monolingual monolingual Train Out-of- Pseudo Ps In-domain (1) (1 ) Style Transfer model domain in in-do domain parallel Fluent-to-Disfluent parallel pa parallel Train In-domain In-domain (2 (2) ) NMT model source target Spanish-to-English 7

  8. (1) Style Transfer model Transfer fluent input sentences of out-of-domain parallel data into disfluent styles He’s been sleeping He estado durmiendo casi once horas for almost 11 hours Style Transfe fer model Fluent-to-Disfluent He’s been sleeping Ya, ya, so, duerme casi once horas for almost 11 hours Styl tyle Transfe fer mode del: • based on Unsupervised NMT (Artetxe et al., 2018; Lample et al., 2018) with out-of-domain fluent data and in-domain disfluent data 8

  9. (2) NMT model Apply fine-tuning • conventional domain adaptation methods of MT • greatly improves the accuracy of low-resource domain- specific translation (Dakwale and Monz, 2017) Learning steps for fine-tuning: 1. Pre-training steps 2. Fine-tuning steps Pseudo In-domain in-domain parallel parallel pr pre-tr trained d mode del fi fine-tu tuned d mode del Spanish-to-English Spanish-to-English 9

  10. Outline 1. Introduction 2. System Description 3. Experiments 4. Discussion 5. Conclusion 10

  11. Datasets • LDC Fisher Spanish speech with English translations ( Fi Fisher ) • parallel in-domain data • disfluent Spanish to (fluent/disfluent) English • United Nations Parallel Corpus ( UN UNCo Corpus ) • parallel out-of-domain data • fluent Spanish to fluent English Da Data statistics # sentences er (in-domain)/Train Fi Fisher 138,720 Dev 3,977 Test 3,641 UNCorpus (out-of-domain)/Train 1,000,000 UN Dev 4,000 Test 4,000 11

  12. (1) Spanish Style Transfer Data: Fisher (disfluent) and UNCorpus (fluent) Spanish data Model: Unsupervised NMT (UNMT) based on Transformer Evaluation: ity between domains by measuring • Estimate the sim similar ilarity plexity of 3-gram language model the pe perpl Out-of- In-domain domain monolingual monolingual Train Out-of- Pseudo (1) Style Transfe fer model In-domain domain in-domain Fluent-to-Disfluent parallel parallel parallel Train In-domain (2 (2) ) NMT model In-domain source Spanish-to-English target 12

  13. (1) Spanish Style Transfer Results • reduced perplexity and number of unknown words by style transfer Training data perplexity unknow words Fisher 72.46 0 UNCorpus 589.81 5,173,539 Fishe Fi her-lik like e UN UNCorpus 474.47 4,217,819 Examples of pseudo in-domain data ( Fi Fisher-lik like UN UNCo Corpus ) UNCorpus Fisher-like UNCorpus d conducta y disciplina eh conducta y disciplina eh c lista amplia de verificación para la mh mhm lista amplia de verificación para autoevaluación la la la te tele le • Delete paragraph symbol • Insert “Disfluency” (filler, repetition, missing words, ASR error, ..) 13

  14. (2) NMT with Domain Adaptation Data • in-domain: 130K bilingual pairs of Fisher • out-of-domain: 1M of UNCorpus or Fisher-like UNCorpus Model: Transformer (almost follow the transformer_base settings) Evaluation: calculated the BLEU scores with sacreBLEU Out-of- In-domain domain monolingual monolingual Train Out-of- Pseudo (1) Style Transfe fer model In-domain domain in-domain Fluent-to-Disfluent parallel parallel parallel Train In-domain (2 (2) ) NMT model In-domain source Spanish-to-English target 14

  15. (2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18.3 Fine-tuning Fisher-like UNCorpus + Fisher 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 15

  16. (2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 +3.5 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18. 18.3 Fine-tuning Fisher-like UNCorpus + Fisher 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 16

  17. (2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18.3 Fine-tuning +0.2 Fisher-like UNCorpus + Fisher 18. 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 17

  18. (2) NMT with Domain Adaptation Results (1/2) – Effect of Style Transfer BLEU scores of trained NMT models for Disfluent Spanish to Fluent English System Fisher/test Fisher 14.8 Single Training UNCorpus 7.8 -1.1 Fisher-like UNCorpus 6.7 UNCorpus + Fisher 18.3 Fine-tuning Fisher-like UNCorpus + Fisher 18.5 • Domain adaptation training outperformed the baseline • slightly improved by using the pseudo in-domain data 18

  19. (2) NMT with Domain Adaptation Results (2/2) – Fluent vs Disfluent references “ Fi Fisher er (disfluen ent) “ did not use Fisher’s fluent references but instead used disfluent references System Fisher/test Fisher (fluent) 14.8 UNCorpus + Fisher (fluent) 18.3 Fisher-like UNCorpus + Fisher (fluent) 18. 18.5 -3.2 Fisher ( di disfluent ) 11.6 -3.1 UNCorpus + Fisher ( di disfluent ) 15.2 -2.9 Fisher-like UNCorpus + Fisher ( di disfluent ) 15.6 • models trained with Fisher’s original disfluent references had about 3 points lower BLEU 19

  20. Outline 1. Introduction 2. System Description 3. Experiments 4. Discussion 5. Summary 20

  21. Effect of Style Transfer The use of pseudo in-domain data improved accuracy, but • there was no significant improvement • was worse in the pre-training phase An example of style transferred sentence: nueva york 1 a 12 de junio de 2015 (original) nueva york oh a mi eh de de de de (generated) • some sentences lost the meaning of the sentence • style transfer constrains may be too strong → This problem may be mitigated by a model that ca can co control the tr trade de-of off between style transfer and content preservation 21

  22. Fluent vs Disfluent References The model trained using Fisher’s original disfluent data had a BLEU score of about 3 points lower than the model trained using fluent data. → by by remo moving the disfluency of reference sentences imp improves s the the BLEU by ab about ut thr three points ints for all all the the le lear arning ning str strat ategie gies s we tr trie ied the use of large out-of-domain data with fl fluen ent • re refere rence sentences did not mitigate this problem Style of the sentence ce has an impact ct on the translation accu ccuracy cy 22

  23. Summary Translation accuracy was improved • by domain adaptation (+3.7) • by style transfer of out-of-domain (+0.4) • effect was limited due to parallel data quality degradation Future work pursue a a s style t transfer t that d does n not r reduce the q quality o of t the p parallel d data 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend