NAISTs Machine Translation Systems for IWSLT 2020 Conversational - - PowerPoint PPT Presentation

naist s machine translation systems
SMART_READER_LITE
LIVE PREVIEW

NAISTs Machine Translation Systems for IWSLT 2020 Conversational - - PowerPoint PPT Presentation

NAISTs Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task Ryo Fukuda 1 , Katsuhito Sudoh 1 , and Satoshi Nakamura 1,2 1 Nara Institute of Science and Technology 2 AIP Center, RIKEN, Japan Brief Overview Ch


slide-1
SLIDE 1

NAIST’s Machine Translation Systems

for IWSLT 2020 Conversational Speech Translation Task

Ryo Fukuda1, Katsuhito Sudoh1, and Satoshi Nakamura1,2

1Nara Institute of Science and Technology 2AIP Center, RIKEN, Japan

slide-2
SLIDE 2

Brief Overview

Ch Challenge track: Co

Conversational Spe peech Tra ranslation Translation task from disfluent Spanish to fluent English

  • Includes speech-to-text and text-to-text translation subtask

Mo Motiv ivatio ion: Tackle tw

two problems on text-to to-te text NMT

  • 1. Low-resource translation
  • 2. Noisy input sentences
  • fillers, hesitations, self-corrections, ASR errors, …

Pr Proposal: Dom

Domain a n adaptation us

  • n using

ng s style t trans nsfer

  • transfer the sty

style les of out-of-domain data to be like in- domain data, and them performed domain adaptation

2

slide-3
SLIDE 3

Outline

  • 1. Introduction
  • 2. System Description
  • 3. Experiments
  • 4. Discussion
  • 5. Summary

3

slide-4
SLIDE 4

Motivation

Th The “ “style” o

  • f t

task d data ( (in-do domain):

→ Ideally, augment data by using large corpus same style

Large c corpus a available ( (out-of

  • f-do

domain):

→ Effects of training with them are limited

4

disfluent Spanish fluent English fluent Spanish fluent English

in-domain data

  • ut-of-domain

data

slide-5
SLIDE 5

Motivation

St Style transfer model: fluent to disfluent

  • increase the the similarity between out-of-domain and

in-domain data → Enables effective domain adaptive training

5

fluent Spanish fluent English disfluent Spanish fluent English St Style Transfer

  • ut-of-domain

data pseudo in-domain data

slide-6
SLIDE 6

Outline

  • 1. Introduction
  • 2. System Description
  • 3. Experiments
  • 4. Discussion
  • 5. Summary

6

slide-7
SLIDE 7

Overview

Generate pseudo in-domain data and adapt it for NMT

7 Out-of- domain monolingual Out-of- domain parallel Ps Pseudo in in-do domain pa parallel In-domain parallel Train In-domain monolingual Train In-domain source In-domain target (1 (1) ) Style Transfer model

Fluent-to-Disfluent

(2 (2) ) NMT model

Spanish-to-English

slide-8
SLIDE 8

(1) Style Transfer model

Transfer fluent input sentences of out-of-domain parallel data into disfluent styles Styl tyle Transfe fer mode del:

  • based on Unsupervised NMT (Artetxe et al., 2018; Lample et al., 2018)

with out-of-domain fluent data and in-domain disfluent data

8

Ya, ya, so, duerme casi once horas He estado durmiendo casi once horas Style Transfe fer model

Fluent-to-Disfluent He’s been sleeping for almost 11 hours He’s been sleeping for almost 11 hours

slide-9
SLIDE 9

(2) NMT model

Apply fine-tuning

  • conventional domain adaptation methods of MT
  • greatly improves the accuracy of low-resource domain-

specific translation (Dakwale and Monz, 2017)

Learning steps for fine-tuning:

9 pr pre-tr trained d mode del

Spanish-to-English

Pseudo in-domain parallel In-domain parallel fi fine-tu tuned d mode del

Spanish-to-English

  • 1. Pre-training steps
  • 2. Fine-tuning steps
slide-10
SLIDE 10

Outline

  • 1. Introduction
  • 2. System Description
  • 3. Experiments
  • 4. Discussion
  • 5. Conclusion

10

slide-11
SLIDE 11

Datasets

  • LDC Fisher Spanish speech with English translations (Fi

Fisher)

  • parallel in-domain data
  • disfluent Spanish to (fluent/disfluent) English
  • United Nations Parallel Corpus (UN

UNCo Corpus)

  • parallel out-of-domain data
  • fluent Spanish to fluent English

11

Da Data statistics # sentences Fi Fisher er (in-domain)/Train 138,720 Dev 3,977 Test 3,641 UN UNCorpus (out-of-domain)/Train 1,000,000 Dev 4,000 Test 4,000

slide-12
SLIDE 12

(1) Spanish Style Transfer

Data: Fisher (disfluent) and UNCorpus (fluent) Spanish data Model: Unsupervised NMT (UNMT) based on Transformer Evaluation:

  • Estimate the sim

similar ilarity ity between domains by measuring the pe perpl plexity of 3-gram language model

12

Out-of- domain monolingual Out-of- domain parallel Pseudo in-domain parallel In-domain parallel Train In-domain monolingual Train In-domain source In-domain target (1) Style Transfe fer model Fluent-to-Disfluent (2 (2) ) NMT model Spanish-to-English

slide-13
SLIDE 13

(1) Spanish Style Transfer

Results

  • reduced perplexity and number of unknown words by style transfer

Examples of pseudo in-domain data (Fi Fisher-lik like UN UNCo Corpus)

  • Delete paragraph symbol
  • Insert “Disfluency” (filler, repetition, missing words, ASR error, ..)

13 Training data perplexity unknow words Fisher 72.46 UNCorpus 589.81 5,173,539 Fi Fishe her-lik like e UN UNCorpus 474.47 4,217,819 UNCorpus Fisher-like UNCorpus

d conducta y disciplina eh eh conducta y disciplina c lista amplia de verificación para la autoevaluación mh mhm lista amplia de verificación para la la la te tele le

slide-14
SLIDE 14

(2) NMT with Domain Adaptation

Data

  • in-domain: 130K bilingual pairs of Fisher
  • out-of-domain: 1M of UNCorpus or Fisher-like UNCorpus

Model: Transformer (almost follow the transformer_base settings) Evaluation: calculated the BLEU scores with sacreBLEU

14

Out-of- domain monolingual Out-of- domain parallel Pseudo in-domain parallel In-domain parallel Train In-domain monolingual Train In-domain source In-domain target (1) Style Transfe fer model Fluent-to-Disfluent (2 (2) ) NMT model Spanish-to-English

slide-15
SLIDE 15

(2) NMT with Domain Adaptation

Results (1/2) – Effect of Style Transfer

  • Domain adaptation training outperformed the baseline
  • slightly improved by using the pseudo in-domain data

15

BLEU scores of trained NMT models for Disfluent Spanish to Fluent English

System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18.3 Fisher-like UNCorpus + Fisher 18.5

slide-16
SLIDE 16

(2) NMT with Domain Adaptation

Results (1/2) – Effect of Style Transfer

  • Domain adaptation training outperformed the baseline
  • slightly improved by using the pseudo in-domain data

16

BLEU scores of trained NMT models for Disfluent Spanish to Fluent English

System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18. 18.3 Fisher-like UNCorpus + Fisher 18.5

+3.5

slide-17
SLIDE 17

(2) NMT with Domain Adaptation

Results (1/2) – Effect of Style Transfer

  • Domain adaptation training outperformed the baseline
  • slightly improved by using the pseudo in-domain data

17

BLEU scores of trained NMT models for Disfluent Spanish to Fluent English

System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18.3 Fisher-like UNCorpus + Fisher 18. 18.5

+0.2

slide-18
SLIDE 18

(2) NMT with Domain Adaptation

Results (1/2) – Effect of Style Transfer

  • Domain adaptation training outperformed the baseline
  • slightly improved by using the pseudo in-domain data

18

BLEU scores of trained NMT models for Disfluent Spanish to Fluent English

System Fisher/test Single Training Fisher 14.8 UNCorpus 7.8 Fisher-like UNCorpus 6.7 Fine-tuning UNCorpus + Fisher 18.3 Fisher-like UNCorpus + Fisher

18.5

  • 1.1
slide-19
SLIDE 19

(2) NMT with Domain Adaptation

Results (2/2) – Fluent vs Disfluent references

  • models trained with Fisher’s original disfluent references had

about 3 points lower BLEU

19

“Fi Fisher er (disfluen ent)“ did not use Fisher’s fluent references but instead used disfluent references

System Fisher/test Fisher (fluent) 14.8 UNCorpus + Fisher (fluent) 18.3 Fisher-like UNCorpus + Fisher (fluent)

18. 18.5

Fisher (di disfluent) 11.6 UNCorpus + Fisher (di disfluent) 15.2 Fisher-like UNCorpus + Fisher (di disfluent) 15.6

  • 3.2
  • 3.1
  • 2.9
slide-20
SLIDE 20

Outline

  • 1. Introduction
  • 2. System Description
  • 3. Experiments
  • 4. Discussion
  • 5. Summary

20

slide-21
SLIDE 21

Effect of Style Transfer

The use of pseudo in-domain data improved accuracy, but

  • there was no significant improvement
  • was worse in the pre-training phase

An example of style transferred sentence:

nueva york 1 a 12 de junio de 2015 (original) nueva york oh a mi eh de de de de (generated)

  • some sentences lost the meaning of the sentence
  • style transfer constrains may be too strong

→ This problem may be mitigated by a model that ca

can co control the tr trade de-of

  • ff between style transfer and content preservation

21

slide-22
SLIDE 22

Fluent vs Disfluent References

The model trained using Fisher’s original disfluent data had a BLEU score of about 3 points lower than the model trained using fluent data. → by by remo moving the disfluency of reference sentences imp improves s the the BLEU by ab about ut thr three points ints for all all the the le lear arning ning str strat ategie gies s we tr trie ied

  • the use of large out-of-domain data with fl

fluen ent re refere rence sentences did not mitigate this problem

22

Style of the sentence ce has an impact ct on the translation accu ccuracy cy

slide-23
SLIDE 23

Summary

Translation accuracy was improved

  • by domain adaptation (+3.7)
  • by style transfer of out-of-domain (+0.4)
  • effect was limited due to parallel data quality degradation

Future work

pursue a a s style t transfer t that d does n not r reduce the q quality o

  • f t

the p parallel d data

23