Translator Research Production Shared Research task Dataset - - PowerPoint PPT Presentation

translator
SMART_READER_LITE
LIVE PREVIEW

Translator Research Production Shared Research task Dataset - - PowerPoint PPT Presentation

Translator Research Production Shared Research task Dataset newstest2016 newstest2017 WMT18 33.9 29.0 Random 16.2 14.1 LangID+Random 26.6 23.3 LangID+Adeq 35.1 30.2 Ablation: no LangID 15.4 12.7 Ablation: no AbsDiff 33.8


slide-1
SLIDE 1

Translator

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Production Research Shared task Research

slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

Dataset newstest2016 newstest2017 WMT18 33.9 29.0 Random 16.2 14.1 LangID+Random 26.6 23.3 LangID+Adeq 35.1 30.2 Ablation: no LangID 15.4 12.7 Ablation: no AbsDiff 33.8 29.3 Ablation: no CE-Weight 31.7 27.4 LangID+Adeq+Dom 36.0 31.0

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15

System 2016 2017 2018 WMT18-Microsoft 38.6 31.3 46.5 WMT18-FAIR

  • 32.7

44.9 WMT19-baseline 37.7 30.3 46.5 + data-filtering 38.3 31.1 46.6 + noisy back-translation 38.9 32.8 46.3 + fine-tuning 40.6 33.6 48.9

slide-16
SLIDE 16

System en de both WMT18-Microsoft 41.1 35.5 39.1 WMT18-FAIR

  • WMT19-baseline

41.8 32.5 38.2 + data-filtering 41.7 34.0 39.0 + noisy back-translation 38.9 40.4 39.7 + fine-tuning 42.2 39.2 41.2

slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

BPE SentencePiece Poli@@ zei|ch@@ ef ▁Polizei chef ver@@ hän@@ g|nis@@ vollen ▁ver h äng nis vollen Universit@@ ä@@ t|s @-@ Mitarbeiter ▁Universität s - Mitarbeiter Schie@@ ß|en ▁Schieß en be|su@@ cht|en ▁besucht en auf@@ gere@@ g|t ▁a uf|gereg|t Be@@ urlau@@ b|ung ▁Be urlaub ung

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
  • Microsoft Translator at WMT 2019: T
  • wards Large-Scale

Document-Level Neural Machine Translation https://arxiv.org/abs/1907.06170

  • Improving Deep Transformer with Depth-Scaled Initialization and

Merged Attention https://arxiv.org/abs/1908.11365

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43

On The Evaluation of Machine Translation Systems Trained With Back-Translation https://arxiv.org/abs/1908.05204

slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56

Model Parameters Layers Dim BERT/GPT-2 117M 12 768/4096 BERT/GPT-2 345M 24 1024/4096 GPT-2 762M 36 1280/4096 GPT-2 1542M 48 1600/4096 Model Parameters Layers Dim Nematus RNN 25M (95MB) 1/1 (2/2) 512/1024 Transformer (Base) 30M (117MB) 6/6 512/2048 Transformer (Big) 209M (790MB) 6/6 1024/4096 Transformer (Bigger) 386M (1,471MB) 12/12 1024/4096 Transformer (Even Bigger) 570M 18/18 1024/4096 Transformer (Biggest) 750M 24/24 1024/4096

slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59
slide-60
SLIDE 60
slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65
slide-66
SLIDE 66

blogs.msdn.com/translator twitter.com/MSTranslator facebook.com/BingTranslator Microsoft.com/Translator