STACL: Simultaneous Translation with Integrated Anticipation & - PowerPoint PPT Presentation

STACL: Simultaneous Translation with Integrated Anticipation & Controllable Latency Liang Huang Principal Scientist, Baidu Research Assistant Professor (on-leave), Oregon State University Joint work between Baidu Research (Sunnyvale) and Baidu NLP (Beijing)

Breakthrough in Simultaneous Translation full-sentence (non-simultaneous) translation simultaneous translation, latency ~3 secs STACL Baidu World Conference, November 2017 Baidu World Conference, November 2018 2

Background: Consecutive vs. Simultaneous consecutive interpretation   simultaneous interpretation   multiplicative latency (x2) additive latency (+3 secs)

Background: Consecutive vs. Simultaneous consecutive interpretation   simultaneous interpretation   multiplicative latency (x2) additive latency (+3 secs) simultaneous interpretation is extremely difficult only ~3,000 qualified simultaneous interpreters world-wide each interpreter can only sustain for   at most 10-30 minutes the best interpreters can only cover   ～ 60% of the source material

Tradeoff between Latency and Quality consecutive   high quality interpreters machine   our   translation goal simultaneous interpreters word-by-word   low quality translation high latency low latency 1 sentence ～ 3 seconds 4

Industrial Work in Simultaneous Translation • almost all existing “real-time” translation systems use conventional full- sentence translation techniques, causing at least one-sentence delay • some systems repeatedly retranslate, but constantly changing translations is annoying to the user and can’t be used for speech-to-speech translation Baidu, Nov. 2017 (~12 seconds delay) Sougou, Oct. 2018 (~12 seconds delay) 5

Academic Work in Simultaneous Translation • prediction of German verb (Grissom et al, 2014) • reinforcement learning (Grissom et al, 2014; Gu et al, 2017) • learning Read/Write sequences on top of a pretained NMT model • “encourages” latency requirements, but can’t force them in testing • complicated, and slow to train Grissom et al, 2014 6

Challenge: Word Order Difference • e.g. translate from SOV language (Japanese, German) to SVO (English) • German is underlyingly SOV, and Chinese is a mix of SVO and SOV • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb) Grissom et al, 2014

Challenge: Word Order Difference • e.g. translate from SOV language (Japanese, German) to SVO (English) • German is underlyingly SOV, and Chinese is a mix of SVO and SOV • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb) Grissom et al, 2014 President Bush meets with Russian President Putin in Moscow

Challenge: Word Order Difference • e.g. translate from SOV language (Japanese, German) to SVO (English) • German is underlyingly SOV, and Chinese is a mix of SVO and SOV • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb) Grissom et al, 2014 President Bush meets with Russian President Putin in Moscow non-anticipative: President Bush ( …… waiting …… ) meets with Russian …

Challenge: Word Order Difference • e.g. translate from SOV language (Japanese, German) to SVO (English) • German is underlyingly SOV, and Chinese is a mix of SVO and SOV • human simultaneous interpreters routinely “anticipate” (e.g., predicting German verb) Grissom et al, 2014 President Bush meets with Russian President Putin in Moscow non-anticipative: President Bush ( …… waiting …… ) meets with Russian … anticipative: President Bush meets with Russian President Putin in Moscow

Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation

总统布什茶 Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation Bùshí z ǒ ngt ǒ ng Bush President President

布什茶总统在 Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation Bùshí z ǒ ngt ǒ ng zài Bush President in President Bush

布什茶总统在莫斯科 Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation Bùshí z ǒ ngt ǒ ng zài Mòs ī k ē Bush President in Moscow President Bush meets

总统布什茶在莫斯科与 Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation Bùshí z ǒ ngt ǒ ng zài Mòs ī k ē y ǔ Bush President in Moscow with President Bush meets with

布什茶俄罗斯与莫斯科在总统 Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation Bùshí z ǒ ngt ǒ ng zài Mòs ī k ē y ǔ Éluós ī Bush President in Moscow with Russian President Bush meets with Russian

在俄罗斯总统与布什茶莫斯科总统 Our Solution: Prefix-to-Prefix • seq-to-seq is only suitable for 1 2 3 4 5 seq-to-seq source: conventional full-sentence MT … • we propose prefix-to-prefix, tailed to target: … wait whole source sentence … 1 2 simultaneous MT 1 2 3 4 5 source: • special case: wait- k policy: translation is prefix-to-prefix   … (wait- k ) target: always k words behind source sentence wait k words 1 2 • training in this way enables anticipation Bùshí z ǒ ngt ǒ ng zài Mòs ī k ē y ǔ Éluós ī z ǒ ngt ǒ ng Bush President in Moscow with Russian President President Bush meets with Russian President

STACL: Simultaneous Translation with Integrated Anticipation & - PowerPoint PPT Presentation

STACL: Simultaneous Translation with Integrated Anticipation & Controllable Latency Liang Huang Principal Scientist, Baidu Research Assistant Professor (on-leave), Oregon State University Joint work between Baidu Research (Sunnyvale) and

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Simultaneous Translation: Recent Advances and Remaining Challenges Liang Huang Baidu

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Simultaneous GermanEnglish Lecture Translation Muntsin Kolss, Matthias Wlfel, Florian Kraft,

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Simultaneous Multithreading: Simultaneous Multithreading: Multiplying Alpha Performance

Simultaneous Measurement of Simultaneous Measurement of Nonlinearity and Electrochemical

Multithreaded processors Hung-Wei Tseng Simultaneous Multi- Threading (SMT) 12 Simultaneous

Welcome to CS251 Interpreter and Translators Theory of Programming Languages Computer Science

Scripted Components Dr. James A. Bednar jbednar@inf.ed.ac.uk

An Introduction to Python for Scientists Hands-On Tutorial Ahmed Attia Statistical and Applied

the logic of reversible computing Theory and Practice Robin Kaarsgaard February 26, 2018 DIKU,

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters

Modular Instrumentation of Interpreters in JavaScript Florent Marchand de Kerchove, Jacques

Software Reliability and System reliability Steven J Zeil Old Dominion Univ. Spring 2012 1

Emulation Michael Jantz Acknowledgements Slides adapted from Chapter 2 in Virtual Machines:

Sambuz

Useful Links

Newsletter

Mail Us