Simultaneous Translation:
Recent Advances and Remaining Challenges
Liang Huang
Baidu Research (USA) and Oregon State University
Simultaneous Translation: Recent Advances and Remaining Challenges - - PowerPoint PPT Presentation
Simultaneous Translation: Recent Advances and Remaining Challenges Liang Huang Baidu Research (USA) and Oregon State University Consecutive vs. Simultaneous Interpretation consecutive interpretation simultaneous interpretation
Baidu Research (USA) and Oregon State University
consecutive interpretation multiplicative latency (x2) simultaneous interpretation additive latency (+3 secs)
consecutive interpretation multiplicative latency (x2) simultaneous interpretation additive latency (+3 secs) simultaneous interpretation is extremely difficult
interpreters world-wide (AIIC) each interpreter can only sustain for at most 15-20 minutes the best interpreters can only cover ~60% of the source material
from United Nations Proceedings Speech Corpus (LDC2014S08, Chay et al, 2014)
from United Nations Proceedings Speech Corpus (LDC2014S08, Chay et al, 2014)
from United Nations Proceedings Speech Corpus (LDC2014S08, Chay et al, 2014)
from United Nations Proceedings Speech Corpus (LDC2014S08, Chay et al, 2014) latency latency latency latency
4
high latency low latency low quality high quality
full-sentence machine translation word-by-word translation
simultaneous interpretation
~3 seconds
consecutive interpretation
1 sentence
written translation
4
high latency low latency low quality high quality
full-sentence machine translation word-by-word translation
simultaneous interpretation
~3 seconds
consecutive interpretation
1 sentence
seq-to-seq is already very good
needs fundamentally new ideas!
previous work in simultaneous translation
written translation
4
high latency low latency low quality high quality
full-sentence machine translation word-by-word translation
simultaneous interpretation
~3 seconds
consecutive interpretation
1 sentence
seq-to-seq is already very good
needs fundamentally new ideas!
previous work in simultaneous translation
source speech stream
streaming speech recognition
source text stream
simultaneous text-to-text translation
target text stream
incremental text-to- speech
target speech stream
…
… President Bush …
…
written translation
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
request
Zhongjun He Hao Xiong Haifeng Wang Mingbo Ma Kaibo Liu Renjie Zheng
6
work
Baidu World Conference, Nov. 2017 full-sentence translation (latency: 10+ secs) Baidu World Conference, Nov. 2018 low-latency simultaneous translation (latency: ~3 secs)
request
Ken Church I really need low-latency simultaneous translation! Zhongjun He Hao Xiong Haifeng Wang Mingbo Ma Kaibo Liu Renjie Zheng
Grissom et al, 2014
Grissom et al, 2014
President Bush meets with Russian President Putin in Moscow
Grissom et al, 2014
non-anticipative: President Bush (…… waiting ……) meets with Russian … President Bush meets with Russian President Putin in Moscow
Grissom et al, 2014
non-anticipative: President Bush (…… waiting ……) meets with Russian … President Bush meets with Russian President Putin in Moscow anticipative: President Bush meets with Russian President Putin in Moscow
annoying to the users and can’t be used for speech-to-speech translation
8
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
… wait whole source sentence …
1 2
source: target:
4 1 2 3 5
seq-to-seq
4 1 2 3
wait k words
1 2
source: target:
5
prefix-to-prefix (wait-k)
always k words behind source sentence
President Bush meets with Russian President
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with zǒngtǒng
总统
President Éluósī
俄罗斯
Russian Pǔjīng
普京
Putin
Putin in Moscow
huìwù
会晤
meet
wait 2
p(yi | x1 … xn , y1…yi-1)
p(yi | x1 … xi+k-1 , y1…yi-1)
p(yt | x1 … xg(t) , y1… yt-1)
g(t): num. of source words used to predict yt
p(yt | x1 … xn , y1… yt-1)
Pres. in Moscow with Putin meet
布什茶 总统 在 莫斯科 与 普京 会晤 President Bush meets with Putin in Moscow
Bush
t=3
g(3) = 4
this general framework can be used for other tasks such as incremental parsing and incremental text-to-speech
11
This is just our research demo. Our production system is better (shorter ASR latency).
11
This is just our research demo. Our production system is better (shorter ASR latency).
11
This is just our research demo. Our production system is better (shorter ASR latency).
江 泽⺠氒 对 法国 总统 的 来华 访问 表示 感谢 。
jiāng zé mín d u ì fǎ guó zǒng tǒng d e l á i huá fǎng wèn biǎo shì gǎn xiè
jiang zemin to French President ’s to-China visit express gratitude
jiang zemin expressed his appreciation for the visit by french president .
11
This is just our research demo. Our production system is better (shorter ASR latency).
江 泽⺠氒 对 法国 总统 的 来华 访问 表示 感谢 。
jiāng zé mín d u ì fǎ guó zǒng tǒng d e l á i huá fǎng wèn biǎo shì gǎn xiè
jiang zemin to French President ’s to-China visit express gratitude
jiang zemin expressed his appreciation for the visit by french president .
12
12
13
This is live recording from the Baidu World Conference on Nov 1, 2018.
13
This is live recording from the Baidu World Conference on Nov 1, 2018.
14
German source: doch während man sich im kongress nicht auf ein vorgehen einigen kann , warten mehrere bundesstaaten nicht länger .
but while they self in congress not on one action agree can wait several states not longer
English translation (simultaneous, wait 3): but , while congress does not agree on a course of action , several states no longer wait . English translation (full-sentence baseline): but , while congressional action can not be agreed , several states are no longer waiting .
15
latency latency latency of “Bolivia” (+) latency of “position” (-)
布什茶 总统 在 莫斯科 与 普京 会晤 Pres. Bush meets with Putin in Moscow
read write
RL
16
RL: our adaptation of Gu et al (2017)
trained with CW=2, 5, 8.
17
fixed-latency policies adaptive policies full-sentence MT model Dalvi et al. (2018); test-time wait-k (Ma et al. 2018) Grissom et al. (2014); Cho & Esipova (2016); Satija & Pineau (2016); Gu et al. (2017); Alinejad et al (2018); … simultaneous MT model (our invention) wait-k (Ma et al. 2018) Arivazhagan et al. (ACL 2019) Zheng et al. (ACL 2019)
19
19
input
wǒ
我 I
shàng
尚 yet
dédào
得到 receive
yǒuguān
有关 relevant
bùmén
部⻔闩 department
wèi
未 not wait-1 (AL=1.4) I have not relevant received
19
input
wǒ
我 I
shàng
尚 yet
dédào
得到 receive
yǒuguān
有关 relevant
bùmén
部⻔闩 department
wèi
未 not wait-1 (AL=1.4) I have not relevant documents received
19
input
wǒ
我 I
shàng
尚 yet
dédào
得到 receive
yǒuguān
有关 relevant
bùmén
部⻔闩 department
de
的 ’s
huíyìng
回应 response
wèi
未 not wait-1 (AL=1.4) I have not relevant documents from relevant departments received
19
input
wǒ
我 I
shàng
尚 yet
dédào
得到 receive
yǒuguān
有关 relevant
bùmén
部⻔闩 department
de
的 ’s
huíyìng
回应 response
wèi
未 not wait-1 (AL=1.4) I have not relevant documents from relevant departments wait-4 (AL=4.0) I received response from relevant departments received have not
19
input
wǒ
我 I
shàng
尚 yet
dédào
得到 receive
yǒuguān
有关 relevant
bùmén
部⻔闩 department
de
的 ’s
huíyìng
回应 response
wèi
未 not wait-1 (AL=1.4) I have not relevant documents from relevant departments wait-4 (AL=4.0) I received response from relevant departments adaptive (AL=1.8) received received have not have not I
response from relevant departments
20
President Bush meets
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with
READ WRITE Action
OR
20
President Bush meets
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with
READ WRITE Action
OR
wait-k RL
20
President Bush meets
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow yǔ
与
with
READ WRITE Action
OR
wait-k RL
21
the learn
good
Vocabulary{
…
WRITE WRITE WRITE WRITE WRITE
21
the learn
good
Vocabulary{
…
WRITE WRITE WRITE WRITE WRITE
the learn
good
Vocabulary{
…
READ WRITE WRITE WRITE WRITE WRITE
R
21
the learn
good
Vocabulary{
…
WRITE WRITE WRITE WRITE WRITE
meets
yǔ
与
with
President Bush
Bùshí
布什茶
Bush zǒngtǒng
总统
President zài
在
in Mòsīkē
莫斯科
Moscow
NMT Model
READ WRITE meets
the learn
good
Vocabulary{
…
READ WRITE WRITE WRITE WRITE WRITE
?
R
R
22
vocabulary
imitation learning
我
w ǒI 尚
shànghave 未
wèinot 得到
d é d à oreceive 有关
yǒuguānrelevant 部⻔闩
bùméndepartment 的
d e’s 回应
huíyìngresponse
I have not received responses from relevant departments
Target Source
22
vocabulary
imitation learning
我
w ǒI 尚
shànghave 未
wèinot 得到
d é d à oreceive 有关
yǒuguānrelevant 部⻔闩
bùméndepartment 的
d e’s 回应
huíyìngresponse
I have not received responses from relevant departments
Target Source
22
for more details come to my short talk tomorrow
23 我
w ǒ
I
尚
shàng
yet
未
wèi
not
得到
d é d à o
receive
有关
yǒuguān
relevant
部⻔闩
b ù m é n
department
的
d e
’s
回应
huíyìng
response
I have not received responses from relevant depart- ments
Target Source
I h a v e n
r e s p
s e s f r
r e l e v a n t d e p a r t m e n t s r e c e i v e d R R R R R R R R
wait-5 wait-4 wait-3 wait-2 wait-1
p(yi| … )
enough, READ
(more conservative)
(more aggressive)
24
24
source speech stream
streaming speech recognition
source text stream
simultaneous text-to-text translation
target text stream
incremental text-to- speech
target speech stream
…
… President Bush …
…
24
source speech stream
streaming speech recognition
source text stream
simultaneous text-to-text translation
target text stream
incremental text-to- speech
target speech stream
…
… President Bush …
…
24
source speech stream
streaming speech recognition
source text stream
simultaneous text-to-text translation
target text stream
incremental text-to- speech
target speech stream
…
… President Bush …
…
24
source speech stream
streaming speech recognition
source text stream
simultaneous text-to-text translation
target text stream
incremental text-to- speech
target speech stream
…
… President Bush …
…
25
有
yǒu
have
⼜叉
yòu
again
26
Baidu AI Create, July 2019
26
Baidu AI Create, July 2019
26
Baidu AI Create, July 2019
and good quality (like human written translation)
27
and good quality (like human written translation)
27
28
关于 克林淋顿主义 , 没有 准确 的 定义
guānyú k è l í n d ù n z h ǔ y ì méiyǒu zhǔnquè d e d ì n g y ì
about Clintonism no accurate def. “There is no accurate definition of Clintonism.”
习近平 于 2012 年憐 在 北磻京 当选
x í j ì n p í n g y ú nián z à i b ě i j ī n g dāngxuǎn
Xi Jiping in 2012 yr in Beijing elected “Xi Jinping was elected in Beijing in 2012” About Clintonism, there is no accurate definition.
VP PP PP VP VP NP S S PP S
mandatory reordering
(Chinese) PP VP => (English) VP PP (Chinese) PP S => (English) PP S or S PP
reference translation ideal => see also He et al (2015)
29
Code (will be) available at https://nlp.baidu.com/paddlenlp using https://github.com/PaddlePaddle framework (it supports both static & dynamic graphs) (the code for robust decoding with ASR noise is already available) Two Posters after the coffee break (10:30), Session 4A (#4 & #6) Short Talk tomorrow, Session 8D (17:13, CAVANIGLIA)