Toward Automatic Speech Interpretation Nara Institute of Science and - - PowerPoint PPT Presentation

toward automatic speech interpretation
SMART_READER_LITE
LIVE PREVIEW

Toward Automatic Speech Interpretation Nara Institute of Science and - - PowerPoint PPT Presentation

1 Toward Automatic Speech Interpretation Nara Institute of Science and Technology Data Science Center, and Graduate School of Science and Technology Satoshi Nakamura with Katsuhito Sudo, Graham Neubig Sakriani Sakti, Hiroki Tanaka, Katsuki


slide-1
SLIDE 1

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Toward Automatic Speech Interpretation

Nara Institute of Science and Technology Data Science Center, and Graduate School of Science and Technology

Satoshi Nakamura

with Katsuhito Sudo, Graham Neubig Sakriani Sakti, Hiroki Tanaka, Katsuki Chosa, Do Quoc Truong

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

1

slide-2
SLIDE 2

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Speech-to-Speech Translation System

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

2

Multilingual Speech Recognition Spoken Language Translation Multilingual Speech Synthesis Japanese English I go to school 「私は学校に行く: Watashi wa Gakko ni iku」

Watashi wa Gakko ni iku

I go to school 2

slide-3
SLIDE 3

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Speech Translation and Text Translation

Speech Translation

– Translation of spoken languages – Speech recognition errors – Translation from source language speech to target language speech (text) – Short latency for real-time human communication

Translation of Spoken Language

– Object is real-time communication and understanding – Para-linguistic/non-linguistic information necessary – Context dependent utterances, non syntactical utterances – No punctuation – No upper/lower case

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

3

slide-4
SLIDE 4

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Technical Background around 2000

Corpus-based Approach

– Statistical modeling and large size training data

Machine Translation

– Rule based: Linguists created translation rules – Corpus based︓

  • Example-Based

Automatic extraction of translation rules [M.Nagao 1984 etc.]

  • Statistical MT(Statistical Machine Translation)

Extract rules statistically based on Noisy Channel Model [P. F. Brown, et.al., 1993]

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

4

slide-5
SLIDE 5

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Contents

1. History of Automatic Speech Translation Research 2. Automatic Speech Interpretation Technologies 3. Current Project and Data Collection 4. Summary and Future Works

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

5 5

slide-6
SLIDE 6

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Speech Translation Projects

Japan

– ATR Speech-to-speech Translation (1986-2008) – NICT Speech-to-speech Translation (2008-2011, 2014-2020)

EU

– Verbmobile (1993-2000) – Nespole(2001-2003) – TC-Star(2004-2006) – EU-Bridge(2012-2014)

US

– DARPA TransTac, Communicator (2006-2010) – DARPA GALE(2006-2010) – DARPA BOLT(2011-2015)

International

– C-Star Consortium (1991-2003) – IWSLT (2004-) – A-Star Consortium(2006-2008) – U-Star Consortium (2009-)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

6

slide-7
SLIDE 7

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

History of Speech Translation Research in Japan

7

Fundamentals

Read Speech

  • Syntactically correct
  • Clear utterance
  • Limited domain
  • Ex. “Conference

Registration”

Daily Conversation

  • Standard expression
  • Unclear utterance
  • Limited domain
  • Ex. “Hotel Reservation”

Wider and Real Domain

  • Wider and real domain

“International Travel”

  • Realistic expressions
  • Noisy speech
  • J-E, J-C speech translation

1986 1992 1999 2006

Rule-based Technology Corpus-based Technology Hand-made

Large scale corpus + Machine learning

2008

ATR NICT A-STAR + More Languages for Translation

  • Multilateral translation for 8 Asian languages
  • Network-based S2ST

2010

  • 21 multilateral text translation

C-STAR

  • Multilateral translation for 7 world languages

IWSLT

  • Evaluation Campaign of S2S technologies

2011 VoiceTra NAIST ATR ATR

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

slide-8
SLIDE 8

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Mechanism of Speech Translation System

Multilingual Speech Recognition

Large Scale Japanese Speech Corpora Large Scale Parallel Corpora between Japanese and English Large Scale English Speech Corpora

Spoken Language Translation Multilingual Speech Synthesis Japanese English I go to school 「私は学校に行く: Watashi wa Gakko he iku」

w a t a sh i w a g a xtu k o o n i….. Watashi wa Gakko he iku

Large Scale Japanese Text Corpora I to school go

Convert Japanese word sequence into English word sequence using dictionary 「私は:watashi ha」⇒ “I” 「学校に:Gakko ni」⇒ “to school” 「行く: iku」⇒“go”

Convert to word sequencee By lexicon and grammer Convert Japanese Phoneme sequence “a”,”I”,”u”,… Select appropriate waveform to English text from the corpus

Re-order word sequence According to English grammer “I” “I” “to school” “go” “go” “to school”

I go to school

Corpora

Large Scale English Text Corpora

Digital revolution for under resourced languages in Asia 2019

slide-9
SLIDE 9

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Phrase Based Machine Translation

Divide the sentence into small phrases and translate

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

9

Today I will give a lecture on machine translation .

Today 今日は、 I will give を行います a lecture on の講義 machine translation 機械翻訳 . 。 Today 今日は、 I will give を行います a lecture on の講義 machine translation 機械翻訳 . 。

今日は、機械翻訳の講義を行います。

kyowa kikaihonyaku no kogi wo okonaimasu

 Score translations with translation model (TM),

reordering model (RM), and language model (LM)

9

slide-10
SLIDE 10

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Translation Model Creation

 Perform automatic alignment of parallel text  Extract phrases from the aligned text for translation

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

10

the hotel front desk

ホテル の(hoteru no) → hotel ホテル の(hoteru no) → the hotel 受付(uketsuke) → front desk ホテルの受付 → hotel front desk ホテルの受付 → the hotel front desk

受付( Uketsuke) の( no) ホテル ( hoteru)

slide-11
SLIDE 11

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Statistical MT

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

11

  • Translation Model, Reordering Model, Language Model

Source and target language parallel text corpus Target language text corpus Parameter estimation Parameter estimation

Translation model Language model

Machine Translation

Input text (Source Language) Translation text (Target Language)

Reordering model

Phrase substitution Grammatical correctness Decoding

slide-12
SLIDE 12

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Parallel Corpus

Japanese: “mado wo aketemo iidesuka”

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

English: 1. may i open the window 2.

  • k if i open the window

3. can i open the window 4. could we crack the window 5. is it okay if i open the window 6. would you mind if i opened the window 7. is it okay to open the window 8. do you mind if i open the window 9. would it be all right to open the window 10. i’d like to open the window

Japanese English Chinese Korean New lang. 12

slide-13
SLIDE 13

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

13

Sightseeing

7.7% (11)

Study Overseas

1.6% (14)

Restaurant

7.3% (11)

Drink

1.3% (4)

Communication 6.4% (6) Exchange

1.2% (5)

Airport

5.5% (14)

Snack

1.2% (4)

Business

5.3% (26)

Beauty

0.8% (5)

Contact

4.0% (6)

Go Home

0.6% (4)

Airplane

3.6% (11)

Research

0.1% (12)

Homestay

2.3% (11)

Stay

8.2% (11)

  • make/change

a reservation

  • check-in
  • trouble

Move

8.4% (8)

  • transportation
  • buy a ticket
  • rental car
  • trouble

Shopping

10.0% (13)

  • buy something
  • gather information
  • price
  • wrapping

Basic

12.2% (7)

  • greet someone
  • ask a question
  • state one’s

purpose

Trouble

12.1% (20)

  • luggage
  • emergency
  • medicine
  • assistance

ATR BTEC Corpus

Spoken Language Communication Research Laboratories

slide-14
SLIDE 14

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Mechanism of Speech Translation System

Multilingual Speech Recognition

Large Scale Japanese Speech Corpora Large Scale Parallel Corpora between Japanese and English Large Scale English Speech Corpora

Spoken Language Translation Multilingual Speech Synthesis Japanese English I go to school 「私は学校に行く: Watashi wa Gakko he iku」

w a t a sh i w a g a xtu k o o n i….. Watashi wa Gakko he iku

Large Scale Japanese Text Corpora I to school go

Convert Japanese word sequence into English word sequence using dictionary 「私は:watashi ha」⇒ “I” 「学校に:Gakko ni」⇒ “to school” 「行く: iku」⇒“go”

Convert to word sequencee By lexicon and grammer Convert Japanese Phoneme sequence “a”,”I”,”u”,… Select appropriate waveform to English text from the corpus

Re-order word sequence According to English grammer “I” “I” “to school” “go” “go” “to school”

I go to school

Corpora

Large Scale English Text Corpora

Digital revolution for under resourced languages in Asia 2019

slide-15
SLIDE 15

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Speech and Language Corpus for ASR

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

15

Acoustic model Language model Japanese 4,200 speakers (271 hrs) 852k sentences English 532 speakers (202 hrs) US, BRT, AUS 710k sentences Chinese 536 speakers (249 hrs) Beijing, Shanghai, Canton, Taiwan 510k sentences

Spoken Language Communication Research Laboratories

slide-16
SLIDE 16

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Speech to Speech Translation

・“VoiceTra” Network-based Speech Translation released on Jul. 2010

・21 language pair for Text I/O ・6 language pair for Speech I/O

800k download and 4M access worldwide as of 2011.3. 16 16

Japanese, English, Mandarin, Taiwanese Mandarin, German, French, Dutch, Danish, Italian, Spanish, Portuguese, Brazilian Portuguese, Russian, Arabic, Hindi, Indonesian, Malay, Thai, Tagalog, Vietnamese, Korean

※ Language in red can be input/output in voices. ※There is no text input support for Hindi or Vietnamese.

VoiceTra

“Shabette Hon’yaku” 「しゃべって翻訳」 ・Japanese-English ・NTTDocomo

トップの画面

音声入力画面 翻訳結果出力画面

Launched in November 2007 The first network‐based STS translation service

Spoken Language Communication Research Laboratories

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

slide-17
SLIDE 17

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Performance Improvements

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

17

0% 10% 20% 30% 40% 50% 60% 70% 日英 日中 全国共通版 固有名詞・固有表現追加 実データによるモデル更新

Subjective Evaluation % of ABC Initial Models Named Entity, Expressions Adaptation using real user data # utterances used for adaptation Word Error Rate % JE JC A Good B Fair C Acceptable D Nonsense NIL No Output

17

slide-18
SLIDE 18

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Basic Travel Expression Corpus: Parallel Sentences

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

18

Japanese English Chinese Korean New lang. BTEC Parallel sentences

Spoken Language Communication Research Laboratories

18

slide-19
SLIDE 19

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Standardization Image

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

Server A (ex. Japan) Server B (ex. Thailand)

HTTP protocol XML format Data transfer (ASR results, MT results etc) Data transfer (ASR results, MT results etc) Parallel corpus, Speech data, lexcon

Parallel corpus, format, lexicon

Parallel corpus, Speech data, lexcon

User interface User interface

Processing modules Processing modules User interface standardization

S2S

19

slide-20
SLIDE 20

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

 Activity start for standardization of Network-based S2ST at ITU-T SG16  Session period:October, 2009 to March, 2010  NICT is the editor for S2ST standardization at ITU-T SG16, WP2 Q21/22  Not only language conversion but also potentially added module like sign language are taken into account: S2ST -> Modality conversion

Standardization at ITU-SG16

Document Title Scope

F.745 Functional Requirements for Network‐based S2ST

‐ Definition of Network‐based S2ST ‐ Functions and service requirements of network‐based S2ST

H.625 Architectural Requirements for Network‐based S2ST

‐ Requirements of S2ST architecture ‐ Definition of interface for Network‐based S2ST

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

20

slide-21
SLIDE 21

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Research Topics at NAIST

2019/06/15

CLI9 Keynote Satoshi Nakamura, NAIST

Speech Translation Machine Translation

Brain Measurement

Persona Modeling

Spoken Dialog System Multi-modal

Nakamura-lab is best!

Big Data Analytics NAIST Data Science Center

Which lab do you recommend?

Multimodal Concept Learning

Knowledge Acquisition QA system

Multilingual Speech Recognition Emotion, Environment Recognition

Deep Neural Network

Affective Computing Natural Language Processing

Integrating fundamental technologies into the augmented human-communication systems

21

CLI9 Keynote Satoshi Nakamura, NAIST

slide-22
SLIDE 22

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Recent Progress of ASR after 2000

Traditional Technologies

– Template Matching, Dynamic Programing [Sakoe 71] – Hidden Markov Modeling, N-Gram Model [Mercer 83, etc] – Neural Network, TDNN [Waibel 89], LSTM [Hochreiter 97] – Weighted Finite State Transducer [Mohri 2006] – Big Training Data, Data Collection through Trial Service

Deep Learning (Hinton visited MSR)

– DNN-HMM [Hinton 2012]

  • Estimate State Posterior Probability by DNN

– Connectionist Temporal Classification [Graves 2013]

  • Predict Phoneme Label every frame

– Listen, Attend, and Spell [Chan 2016]

  • CTC + Attention: End-to-end modeling

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

22

slide-23
SLIDE 23

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Recent Speech Synthesis

Traditional Technologies

– Formant-based Synthesis, Waveform Concatenation – Statistical Speech Synthesis: HTS

  • Speech Synthesis by HMM

– Tokuda, et al., “Speech parameter generation algorithms for HMM-based speech synthesis”, ICASSP 2000

Deep Learning

– WaveNet

  • Waveform Convolution

– van den Oord et al., “WAVENET: A GENERATIVE MODEL FOR RAW AUDIO”, arXiv:1609.03499v2 [cs.SD] 19 Sep 2016

– Tacotron

  • End-to-end speech synthesis with character input. Waveform generation by Griffin-Lim

– Wang, et al., “TACOTRON: TOWARDS END-TO-END SPEECH SYNTHESIS”, arXiv:1703.10135v2 [cs.CL] 6 Apr 2017

– Tacotron2:

  • Tacotron + WaveNet

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

23

slide-24
SLIDE 24

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Recent MT progress

Traditional Technologies

– Rule-based MT: Linguists generate translation rules – Corpus-based MT:

  • Example-Based: Automatic rule extraction from corpus [M. Nagao84, Sato et.al.,89, Sumita et. al., 91 ]
  • Statistical MT:

Statistical Modeling of MT. Extraction of model parameters from corpus and MT based on Noisy Channel Model [P. F. Brown, et.al. 93]

  • Phrase-base SMT
  • Tree-to-string

– Statistical MT based on Tree Structure

Deep Learning

– Neural Machine Translation [2014]

  • Combination of Encoder and Decoder by LSTM

– Attention NMT [2015]

  • Add Attention to encoder and decoder

– Self Attention NMT [2017] – Self attention by multiple heads. Transformer.

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

24

slide-25
SLIDE 25

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Contents

1. History of Automatic Speech Translation Research 2. Automatic Speech Interpretation Technologies 3. Speech Translation with Para-linguistic Information 4. Current Project and Data Collection 5. Summary and Future Works

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

25 25

slide-26
SLIDE 26

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Communication with Translation

CLI9 Keynote Satoshi Nakamura, NAIST

26 Input: Text Speech Video Gesture

Speech⇒Text ASR

Realtime

Incremental

MT Conversion Dialog Control

Linguistic Information

Paralinguistic Emotion, Style, Personality, Prosody, Gesture Paralinguistic Emotion, Style, Personality, Prosody, Gesture

Output: Text Speech Video Gesture

Source Language Target Language

Speech “to o kyo e i ku” MT results /I/go/to/Tokyo/ TTS results “ai go tu tokyo/ Personality, Prosody Personality, Prosody

Discource Context Domain knowledge, Ontology

Text Image⇒text PR Text Text⇒Speech TTS Text⇒Image Image Syns.

End‐to‐end Process

Communication ① Simultaneity, Incremental, Latency, ② Para/non linguistic information

Linguistic Information 2019/06/15

slide-27
SLIDE 27

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Human Interpreting [A.Mizuno 2016]

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

27

E‐J Interpretation Example (1) The relief workers (2) say (3) they don’t have (4) enough food, water, shelter, and medical supplies (5) to deal with (6) the gigantic wave of refugees (7) who are ransacking the countryside (8) in search of the basics (9) to stay alive.

(1) 救援担当者は (9) 生きるための (8) 食料を求め て (7) 村を荒らし回っている (6) 大量の難民達の (5) 世話をするための (4) 十分な食料や水,宿泊施設, 医療品が (3) 無いと (2) 言っています.

Necessary #Chunk>3!

(1) 救援担当者達の (2) 話では (4)食料,水,宿泊施 設,医薬品が, (3) 足りず (6) 大量の難民達の (5) 世話が出来ないとのことです.(7) 難民達は今村々 を荒らし回って, (9) 生きるための (8) 食料を求めて いるのです.

Necessary #Chunk<3!

Memory Chunk

slide-28
SLIDE 28

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Problem: Delay (Ear-Voice Span)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

28

ASR こんにちは、駅はどこですか? konnichiwa eki wa dokodesuka MT Hello, where is the station? TTS

Delay

slide-29
SLIDE 29

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Simultaneous Incremental Speech Interpretation

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

29

ASR

こんにちは、 konnichiwa

MT

駅は ekiwa

MT

どこですか? dokodesuka

MT

Hello,

the station where is it? TTS TTS TTS

Delay: Reduced

But, this is not easy!

slide-30
SLIDE 30

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Can We Do the Same in Automatic Speech Interpretation?

 Segmentation: When do we start interpretation?  Prediction: Can we predict things that haven't been said?  Rewording: Can we reword sentences to be conducive to simultaneous

interpretation?

 Evaluation: How do we decide which results are better?

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

30

Four problems:

slide-31
SLIDE 31

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Re-ordering

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

Crucial for translation accuracy:

こんにちは 駅 は どこ ですか Hello, where is the station Normal phrase-based translation: こんにちは 駅 は どこ ですか Hello, where is it the station Translation with early timing:

31

slide-32
SLIDE 32

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Lexicalized Reordering Model

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

 Probabilistically models reordering for increased accuracy of translation  Given current phrase and next phrase:

背 の 高い 男 the tall man

Monotone:

太郎 を 訪問 した visited Taro

Swap:

私 は 太郎 を 訪問した I visited Taro

Discontinuous Right: Discontinuous Left:

背 の 高い 男 を 訪問 した visited the tall man

 “monotone” + “discontinuous right” = “right probability”

32

slide-33
SLIDE 33

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Adjusting Timing with Reordering Probabilities, 2012

First, temporarily choose strings according to method one

Next, if that phrase's right probability exceeds a threshold, actually translate the words in the cache

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

33

Example (threshold = 0.8): hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ choose “hello” ↓ right probability is 0.9 > 0.8 ↓ translate “hello” “where is” phrase exists ↓ wait “where is the” phrase missing ↓ choose “where is” ↓ right probability is 0.6 < 0.8 ↓ do not translate yet “the station” utterance ends ↓ translate “where is the station”

Threshold 1.0 = traditional, 0.0 = method one

Fujita, et. al., 2013

slide-34
SLIDE 34

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Comparison Across Settings

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

Delay decreases in all settings

Better delay/accuracy tradeoff for long sentences, similar languages 2 4 6 8 10 12 14 10 20 30 40 50 60 70 80 en-ja ja-en ja-en (11+) fr-en Delay (Seconds) A c c u r a c y ( B L E U )

t=0.0 t=1.0

Faster More Accurate

34

(News) (Travel)

slide-35
SLIDE 35

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Experiments (IWSLT2013)

Contents: TED Talk(English⇒Japanese) - Translation (Caption)

  • vs. Interpretation

Human Interpreter Three professionals with different skills

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

35

Skill Rank # Years of Interpreter Experiences S 15 years A 4 years B 1 year

slide-36
SLIDE 36

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

SS2S vs. Human Interpreter Results on TED Talks

CLI9 Keynote Satoshi Nakamura, NAIST

36

38 40 42 44 46 48 50 1 2 3 4 5 6

RIBES Dealy (Sec)

LM+Tu A rank B rank

A rank:4 yr. exp B rank:1 yr. exp.

Fast

Accurate

By Phrase By Sentence B Rank(1 Year) A Rank(4 Year)

≒ B rank human interpreter with 1 year experience

2019/06/15

slide-37
SLIDE 37

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Translation Timing Control by Syntactic Prediction, 2015

Syntactic Prediction

– Incremental bottom up parsing – Feature extraction and syntactic prediction

Wait MT output when specific labels appear.

– Control MT output timing according reordering

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

37

Oda, Yusuke et al., Syntax‐based Simultaneous Translation through Prediction of Unseen Syntactic Constituents, Proc. of ACL‐IJCNLP 2015.

Incremental parsing and syntactic prediction in the next 18 minutes i 'm going to take [NP] (waiting) i 'm going to take you on a journey MT results 18 分 で あ る [NP] を 行 っ て い ま す 皆さん を 旅 に お連れ します

slide-38
SLIDE 38

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Sample 1 ,2015

Conventional Automatic Speech Interpretation with Delay to Wait for Speech End (HirofumiSeo-

trad.mp4)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

38

slide-39
SLIDE 39

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Sample 2 ,2015

Actual Interpreter

(HirofumiSeo-interpreter.mp4)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

39

slide-40
SLIDE 40

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Sample 3 ,2015

Proposed Automatic Speech Interpretation (HirofumiSeo-simul.mp4: )

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

40

slide-41
SLIDE 41

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Statistical Translation Frameworks

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

41

Symbolic Models Phrase-based MT [Koehn+ 03]

he has a cold 彼 は 風邪 を 引いている he 彼 は has 引いている a cold 風邪 を he 彼 は has 引いている a cold 風邪 を

Tree-to-String MT [Liu+ 06]

彼 は 風邪 he has a cold

PRP VBZ DET NN VP NP S

引いている を

Continuous-space (Neural) Models Encoder-Decoder [Sutskever+ 14]

he has a cold <s> 彼 彼 は は 風邪 風邪 を

引いて いる

を <s>

引いて いる

Attentional [Bahdanau+ 15]

he has a cold

g1,...,g4 a1 a2 a3 a4 hi-1 hi ri-1

P(ei|F,e1,...,ei-1)

Intelligent and Invisible Computing 41

slide-42
SLIDE 42

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Encoder-decoder Model

Memorize input sentence by LSTM recurrent neural network Generate output sentence by LSTM recurrent neural network

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

42

これ kore は wa 機械 kikai 翻訳 honnyaku です desu This is a machinetrans- lation

Vector Representation Vector Representation

Encoder Decoder

Memorize input sentence Generate MT sentence looking back the memory

Memorize Sentence

slide-43
SLIDE 43

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Attention Mechanism

Better Memorization of Sentence and Looking-back Mechanism

– Weighted-sum by the attention

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

43

This is a machinetrans- lation

Vector Representation Vector Representation

これ kore は wa 機械 kikai 翻訳 honnyaku です desu

slide-44
SLIDE 44

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Results

(Neubig, et.al, WAT2015)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

44

en-ja ja-en zh-ja ja-zh 10 20 30 40 50 BLEU en-ja ja-en zh-ja ja-zh 70 75 80 85 90 Base Rerank RIBES

+1.6 +2.8 +2.5 +1.5 +1.8 +2.7 +1.4 +1.8

Confirm what we know: Neural reranking helps automatic evaluation.

en-ja ja-en zh-ja ja-zh 10 20 30 40 50 60 70 Base Rerank HUMAN

+12.5 +23.7 +10.0 +4.2

Show what we didn't know: Also help manual evaluation.

44

Intelligent and Invisible Computing 44

slide-45
SLIDE 45

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

ブッシュ Bush 大統領 daitoryo は wa プーチン puchin と to 会談 kaidan する suru President Bush meets with Putin Wait K tokens Controllable! Prediction!

原文

ブッシュ 大統領 は プーチン と 会談 する

従来法

President Bush meets with Putin

提案法

President Bush meets with Putin

Prediction! delay delay delay Controllable!

Wait-k Algorithm

Mingbo Ma, et al., “STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency”, arXiv:1810.08398v3 [cs.CL] 3 Nov 2018 2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

45

slide-46
SLIDE 46

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Contents

1. History of Automatic Speech Translation Research 2. Automatic Speech Interpretation Technologies 3. Current Project and Data Collection 4. Summary and Future Works

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

46 46

slide-47
SLIDE 47

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

JSPS Next Generation Speech Interpretation Research Project

Objectives

– Incremental Automatic Speech Interpretation Algorithm – Corpus Collection – Evaluation Measure

Duration: 2017-2021, 5 years Member:

– Leader: Satoshi Nakamura (NAIST) Leader – Acoustic Signal Processing: Hiroshi Saruwatari (U. Tokyo) – Speech Recognition: Sakriani Sakti (NAIST), Tatsuya Kawahara (Kyoto U) – Machine Translation: Katsuhito Sudo, Yuji Matsumoto (NAIST) – Speech Synthesis: Tomoki Toda (Nagoya U), Shinnosuke Takamichi (U.Tokyo), Sakriani Sakti (NAIST) – Audio-visual Translation: Shigeo Morishima (Waseda U) – Cognitive Load Measurement: Hiroki Tanaka (NAIST) – Corpus Collection: Katsuhito Sudo, Manami Matsuda (NAIST)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

47

slide-48
SLIDE 48

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Project Overview

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

48

Noise Reduction

Noise, Reverberation

Paralinguistic MT Incremental ASR Incremental TTS

Face modeling

Speaking Face MT Extraction of Paralinguistics Speaking Face Conversion Caption Generation Incremental MT

Task1 :Incremental Speech Interpretation Algorithm Task 3: Video MT Paralinguistic TTS Task 2: Paralinguistic Speech Translation

Task 4: Real Time Cognitive Load Measurement by Human Sensing

2x 32ch EEG, Gaze, Heart rate

Task 5: Corpus Collection and Prototyping

Collect 400 hours Data of Japanese and English Speech Interpretation Building Prototype of the Incremental Speech Interpretation System

slide-49
SLIDE 49

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

NAIST Interpreter Corpus

2012-2016

– Source speech: MP4 (TED), MP3 (CNN), PCM – Interpreter speech: 24bit 48kHz PCM

  • Skill:S (10 years+), A(3 years+), B
  • Some data includes speech of multiple interpreters

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

49

Translation direction Domain Source Speech Interpreter Speech #files #hours #files #hours

E‐>J

TED 74 15.2 58 12.3 CNN 13 0.731 7 0.389 Total 87 15.9 65 12.7

J‐>E

TED 60 11.9 60 11.9 CSJ 31 5.51 31 5.51 NHK 10 0.304 10 0.304 Total 101 17.7 101 17.7

slide-50
SLIDE 50

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

NAIST Interpreter Corpus 2018

As of 2018

– Source speech: MP4 (TED, TEDx), PCM (CSJ) – Interpreter speech: 16bit 16kHz PCM

  • Skill:S (10 years +), A (3 years +), B
  • For training set. Total 100 hours by the rank A interpreters
  • For test set. Total 24 hours by one from all rank interpreters

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

50

Translation direction domain Source speech Interpreter speech #files #hours #files #hours

E‐>J

TED 302 66.8 302 66.8 TED (test) 16 4 16 4 total 318 70.8 318 70.8

J‐>E

CSJ 146 33 146 33 TEDx (test) 19 4 19 4 total 165 37 165 37

slide-51
SLIDE 51

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Book (Japanese version)

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

51

slide-52
SLIDE 52

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Contents

1. History of Automatic Speech Translation Research 2. Automatic Speech Interpretation Technologies 3. Current Project and Data Collection 4. Summary and Future Works

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

52 52

slide-53
SLIDE 53

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Summary

Remarkable progress

– By Statistical Machine Translation – Deep Neural Network – Progress in Speech Translation

Automatic Speech Interpretation

– Data Collection – Develop Algorithms both for Automatic Speech Interpretation and Interpreter Support System

Further Research

– Para-linguistics/ Multi-modal – Context/ Situation Dependency – Common Sense and Domain Knowledge – Semantics, Discourse Analysis – Towards Better Communication

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST

53

slide-54
SLIDE 54

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

2019/06/15 CLI9 Keynote Satoshi Nakamura, NAIST 54

slide-55
SLIDE 55

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Communication with Translation

CLI9 Keynote Satoshi Nakamura, NAIST

55 Input: Text Speech Video Gesture

Speech⇒Text ASR

Realtime

Incremental

MT Conversion Dialog Control

Linguistic Information

Paralinguistic Emotion, Style, Personality, Prosody, Gesture Paralinguistic Emotion, Style, Personality, Prosody, Gesture

Output: Text Speech Video Gesture

Source Language Target Language

Speech “to o kyo e i ku” MT results /I/go/to/Tokyo/ TTS results “ai go tu tokyo/ Personality, Prosody Personality, Prosody

Discource Context Domain knowledge, Ontology

Text Image⇒text PR Text Text⇒Speech TTS Text⇒Image Image Syns.

End‐to‐end Process

Communication ① Simultaneity, Incremental, Latency, ② Para/non linguistic information

Linguistic Information 2019/06/15

slide-56
SLIDE 56

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Research Focus Up to Now

Emphases Speech Translation

 Translates speech while preserving emphasis information

CLI9 Keynote Satoshi Nakamura, NAIST

ASR “It is hot today” TTS MT “今日は熱いです” ES Source emphasis information ET Target emphasis information. English Japanese

(1) Emphasis estimation (ES) systems:

Estimate emphasis information given speech & a corresponding word sequence

(2) Emphasis translation (ET) systems:

Translate estimated emphasis information into another language

2019/06/15

56

slide-57
SLIDE 57

http://www.naist.jp/

無限の可能性、ここが最先端 -Outgrow your limits-

Speech Translation Samples

English-Japanese Emphases Translation

CLI9 Keynote Satoshi Nakamura, NAIST

ASR TTS MT

English Japanese natural natural baseline ET(CRF) ET(CRF)+pause natural natural baseline ET(CRF) ET(LSTM)

2019/06/15

57