Simultaneous Speech Translation Graham Neubig Nara Institute of - - PowerPoint PPT Presentation

simultaneous speech translation
SMART_READER_LITE
LIVE PREVIEW

Simultaneous Speech Translation Graham Neubig Nara Institute of - - PowerPoint PPT Presentation

Simultaneous Speech Translation Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 5/18/2015 Joint Work With: Satoshi Nakamura, Tomoki Toda, Sakriani Sakti, Tomoki Fujita, Hiroaki Shimizu,


slide-1
SLIDE 1

1

Simultaneous Speech Translation

Simultaneous Speech Translation

Graham Neubig Nara Institute of Science and Technology (NAIST) 5/18/2015 Joint Work With: Satoshi Nakamura, Tomoki Toda, Sakriani Sakti, Tomoki Fujita, Hiroaki Shimizu, Yusuke Oda, Takashi Mieno

slide-2
SLIDE 2

2

Simultaneous Speech Translation

Background

slide-3
SLIDE 3

3

Simultaneous Speech Translation

Speech Translation Systems

  • Translate speech from source language to target

ASR こんにちは、駅はどこですか? MT Hello, where is the station? TTS

slide-4
SLIDE 4

4

Simultaneous Speech Translation

Problem: Delay

  • Wait for the whole utterance to end before translating

ASR こんにちは、駅はどこですか? MT Hello, where is the station? TTS

Delay

slide-5
SLIDE 5

5

Simultaneous Speech Translation

Solution: Divide into Smaller Chunks

ASR

こんにちは、

  • Choose appropriate timing to start translation

MT

駅は

MT

どこですか?

MT

Hello,

the station where is it? TTS TTS TTS

Delay: Reduced

slide-6
SLIDE 6

6

Simultaneous Speech Translation

Four Problems

  • Segmentation: When do we start translating?
  • Prediction: Can we predict things that haven't been

said?

  • Data: Can we learn something from actual

simultaneous interpreters?

  • Evaluation: How do we decide which results are

better?

slide-7
SLIDE 7

7

Simultaneous Speech Translation

1) Sentence Segmentation for Simultaneous Speech Translation

slide-8
SLIDE 8

8

Simultaneous Speech Translation

Previous Work: Incremental Dependency Parsing/Manual Rules [Ryu+ 04]

  • Utilize knowledge of English/Japanese to derive rules

I went to the park with your brother

subj prep prep Translate after the first prepositional phrase completes!

私は公園に行きました あなたの弟と

MT MT

  • - Requires a bilingual linguist to design rules
  • - Requires an accurate incremental dependency parser
slide-9
SLIDE 9

9

Simultaneous Speech Translation

Previous Work: Division on Pauses [Fugen+ 08, Bangalore+ 12]

  • Simply divide on short pauses in the utterance

ASR

hello where is the station

  • - Cannot capture relationship between languages
  • - Result will greatly change with speech speed,

disfluencies

slide-10
SLIDE 10

10

Simultaneous Speech Translation

Previous Work: Division on Predicted Commas [Sridhar+ 13]

  • Guess where commas would appear in the text

hello where is the ... comma! translate

  • + Simple, and surprisingly effective
  • - No parameter to adjust the granularity
  • - Can't capture features of the target language

Classifier

no comma wait

Classifier

no comma wait

Classifier

slide-11
SLIDE 11

11

Simultaneous Speech Translation

Considering Reordering Probabilities in Sentence Segmentation [Fujita et al., Interspeech 2013]

slide-12
SLIDE 12

12

Simultaneous Speech Translation

Phrase Based Machine Translation

  • Divide the sentence into small phrases and translate

Today I will give a lecture on machine translation .

Today 今日は、 I will give を行います a lecture on の講義 machine translation 機械翻訳 . 。 Today 今日は、 I will give を行います a lecture on の講義 machine translation 機械翻訳 . 。

今日は、機械翻訳の講義を行います。

  • Score translations with translation model (TM),

reordering model (RM), and language model (LM)

slide-13
SLIDE 13

Simultaneous Speech Translation

Translation Model Creation

  • Perform automatic alignment of bitext
  • From aligned text, extract phrases for translation

the hotel front desk ホ テ 受 ルの付

ホテル の → hotel ホテル の → the hotel 受付 → front desk ホテルの受付 → hotel front desk ホテルの受付 → the hotel front desk

slide-14
SLIDE 14

Simultaneous Speech Translation

Lexicalized Reordering Model

  • Probabilistically models reorderings for increased

accuracy of translation

  • Given current phrase and next phrase:
  • “monotone” + “discontinuous right” = “right probability”

背 の 高い 男 the tall man

Monotone:

太郎 を 訪問 した visited Taro

Swap:

私 は 太郎 を 訪問した I visited Taro

Discontinuous Right: Discontinuous Left:

背 の 高い 男 を 訪問 した visited the tall man

slide-15
SLIDE 15

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet
slide-16
SLIDE 16

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-17
SLIDE 17

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-18
SLIDE 18

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-19
SLIDE 19

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-20
SLIDE 20

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello”

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-21
SLIDE 21

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello”

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-22
SLIDE 22

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello” “where is” phrase exists ↓ wait

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-23
SLIDE 23

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello” “where is” phrase exists ↓ wait

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-24
SLIDE 24

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello” “where is” phrase exists ↓ wait “where is the” phrase missing ↓ translate “where is”

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-25
SLIDE 25

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello” “where is” phrase exists ↓ wait “where is the” phrase missing ↓ translate “where is”

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-26
SLIDE 26

Simultaneous Speech Translation

Method One: Choosing Translation Timing with Phrases

  • Input words one at a time from ASR
  • While words exist in phrase table, don't translate yet

Input String hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ translate “hello” “where is” phrase exists ↓ wait “where is the” phrase missing ↓ translate “where is” “the station” utterance ends ↓ translate “the station”

Phrase Table hello→ こんにちは where→ どこ where is→ どこですか the→ その the station→ 駅

slide-27
SLIDE 27

Simultaneous Speech Translation

Problem with Method One

  • Has the potential to degrade translation accuracy:

こんにちは 駅 は どこ ですか Hello, where is the station Normal phrase-based translation: こんにちは 駅 は どこ ですか Hello, where is it the station Translation with early timing:

slide-28
SLIDE 28

Simultaneous Speech Translation

Method Two: Adjusting Timing with Reordering Probabilities

  • First, temporarily choose strings according to method one
  • Next, if that phrase's right probability exceeds a

threshold, actually translate the words in the cache

  • Threshold 1.0 = traditional, 0.0 = method one

Example (threshold = 0.8): hello where is the station

“hello” phrase exists ↓ wait “hello where” phrase missing ↓ choose “hello” ↓ right probability is 0.9 > 0.8 ↓ translate “hello” “where is” phrase exists ↓ wait “where is the” phrase missing ↓ choose “where is” ↓ right probability is 0.6 < 0.8 ↓ do not translate yet “the station” utterance ends ↓ translate “where is the station”

slide-29
SLIDE 29

Simultaneous Speech Translation

Experimental Setup

  • Four Types of Experiments:
  • Japanese-English BTEC Travel Conversation (ja-en)
  • Japanese-English BTEC with 11+ Words (ja-en 11+)
  • English-Japanese BTEC Travel Conversation (en-ja)
  • French-English WMT News (fr-en)
  • Evaluation Measures:
  • Accuracy

– 14-ref BLEU for BTEC, 1-ref BLEU for News – Manually-graded acceptability

  • Delay (Seconds)
slide-30
SLIDE 30

Simultaneous Speech Translation

1 1.5 2 2.5 3 3.5 4 4.5 20 25 30 35 40 45 50 Proposed Pause Delay (Seconds) Accuracy (BLEU)

Accuracy, Comparison with Pause-based Segmentation

  • In faster settings proposed method best
  • In slower settings pause-based method best
slide-31
SLIDE 31

Simultaneous Speech Translation

Manual Evaluation

  • Decrease in manual evaluation as well, but less
  • bvious than evaluated by BLEU
slide-32
SLIDE 32

32

Simultaneous Speech Translation

System Demo

slide-33
SLIDE 33

Simultaneous Speech Translation

Optimizing Segmentation Strategies [Oda et al. 14, ACL]

slide-34
SLIDE 34

Simultaneous Speech Translation

Motivation

  • All previous segmentation strategies were based on

heuristics

  • Don't directly take into account effect on translation

accuracy

What if we could directly optimize sentence segmentation for translation accuracy?

slide-35
SLIDE 35

Simultaneous Speech Translation

slide-36
SLIDE 36

36

Simultaneous Speech Translation

S* Search Method 1: Greedy Search

I ate lunch but she left

ω = 0.7 ω = 0.5 ω = 0.8 ω = 0.6 ω = 0.6

I ate lunch but she left I ate lunch but she left

ω = 0.6 ω = 0.4 ω = 0.3 ω = 0.9 ω = 0.7 ω = 0.6 ω = 0.6

  • Choose the next segmentation point to maximize

accuracy → Use the chosen segments as training data for a classifier

slide-37
SLIDE 37

37

Simultaneous Speech Translation

S* Search Method 2: Grouping by Features

I ate lunch but she left

PRN VBD NN CC PRN VBD

I ate an apple and an orange

PRN VBD DET NN CC DET NN

Pronoun + Verb Noun + Conjunction Determiner + Noun

  • Because MT/Evaluation is complicated, there is the

potential to overfit

  • Solution: group boundaries by features

Search can be performed using dynamic programming Features for the model trivial, no learning is needed

slide-38
SLIDE 38

38

Simultaneous Speech Translation

Results on TED Talks

→ 2-3 times faster with no loss in BLEU

slide-39
SLIDE 39

39

Simultaneous Speech Translation

Problems of Overfitting

slide-40
SLIDE 40

40

Simultaneous Speech Translation

  • 2. Predicting Future Information for

Simultaneous Speech Translation

slide-41
SLIDE 41

41

Simultaneous Speech Translation

Why Prediction?

私 は 店 に 行った [I] [to the store] [went] I went to the store 私 は 店 に 友達と [I] [to the store] [with a friend] I went to the store with a friend to buy books sometimes 時々 本 を 買い に 行った [sometimes][to buy books] [went] 私 は 店 に 友達と [I] [to the store] [with a friend] I went to the store 時々 本 を 買い に 行った [sometimes][to buy books] [went]

Predict!

with a friend to buy books sometimes

slide-42
SLIDE 42

42

Simultaneous Speech Translation

Predicting Sentence-final Verbs [Grissom et al., EMNLP 14]

Train a classifier to predict the sentence-final verb Use reinforcement learning to decide to “wait” “predict” or “commit”

slide-43
SLIDE 43

43

Simultaneous Speech Translation

Predicting Unseen Syntactic Constituents for Syntax-based Translation [Oda et al., ACL 2015]

slide-44
SLIDE 44

44

Simultaneous Speech Translation

Problems with Phrase-based Translation

友達 と ご飯 を 食べた with my friend rice I ate I ate rice with my friend 友達 と ご飯 を 食べた with my friend rice I ate I ate rice with my friend 友達 と ご飯 を 食べた and my friend rice I ate I ate rice and my friend

slide-45
SLIDE 45

45

Simultaneous Speech Translation

Syntactic Parsing

友達 と ご飯 を 食べた

N P N P V PP PP VP

友達 と ご飯 を 食べた

N P N P V PP PP VP NP NP NP

friend + rice OBJ ate friend + rice OBJ ate

slide-46
SLIDE 46

46

Simultaneous Speech Translation

Tree-to-String Translation [Liu+ 06]

VP0-5 PP0-1 VP2-5 PP2-3 N2 P3 V4 N0 P1 友達 と ご飯 を 食べ た SUF5 VP4-5 x1 with x0 x1 x0 my friend rice ate

ate rice with my friend

x1 x0 x1 x0

slide-47
SLIDE 47

47

Simultaneous Speech Translation

  • 例: in the next 18 minutes I 'm going to take …
  • In the most usual case, a verb phrase (VP) occurs after "I."
  • But in this case, VP is dropped.
  • Translation units may not be complete its syntax structure itself.

Verb Phrase

Problems with Syntactic Simultaneous Translation

slide-48
SLIDE 48

48

Simultaneous Speech Translation

Motivation

  • Predict and consider unseen syntactic constituents

for syntactically incorrect translation units

  • Use these to improve translation

In the next 18 minutes I VP

Translation unit Additional syntactic constituent

Predict

今から 18 分で、私は VP 今から 18 分私

without additional constituent with it

slide-49
SLIDE 49

49

Simultaneous Speech Translation

Segmentation-based Simultaneous Translation

  • Simultaneous (speech) translation system based on segmentation

this is a pen this is a pen Segmentation これです ペン Retrieve words from speech Make groups

  • f words

↓ Translation units Translate each translation unit respectively Translation Output ASR

slide-50
SLIDE 50

50

Simultaneous Speech Translation

Two Proposed Methods

  • Proposed method 1: Filling up unseen constituents
  • Proposed method 2: Waiting translation based on unseen const.
  • Translation results are output if specific conditions are satisfied.
  • Or, concatenate it with next translation unit.

this is NP this is a pen this is a pen これは NP です これはペンです this is a pen Waiting this is Proposed method 2 Proposed method 1 Segmentation ASR Segmentation Translation Parsing Output

slide-51
SLIDE 51

51

Simultaneous Speech Translation

Predicting Unseen Constituents

Word:R1=I POS:R1=NN Word:R1-2=I,min. POS:R1-2=NN,NNS ... ROOT=PP ROOT-L=IN ROOT-R=NP ...

VP ... 0.65 NP ... 0.28 nil ... 0.04 ...

VP

  • 5. Recursively continue

until 'nil' is generated

nil

in the next 18 min. I IN DT JJ CD NNS NN NP NP NP PP

Translation unit

  • 1. Forcedly parse translation unit
  • 2. Extract features
  • 3. Multi-class

classification

  • 4. Add most-probable

constituents into the tail

slide-52
SLIDE 52

52

Simultaneous Speech Translation

Generating Training Data

  • Decompose parse trees in Penn Treebank

to generate training data

This is a pen DT VBZ NN NP VP NP S DT This is DT VBZ NP VP NP S is a pen VBZ NN NP VP DT is a VBZ NN NP VP DT

is a pen [nil] is a [NN] [nil] This is [NP] [nil]

slide-53
SLIDE 53

53

Simultaneous Speech Translation

Tree-to-String SMT

  • Tree-to-string (T2S) SMT

  • Use parse trees of source language to translate
  • Basically, T2S are better than phrase-based SMT (PBMT)

for syntactically distant language pairs

– e.g. English→Japanese

This is a pen

This is a pen DT VBZ NN NP VP NP S DT これ は ペン で す Parse MT

slide-54
SLIDE 54

54

Simultaneous Speech Translation

Tree-to-String SMT with Predicted Constituents

  • Tree-to-string (T2S) SMT

  • Use parse trees of source language to translate
  • Basically, T2S are better than phrase-based SMT (PBMT)

for syntactically distant language pairs

– e.g. English→Japanese

  • T2S can use filled up constituents explicitly

This is NP

This is DT VBZ

NP

VP NP S これ は NP で す Parse MT

slide-55
SLIDE 55

55

Simultaneous Speech Translation

Problems with Reordering

タグ推定後の 入力文

in the next 18 minutes i 'm going to take [NP] you on a journey

翻訳結果

18 分 で あ る [NP] を 行 っ て い ま す 旅 の 途中 で あ る の か

Afuer translatjng, rightmost syntactjc constjtuents are sometjmes place non-rightmost in the translatjon results. ⇒ Reordering has occurred ... translatjon may fail.

slide-56
SLIDE 56

56

Simultaneous Speech Translation

Solutions to Reordering

Predicted Tags

in the next 18 minutes i 'm going to take [NP] (wait) i 'm going to take you on a journey

Translatjon Result

18 分 で あ る [NP] を 行 っ て い ま す 皆さん を 旅 に お連れ します

Afuer translatjng, rightmost syntactjc constjtuents are sometjmes place non-rightmost in the translatjon results. ⇒ Reordering has occurred ... translatjon may fail. ⇒ Cancel output and wait next translatjon unit

Correct Result!

slide-57
SLIDE 57

57

Simultaneous Speech Translation

Experiments

  • Domain

TED [WIT3]

  • Languages

English → 日本語

  • Tokenizatjon

Stanford Tokenizer, KyTea

  • Parsing

Ckylark [Oda+2015]

  • MT Decoder

Moses(PBMT), Travatar(T2S)

  • Evaluatjon

BLEU ・ RIBES

  • Segmentatjon

n-words, GreedySeg[Oda+2014]

Method

Overview

Baselines

PBMT

PBMT (Moses)

T2S

T2S-MT (Travatar) without constjtuent predictjon

Proposed

T2S-tag

T2S-MT (Travatar) with constjtuent predictjon

T2S-wait

T2S-MT (Travatar) with constjtuent predictjon & waitjng

slide-58
SLIDE 58

58

Simultaneous Speech Translation

Results (BLEU) (1)

2 4 6 8 10 12 14 16 18 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 T2S T2S-tag T2S-wait PBMT

slide-59
SLIDE 59

59

Simultaneous Speech Translation

Results (BLEU) (2)

  • Few segmentations ( right of graph ) … Tree-to-string is higher
  • Many segmentations ( left of graph ) … PBMT is higher
  • Syntactic deficiencies occur due to segmentation

2 4 6 8 10 12 14 16 18 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 T2S T2S-tag T2S-wait PBMT

PBMT T2S Accuracies are reversed

mean #words

slide-60
SLIDE 60

60

Simultaneous Speech Translation

Results (BLEU) (3)

  • Filling up syntactic constituents ( T2S-tag, T2S-wait )
  • Keep translation accuracy under settings of many segmentation

… Effects of additional constituents

2 4 6 8 10 12 14 16 18 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 T2S T2S-tag T2S-wait PBMT

T2S-wait T2S-tag Keep accuracy even many segmentation

mean #words

slide-61
SLIDE 61

61

Simultaneous Speech Translation

  • 3. Using Information from

Simultaneous Interpreters [Shimizu et al., IWSLT14]

slide-62
SLIDE 62

62

Simultaneous Speech Translation

Simultaneous Interpretation Techniques

 The salami technique [Jones 02] - Split long sentences into shorter ones last year I went to Japan 去年  Word choice - Can reduce reordering in structurally different languages A because B B だから A A because B A なぜならば B Translation Simultaneous Interpreting Simultaneous interpreters have a variety of techniques to reduce delay! 日本に行った

slide-63
SLIDE 63

63

Simultaneous Speech Translation

Simultaneous Interpretation Data

 Recorded data - TED Talks (English-Japanese) Advantage: Can compare with translations (subtitles) Experience Rank 15 years S rank 4 years A rank 1 year B rank Advantage: Can analyze differences  Simultaneous interpreters - 3 pros with varying years of experience - Ranked S, A, and B

slide-64
SLIDE 64

64

Simultaneous Speech Translation

Using Simultaneous Interpretation Data

Input

Translation System

Interpreted

Training

Translated Interpretation- like results

[Paulik+ 09] [Sridhar+ 13] Traditional Proposed  Approach: - Incorporate simultaneous interpretation data in training the MT system

slide-65
SLIDE 65

65

Simultaneous Speech Translation

Incorporating Interpretation Data

 Tuning (Tu) - Tune the parameters of the translation systems to match the interpretation data  Language Model (LM) : Linear Interploation - Match the style of simultaneous interpreters  Translation Model (TM) : fill-up [Bisazza+ 11] - Like the LM, adapt to match the translation data LM Apply simultaneous interpretation data to three processes in MT training

slide-66
SLIDE 66

66

Simultaneous Speech Translation

Experimental Evaluation

I mp r

  • v

e me n t i n A c c u r a c y P h r a s e S e n t e n c e

slide-67
SLIDE 67

67

Simultaneous Speech Translation

  • 4. Evaluating Simultaneous Speech

Translation Systems [Mieno et al., 15]

slide-68
SLIDE 68

68

Simultaneous Speech Translation

Problems with Evaluation

System Accuracy

  • Diffjcult to make clear which of two systems is betuer

more / reasonable / is there a hotel ? / more / reasonable / is there a hotel ? / do you have a more reasonable hotel ? / do you have a more reasonable hotel ? / 原文:「もっと 手頃 な ホテル は あり ませ ん か」 原文:「もっと 手頃 な ホテル は あり ませ ん か」 Don't split Split Finely

Low Delay Accuracy High Low Score

slide-69
SLIDE 69

69

Simultaneous Speech Translation

Goal of Evaluation

  • Based on speed and accuracy, determine which system is betuer

High Low

Accuracy Accuracy Accuracy Delay Delay Delay

slide-70
SLIDE 70

70

Simultaneous Speech Translation

How to Create an Evaluation Function? (Based on Data)

Evaluated Data Evaluated Data Accuracy Accuracy Delay Delay

Training Data Features

Movies with various delays and accuracies

Movie data

Machine Learning

Evaluation Function

slide-71
SLIDE 71

71

Simultaneous Speech Translation

Data Format

  • Rank-based evaluatjon

– Perform comparatjve evaluatjon of which output is “betuer” – Allows for consideratjon of both speed and accuracy

System A System B System C System D System E Output A Output B Output C Output D Output E

☆Rank

4 1 3 2 5

Input video

Ranking by evaluators

slide-72
SLIDE 72

72

Simultaneous Speech Translation

Learning an Evaluation Function

Weight vector Features useful in evaluation (i.e., delay and accuracy)

Displayed video

Define a linear function that takes a video as input and returns a score

This function can be learned from ranked data using “learning to rank”

slide-73
SLIDE 73

73

Simultaneous Speech Translation

Experimental Setup

  • Target video

TED Talks TED Talks

  • Gathered data

Video 20 Types 20-30 Seconds Delay 7 Types 0,1,2,3,5,7,10 Seconds Subjects 15 Japanese speakers Method Ranking 1-3

  • Translation data

(5 varieties)

English → Japanese ① Realtime trans. is important ② Often used in MT evaluation

Translator Translator

Interpreter 1 (S Rank) Interpreter 1 (S Rank) Interpreter 2 (A Rank) Interpreter 2 (A Rank) Syntax-based MT Syntax-based MT Phrase-based MT Phrase-based MT

slide-74
SLIDE 74

74

Simultaneous Speech Translation

Learned Evaluation Functions

Speech Output Subtitle Output

5 4 3 2 1 2 4 6 8 10

Delay (s)

5 4 3 2 1 2 4 6 8 10

Delay (s) 5 Level Acceptability 5 Level Acceptability

Accuracy Delay Subtitle Output 1.40

  • 0.059

Speech Output 1.99

  • 0.018
slide-75
SLIDE 75

75

Simultaneous Speech Translation

Conclusion

slide-76
SLIDE 76

76

Simultaneous Speech Translation

Conclusion and Challenges

  • Segmentation: Reordering-based and optimization-

based methods

  • Next: Joint consideration of lexical/acoustic features?
  • Prediction: Unseen syntactic constituents
  • Next: Rewording and correcting prediction mistakes?
  • Interpreter Data: Adapted translation models
  • Next: Learning summarization, or other strategies?
  • Evaluation: Made clear accuracy/delay relationship
  • Next: More expressive models, features?