 
              Simultaneous Speech Translation Simultaneous Speech Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/16/2015 Joint Work With: Satoshi Nakamura, Tomoki Toda, Sakriani Sakti, Tomoki Fujita, Hiroaki Shimizu, Yusuke Oda, Takashi Mieno, Quoc Truong Do 1
Simultaneous Speech Translation Background 2
Simultaneous Speech Translation Speech Translation Source: Microsoft Research http://research.microsoft.com/en-us/news/features/translator-052714.aspx Source: NICT http://www.nict.go.jp/press/2010/06/29-1.html 3 Source: Karlsruhe Institute of Technology http://isl.anthropomatik.kit.edu/english/1520.php
Simultaneous Speech Translation Traditional Speech Translation Divide at sentence boundaries ASR こ ん に ち は 、 駅 は ど こ で す か ? MT Hello, where is the station? TTS 4
Simultaneous Speech Translation Problem: Delay (Ear-Voice Span) Delay ASR こ ん に ち は 、 駅 は ど こ で す か ? MT Hello, where is the station? TTS 5
Simultaneous Speech Translation Speech Translation Example 6
Simultaneous Speech Translation Simultaneous Speech Translation Delay: Reduced ASR こ ん に ち は 、 駅 は ど こ で す か ? MT MT MT Hello, the station where is it? TTS TTS TTS But, this is not easy! 7
Simultaneous Speech Translation Professional Simultaneous Interpretation Photo Credit: 8 https://www.flickr.com/photos/joi/2027679714 https://www.flickr.com/photos/european_parliament/4268490015
Simultaneous Speech Translation Simultaneous Interpretation Data [Shimizu+ LREC14]  Recorded data - About 10 Hours of TED Talks (English-Japanese, Japanese-English)  Simultaneous interpreters Experience Rank - 3 pros with varying years of 15 years S rank experience 4 years A rank - Ranked S, A, and B 1 year B rank Freely available for research purposes: http://ahclab.naist.jp/resource/stc/ 9
Simultaneous Speech Translation Simultaneous Interpreter Example 10
Simultaneous Speech Translation So How do Simultaneous Interpreters Do It? Source: 今 ご 覧 い た だ い た こ の 映 像 は 今 か ら 五 年 前 、 日 本 で 世 間 を 賑 わ せ て い た 裁 判 員 制 度 が 始 ま る 一 年 前 、 大 学 四 年 生 だ っ た 私 が 模 擬 裁 判 用 の 資 料 と し て 作 っ た 物 で す Translation: Five years ago, as a college senior, I created the video that you just saw as a reference material for a mock trial, one year before the much-talked-about jury system commenced in Japan. Interpretation: Predict NP You just saw this video clip. Five years ago, at that time in Japan, the ordinary people's justice system, jury system, was very much talked about in Japan, and I created this video as a reference material for that. 11 Segmentation Prediction Rewording Summarization
Simultaneous Speech Translation Can We Do the Same in Speech Translation Systems? Four problems in this talk: ● Segmentation: When do we start translating? ● Prediction: Can we predict things that haven't been said? ● Rewording: Can we reword sentences to be conducive to simultaneous translation? ● Evaluation: How do we decide which results are better? 12
Simultaneous Speech Translation Segmentation 13
Simultaneous Speech Translation Heuristic Segmentation Strategies Division on pauses [Fugen+ 07, Bangalore+ 12] hello where is the station comma no comma Division on predicted commas [Sridhar+ 13] Division based on reordering probabilities [Fujita+ 13] hello → probability of reordering 0.1 14 where → probability of reordering 0.8
Simultaneous Speech Translation Optimizing Segmentation Strategies for Simultaneous Speech Translation [Oda+ ACL14] ● All previous segmentation strategies were based on heuristics ● Don't directly take into account effect on translation accuracy What if we could directly optimize sentence segmentation for translation accuracy? 15
Simultaneous Speech Translation Training/Testing Framework Find segmentation S* Training Corpus Segmentation S* that maximizes MT accuracy src src src trg trg trg src src src src src src trg trg trg src src src src src src trg trg trg src src src Train segmentation Model model Testing Corpus Segmented Test Translated Test src src src src src src trg trg trg src src src src src src trg trg trg Segment Translate src src src src src src trg trg trg 16
Simultaneous Speech Translation S* Search Method 1: Greedy Search I ate lunch but she left 私 は 昼 食 を 食 べ た が 彼 女 は 帰 っ た I ate lunch but she left 0.7 私 昼 食 を 食 べ た が 彼 女 は 帰 っ た I ate lunch but she left 0.4 私 は 食 べ た ラ ン チ 彼 女 は 帰 っ た I ate lunch but she left 0.6 私 は 昼 食 を 食 べ た し か し 彼 女 は 帰 っ た 私 は 昼 食 を 食 べ た が 彼 女 は 帰 っ た 1.0 I ate lunch but she left 0.2 私 は 食 べ た が 彼 女 左 I ate lunch but she left I ate lunch but she left I ate lunch but she left 私 昼 食 を 食 べ た が 彼 女 は 帰 っ た 0.9 I ate lunch but she left 私 は 食 べ た 昼 食 だ が 彼 女 は 帰 っ た 0.3 I ate lunch but she left 私 は 昼 食 を 食 べ た し か し 彼 女 は 帰 っ た 0.6 I ate lunch but she left 私 は 昼 食 を 食 べ た が 彼 女 左 0.2 I ate lunch but she left 17 Train SVM classifier to recover / at test time
Simultaneous Speech Translation S* Search Method 2: Grouping by Features ● Because MT/Evaluation is complicated, there is the potential to overfit ● Solution: group boundaries by features I ate lunch but she left PRN VBD NN CC PRN VBD Pronoun + Verb Noun + Conjunction Determiner + Noun I ate an apple and an orange PRN VBD DET NN CC DET NN Search can be performed using dynamic programming 18 Features for the model trivial, no learning is needed
Simultaneous Speech Translation Results on TED Talks 19 → 2-3 times faster with no loss in BLEU
Simultaneous Speech Translation Simultaneous Translation Demo ● Greedy+Grouping at 10 words 20
Simultaneous Speech Translation Future Contributions to Segmentation? ● Speech: Optimized models using acoustic features? ● Parsing: Incorporation with incremental parsing? e.g. [Ryu+ 06] ● Machine Learning: Smarter models: neural networks? ● Algorithms: Integration with incremental decoding? e.g. [Sankaran+ 10] 21
Simultaneous Speech Translation Prediction 22
Simultaneous Speech Translation What Kind of Prediction do Simultaneous Interpreters Do? [Wilss 78, Chernov+ 04] ● Lexical prediction サ イ エ ン ス を 正 し く 楽 し く 、 こ れ を 合 い 言 葉 に サ イ エ ン ス CG science factual fun this keyword as science CG then what I wanted to do is to ク リ エ ー タ ー と し て 活 動 し て い ま す 。 creator as working promote fun and factual science, that's my keyword. I'm a … ● Structural prediction 今 ご 覧 頂 い た 映 像 now you saw video you just saw a video clip 23
Simultaneous Speech Translation Predicting Sentence-final Verbs [Grissom et al., EMNLP14] ● Method for translating from verb-final languages (e.g. German) ● Train a classifier to predict the sentence-final verb ● Use reinforcement learning to decide to “wait” “predict” or “commit” 24
Simultaneous Speech Translation Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents [Oda+ ACL15] ● Predict unseen syntax constituents ● Translate from correct tree Predict In the next 18 minutes I VP PP S IN NP PP NP VP NP NP IN NP PRP DT JJ CD NNS NN DT JJ CD NNS in the next 18 minutes I in the next 18 minutes I (VP) 25 今 か ら 18 分 私 今 か ら 18 分 で 私 は (VP)
Simultaneous Speech Translation Why is Syntax Necessary? ● Tree-to-string (T2S) MT framework ● Obtains state-of-the-art results on syntactically distant language pairs (c.f. phrase-based translation; PBMT) ● Possible to use additional syntactic constituents explicitly S VP NP This is NP Parse NP MT こ れ は NP で す DT VBZ This is ● Additional heuristic to wait for more input based on when 26 translation requires reordering
Simultaneous Speech Translation Making Training Data for Syntax Prediction ● Decompose gold trees in the treebank 1. Select any leaf span in the tree S 2. Find the path between leftmost/rightmost leaves NP VP 3. Delete the outside subtree DT VBZ NP 4. Replace inside subtrees with topmost phrase label DT NN NN 5. Finally we obtain: This is a pen nil is a NN nil Leaf span 27 Left syntax Leaf span Right syntax
Simultaneous Speech Translation Syntax Prediction Process 1. Parse the input as-is 2. Extract features PP Word:R1=I ROOT=PP POS:R1=NN ROOT-L=IN Word:R1-2=I,minutes ROOT-R=NP POS:R1-2=NN,NNS ... IN NP ... 3. Predict the next tag NP NP (linear SVM) VP ... 0.65 NP ... 0.28 4. Append to DT JJ CD NNS NN nil ... 0.04 sequence ... 5. Repeat until nil in the next 18 minutes I VP nil Input translation unit 28
Recommend
More recommend