Simultaneous GermanEnglish Lecture Translation Muntsin Kolss, - - PowerPoint PPT Presentation

simultaneous german english lecture translation
SMART_READER_LITE
LIVE PREVIEW

Simultaneous GermanEnglish Lecture Translation Muntsin Kolss, - - PowerPoint PPT Presentation

Simultaneous GermanEnglish Lecture Translation Muntsin Kolss, Matthias Wlfel, Florian Kraft, Jan Niehues, Matthias Paulik, Alex Waibel IWSLT 2008, October 21, 2008 Simultaneous Lecture Translation: Challenges (for GermanEnglish)


slide-1
SLIDE 1

Simultaneous GermanEnglish Lecture Translation

Muntsin Kolss, Matthias Wölfel, Florian Kraft, Jan Niehues, Matthias Paulik, Alex Waibel IWSLT 2008, October 21, 2008

slide-2
SLIDE 2

Simultaneous Lecture Translation: Challenges (for GermanEnglish)

  • Unlimited Domain:

Wide variety of topics Lectures often go deeply into detail: specialized vocabulary and expressions

  • Spoken Language:

Most lecturers are not professionally trained speakers Conversational speech, more informal than prepared speeches Long monologues, often not easily separable utterances with sentence boundaries

  • Strict Real,time and Latency requirements
  • German,English specific:

English words embedded in German, especially technical terms German compounds Long,distance word reordering

slide-3
SLIDE 3

System Overview

slide-4
SLIDE 4

English Words in German Lectures

182 215 66 448 Total Error 20.5% 15.4% 60.0% 10.7% WER 4 2 3 5 Unknown 56 33 10 68 Both 7 8 6 7 English 113 91 37 258 German Substitutions 2 37 9 58 Insertions 44 1 52 Deletions 887 1397 110 4195 Total Words Unknown Both English German Language

slide-5
SLIDE 5

English Words in German Lectures

  • Two Approaches:

Use two phoneme sets in parallel, one each for German and English (parallel) Map the English pronunciation dictionary to German phonemes (mapping)

WER 18.9% 14.7% 26.4% 11.4% 13.4% Parallel 16.1% 13.8% 34.6% 11.1% 12.7% Mapping 20.5% 15.4% 60.0% 10.7% 13.8% Baseline Unknown Both English German All Language

slide-6
SLIDE 6

Machine Translation: Adaptation to Lectures

  • Training data: German,English EPPS, News Commentary, Travel Expression

Corpus

  • 100K corpus of German lectures held at Universtität Karlsruhe, transcribed and

translated into English

31.40 35.24 + Discriminative Word Alignment 31.38 34.59 + Rule,based word reordering 30.94 34.00 LM and TM adaptation 30.46 33.09 Translation Model (TM) Adaptation 29.17 33.11 Language Model (LM) Adaptation 27.18 31.54 Baseline Test Dev

slide-7
SLIDE 7

Automatic Simultaneous Translation: Input Segmentation

  • Text Translation

source sentence MT Decoder target sentence

  • Speech Translation (turn,based, „push,to,talk“ dialog systems)

source utterance MT Decoder target utterance

  • Simultaneous Translation

continuous ASR input Segmentation MT Decoder target segment

slide-8
SLIDE 8

Low latency translation is easy/

5 10 15 20 25 30 35 40 1 2 3 4 5 6 7 8 9 10 15 20 50 100 10K Segment Length [%] BLEU

Fixed Segment Length

slide-9
SLIDE 9

Disadvantages of Input Segmentation

  • Choosing meaningful segment boundaries is difficult and error,prone
  • No recovery from segmentation errors, input segmentation makes hard decisions
  • Phrases which would match across the segment boundaries can no longer be

used

  • No word reordering across segment boundaries is possible
  • Language model context is lost across the segment boundaries
  • If the language model is trained on sentence segmented data there will often be a

mismatch for the begin,of,sentence and end,of,sentence LM events

slide-10
SLIDE 10

Phrasebased SMT decoder

“I have heard traditional values referred to” “he escuchado relacionarlo con valores tradicionales”

I have heard traditional values referred to

slide-11
SLIDE 11

Stream Decoding: Continuous Translation Lattice

“ and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe ...”

  • No input segmentation: process “infinite” input stream from speech recognizer,

extending/truncating the translation lattice

in F

slide-12
SLIDE 12

Stream Decoding: Continuous Translation Lattice

“ and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe ...”

  • No input segmentation: process “infinite” input stream from speech recognizer,

extending/truncating the translation lattice

in which F

slide-13
SLIDE 13

Stream Decoding: Continuous Translation Lattice

“ and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe ...”

  • No input segmentation: process “infinite” input stream from speech recognizer,

extending/truncating the translation lattice

in which we F

slide-14
SLIDE 14

Stream Decoding: Continuous Translation Lattice

“ and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe ...”

  • No input segmentation: process “infinite” input stream from speech recognizer,

extending/truncating the translation lattice

in which we use F

slide-15
SLIDE 15

Stream Decoding: Continuous Translation Lattice

“ and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe ...”

  • No input segmentation: process “infinite” input stream from speech recognizer,

extending/truncating the translation lattice

in which we use these networks for F

slide-16
SLIDE 16

Stream Decoding: Continuous Translation Lattice

“ and the inspiration for the exact motivation of the stimuli was derived from experiments in which we use these networks for geometrical figures and we ask subjects to describe ...”

  • No input segmentation: process “infinite” input stream from speech recognizer,

extending/truncating the translation lattice

use these networks for F

slide-17
SLIDE 17

Stream Decoding: Asynchronous Input and Output

  • Each incoming source word from the recognizer triggers a new search through the current

translation lattice

  • Output of resulting best hypothesis is partially or completely delayed, until either a time out
  • ccurs or new input arrives, which leads to lattice expansion and a new search
  • Creates sliding window during which translation output lags the incoming source stream

use these networks for F

slide-18
SLIDE 18

Stream Decoding: Output Segmentation

  • Decide which part of the current best translation hypothesis to output, if any at all:

Minimum Latency Lmin: The translation covering the last Lmin untranslated source words received from the speech recognizer at any point is never output (except for time,outs) Maximum Latency Lmax: When the latency reaches Lmax source words, translation output covering the source words exceeding this value is forced in which we use these networks for F

Lmin Lmax

slide-19
SLIDE 19

Stream Decoding: Output Segmentation

  • Backtrace hypothesis until Lmin source words have been passed
  • If the hypothesis reached contains reordering gaps, continue backtracing until state with no
  • pen reorderings
  • If no such state can be found, perform a new restricted search that only expands hypotheses

which have to open reorderings at the node where the maximum latency would be exceeded

Lmin Lmax

slide-20
SLIDE 20

Stream Decoding Performance under Latency Constraint

5 10 15 20 25 30 35 40 1 2 3 4 5 6 7 8 9 10 15 20 50 100 10K Segment Length [%] BLEU Fixed Segment Length Keeping LM State Acoustic Features Stream Decoding

Lmin and Lmax chosen to optimize translation quality

slide-21
SLIDE 21

Choosing optimal parameter values for Lmin and Lmax

1 2 3 4 5 6 7 8 9 10 1 3 5 7 9 5 10 15 20 25 30 35 40 [%] BLEU Minimum Latency Maximum Latency

slide-22
SLIDE 22

Summary

  • Current system for simultaneous translation of German lectures to English

combines state,of,the,art ASR and SMT components

  • ASR system modified to handle German compounds, and English terms and

expressions embedded in German lectures

  • SMT system uses additional compound splitting and model adaptation to topic and

style of lectures

  • Experiments with Stream Decoding to reduce latencies of the overall system
  • Generated translation output provides a good idea of what the German lecturer

said

  • Major challenge for the future is better addressing long,range word reordering

requirements between German and English