Open Source Toolkit for Speech to Text Translation Thomas Zenkel, - - PowerPoint PPT Presentation

open source toolkit for speech to text translation
SMART_READER_LITE
LIVE PREVIEW

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, - - PowerPoint PPT Presentation

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Mller, Ngoc- Quan Pham, Sebastian Stker, Alex Waibel Institute for Antrophomatics KIT University of the State of Baden-Wuerttemberg


slide-1
SLIDE 1

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

Institute for Antrophomatics

www.kit.edu

Open Source Toolkit for Speech to Text Translation

Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Müller, Ngoc- Quan Pham, Sebastian Stüker, Alex Waibel

slide-2
SLIDE 2

Institute for Anthropomatics 2 16.08.18 Jan Niehues - S2T Translation

Motivation

  • Speech translation interesting challenge
  • Neural models
  • End-to-End models
  • Provide a baseline
  • Cascade of several models
  • Easy to extend
  • Develop models for part
  • Easy to use
  • Download pretrained models
slide-3
SLIDE 3

Institute for Anthropomatics 3 16.08.18 Jan Niehues - S2T Translation

Cascade Spoken Language Translation

  • Serial combination of several

models

  • ASR
  • Audio → Text
  • Segmentation
  • Add case information
  • Add punctuation information
  • Machine translation
  • Source language →

target language

slide-4
SLIDE 4

Institute for Anthropomatics 4 16.08.18 Jan Niehues - S2T Translation

CTC-based ASR

  • Input:
  • 40 dimensional Mel-filterbank features
  • Output:
  • Byte-pair units (300 or 10000)
  • Model:
  • 4-layer Bi-LSTM
  • Softmax layer
  • Trained using CTC loss function
slide-5
SLIDE 5

Institute for Anthropomatics 5 16.08.18 Jan Niehues - S2T Translation

Encoder-Decoder Based ASR

  • XNMT-based implementation
  • Input:
  • 40 dimensional Mel-filterbank features
  • Encoder:
  • 4-layer bidirectional pyramidal

encoder

  • Decoder:
  • One-layer bidirectional decoder
slide-6
SLIDE 6

Institute for Anthropomatics 6 16.08.18 Jan Niehues - S2T Translation

Segmentation and Punctuation

  • Monolingual machine translation system
  • Add punctuation and case
  • Example:
  • Input:
  • i felt wor@@ se why i wro@@ te a

who@@ le book

  • Output:
  • U L L. U? U L L L L
  • I felt worse. Why? I wrote a whole book
  • Preprocessing:
  • Randomly split training data and

remove punctuation information

  • OpenNMT-based model
slide-7
SLIDE 7

Institute for Anthropomatics 7 16.08.18 Jan Niehues - S2T Translation

Machine Translation

  • OpenNMT-based model
  • RNN-based Encoder and Decoder
  • Preprocessing:
  • Tokenizer
  • Byte-pair encoding
  • Mid-size model:
  • Pre-training on all data
  • Adaptation to in-domain data using

continue training

slide-8
SLIDE 8

Institute for Anthropomatics 8 16.08.18 Jan Niehues - S2T Translation

Data

  • Scripts to download and preprocess

default data

  • Audio:
  • TED LIUM corpus
  • Text:
  • Small model:
  • WIT corpus
  • Midsize model:
  • EPPS corpus
  • WIT corpus
slide-9
SLIDE 9

Institute for Anthropomatics 9 16.08.18 Jan Niehues - S2T Translation

Results

  • Evaluation tool to calculate 4 metrics provided
  • BLEU, TER, CharacTER, BEER
  • Automatic re-segmentation

Model dev2010 tst2010 tst2013 tst2014 Attention 13.42 13.57 12.04 11.88 CTC 300 12.33 11.88 12.47 11.49 CTC 10K 13.04 13.44 13.41 12.58 Rover 13.98 14.08 13.73 13.23

slide-10
SLIDE 10

Institute for Anthropomatics 10 16.08.18 Jan Niehues - S2T Translation

Conclusion

  • Combination of several toolkits to build full speech translation toolkit
  • Easy usage:
  • Dockerized
  • Applications
  • Apply pre-trained models
  • Train models using provided data (IWSLT)
  • Train models on own data
  • Link:
  • https://github.com/isl-mt/SLT.KIT