Open Source Toolkit for Speech to Text Translation Thomas Zenkel, - PowerPoint PPT Presentation

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Müller, Ngoc- Quan Pham, Sebastian Stüker, Alex Waibel Institute for Antrophomatics KIT – University of the State of Baden-Wuerttemberg and www.kit.edu National Research Center of the Helmholtz Association

Motivation • Speech translation interesting challenge • Neural models • End-to-End models • Provide a baseline • Cascade of several models • Easy to extend • Develop models for part • Easy to use • Download pretrained models 2 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Cascade Spoken Language Translation • Serial combination of several models • ASR • Audio → Text • Segmentation • Add case information • Add punctuation information • Machine translation • Source language → target language 3 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

CTC-based ASR • Input: • 40 dimensional Mel-filterbank features • Output: • Byte-pair units (300 or 10000) • Model: • 4-layer Bi-LSTM • Softmax layer • Trained using CTC loss function 4 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Encoder-Decoder Based ASR • XNMT-based implementation • Input: • 40 dimensional Mel-filterbank features • Encoder: • 4-layer bidirectional pyramidal encoder • Decoder: • One-layer bidirectional decoder 5 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Segmentation and Punctuation • Monolingual machine translation system • Add punctuation and case • Example: • Input: • i felt wor@@ se why i wro@@ te a who@@ le book • Output: • U L L. U? U L L L L • I felt worse. Why? I wrote a whole book • Preprocessing: • Randomly split training data and remove punctuation information • OpenNMT-based model 6 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Machine Translation • OpenNMT-based model • RNN-based Encoder and Decoder • Preprocessing: • Tokenizer • Byte-pair encoding Mid-size model: • • Pre-training on all data • Adaptation to in-domain data using continue training 7 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Data • Scripts to download and preprocess default data • Audio: • TED LIUM corpus • Text: • Small model: • WIT corpus • Midsize model: • EPPS corpus • WIT corpus 8 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Results • Evaluation tool to calculate 4 metrics provided • BLEU, TER, CharacTER, BEER • Automatic re-segmentation Model dev2010 tst2010 tst2013 tst2014 Attention 13.42 13.57 12.04 11.88 CTC 300 12.33 11.88 12.47 11.49 CTC 10K 13.04 13.44 13.41 12.58 Rover 13.98 14.08 13.73 13.23 9 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Conclusion • Combination of several toolkits to build full speech translation toolkit • Easy usage: • Dockerized • Applications • Apply pre-trained models • Train models using provided data (IWSLT) • Train models on own data • Link: • https://github.com/isl-mt/SLT.KIT 10 16.08.18 Jan Niehues - S2T Translation Institute for Anthropomatics

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, - PowerPoint PPT Presentation

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Mller, Ngoc- Quan Pham, Sebastian Stker, Alex Waibel Institute for Antrophomatics KIT University of the State of Baden-Wuerttemberg

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

WMCTC 2019 - 2020 Budget Presentation for the Pottsgrove Board of Directors April 9, 2019

2019-2020 Constance M. Carroll, Ph.D. Chancellor District Overview 2 California Community

Specificity of exploitation and maintenance of electric busses in CTC Belgrade eljko

The Career and T echnology Center Our mission is to help all students be successful in both

FUTURE OF WORK: Washingtons People & Businesses Nate Humphrey Workforce Education

DOING BUSINESS IN SUDAN A PRIVATE SECTOR PERSPECTIVE December 2017 Sudan | Diversely rich

polymeric lab on a chip devices Industrial PhD with ST Microelectronics Lecce (Tutor: Ing.

Re c o rde r s Offic e Busine ss L ic e nsing I nfo rma tio n Syste ms Co mmunitie s tha t

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, - PowerPoint PPT Presentation

Open Source Toolkit for Speech to Text Translation Thomas Zenkel, Matthias Sperber, Jan Niehues, Markus Mller, Ngoc- Quan Pham, Sebastian Stker, Alex Waibel Institute for Antrophomatics KIT University of the State of Baden-Wuerttemberg

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

WMCTC 2019 - 2020 Budget Presentation for the Pottsgrove Board of Directors April 9, 2019

2019-2020 Constance M. Carroll, Ph.D. Chancellor District Overview 2 California Community

Specificity of exploitation and maintenance of electric busses in CTC Belgrade eljko

The Career and T echnology Center Our mission is to help all students be successful in both

FUTURE OF WORK: Washingtons People &amp; Businesses Nate Humphrey Workforce Education

DOING BUSINESS IN SUDAN A PRIVATE SECTOR PERSPECTIVE December 2017 Sudan | Diversely rich

polymeric lab on a chip devices Industrial PhD with ST Microelectronics Lecce (Tutor: Ing.

Re c o rde r s Offic e Busine ss L ic e nsing I nfo rma tio n Syste ms Co mmunitie s tha t

FUTURE OF WORK: Washingtons People & Businesses Nate Humphrey Workforce Education