Technology for Video Translation Susanne Weber Language Technology - - PowerPoint PPT Presentation

technology for video translation
SMART_READER_LITE
LIVE PREVIEW

Technology for Video Translation Susanne Weber Language Technology - - PowerPoint PPT Presentation

The BBCs Virtual Voice - over tool ALTO: Technology for Video Translation Susanne Weber Language Technology Producer, BBC News Labs In this presentation. - Overview over the ALTO Pilot project - Machine Translation and Computer


slide-1
SLIDE 1

The BBC’s ‘Virtual Voice-over tool’ ALTO: Technology for Video Translation

Susanne Weber

Language Technology Producer, BBC News Labs

slide-2
SLIDE 2

In this presentation….

  • Overview over the ALTO Pilot project
  • Machine Translation and Computer Assisted Translation
  • Text to Speech synthesis
  • Users’ experience with this technology
  • Conclusions
slide-3
SLIDE 3

Production tool for the translation of News videos Collaboration between

  • News Labs
  • World Service
  • Global News
slide-4
SLIDE 4

Go to http://www.bbc.com/japanese/video_and_audio/today_in_video And http://www.bbc.com/russian/video_and_audio/today_in_video

slide-5
SLIDE 5

We experimented with 2 types of News Videos

  • Short clips without original narrator track
  • News Packages containing several voices
slide-6
SLIDE 6

How do we currently translate videos?

slide-7
SLIDE 7

Record Voice-over tracks Align Audio & Video Translate Script

Balance Audio Tracks

Edit Audio

Typical Workflow for Video Translation

slide-8
SLIDE 8
slide-9
SLIDE 9

Off-the-shelf products

slide-10
SLIDE 10

Computer-Assisted Translation

slide-11
SLIDE 11

Computer-Assisted Translation

How Good Is it???

slide-12
SLIDE 12

To put things into perspective…

  • ca. 7,000 languages in the world
  • Google Translate lists just over 100 languages
  • Most TTS providers have fewer than 30 languages
slide-13
SLIDE 13

High Resourced vs. Low Resourced Languages

  • MT quality depends on:
  • Language Pairs
  • Source Text

Our editors’ feedback:

  • CAT is still faster than translating from scratch
  • CAT is useful for proof-reading

Machine Translation – Computer Assisted Translation

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
  • It is difficult to get good quality voices – why is that?
  • Currently, we are dependent on a small number of companies
  • Why do some of them sound so natural, others don’t?
  • Why can’t we have them in all the languages?
slide-17
SLIDE 17

There are 2 common methods for voices synthesis: 1) Unit Selection 2) Statistical Parametric

slide-18
SLIDE 18

Scripts (phonemes etc) to generate utterances data: “blah … blah…”

Record Voice Pron Lexicon and word labels

Creating synthetic voices: Unit Selection

Utterance files

slide-19
SLIDE 19

Text-To-Speech Synthesis: Unit Selection

Input text Pron Lexicon Prosody, stress, duration

NLP: Produce linguistic specification

Select phonemes Concatenate waveforms Output (spoken text)

Overlap / crossfade

Utterance files

slide-20
SLIDE 20

Japanese:

Unit Selection – Audio Examples

slide-21
SLIDE 21
  • It sounds surprisingly natural

………what is “natural”? There is no objective measurement

  • f “naturalness” – it is subjective

……are accents “natural”? Scottish? Welsh? when they are human-like = “natural”

Unit Selection – User Feedback

slide-22
SLIDE 22

Unit Selection – Limitations

  • TTS voices are emotionally neutral
  • This is good for ‘regular’ news
  • Unsuitable for emotionally charged contents, e.g. when

voicing over victims of bomb attacks

  • We have no control over their emotional expression in

Unit Selection

slide-23
SLIDE 23

Pros / cons

Unit Selection – Phonetic performance control / Limitations

Spelling Audio (English, UK)

Angela Merkel Ang ella Markel Vladimir Putin Vladimeer Pootin Francois Hollande Francois O’Lond

slide-24
SLIDE 24

Excitation Parameter Extraction

Training of Models: Statistical Parametric (simplified)

Speech Database

Spectral Parameter Extraction

Speech Signal

Training of TTS models

Hidden Markov Models Text / Words: LABELS

slide-25
SLIDE 25

Voice Synthesis: Statistical Parametric (simplified)

Convert into Label Sequence Construct Utterances by concatenating Hidden Markov models

Synthesized Speech Hidden Markov Models

Generate Excitation Generate Spectral Parameter Context dependent

Input text

slide-26
SLIDE 26

Statistical parametric TTS – the good bits

  • It is flexible, because of its statistical modelling process
  • It allows expressive voices to be generated;
  • the emotional expression of voices can be controlled
  • Voices are easier to build, because it doesn’t need

large amounts of datasets

  • this is good for low-resourced languages
slide-27
SLIDE 27

Statistical parametric TTS – the sound Audio examples: Unit Selection HMM Japanese Japanese

Please go to this link:

http://www.ai-j.jp/

slide-28
SLIDE 28

Conclusion and Next Steps:

  • We need language data for low resourced languages:
  • For MT as well as TTS
  • We need more languages and voices to be available
  • We need expressive voices (e.g. a hybrid system)
  • Collaborate with research groups and universities
  • We want to tackle Graphics Translation
  • And integrate automated transcription