SDS Aplications - Speech-to-speech translation - Anca Burducea May - PowerPoint PPT Presentation

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015

S2S Translation Three independent tasks: S s → T s → T t → S t S s = speech source T s = text source T t = text target S t = speech target

S2S Translation S s → T s = ASR ↓ T s → T t = MT T t → S t = TTS Wo ist das n¨ achste Hotel? S s = speech source ↓ T s = text source T t = text target Where is the nearest hotel? S t = speech target ↓

S2S Translation S s → T s = ASR – WER ↓ T s → T t = MT – BLEU T t → S t = TTS – subjective Wo ist das n¨ achste Hotel? listening tests ↓ S s = speech source T s = text source Where is the nearest hotel? T t = text target S t = speech target ↓

S2S Translation - Issues ◮ error propagation ◮ not using context in the downstream process

Annotations of Speech A lot of context annotation on speech ◮ dialog act (DA) tags ◮ semantic annotation ◮ pitch prominence ◮ emphasis ◮ contrast ◮ emotion ◮ speaker segmentation

Sridhar 2013 Enrich S2S translations using contextual information!

Sridhar 2013 Enrich S2S translations using contextual information! ◮ DA tags ◮ prosodic word prominence

Sridhar 2013 Enrich S2S translations using contextual information! ◮ DA tags ◮ prosodic word prominence Purpose: ◮ resolve ambiguities ◮ wir haben noch → we still have ◮ wir haben noch → we have another

Sridhar 2013 Enrich S2S translations using contextual information! ◮ DA tags ◮ prosodic word prominence Purpose: ◮ resolve ambiguities ◮ wir haben noch → we still have ◮ wir haben noch → we have another ◮ enrich target speech with prosody (intonation, emotion) from source speech

Sridhar 2013 S s = speech source T s = text source T t = text target S t = speech target L s = enriched source = text source + context labels L t = enriched target = text target + context labels

Sridhar 2013 Data ◮ train MaxEnt classifier for ◮ DA tagging: statement, acknowledgment, abandoned, agreement, question, appreciation, other – 82.9% ◮ prosodic prominence: accent, no-accent – 78.5%

Sridhar 2013 Data ◮ train MaxEnt classifier for ◮ DA tagging: statement, acknowledgment, abandoned, agreement, question, appreciation, other – 82.9% ◮ prosodic prominence: accent, no-accent – 78.5% ◮ tested on three parallel corpora: Farsi-English, Japanese-English, Chinese-English

Sridhar 2013 Improve translation model using source language enrichment: ◮ bag-of-words model ◮ ◮ reorder words according to target language model

Sridhar 2013 Improve translation model using source language enrichment: ◮ bag-of-words model ◮ ◮ reorder words according to target language model Improve translation model using target language enrichment ◮ factored model: word is translated into (word, pitch accent)

Sridhar 2013 - Results DA tags ◮ question(YN, WH, open), acknowledgement → significant improvement ◮ statement → no significant improvement

Sridhar 2013 - Results DA tags ◮ question(YN, WH, open), acknowledgement → significant improvement ◮ statement → no significant improvement Prosody ◮ improved prosodic accuracy of target speech ◮ lexical selection accuracy no affected (same BLEU)

Sridhar 2013 - Results DA tags ◮ question(YN, WH, open), acknowledgement → significant improvement ◮ statement → no significant improvement Prosody ◮ improved prosodic accuracy of target speech ◮ lexical selection accuracy no affected (same BLEU) Conclusion: ”the real benefits of such a scheme would be manifested through human evaluations. We are currently working on conducting subjective evaluations.”

VERBMOBIL ◮ German S2S system developed between 1993-2000 ◮ ”verbal communication with foreign interlocutors in mobile situations” ◮ ”Verbmobil is the first speech-only dialog translation system” ◮ bidirectional translations for German, English, Japanese ◮ business-oriented domains: 1. appointment scheduling 2. travel planning 3. remote PC maintenance

VERBMOBIL features ◮ context-sensitive translations e.g. GER nachste → ENG next (train) or nearest (hotel) ◮ prosody e.g. ”wir haben noch” vs. ”wir haben noch ” ◮ domain knowlege: it knows ”things about the topic being discussed” ◮ dialog memory: it knows ”things that were communicated earlier” ◮ disfluencies management: 1. filters out simple disfluencies (”ahh”, ”umm”) 2. remove reparandum

VERBMOBIL - Disambiguation

VERBMOBIL - Control Panel Demo: Link

SDS Aplications - Speech-to-speech translation - Anca Burducea May - PowerPoint PPT Presentation

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation Three independent tasks: S s T s T t S t S s = speech source T s = text source T t = text target S t = speech target S2S Translation S s

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

SDS@OSU 2020 PRESENTATION SUBMISSIONS Society for Disability Studies: SDS@disstudies.org,

OpenSDS An Indus try W ide Colla bora tion For SDS Ma na gement Cameron Bahar and Steven Tan

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

STRATEGIC ISSUES FOR US PNW TIMBERLANDS Jason Spadaro President SDS Lumber Company January 23,

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Classifying Stress Patterns by Cognitive Complexity James Rogers Dept. of Computer Science

python @ Strand python @ Strand Overview Strands avadis TM platform Used in several

CryptoManiac: A Fast Flexible Architecture for Secure Communication Lisa Wu, Chris Weaver, and

Adaptive FPGA-based Database Accelerators Achievements, Possibilities, and Challenges Daniel

Speech Processing for Speech Processing for Unwritten Languages Unwritten Languages Alan W

MDL-Based Models for Transliteration Generation . . . . . . 1 / 27 . . Etymon Project

Edge Computing for IoT Application Scenarios RESCOM Summer School, Le Croisic, France June 23,

Paxos! CSE 452 Slides from Lorenzo Alvisi, Doug Woos, Tom Anderson State machine replication Want