speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Translation Speech - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System configurations


  1. Speech Processing 15-492/18-492 Speech Translation

  2. Speech Translation Three part systems � Three part systems � � ASR ASR - -> Translation > Translation - -> TTS > TTS � System configurations � System configurations � � One way One way – – phrasal phrasal � � One way One way – – broadcast/lecture broadcast/lecture � � 1.5 way 1.5 way – – phrasal with limited answers phrasal with limited answers � � Two way Two way – – full two way full two way �

  3. Machine Translation Technologies � Phrasal Phrasal � � Phrase to phrase look up Phrase to phrase look up � � Template: Template: � � Template fillers, fixed translation Template fillers, fixed translation � � Interlingua Interlingua � � Translation into meaning representation Translation into meaning representation � � Statistical Machine Translation Statistical Machine Translation � � From large collect of parallel text From large collect of parallel text � � Classification base translation Classification base translation � � Identify classes and deal directly with them Identify classes and deal directly with them �

  4. Choices in Translation Choose any two … � Choose any two … � � High accuracy High accuracy � � Large vocabulary Large vocabulary � � Fully automatic Fully automatic � Speech vs vs Text Text � Speech � � Speech less clear than text Speech less clear than text � � Less speech to train from Less speech to train from � � Needs to be real Needs to be real- -time (probably) time (probably) �

  5. Simple Translation Phrase to Phrase � Phrase to Phrase � � Greetings Greetings � � Do you need medical attention? Do you need medical attention? � � Relatively easy to build, but limited use Relatively easy to build, but limited use � Template translations � Template translations � � The next train leaves at TIME from gate The next train leaves at TIME from gate GATE GATE � form PLACE form PLACE � Limited but still useful Limited but still useful �

  6. Interlingua Translate sentences into standard form � Translate sentences into standard form � Generate sentences from standard form � Generate sentences from standard form � PROS: � PROS: � � Can do multiple languages easily Can do multiple languages easily � � Can be very accurate Can be very accurate � CONS � CONS � � Designing universal interlingua is very hard Designing universal interlingua is very hard � � Doesn’t do well when out of domain Doesn’t do well when out of domain �

  7. Statistical Machine Translation Build probabilistic models from parallel text � Build probabilistic models from parallel text � Parallel text often available from � Parallel text often available from � � Bilingual organizations Bilingual organizations �  Governments, UN Governments, UN  � Relatively easy to collect Relatively easy to collect �  Requires translators rather than MT experts Requires translators rather than MT experts 

  8. Learning from Parallel Text

  9. Learning from Parallel Text

  10. Statistical Machine Translation PROS � PROS � � Data collection doesn’t require MT experts Data collection doesn’t require MT experts � � Data driven Data driven � � Degrades gracefully when out of domain Degrades gracefully when out of domain � CONS � CONS � � Needs all language pairs Needs all language pairs � � Needs good/lots of data Needs good/lots of data � � Hard to fix specific errors Hard to fix specific errors �

  11. SPEECH Translation Speech isn’t text � Speech isn’t text � � Different style, hard to find lots of Different style, hard to find lots of exaples exaples � Speech isn’t fluent � Speech isn’t fluent � � False starts, hesitations, ungrammatical False starts, hesitations, ungrammatical � ☺ ASR never makes errors ☺ � ASR never makes errors �

  12. One Way: Broadcast One speaker � One speaker � � Lecturer: can modify language model Lecturer: can modify language model � Multiple speakers � Multiple speakers � � May be repeat speakers (News Anchor) May be repeat speakers (News Anchor) � � May had other noises: music etc May had other noises: music etc � � (TV programs) (TV programs) � Doesn’t need to be real time (maybe) � Doesn’t need to be real time (maybe) �

  13. Two Way: Dialog Users can detect own errors and correct � Users can detect own errors and correct � Needs to be real time � Needs to be real time � One user may be much more familiar � One user may be much more familiar � How do you teach the other user � How do you teach the other user � Typically domain directed � Typically domain directed �

  14. Speech Technology Issues ASR: � ASR: � � Disfluencies Disfluencies, dialects, speaking style , dialects, speaking style � � Unfamiliarity with system Unfamiliarity with system � TTS: � TTS: � � MT output isn’t always fluent MT output isn’t always fluent � � TTS says it anyway TTS says it anyway � � Can be hard to understand Can be hard to understand �

  15. Speech Technology Issues Spoken not Written Languages � Spoken not Written Languages � � Arabic Arabic vs vs Arabic Dialects Arabic Dialects � � Mixture of languages Mixture of languages � � Politeness levels Politeness levels � � Gender in speech Gender in speech �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend