Speech Processing 15-492/18-492 Speech Translation Speech - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation

Speech Translation Three part systems � Three part systems � � ASR ASR - -> Translation > Translation - -> TTS > TTS � System configurations � System configurations � � One way One way – – phrasal phrasal � � One way One way – – broadcast/lecture broadcast/lecture � � 1.5 way 1.5 way – – phrasal with limited answers phrasal with limited answers � � Two way Two way – – full two way full two way �

Machine Translation Technologies � Phrasal Phrasal � � Phrase to phrase look up Phrase to phrase look up � � Template: Template: � � Template fillers, fixed translation Template fillers, fixed translation � � Interlingua Interlingua � � Translation into meaning representation Translation into meaning representation � � Statistical Machine Translation Statistical Machine Translation � � From large collect of parallel text From large collect of parallel text � � Classification base translation Classification base translation � � Identify classes and deal directly with them Identify classes and deal directly with them �

Choices in Translation Choose any two … � Choose any two … � � High accuracy High accuracy � � Large vocabulary Large vocabulary � � Fully automatic Fully automatic � Speech vs vs Text Text � Speech � � Speech less clear than text Speech less clear than text � � Less speech to train from Less speech to train from � � Needs to be real Needs to be real- -time (probably) time (probably) �

Simple Translation Phrase to Phrase � Phrase to Phrase � � Greetings Greetings � � Do you need medical attention? Do you need medical attention? � � Relatively easy to build, but limited use Relatively easy to build, but limited use � Template translations � Template translations � � The next train leaves at TIME from gate The next train leaves at TIME from gate GATE GATE � form PLACE form PLACE � Limited but still useful Limited but still useful �

Interlingua Translate sentences into standard form � Translate sentences into standard form � Generate sentences from standard form � Generate sentences from standard form � PROS: � PROS: � � Can do multiple languages easily Can do multiple languages easily � � Can be very accurate Can be very accurate � CONS � CONS � � Designing universal interlingua is very hard Designing universal interlingua is very hard � � Doesn’t do well when out of domain Doesn’t do well when out of domain �

Statistical Machine Translation Build probabilistic models from parallel text � Build probabilistic models from parallel text � Parallel text often available from � Parallel text often available from � � Bilingual organizations Bilingual organizations �  Governments, UN Governments, UN  � Relatively easy to collect Relatively easy to collect �  Requires translators rather than MT experts Requires translators rather than MT experts 

Learning from Parallel Text

Statistical Machine Translation PROS � PROS � � Data collection doesn’t require MT experts Data collection doesn’t require MT experts � � Data driven Data driven � � Degrades gracefully when out of domain Degrades gracefully when out of domain � CONS � CONS � � Needs all language pairs Needs all language pairs � � Needs good/lots of data Needs good/lots of data � � Hard to fix specific errors Hard to fix specific errors �

SPEECH Translation Speech isn’t text � Speech isn’t text � � Different style, hard to find lots of Different style, hard to find lots of exaples exaples � Speech isn’t fluent � Speech isn’t fluent � � False starts, hesitations, ungrammatical False starts, hesitations, ungrammatical � ☺ ASR never makes errors ☺ � ASR never makes errors �

One Way: Broadcast One speaker � One speaker � � Lecturer: can modify language model Lecturer: can modify language model � Multiple speakers � Multiple speakers � � May be repeat speakers (News Anchor) May be repeat speakers (News Anchor) � � May had other noises: music etc May had other noises: music etc � � (TV programs) (TV programs) � Doesn’t need to be real time (maybe) � Doesn’t need to be real time (maybe) �

Two Way: Dialog Users can detect own errors and correct � Users can detect own errors and correct � Needs to be real time � Needs to be real time � One user may be much more familiar � One user may be much more familiar � How do you teach the other user � How do you teach the other user � Typically domain directed � Typically domain directed �

Speech Technology Issues ASR: � ASR: � � Disfluencies Disfluencies, dialects, speaking style , dialects, speaking style � � Unfamiliarity with system Unfamiliarity with system � TTS: � TTS: � � MT output isn’t always fluent MT output isn’t always fluent � � TTS says it anyway TTS says it anyway � � Can be hard to understand Can be hard to understand �

Speech Technology Issues Spoken not Written Languages � Spoken not Written Languages � � Arabic Arabic vs vs Arabic Dialects Arabic Dialects � � Mixture of languages Mixture of languages � � Politeness levels Politeness levels � � Gender in speech Gender in speech �

Speech Processing 15-492/18-492 Speech Translation Speech - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System configurations

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

How a MySQL DBA see Postgresql (and why their company should worry about) Marco Tusa Percona

World 2012 Web-based iOS Configuration Management Tim Bell Trinity College, University of

Evonne M. Silva | evonne@codeforamerica.org Government can work for the people by the people in

Graphical user interfaces (G (GUI) Tkinter Python shell Accumulator: 0

Anomal y detection D E SIG N IN G MAC H IN E L E AR N IN G W OR K FL OW S IN P YTH ON Dr .

From w orkflo w s to pipelines D E SIG N IN G MAC H IN E L E AR N IN G W OR K FL OW S IN P

15-388/688 - Practical Data Science: Linear classification J. Zico Kolter Carnegie Mellon

Nonlinear Control Lecture # 26 State Feedback Stabilization Nonlinear Control Lecture # 26 State

Sambuz

Useful Links

Newsletter

Mail Us