Speech Processing 15-492/18-492 Speech Translation Speech - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Translation Speech - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System configurations


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Translation

slide-2
SLIDE 2

Speech Translation

  • Three part systems

Three part systems

  • ASR

ASR -

  • > Translation

> Translation -

  • > TTS

> TTS

  • System configurations

System configurations

  • One way

One way – – phrasal phrasal

  • One way

One way – – broadcast/lecture broadcast/lecture

  • 1.5 way

1.5 way – – phrasal with limited answers phrasal with limited answers

  • Two way

Two way – – full two way full two way

slide-3
SLIDE 3

Machine Translation Technologies

  • Phrasal

Phrasal

  • Phrase to phrase look up

Phrase to phrase look up

  • Template:

Template:

  • Template fillers, fixed translation

Template fillers, fixed translation

  • Interlingua

Interlingua

  • Translation into meaning representation

Translation into meaning representation

  • Statistical Machine Translation

Statistical Machine Translation

  • From large collect of parallel text

From large collect of parallel text

  • Classification base translation

Classification base translation

  • Identify classes and deal directly with them

Identify classes and deal directly with them

slide-4
SLIDE 4

Choices in Translation

  • Choose any two …

Choose any two …

  • High accuracy

High accuracy

  • Large vocabulary

Large vocabulary

  • Fully automatic

Fully automatic

  • Speech

Speech vs vs Text Text

  • Speech less clear than text

Speech less clear than text

  • Less speech to train from

Less speech to train from

  • Needs to be real

Needs to be real-

  • time (probably)

time (probably)

slide-5
SLIDE 5

Simple Translation

  • Phrase to Phrase

Phrase to Phrase

  • Greetings

Greetings

  • Do you need medical attention?

Do you need medical attention?

  • Relatively easy to build, but limited use

Relatively easy to build, but limited use

  • Template translations

Template translations

  • The next train leaves at TIME from gate

The next train leaves at TIME from gate GATE GATE form PLACE form PLACE

  • Limited but still useful

Limited but still useful

slide-6
SLIDE 6

Interlingua

  • Translate sentences into standard form

Translate sentences into standard form

  • Generate sentences from standard form

Generate sentences from standard form

  • PROS:

PROS:

  • Can do multiple languages easily

Can do multiple languages easily

  • Can be very accurate

Can be very accurate

  • CONS

CONS

  • Designing universal interlingua is very hard

Designing universal interlingua is very hard

  • Doesn’t do well when out of domain

Doesn’t do well when out of domain

slide-7
SLIDE 7

Statistical Machine Translation

  • Build probabilistic models from parallel text

Build probabilistic models from parallel text

  • Parallel text often available from

Parallel text often available from

  • Bilingual organizations

Bilingual organizations

  Governments, UN

Governments, UN

  • Relatively easy to collect

Relatively easy to collect

  Requires translators rather than MT experts

Requires translators rather than MT experts

slide-8
SLIDE 8

Learning from Parallel Text

slide-9
SLIDE 9

Learning from Parallel Text

slide-10
SLIDE 10

Statistical Machine Translation

  • PROS

PROS

  • Data collection doesn’t require MT experts

Data collection doesn’t require MT experts

  • Data driven

Data driven

  • Degrades gracefully when out of domain

Degrades gracefully when out of domain

  • CONS

CONS

  • Needs all language pairs

Needs all language pairs

  • Needs good/lots of data

Needs good/lots of data

  • Hard to fix specific errors

Hard to fix specific errors

slide-11
SLIDE 11

SPEECH Translation

  • Speech isn’t text

Speech isn’t text

  • Different style, hard to find lots of

Different style, hard to find lots of exaples exaples

  • Speech isn’t fluent

Speech isn’t fluent

  • False starts, hesitations, ungrammatical

False starts, hesitations, ungrammatical

  • ASR never makes errors

ASR never makes errors ☺ ☺

slide-12
SLIDE 12

One Way: Broadcast

  • One speaker

One speaker

  • Lecturer: can modify language model

Lecturer: can modify language model

  • Multiple speakers

Multiple speakers

  • May be repeat speakers (News Anchor)

May be repeat speakers (News Anchor)

  • May had other noises: music etc

May had other noises: music etc

  • (TV programs)

(TV programs)

  • Doesn’t need to be real time (maybe)

Doesn’t need to be real time (maybe)

slide-13
SLIDE 13

Two Way: Dialog

  • Users can detect own errors and correct

Users can detect own errors and correct

  • Needs to be real time

Needs to be real time

  • One user may be much more familiar

One user may be much more familiar

  • How do you teach the other user

How do you teach the other user

  • Typically domain directed

Typically domain directed

slide-14
SLIDE 14

Speech Technology Issues

  • ASR:

ASR:

  • Disfluencies

Disfluencies, dialects, speaking style , dialects, speaking style

  • Unfamiliarity with system

Unfamiliarity with system

  • TTS:

TTS:

  • MT output isn’t always fluent

MT output isn’t always fluent

  • TTS says it anyway

TTS says it anyway

  • Can be hard to understand

Can be hard to understand

slide-15
SLIDE 15

Speech Technology Issues

  • Spoken not Written Languages

Spoken not Written Languages

  • Arabic

Arabic vs vs Arabic Dialects Arabic Dialects

  • Mixture of languages

Mixture of languages

  • Politeness levels

Politeness levels

  • Gender in speech

Gender in speech

slide-16
SLIDE 16