11-731 Machine Translation Speech 2 Speech Translation Speech - - PowerPoint PPT Presentation

11 731 machine translation
SMART_READER_LITE
LIVE PREVIEW

11-731 Machine Translation Speech 2 Speech Translation Speech - - PowerPoint PPT Presentation

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems Three part systems ASR ASR - -> Translation > Translation - -> TTS > TTS System configurations System


slide-1
SLIDE 1

11-731 Machine Translation

Speech 2 Speech Translation

slide-2
SLIDE 2

Speech Translation

  • Three part systems

Three part systems

  • ASR

ASR -

  • > Translation

> Translation -

  • > TTS

> TTS

  • System configurations

System configurations

  • One way

One way – – phrasal phrasal

  • One way

One way – – broadcast/lecture broadcast/lecture

  • 1.5 way

1.5 way – – phrasal with limited answers phrasal with limited answers

  • Two way

Two way – – full two way full two way

slide-3
SLIDE 3

Machine Translation Technologies

  • Phrasal

Phrasal

  • Phrase to phrase look up

Phrase to phrase look up

  • Template:

Template:

  • Template fillers, fixed translation

Template fillers, fixed translation

  • Interlingua

Interlingua

  • Translation into meaning representation

Translation into meaning representation

  • Statistical Machine Translation

Statistical Machine Translation

  • From large collect of parallel text

From large collect of parallel text

  • Classification base translation

Classification base translation

  • Identify classes and deal directly with them

Identify classes and deal directly with them

slide-4
SLIDE 4

Simple Translation

  • Phrase to Phrase

Phrase to Phrase

  • Greetings

Greetings

  • Do you need medical attention?

Do you need medical attention?

  • Relatively easy to build, but limited use

Relatively easy to build, but limited use

  • Template translations

Template translations

  • The next train leaves at TIME from gate

The next train leaves at TIME from gate GATE GATE form PLACE form PLACE

  • Limited but still useful

Limited but still useful

slide-5
SLIDE 5

SPEECH Translation

  • Speech isn’t text

Speech isn’t text

  • Different style, hard to find lots of

Different style, hard to find lots of exaples exaples

  • Speech isn’t fluent

Speech isn’t fluent

  • False starts, hesitations, ungrammatical

False starts, hesitations, ungrammatical

  • ASR never makes errors

ASR never makes errors ☺ ☺

slide-6
SLIDE 6

One Way: Broadcast

  • One speaker

One speaker

  • Lecturer: can modify language model

Lecturer: can modify language model

  • Multiple speakers

Multiple speakers

  • May be repeat speakers (News Anchor)

May be repeat speakers (News Anchor)

  • May had other noises: music etc

May had other noises: music etc

  • (TV programs)

(TV programs)

  • Doesn’t need to be real time (maybe)

Doesn’t need to be real time (maybe)

slide-7
SLIDE 7

One Way: “Dialogue”

  • Voxtec’s Phraselator

– One way communication – Recognized “fixed” phrases – Lookup for translations – *Very* fast deployment for new languages.

slide-8
SLIDE 8

Two Way: Dialog

  • Users can detect own errors and correct

Users can detect own errors and correct

  • Needs to be real time

Needs to be real time

  • One user may be much more familiar

One user may be much more familiar

  • How do you teach the other user

How do you teach the other user

  • Typically domain directed

Typically domain directed

slide-9
SLIDE 9

Two way: Dialog

CMU System: Janus PDA version CMU SMT Cepstral Synthesis Mobile Tech models Platform: COTS PDA (Ipaq) VoxTec P2 Language: Iraqi/English, Thai/English Chinese, Japanese etc

slide-10
SLIDE 10

Speech Technology Issues

  • ASR:

ASR:

  • Disfluencies

Disfluencies, dialects, speaking style , dialects, speaking style

  • Unfamiliarity with system

Unfamiliarity with system

  • TTS:

TTS:

  • MT output isn’t always fluent

MT output isn’t always fluent

  • TTS says it anyway

TTS says it anyway

  • Can be hard to understand

Can be hard to understand

slide-11
SLIDE 11

Speech Technology Issues

  • Spoken not Written Languages

Spoken not Written Languages

  • Arabic

Arabic vs vs Arabic Dialects Arabic Dialects

  • Mixture of languages

Mixture of languages

  • Politeness levels

Politeness levels

  • Gender in speech

Gender in speech

slide-12
SLIDE 12

Transtac: Two S2S System

  • DARPA developed for

DARPA developed for

  • Check points, medical and civil defense

Check points, medical and civil defense

  • Requirements

Requirements

  • Two way

Two way

  • Eyes

Eyes-

  • free (no screen)

free (no screen)

  • Portable

Portable

  • Usable by real

Usable by real usersS usersS

slide-13
SLIDE 13

Transtac System

Laptop secured in Backpack Optional speech control Push-to-Talk Buttons Close-talking Microphone Small powerful Speakers

slide-14
SLIDE 14

Transtac System Details

  • Two way system

Two way system

  • 2 ASR systems: English and Iraqi

2 ASR systems: English and Iraqi

  • 2 way statistical translation

2 way statistical translation

  • 2 synthesizers

2 synthesizers

  • Push

Push-

  • to

to-

  • talk system

talk system

  • (Users don’t like “translate everything mode”)

(Users don’t like “translate everything mode”)

  • Echo back ASR result

Echo back ASR result

  • And then translation

And then translation

slide-15
SLIDE 15

Iraqi Language

  • Iraqi Arabic is a dialect

Iraqi Arabic is a dialect

  • Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s write Modern Standard Arabic

  • Most Iraqi’s do not write their own dialect

Most Iraqi’s do not write their own dialect

  • No standardized spelling

No standardized spelling

  • Transtac

Transtac project invented one project invented one

  • But Iraqi’s may not be used to it

But Iraqi’s may not be used to it

  • Arabic (MSA and dialects)

Arabic (MSA and dialects)

  • Do not write short vowels in words

Do not write short vowels in words

slide-16
SLIDE 16

Data for Training

  • Collected human mediated dialogs

Collected human mediated dialogs

  • Human acts as a machine

Human acts as a machine

  • Passed a microphone back an forward

Passed a microphone back an forward

  • Try to get people not to talk at same time

Try to get people not to talk at same time

  • Large number of collections (over 4 years)

Large number of collections (over 4 years)

  • 650 thousand sentences pairs

650 thousand sentences pairs

  • Many different speakers

Many different speakers

  • Hand transcribed by experts (in Iraqi spelling)

Hand transcribed by experts (in Iraqi spelling)

  • Hand translate (Source sentences and Interpreter’s)

Hand translate (Source sentences and Interpreter’s)

slide-17
SLIDE 17

Iraqi ASR

  • Acoustic model from Iraqi data

Acoustic model from Iraqi data

  • Based on MSA

Based on MSA phoneset phoneset

  • Needs to be small fast models

Needs to be small fast models

  • Discriminative Training

Discriminative Training

  • Speaker specific adaptation

Speaker specific adaptation

  • Lexicon

Lexicon

  • Based on LDC provided lexicon

Based on LDC provided lexicon

  • Multiple pronunciations/typos still a problem

Multiple pronunciations/typos still a problem

  • Statistically trained LTS rules

Statistically trained LTS rules

  • Language Model

Language Model

  • Trained on Iraqi input (and translated output)

Trained on Iraqi input (and translated output)

slide-18
SLIDE 18

English ASR

  • Acoustic model

Acoustic model

  • Originally using other models

Originally using other models

  • Then trained from collected data

Then trained from collected data

  • (Mostly military personnel)

(Mostly military personnel)

  • Lexicon

Lexicon

  • Existing lexicon but needed to add Military speak:

Existing lexicon but needed to add Military speak: MRAP, IED MRAP, IED

  • Language model

Language model

  • Trained from data provided

Trained from data provided

  • Trained from “similar” data found on the web

Trained from “similar” data found on the web

  • Training from hand created “typical” examples

Training from hand created “typical” examples

slide-19
SLIDE 19

TTS

  • Standard English TTS

Standard English TTS

  • Appropriate “command” voice

Appropriate “command” voice

  • Unit selection

Unit selection

  • Added lots of military vocabulary

Added lots of military vocabulary

  • Iraqi TTS

Iraqi TTS

  • Recorded from Iraqi radio announcer

Recorded from Iraqi radio announcer

  • Based on example sentences in the domain

Based on example sentences in the domain

  • LDC lexicon and LTS rules (same as ASR)

LDC lexicon and LTS rules (same as ASR)

  • Hand tuned

Hand tuned

slide-20
SLIDE 20

S2S Interface Issues

  • How do you teach people to use the system

How do you teach people to use the system

“Transtac Transtac say instructions” say instructions”

  • Not really sufficient

Not really sufficient

  • How can you tell it translated correctly

How can you tell it translated correctly

  • Give (speech) feedback.

Give (speech) feedback.

  Backtranslation

Backtranslation

  ASR echo back

ASR echo back

slide-21
SLIDE 21

S2S Interface Issues

  • How do you translate names

How do you translate names

  • A correct translation/transliteration is hard to

A correct translation/transliteration is hard to understand understand

  • Mark names in translations

Mark names in translations

  • “My name is … Abdullah”

“My name is … Abdullah”

  • “He lives on … al

“He lives on … al-

  • Aqar

Aqar … street” … street”

slide-22
SLIDE 22

S2S Evaluation (Transtac)

  • Offline tests

Offline tests

  • ASR

ASR-

  • >Text and Text

>Text and Text-

  • >Text

>Text

  • Compare to translation references

Compare to translation references

  • WER and “BLEU” score

WER and “BLEU” score

  • Online tests

Online tests

  • Concept transfer (through defined scenarios)

Concept transfer (through defined scenarios)

  • Speed (number of concepts per minute)

Speed (number of concepts per minute)

  • (English speech masking)

(English speech masking)

  • Utility tests

Utility tests

  • Does it really work

Does it really work

slide-23
SLIDE 23

Transtac Participants

  • Developer groups

Developer groups

  • IBM

IBM

  • SRI

SRI

  • BBN

BBN

  • CMU

CMU

  • USC

USC

  • Evaluations

Evaluations

  • Twice a year in Iraqi (somewhere in DC)

Twice a year in Iraqi (somewhere in DC)

  • One surprise language (Farsi,

One surprise language (Farsi, Bahasa Bahasa Malay) Malay)

  • Other evaluations with military groups

Other evaluations with military groups

slide-24
SLIDE 24

Does it work??

  • Yes, mostly

Yes, mostly

  • 27 concepts out of 30

27 concepts out of 30-

  • ish turns

ish turns

  • Systems are mostly similar

Systems are mostly similar

  • But some better than others

But some better than others

  • Other techniques

Other techniques

  • Belt/holster based PC with handheld speaker

Belt/holster based PC with handheld speaker

  • Small PC in pouch

Small PC in pouch

  • Chest mounted array microphone

Chest mounted array microphone

slide-25
SLIDE 25

S2S ASR Advanced issues

  • Tight coupling

Tight coupling

  • ASR should output N

ASR should output N-

  • best

best

  • Translated all (lattice)

Translated all (lattice)

  • Choose best translation

Choose best translation

  • (MT as a LM for ASR)

(MT as a LM for ASR)

  • Remove

Remove disfluencies/hestitations disfluencies/hestitations

  • Add more relevant data

Add more relevant data

  • Automatically convert past tense/third person data to

Automatically convert past tense/third person data to present tense/ present tense/first+second first+second person … person …

slide-26
SLIDE 26

S2S TTS Advance Issues

  • MT output isn’t grammatical

MT output isn’t grammatical

  • TTS doesn’t care and just says it

TTS doesn’t care and just says it

  • TTS should try to say MT output with more

TTS should try to say MT output with more breaks. breaks.

  • TTS (unit selection)

TTS (unit selection)

  • As a LM on MT output

As a LM on MT output

  • Choose the best translation on what is said best

Choose the best translation on what is said best

slide-27
SLIDE 27