[PPT] - Speech Processing 15-492/18-492 Speech Translation Case study: PowerPoint Presentation

SLIDE 1

Speech Processing 15-492/18-492

Speech Translation Case study: Transtac Details

SLIDE 2

Transtac: Two S2S System

DARPA developed for

DARPA developed for

Check points, medical and civil defense

Check points, medical and civil defense

Requirements

Requirements

Two way

Two way

Eyes

Eyes-

free (no screen)

free (no screen)

Portable

Portable

Usable by real

Usable by real usersS usersS

SLIDE 3

Transtac System

Laptop secured in Backpack Optional speech control Push-to-Talk Buttons Close-talking Microphone Small powerful Speakers

SLIDE 4

Transtac System Details

Two way system

Two way system

2 ASR systems: English and Iraqi

2 ASR systems: English and Iraqi

2 way statistical translation

2 way statistical translation

2 synthesizers

2 synthesizers

Push

Push-

to

to-

talk system

talk system

(Users don’t like “translate everything mode”)

(Users don’t like “translate everything mode”)

Echo back ASR result

Echo back ASR result

And then translation

And then translation

SLIDE 5

Iraqi Language

Iraqi Arabic is a dialect

Iraqi Arabic is a dialect

Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s do not write their own dialect

Most Iraqi’s do not write their own dialect

No standardized spelling

No standardized spelling

Transtac

Transtac project invented one project invented one

But Iraqi’s may not be used to it

But Iraqi’s may not be used to it

Arabic (MSA and dialects)

Arabic (MSA and dialects)

Do not write short vowels in words

Do not write short vowels in words

SLIDE 6

Data for Training

Collected human mediated dialogs

Collected human mediated dialogs

Human acts as a machine

Human acts as a machine

Passed a microphone back an forward

Passed a microphone back an forward

Try to get people not to talk at same time

Try to get people not to talk at same time

Large number of collections (over 4 years)

Large number of collections (over 4 years)

650 thousand sentences pairs

650 thousand sentences pairs

Many different speakers

Many different speakers

Hand transcribed by experts (in Iraqi spelling)

Hand transcribed by experts (in Iraqi spelling)

Hand translate (Source sentences and Interpreter’s)

Hand translate (Source sentences and Interpreter’s)

SLIDE 7

Iraqi ASR

Acoustic model from Iraqi data

Acoustic model from Iraqi data

Based on MSA

Based on MSA phoneset phoneset

Needs to be small fast models

Needs to be small fast models

Discriminative Training

Discriminative Training

Speaker specific adaptation

Speaker specific adaptation

Lexicon

Lexicon

Based on LDC provided lexicon

Based on LDC provided lexicon

Multiple pronunciations/typos still a problem

Multiple pronunciations/typos still a problem

Statistically trained LTS rules

Statistically trained LTS rules

Language Model

Language Model

Trained on Iraqi input (and translated output)

Trained on Iraqi input (and translated output)

SLIDE 8

English ASR

Acoustic model

Acoustic model

Originally using other models

Originally using other models

Then trained from collected data

Then trained from collected data

(Mostly military personnel)

(Mostly military personnel)

Lexicon

Lexicon

Existing lexicon but needed to add Military speak:

Existing lexicon but needed to add Military speak: MRAP, IED MRAP, IED

Language model

Language model

Trained from data provided

Trained from data provided

Trained from “similar” data found on the web

Trained from “similar” data found on the web

Training from hand created “typical” examples

Training from hand created “typical” examples

SLIDE 9

TTS

Standard English TTS

Standard English TTS

Appropriate “command” voice

Appropriate “command” voice

Unit selection

Unit selection

Added lots of military vocabulary

Added lots of military vocabulary

Iraqi TTS

Iraqi TTS

Recorded from Iraqi radio announcer

Recorded from Iraqi radio announcer

Based on example sentences in the domain

Based on example sentences in the domain

LDC lexicon and LTS rules (same as ASR)

LDC lexicon and LTS rules (same as ASR)

Hand tuned

Hand tuned

SLIDE 10

S2S Interface Issues

How do you teach people to use the system

How do you teach people to use the system

“

“Transtac Transtac say instructions” say instructions”

Not really sufficient

Not really sufficient

How can you tell it translated correctly

How can you tell it translated correctly

Give (speech) feedback.

Give (speech) feedback.

  Backtranslation

Backtranslation

  ASR echo back

ASR echo back

SLIDE 11

S2S Interface Issues

How do you translate names

How do you translate names

A correct translation/transliteration is hard to

A correct translation/transliteration is hard to understand understand

Mark names in translations

Mark names in translations

“My name is … Abdullah”

“My name is … Abdullah”

“He lives on … al

“He lives on … al-

Aqar

Aqar … street” … street”

SLIDE 12

S2S Evaluation (Transtac)

Offline tests

Offline tests

ASR

ASR-

>Text and Text

>Text and Text-

>Text

>Text

Compare to translation references

Compare to translation references

WER and “BLEU” score

WER and “BLEU” score

Online tests

Online tests

Concept transfer (through defined scenarios)

Concept transfer (through defined scenarios)

Speed (number of concepts per minute)

Speed (number of concepts per minute)

(English speech masking)

(English speech masking)

Utility tests

Utility tests

Does it really work

Does it really work

SLIDE 13

Transtac Participants

Developer groups

Developer groups

IBM

IBM

SRI

SRI

BBN

BBN

CMU

CMU

USC

USC

Evaluations

Evaluations

Twice a year in Iraqi (somewhere in DC)

Twice a year in Iraqi (somewhere in DC)

One surprise language (Farsi,

One surprise language (Farsi, Bahasa Bahasa Malay) Malay)

Other evaluations with military groups

Other evaluations with military groups

SLIDE 14

Does it work??

Yes, mostly

Yes, mostly

27 concepts out of 30

27 concepts out of 30-

ish turns

ish turns

Systems are mostly similar

Systems are mostly similar

But some better than others

But some better than others

Other techniques

Other techniques

Belt/holster based PC with handheld speaker

Belt/holster based PC with handheld speaker

Small PC in pouch

Small PC in pouch

Chest mounted array microphone

Chest mounted array microphone

SLIDE 15

S2S ASR Advanced issues

Tight coupling

Tight coupling

ASR should output N

ASR should output N-

best

best

Translated all (lattice)

Translated all (lattice)

Choose best translation

Choose best translation

(MT as a LM for ASR)

(MT as a LM for ASR)

Remove

Remove disfluencies/hestitations disfluencies/hestitations

Add more relevant data

Add more relevant data

Automatically convert past tense/third person data to

Automatically convert past tense/third person data to present tense/ present tense/first+second first+second person … person …

SLIDE 16

S2S TTS Advance Issues

MT output isn’t

MT output isn’t gramtical gramtical

TTS doesn’t care and just says it

TTS doesn’t care and just says it

TTS should try to say MT output with more

TTS should try to say MT output with more breaks. breaks.

TTS (unit selection)

TTS (unit selection)

As a LM on MT output

As a LM on MT output

Choose the best translation on what is said best

Choose the best translation on what is said best

SLIDE 17