Speech Processing 15-492/18-492 Speech Translation Case study: - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Translation Case study: - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Translation Case study: Transtac Details Transtac: Two S2S System DARPA developed for DARPA developed for Check points, medical and civil defense Check points, medical and civil defense


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Translation Case study: Transtac Details

slide-2
SLIDE 2

Transtac: Two S2S System

  • DARPA developed for

DARPA developed for

  • Check points, medical and civil defense

Check points, medical and civil defense

  • Requirements

Requirements

  • Two way

Two way

  • Eyes

Eyes-

  • free (no screen)

free (no screen)

  • Portable

Portable

  • Usable by real

Usable by real usersS usersS

slide-3
SLIDE 3

Transtac System

Laptop secured in Backpack Optional speech control Push-to-Talk Buttons Close-talking Microphone Small powerful Speakers

slide-4
SLIDE 4

Transtac System Details

  • Two way system

Two way system

  • 2 ASR systems: English and Iraqi

2 ASR systems: English and Iraqi

  • 2 way statistical translation

2 way statistical translation

  • 2 synthesizers

2 synthesizers

  • Push

Push-

  • to

to-

  • talk system

talk system

  • (Users don’t like “translate everything mode”)

(Users don’t like “translate everything mode”)

  • Echo back ASR result

Echo back ASR result

  • And then translation

And then translation

slide-5
SLIDE 5

Iraqi Language

  • Iraqi Arabic is a dialect

Iraqi Arabic is a dialect

  • Most Iraqi’s write Modern Standard Arabic

Most Iraqi’s write Modern Standard Arabic

  • Most Iraqi’s do not write their own dialect

Most Iraqi’s do not write their own dialect

  • No standardized spelling

No standardized spelling

  • Transtac

Transtac project invented one project invented one

  • But Iraqi’s may not be used to it

But Iraqi’s may not be used to it

  • Arabic (MSA and dialects)

Arabic (MSA and dialects)

  • Do not write short vowels in words

Do not write short vowels in words

slide-6
SLIDE 6

Data for Training

  • Collected human mediated dialogs

Collected human mediated dialogs

  • Human acts as a machine

Human acts as a machine

  • Passed a microphone back an forward

Passed a microphone back an forward

  • Try to get people not to talk at same time

Try to get people not to talk at same time

  • Large number of collections (over 4 years)

Large number of collections (over 4 years)

  • 650 thousand sentences pairs

650 thousand sentences pairs

  • Many different speakers

Many different speakers

  • Hand transcribed by experts (in Iraqi spelling)

Hand transcribed by experts (in Iraqi spelling)

  • Hand translate (Source sentences and Interpreter’s)

Hand translate (Source sentences and Interpreter’s)

slide-7
SLIDE 7

Iraqi ASR

  • Acoustic model from Iraqi data

Acoustic model from Iraqi data

  • Based on MSA

Based on MSA phoneset phoneset

  • Needs to be small fast models

Needs to be small fast models

  • Discriminative Training

Discriminative Training

  • Speaker specific adaptation

Speaker specific adaptation

  • Lexicon

Lexicon

  • Based on LDC provided lexicon

Based on LDC provided lexicon

  • Multiple pronunciations/typos still a problem

Multiple pronunciations/typos still a problem

  • Statistically trained LTS rules

Statistically trained LTS rules

  • Language Model

Language Model

  • Trained on Iraqi input (and translated output)

Trained on Iraqi input (and translated output)

slide-8
SLIDE 8

English ASR

  • Acoustic model

Acoustic model

  • Originally using other models

Originally using other models

  • Then trained from collected data

Then trained from collected data

  • (Mostly military personnel)

(Mostly military personnel)

  • Lexicon

Lexicon

  • Existing lexicon but needed to add Military speak:

Existing lexicon but needed to add Military speak: MRAP, IED MRAP, IED

  • Language model

Language model

  • Trained from data provided

Trained from data provided

  • Trained from “similar” data found on the web

Trained from “similar” data found on the web

  • Training from hand created “typical” examples

Training from hand created “typical” examples

slide-9
SLIDE 9

TTS

  • Standard English TTS

Standard English TTS

  • Appropriate “command” voice

Appropriate “command” voice

  • Unit selection

Unit selection

  • Added lots of military vocabulary

Added lots of military vocabulary

  • Iraqi TTS

Iraqi TTS

  • Recorded from Iraqi radio announcer

Recorded from Iraqi radio announcer

  • Based on example sentences in the domain

Based on example sentences in the domain

  • LDC lexicon and LTS rules (same as ASR)

LDC lexicon and LTS rules (same as ASR)

  • Hand tuned

Hand tuned

slide-10
SLIDE 10

S2S Interface Issues

  • How do you teach people to use the system

How do you teach people to use the system

“Transtac Transtac say instructions” say instructions”

  • Not really sufficient

Not really sufficient

  • How can you tell it translated correctly

How can you tell it translated correctly

  • Give (speech) feedback.

Give (speech) feedback.

  Backtranslation

Backtranslation

  ASR echo back

ASR echo back

slide-11
SLIDE 11

S2S Interface Issues

  • How do you translate names

How do you translate names

  • A correct translation/transliteration is hard to

A correct translation/transliteration is hard to understand understand

  • Mark names in translations

Mark names in translations

  • “My name is … Abdullah”

“My name is … Abdullah”

  • “He lives on … al

“He lives on … al-

  • Aqar

Aqar … street” … street”

slide-12
SLIDE 12

S2S Evaluation (Transtac)

  • Offline tests

Offline tests

  • ASR

ASR-

  • >Text and Text

>Text and Text-

  • >Text

>Text

  • Compare to translation references

Compare to translation references

  • WER and “BLEU” score

WER and “BLEU” score

  • Online tests

Online tests

  • Concept transfer (through defined scenarios)

Concept transfer (through defined scenarios)

  • Speed (number of concepts per minute)

Speed (number of concepts per minute)

  • (English speech masking)

(English speech masking)

  • Utility tests

Utility tests

  • Does it really work

Does it really work

slide-13
SLIDE 13

Transtac Participants

  • Developer groups

Developer groups

  • IBM

IBM

  • SRI

SRI

  • BBN

BBN

  • CMU

CMU

  • USC

USC

  • Evaluations

Evaluations

  • Twice a year in Iraqi (somewhere in DC)

Twice a year in Iraqi (somewhere in DC)

  • One surprise language (Farsi,

One surprise language (Farsi, Bahasa Bahasa Malay) Malay)

  • Other evaluations with military groups

Other evaluations with military groups

slide-14
SLIDE 14

Does it work??

  • Yes, mostly

Yes, mostly

  • 27 concepts out of 30

27 concepts out of 30-

  • ish turns

ish turns

  • Systems are mostly similar

Systems are mostly similar

  • But some better than others

But some better than others

  • Other techniques

Other techniques

  • Belt/holster based PC with handheld speaker

Belt/holster based PC with handheld speaker

  • Small PC in pouch

Small PC in pouch

  • Chest mounted array microphone

Chest mounted array microphone

slide-15
SLIDE 15

S2S ASR Advanced issues

  • Tight coupling

Tight coupling

  • ASR should output N

ASR should output N-

  • best

best

  • Translated all (lattice)

Translated all (lattice)

  • Choose best translation

Choose best translation

  • (MT as a LM for ASR)

(MT as a LM for ASR)

  • Remove

Remove disfluencies/hestitations disfluencies/hestitations

  • Add more relevant data

Add more relevant data

  • Automatically convert past tense/third person data to

Automatically convert past tense/third person data to present tense/ present tense/first+second first+second person … person …

slide-16
SLIDE 16

S2S TTS Advance Issues

  • MT output isn’t

MT output isn’t gramtical gramtical

  • TTS doesn’t care and just says it

TTS doesn’t care and just says it

  • TTS should try to say MT output with more

TTS should try to say MT output with more breaks. breaks.

  • TTS (unit selection)

TTS (unit selection)

  • As a LM on MT output

As a LM on MT output

  • Choose the best translation on what is said best

Choose the best translation on what is said best

slide-17
SLIDE 17