Speech Processing 15-492/18-492 Multilinguality Dealing with *all* - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Multilinguality Dealing with *all* - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Multilinguality Dealing with *all* Languages Over 6000 Languages Over 6000 Languages Maybe not all commercially interesting now Maybe not all commercially interesting now Major languages


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Multilinguality

slide-2
SLIDE 2

Dealing with *all* Languages

  • Over 6000 Languages

Over 6000 Languages

  • Maybe not all commercially interesting … now

Maybe not all commercially interesting … now

  • Major languages (economic)

Major languages (economic)

  • Cell phone manufacturers list 46 languages

Cell phone manufacturers list 46 languages

  • But even those not all covered

But even those not all covered

slide-3
SLIDE 3

What you need

  • ASR

ASR

  • Acoustic model (lots of speakers)

Acoustic model (lots of speakers)

  • Pronunciation Lexicon

Pronunciation Lexicon

  • Language model

Language model

  • TTS

TTS

  • Acoustic model (one speaker)

Acoustic model (one speaker)

  • Pronunciation Lexicon

Pronunciation Lexicon

  • Text analysis

Text analysis

slide-4
SLIDE 4

Writing Systems

  • Romanized writing systems

Romanized writing systems

  • Latin

Latin-

  • 1 (iso

1 (iso-

  • 8599

8599-

  • 1)

1)

  • Covers many Western Europeans languages

Covers many Western Europeans languages

  • Cyrillic

Cyrillic

  • Covers many Eastern European Languages

Covers many Eastern European Languages

  • Arabic Scripts

Arabic Scripts

  • Arabic(s

Arabic(s), Farsi, Urdu, etc ), Farsi, Urdu, etc

  • Devenagari

Devenagari

  • Covers many Northern India Languages

Covers many Northern India Languages

  • Chinese

Chinese Hanzi Hanzi

  • Covers some Chinese dialects but different versions

Covers some Chinese dialects but different versions

  • Many other scripts some non

Many other scripts some non-

  • standard

standard

slide-5
SLIDE 5

Writing Systems

  • Letter based

Letter based

  • Latin, Cyrillic

Latin, Cyrillic

  • Consonant based

Consonant based

  • Arabic, Hebrew

Arabic, Hebrew

  • Mora based

Mora based

  • Half syllable or syllable

Half syllable or syllable

  • Indian scripts, Japanese native scripts

Indian scripts, Japanese native scripts

  • Syllable based

Syllable based

  • Hangul, Chinese

Hangul, Chinese

slide-6
SLIDE 6

Standards

  • Writing standards

Writing standards

  • Taught at schools, newspapers, computer

Taught at schools, newspapers, computer support support

  • Typically standardized spelling

Typically standardized spelling

  • May be mostly spoken

May be mostly spoken

  • Occasionally written

Occasionally written

slide-7
SLIDE 7

Language Specific Issues

  • No explicit markings

No explicit markings

  • Stress, accent, tones

Stress, accent, tones

  • No word boundaries

No word boundaries

  • Chinese, Thai

Chinese, Thai

  • No (short) vowels

No (short) vowels

  • Arabic, Hebrew

Arabic, Hebrew

  • Rich morphology

Rich morphology

  • Many different words in the languages

Many different words in the languages

  • Finnish, Turkish, Greenlandic

Finnish, Turkish, Greenlandic

slide-8
SLIDE 8

Genre Specific Issues

  • No capitals, punctuations

No capitals, punctuations

  • Unpunctuated

Unpunctuated

  • Plain

Plain vs vs polite form polite form

  • Speech

Speech vs vs text form text form

  • Many foreign phrases

Many foreign phrases

  • (technology directed genre’s)

(technology directed genre’s)

  • Many new abbreviations

Many new abbreviations

  • E.g. SMS messages

E.g. SMS messages

slide-9
SLIDE 9

Character Encoding

  • Unicode

Unicode vs vs utf8 utf8 vs vs latin latin

  • Documents mix them

Documents mix them

  • Sometime accent omitted

Sometime accent omitted

  • For ease of typing

For ease of typing

  • Lots of standards

Lots of standards

  • Unicode, EUC, BIG5, TIS42, …

Unicode, EUC, BIG5, TIS42, …

  • Everyone has their own standard

Everyone has their own standard

  • Some create their own standards

Some create their own standards

  • Mixed character sets

Mixed character sets

slide-10
SLIDE 10

Phoneme Sets

  • Hard to find consensus for new languages

Hard to find consensus for new languages

  • Typically lots of different dialects

Typically lots of different dialects

  • What level of distinction?

What level of distinction?

  • Some good for speech but not really phonetic

Some good for speech but not really phonetic

  • /t/

/t/ vs vs / /dx dx/ in “water” / in “water”

  • Often doesn’t include foreign phones

Often doesn’t include foreign phones

  • /w/ in German is common for younger people

/w/ in German is common for younger people

slide-11
SLIDE 11

Words

  • May be hard to define

May be hard to define

  • No word boundaries

No word boundaries

  • Rich morphology

Rich morphology

  • Words have many variations of compounds

Words have many variations of compounds

  • Yomenakatta

Yomenakatta -

  • > could not read

> could not read

  • Yomemasendeshita

Yomemasendeshita -

  • > could not read (polite)

> could not read (polite)

  • Gender specific speech

Gender specific speech

  • Boku

Boku vs vs atashi atashi

  • Language mixtures

Language mixtures

slide-12
SLIDE 12

Pronunciation lexicons

“proper” speech proper” speech vs vs “actual” speech “actual” speech

  • Hard to generalize

Hard to generalize

  • Chinese

Chinese

  • Cross lingual pronunciations

Cross lingual pronunciations

  • “Human” (English/German)

“Human” (English/German)

slide-13
SLIDE 13

“Industry” way

  • Collect at least 100 hours of spoken speech

Collect at least 100 hours of spoken speech

  • At least 20 different speakers

At least 20 different speakers

  • Mixture of gender, age, etc

Mixture of gender, age, etc

  • Through desired channel (phone/desktop)

Through desired channel (phone/desktop)

  • Collect at least 5 hours from one speaker

Collect at least 5 hours from one speaker

  • High quality recording studio

High quality recording studio

  • Data should be targeted to application

Data should be targeted to application

  • Build pronunciation lexicon

Build pronunciation lexicon

  • Expert

Expert phonologist phonologist

slide-14
SLIDE 14

Industry way

  • Probably 3

Probably 3-

  • 6 months

6 months

  • Lead developer

Lead developer

  • Local language expert

Local language expert

  • Lots of human transcribers

Lots of human transcribers

  • Costs?

Costs?

  • Many hundreds of thousands

Many hundreds of thousands

slide-15
SLIDE 15

Or cheaper (?) …

  • Find existing data

Find existing data

  • Linguistic Data Consortium (

Linguistic Data Consortium (UPenn UPenn) )

  • ELRA (European equivalent)

ELRA (European equivalent)

  • Appen

Appen, Australia , Australia

  • Find local people who have collected data

Find local people who have collected data

  • Found data might be in wrong format

Found data might be in wrong format

  • Data cleaning is often the most expensive

Data cleaning is often the most expensive

slide-16
SLIDE 16

Actual way

  • Often mixture

Often mixture

  • Found data for initial model

Found data for initial model

  • Collect data with actual/initial application

Collect data with actual/initial application

slide-17
SLIDE 17

Multilingual Systems

  • Support lots of different languages

Support lots of different languages

  • Press 1 for Spanish

Press 1 for Spanish

  • Press 2 for Gujarati …

Press 2 for Gujarati …

  • Automatically detect language

Automatically detect language

  • Mixed language

Mixed language

slide-18
SLIDE 18

Multilingual (Menu)

  • Speak in your language

Speak in your language

  • Eki

Eki-

  • mai

mai no no tsugi tsugi no bus no ha? no bus no ha?

  • When is the next bus to the station

When is the next bus to the station

  • Need multiple recognizers

Need multiple recognizers

  • Run in parallel and take best result

Run in parallel and take best result

  • Or shared acoustic models

Or shared acoustic models

  • Recognizing both languages at once (mix)

Recognizing both languages at once (mix)

slide-19
SLIDE 19

Multilingual (in line)

  • Code switching

Code switching

  • European, India, Bilingual areas

European, India, Bilingual areas

  • Hinglish

Hinglish, , Spanglish Spanglish

  • Borrowed words and phrases

Borrowed words and phrases

  • Dad, time

Dad, time kyu kyu hua hua hai hai

  • One

One lakh lakh

  • Computer

Computer walla walla

  • numbers

numbers

  • Can be inflected

Can be inflected

  • Was updated

Was updated -

  • > up

> up gedaten gedaten

slide-20
SLIDE 20

Lilac

slide-21
SLIDE 21
slide-22
SLIDE 22

HW2: TTS

  • Due 3:30pm Monday October 20

Due 3:30pm Monday October 20th

th

  • Install Festival and

Install Festival and Festvox Festvox

  • Find 10 errors in each of two different

Find 10 errors in each of two different synthesizers synthesizers

  • Build a voice

Build a voice

  • A Talking Clock

A Talking Clock

  • A general voice

A general voice

  • (or both)

(or both)

slide-23
SLIDE 23