[PPT] - Speech Processing 15-492/18-492 Multilinguality Dealing with *all* PowerPoint Presentation

SLIDE 1

Speech Processing 15-492/18-492

Multilinguality

SLIDE 2

Dealing with all Languages

Over 6000 Languages

Over 6000 Languages

Maybe not all commercially interesting … now

Maybe not all commercially interesting … now

Major languages (economic)

Major languages (economic)

Cell phone manufacturers list 46 languages

Cell phone manufacturers list 46 languages

But even those not all covered

But even those not all covered

SLIDE 3

What you need

ASR

ASR

Acoustic model (lots of speakers)

Acoustic model (lots of speakers)

Pronunciation Lexicon

Pronunciation Lexicon

Language model

Language model

TTS

TTS

Acoustic model (one speaker)

Acoustic model (one speaker)

Pronunciation Lexicon

Pronunciation Lexicon

Text analysis

Text analysis

SLIDE 4

Writing Systems

Romanized writing systems

Romanized writing systems

Latin

Latin-

1 (iso

1 (iso-

8599

8599-

1)

1)

Covers many Western Europeans languages

Covers many Western Europeans languages

Cyrillic

Cyrillic

Covers many Eastern European Languages

Covers many Eastern European Languages

Arabic Scripts

Arabic Scripts

Arabic(s

Arabic(s), Farsi, Urdu, etc ), Farsi, Urdu, etc

Devenagari

Devenagari

Covers many Northern India Languages

Covers many Northern India Languages

Chinese

Chinese Hanzi Hanzi

Covers some Chinese dialects but different versions

Covers some Chinese dialects but different versions

Many other scripts some non

Many other scripts some non-

standard

standard

SLIDE 5

Writing Systems

Letter based

Letter based

Latin, Cyrillic

Latin, Cyrillic

Consonant based

Consonant based

Arabic, Hebrew

Arabic, Hebrew

Mora based

Mora based

Half syllable or syllable

Half syllable or syllable

Indian scripts, Japanese native scripts

Indian scripts, Japanese native scripts

Syllable based

Syllable based

Hangul, Chinese

Hangul, Chinese

SLIDE 6

Standards

Writing standards

Writing standards

Taught at schools, newspapers, computer

Taught at schools, newspapers, computer support support

Typically standardized spelling

Typically standardized spelling

May be mostly spoken

May be mostly spoken

Occasionally written

Occasionally written

SLIDE 7

Language Specific Issues

No explicit markings

No explicit markings

Stress, accent, tones

Stress, accent, tones

No word boundaries

No word boundaries

Chinese, Thai

Chinese, Thai

No (short) vowels

No (short) vowels

Arabic, Hebrew

Arabic, Hebrew

Rich morphology

Rich morphology

Many different words in the languages

Many different words in the languages

Finnish, Turkish, Greenlandic

Finnish, Turkish, Greenlandic

SLIDE 8

Genre Specific Issues

No capitals, punctuations

No capitals, punctuations

Unpunctuated

Unpunctuated

Plain

Plain vs vs polite form polite form

Speech

Speech vs vs text form text form

Many foreign phrases

Many foreign phrases

(technology directed genre’s)

(technology directed genre’s)

Many new abbreviations

Many new abbreviations

E.g. SMS messages

E.g. SMS messages

SLIDE 9

Character Encoding

Unicode

Unicode vs vs utf8 utf8 vs vs latin latin

Documents mix them

Documents mix them

Sometime accent omitted

Sometime accent omitted

For ease of typing

For ease of typing

Lots of standards

Lots of standards

Unicode, EUC, BIG5, TIS42, …

Unicode, EUC, BIG5, TIS42, …

Everyone has their own standard

Everyone has their own standard

Some create their own standards

Some create their own standards

Mixed character sets

Mixed character sets

SLIDE 10

Phoneme Sets

Hard to find consensus for new languages

Hard to find consensus for new languages

Typically lots of different dialects

Typically lots of different dialects

What level of distinction?

What level of distinction?

Some good for speech but not really phonetic

Some good for speech but not really phonetic

/t/

/t/ vs vs / /dx dx/ in “water” / in “water”

Often doesn’t include foreign phones

Often doesn’t include foreign phones

/w/ in German is common for younger people

/w/ in German is common for younger people

SLIDE 11

Words

May be hard to define

May be hard to define

No word boundaries

No word boundaries

Rich morphology

Rich morphology

Words have many variations of compounds

Words have many variations of compounds

Yomenakatta

Yomenakatta -

> could not read

> could not read

Yomemasendeshita

Yomemasendeshita -

> could not read (polite)

> could not read (polite)

Gender specific speech

Gender specific speech

Boku

Boku vs vs atashi atashi

Language mixtures

Language mixtures

SLIDE 12

Pronunciation lexicons

“

“proper” speech proper” speech vs vs “actual” speech “actual” speech

Hard to generalize

Hard to generalize

Chinese

Chinese

Cross lingual pronunciations

Cross lingual pronunciations

“Human” (English/German)

“Human” (English/German)

SLIDE 13

“Industry” way

Collect at least 100 hours of spoken speech

Collect at least 100 hours of spoken speech

At least 20 different speakers

At least 20 different speakers

Mixture of gender, age, etc

Mixture of gender, age, etc

Through desired channel (phone/desktop)

Through desired channel (phone/desktop)

Collect at least 5 hours from one speaker

Collect at least 5 hours from one speaker

High quality recording studio

High quality recording studio

Data should be targeted to application

Data should be targeted to application

Build pronunciation lexicon

Build pronunciation lexicon

Expert

Expert phonologist phonologist

SLIDE 14

Industry way

Probably 3

Probably 3-

6 months

6 months

Lead developer

Lead developer

Local language expert

Local language expert

Lots of human transcribers

Lots of human transcribers

Costs?

Costs?

Many hundreds of thousands

Many hundreds of thousands

SLIDE 15

Or cheaper (?) …

Find existing data

Find existing data

Linguistic Data Consortium (

Linguistic Data Consortium (UPenn UPenn) )

ELRA (European equivalent)

ELRA (European equivalent)

Appen

Appen, Australia , Australia

Find local people who have collected data

Find local people who have collected data

Found data might be in wrong format

Found data might be in wrong format

Data cleaning is often the most expensive

Data cleaning is often the most expensive

SLIDE 16

Actual way

Often mixture

Often mixture

Found data for initial model

Found data for initial model

Collect data with actual/initial application

Collect data with actual/initial application

SLIDE 17

Multilingual Systems

Support lots of different languages

Support lots of different languages

Press 1 for Spanish

Press 1 for Spanish

Press 2 for Gujarati …

Press 2 for Gujarati …

Automatically detect language

Automatically detect language

Mixed language

Mixed language

SLIDE 18

Multilingual (Menu)

Speak in your language

Speak in your language

Eki

Eki-

mai

mai no no tsugi tsugi no bus no ha? no bus no ha?

When is the next bus to the station

When is the next bus to the station

Need multiple recognizers

Need multiple recognizers

Run in parallel and take best result

Run in parallel and take best result

Or shared acoustic models

Or shared acoustic models

Recognizing both languages at once (mix)

Recognizing both languages at once (mix)

SLIDE 19

Multilingual (in line)

Code switching

Code switching

European, India, Bilingual areas

European, India, Bilingual areas

Hinglish

Hinglish, , Spanglish Spanglish

Borrowed words and phrases

Borrowed words and phrases

Dad, time

Dad, time kyu kyu hua hua hai hai

One

One lakh lakh

Computer

Computer walla walla

numbers

numbers

Can be inflected

Can be inflected

Was updated

Was updated -

> up

> up gedaten gedaten

SLIDE 20

Lilac

SLIDE 21

SLIDE 22

HW2: TTS

Due 3:30pm Monday October 20

Due 3:30pm Monday October 20th

th

Install Festival and

Install Festival and Festvox Festvox

Find 10 errors in each of two different

Find 10 errors in each of two different synthesizers synthesizers

Build a voice

Build a voice

A Talking Clock

A Talking Clock

A general voice

A general voice

(or both)

(or both)

SLIDE 23

Speech Processing 15-492/18-492

Multilinguality

Dealing with *all* Languages

Over 6000 Languages

Maybe not all commercially interesting … now

Major languages (economic)

Cell phone manufacturers list 46 languages

But even those not all covered

What you need

ASR

Acoustic model (lots of speakers)

Pronunciation Lexicon

Language model

TTS

Acoustic model (one speaker)

Pronunciation Lexicon

Text analysis

Writing Systems

Romanized writing systems

Cyrillic

Arabic Scripts

Devenagari

Chinese Hanzi Hanzi

Many other scripts some non-

standard

Writing Systems

Letter based

Latin, Cyrillic

Consonant based

Arabic, Hebrew

Mora based

Half syllable or syllable

Indian scripts, Japanese native scripts

Syllable based

Hangul, Chinese

Standards

Writing standards

Taught at schools, newspapers, computer support support

Typically standardized spelling

May be mostly spoken

Occasionally written

Language Specific Issues

No explicit markings

Stress, accent, tones

No word boundaries

Chinese, Thai

No (short) vowels

Arabic, Hebrew

Rich morphology

Many different words in the languages

Finnish, Turkish, Greenlandic

Genre Specific Issues

No capitals, punctuations

Unpunctuated

Plain vs vs polite form polite form

Speech vs vs text form text form

Many foreign phrases

(technology directed genre’s)

Many new abbreviations

E.g. SMS messages

Character Encoding

Unicode vs vs utf8 utf8 vs vs latin latin

Documents mix them

Sometime accent omitted

For ease of typing

Lots of standards

Unicode, EUC, BIG5, TIS42, …

Everyone has their own standard

Some create their own standards

Mixed character sets

Phoneme Sets

Hard to find consensus for new languages

Typically lots of different dialects

What level of distinction?

Some good for speech but not really phonetic

/t/ vs vs / /dx dx/ in “water” / in “water”

Often doesn’t include foreign phones

/w/ in German is common for younger people

Words

May be hard to define

Dealing with all Languages