Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Synthesis Pronunciation Letter to Sound rules

slide-2
SLIDE 2

Speech Synthesis

  • Linguistic Analysis

Linguistic Analysis

  • Pronunciations

Pronunciations

  • Prosody

Prosody

slide-3
SLIDE 3

Part of Speech Tagging

  • Find the most likely tag for each word

Find the most likely tag for each word

  • Most words only have one tag (92% correct)

Most words only have one tag (92% correct)

  • Context often defines tag type

Context often defines tag type

  • “The project”

“The project” vs vs “To project” “To project”

  • Use HMM Part of Speech

Use HMM Part of Speech tagger tagger

  • But need data to train it (English

But need data to train it (English PennTreeBank PennTreeBank) )

slide-4
SLIDE 4

Poor Man’s PoS Tagger

  • Hand list “function” word types

Hand list “function” word types

  • (determiners a an the this)

(determiners a an the this)

  • (conjunctions and or but)

(conjunctions and or but)

  • (pp in on to)

(pp in on to)

  • (content everything else)

(content everything else)

  • Better than nothing

Better than nothing

  • Easy to do on new languages

Easy to do on new languages

slide-5
SLIDE 5

Pronunciation Lexicon

  • List of words and their pronunciation

List of words and their pronunciation

  • (“pencil” n (p eh1 n s

(“pencil” n (p eh1 n s ih ih l)) l))

  • (“table” n (t ey1 b ax l))

(“table” n (t ey1 b ax l))

  • Need the right phoneme set

Need the right phoneme set

  • Need other information

Need other information

  • Part of speech

Part of speech

  • Lexical stress

Lexical stress

  • Other information (Tone, Lexical accent …)

Other information (Tone, Lexical accent …)

  • Syllable boundaries

Syllable boundaries

slide-6
SLIDE 6

Homograph Representation

  • Must distinguish different pronunciations

Must distinguish different pronunciations

  • (“project” n (p r aa1

(“project” n (p r aa1 jh jh eh k t)) eh k t))

  • (“project” v (p r ax

(“project” v (p r ax jh jh eh1 k t)) eh1 k t))

  • (“bass”

(“bass” n_music n_music (b ey1 s)) (b ey1 s))

  • (“bass”

(“bass” n_fish n_fish (b ae1 s)) (b ae1 s))

  • ASR multiple pronunciations

ASR multiple pronunciations

  • (“route” n (r

(“route” n (r uw uw t)) t))

  • (“route(2)” n (r aw t))

(“route(2)” n (r aw t))

slide-7
SLIDE 7

Pronunciation of Unknown Words

  • How do you pronounce new words

How do you pronounce new words

  • 4% of tokens (in news) are new

4% of tokens (in news) are new

  • You can’t synthesis them without

You can’t synthesis them without pronunciations pronunciations

  • You can’t recognize them without

You can’t recognize them without pronunciations pronunciations

  • Letter

Letter-

  • to

to-

  • Sounds rules

Sounds rules

  • Grapheme

Grapheme-

  • to

to-

  • Phoneme rules

Phoneme rules

slide-8
SLIDE 8

LTS: Hand written

  • Hand written rules

Hand written rules

  • [

[LeftContext LeftContext] X [ ] X [RightContext RightContext] ] -

  • > Y

> Y

  • e.g.

e.g.

  • c [h r]

c [h r] -

  • > k

> k

  • c [h]

c [h] -

  • >

> ch ch

  • c [i]

c [i] -

  • > s

> s

  • c

c -

  • > k

> k

slide-9
SLIDE 9

LTS: Machine Learning Techniques

  • Need an existing lexicon

Need an existing lexicon

  • Pronunciations: words and phones

Pronunciations: words and phones

  • But different number of letters and phones

But different number of letters and phones

  • Need an alignment

Need an alignment

  • Between letters and phones

Between letters and phones

  • checked

checked -

  • >

> ch ch eh k t eh k t

slide-10
SLIDE 10

LTS: alignment

t t _ _ _ _ k k eh eh _ _ ch ch d d e e k k c c e e h h c c

  • checked

checked -

  • >

> ch ch eh k t eh k t

  • Some letters go to nothing

Some letters go to nothing

  • Some letters go to two phones

Some letters go to two phones

  • box

box -

  • > b

> b aa aa k k-

  • s

s

  • table

table -

  • > t

> t ey ey b ax b ax-

  • l

l -

slide-11
SLIDE 11

Find alignment automatically

  • Epsilon scattering

Epsilon scattering

  • Find all possible alignments

Find all possible alignments

  • Estimate

Estimate p(L,P p(L,P) on each alignment ) on each alignment

  • Find most probable alignment

Find most probable alignment

  • Hand seed

Hand seed

  • Hand specify allowable pairs

Hand specify allowable pairs

  • Estimate

Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment

  • Find most probable alignment

Find most probable alignment

  • Statistical Machine Translation (IBM model 1)

Statistical Machine Translation (IBM model 1)

  • Estimate

Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment

  • Find most probable alignment

Find most probable alignment

slide-12
SLIDE 12

Not everything aligns

  • 0, 1, and 2 letter cases

0, 1, and 2 letter cases

  • e

e -

  • > epsilon “moved”

> epsilon “moved”

  • x

x -

  • >

> k k-

  • s

s, , g g-

  • z

z “box” “example” “box” “example”

  • e

e -

  • >

> y y-

  • uw

uw “askew” “askew”

  • Some alignments aren’t sensible

Some alignments aren’t sensible

  • dept

dept -

  • > d

> d ih ih p p aa aa r t m ax n t r t m ax n t

  • cmu

cmu -

  • > s

> s iy iy eh m y eh m y uw uw

slide-13
SLIDE 13

Training LTS models

  • Use CART trees

Use CART trees

  • One model for each letter

One model for each letter

  • Predict phone (epsilon, phone, dual phone)

Predict phone (epsilon, phone, dual phone)

  • From letter 3

From letter 3-

  • context (and POS)

context (and POS)

  • # # # c h e c

# # # c h e c -

  • >

> ch ch

  • # # c h e c k

# # c h e c k -

  • > _

> _

  • # c h e c k e

# c h e c k e -

  • > eh

> eh

  • c h e c k e d

c h e c k e d -

  • > k

> k

slide-14
SLIDE 14

LTS results

68.76% 68.76% 95.60% 95.60% Thai Thai 89.38% 89.38% 98.79% 98.79% DE DE-

  • CELEX

CELEX 93.03% 93.03% 99.00% 99.00% BRULEX BRULEX 57.80% 57.80% 91.99% 91.99% CMUDICT CMUDICT 75.56% 75.56% 95.80% 95.80% OALD OALD Word Acc Word Acc Letter Acc Letter Acc Lexicon Lexicon

  • Split lexicon into train/test 90%/10%

Split lexicon into train/test 90%/10%

  • i.e. every tenth entry is extracted for testing

i.e. every tenth entry is extracted for testing

slide-15
SLIDE 15

Example Tree

slide-16
SLIDE 16

But we need more than phones

74.56% 74.56% 63.68% 63.68% Word Word 74.69% 74.69% 76.92% 76.92% W no S W no S 95.80% 95.80%

  • Letter

Letter 96.27% 96.27% 96.36% 96.36% L no S L no S LTPS LTPS LTP+S LTP+S

  • What about lexical stress

What about lexical stress

  • p r aa1 j eh k t

p r aa1 j eh k t -

  • > p r

> p r aa aa j eh1 k t j eh1 k t

  • Two possibilities

Two possibilities

  • A separate prediction model

A separate prediction model

  • Join model

Join model – – introduce eh/eh1 (BETTER) introduce eh/eh1 (BETTER)

slide-17
SLIDE 17

Does it really work

0.4 0.4 7 7 Typos Typos 3.2 3.2 57 57 US Spelling US Spelling 19.8 19.8 351 351 Unknown Unknown 76.6 76.6 1360 1360 Names Names % % Occurs Occurs

  • 40K words from Time Magazine

40K words from Time Magazine

  • 1775 (4.6%) not in OALD

1775 (4.6%) not in OALD

  • LTS gets 70% correct (test set was 74%)

LTS gets 70% correct (test set was 74%)

slide-18
SLIDE 18

Dialect Lexicons

  • Need different lexicons for different dialects

Need different lexicons for different dialects

  • US, UK, Indian, Australia, Europeans

US, UK, Indian, Australia, Europeans

  • Build dialect independent lexicons

Build dialect independent lexicons

  • Dialect independent vowels (“key

Dialect independent vowels (“key-

  • vowels”)

vowels”)

  The vowel in

The vowel in coffee coffee and and conference conference

  Map to

Map to aa aa in US, and o in the UK in US, and o in the UK

  • Post

Post-

  • vocalic r in UK English

vocalic r in UK English

  Car

Car -

  • > k

> k aa aa

  • Specific words

Specific words

  Leisure, route, tortoise, poem

Leisure, route, tortoise, poem

slide-19
SLIDE 19

Post-lexical Rules

  • Sometime you need context

Sometime you need context

  • “the” as dh ax or dh

“the” as dh ax or dh iy iy

  • The banana and The apple

The banana and The apple

  • R

R-

  • insertion in UK English

insertion in UK English

  • Car door

Car door vs vs car alarm car alarm

  • Liaison in French

Liaison in French

  • Petit

Petit vs vs Petit Petit ami ami

slide-20
SLIDE 20

Summary

  • Linguistic analysis

Linguistic analysis

  • Part of speech tagging

Part of speech tagging

  • Pronunciation

Pronunciation

  Phones, stress, (syllables)

Phones, stress, (syllables)

  Letter to sound rules

Letter to sound rules

  • Post lexical rules

Post lexical rules

slide-21
SLIDE 21