Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules

Speech Synthesis Linguistic Analysis � Linguistic Analysis � � Pronunciations Pronunciations � � Prosody Prosody �

Part of Speech Tagging Find the most likely tag for each word � Find the most likely tag for each word � � Most words only have one tag (92% correct) Most words only have one tag (92% correct) � Context often defines tag type � Context often defines tag type � � “The project” “The project” vs vs “To project” “To project” � Use HMM Part of Speech tagger tagger � Use HMM Part of Speech � � But need data to train it (English But need data to train it (English � PennTreeBank) ) PennTreeBank

Poor Man’s PoS Tagger Hand list “function” word types � Hand list “function” word types � � (determiners a an the this) (determiners a an the this) � � (conjunctions and or but) (conjunctions and or but) � � (pp in on to) (pp in on to) � � (content everything else) (content everything else) � Better than nothing � Better than nothing � � Easy to do on new languages Easy to do on new languages �

Pronunciation Lexicon � List of words and their pronunciation List of words and their pronunciation � � (“pencil” n (p eh1 n s (“pencil” n (p eh1 n s ih ih l)) l)) � � (“table” n (t ey1 b ax l)) (“table” n (t ey1 b ax l)) � � Need the right phoneme set Need the right phoneme set � � Need other information Need other information � � Part of speech Part of speech � � Lexical stress Lexical stress � � Other information (Tone, Lexical accent …) Other information (Tone, Lexical accent …) � � Syllable boundaries Syllable boundaries �

Homograph Representation Must distinguish different pronunciations � Must distinguish different pronunciations � � (“project” n (p r aa1 (“project” n (p r aa1 jh jh eh k t)) eh k t)) � � (“project” v (p r ax (“project” v (p r ax jh jh eh1 k t)) eh1 k t)) � � (“bass” (“bass” n_music n_music (b ey1 s)) (b ey1 s)) � � (“bass” (“bass” n_fish n_fish (b ae1 s)) (b ae1 s)) � ASR multiple pronunciations � ASR multiple pronunciations � � (“route” n (r (“route” n (r uw uw t)) t)) � � (“route(2)” n (r aw t)) (“route(2)” n (r aw t)) �

Pronunciation of Unknown Words How do you pronounce new words � How do you pronounce new words � 4% of tokens (in news) are new � 4% of tokens (in news) are new � You can’t synthesis them without � You can’t synthesis them without � pronunciations pronunciations You can’t recognize them without � You can’t recognize them without � pronunciations pronunciations Letter- -to to- -Sounds rules Sounds rules � Letter � Grapheme- -to to- -Phoneme rules Phoneme rules � Grapheme �

LTS: Hand written Hand written rules � Hand written rules � � [ [LeftContext LeftContext] X [ ] X [RightContext RightContext] ] - -> Y > Y � � e.g. e.g. � � c [h r] c [h r] - -> k > k � � c [h] c [h] - -> > ch ch � � c [i] c [i] - -> s > s � � c c - -> k > k �

LTS: Machine Learning Techniques Need an existing lexicon � Need an existing lexicon � � Pronunciations: words and phones Pronunciations: words and phones � � But different number of letters and phones But different number of letters and phones � Need an alignment � Need an alignment � � Between letters and phones Between letters and phones � � checked checked - -> > ch ch eh k t eh k t �

LTS: alignment checked - -> > ch ch eh k t eh k t � checked � c h e c k e d c h e c k e d ch _ eh k _ _ t ch _ eh k _ _ t Some letters go to nothing � Some letters go to nothing � Some letters go to two phones � Some letters go to two phones � � box box - -> b > b aa aa k k- -s s � � table table - -> t > t ey ey b ax b ax- -l l - - �

Find alignment automatically � Epsilon scattering Epsilon scattering � � Find all possible alignments Find all possible alignments � � Estimate Estimate p(L,P p(L,P) on each alignment ) on each alignment � � Find most probable alignment Find most probable alignment � � Hand seed Hand seed � � Hand specify allowable pairs Hand specify allowable pairs � � Estimate Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment � � Find most probable alignment Find most probable alignment � � Statistical Machine Translation (IBM model 1) Statistical Machine Translation (IBM model 1) � � Estimate Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment � � Find most probable alignment Find most probable alignment �

Not everything aligns 0, 1, and 2 letter cases � 0, 1, and 2 letter cases � � e e - -> epsilon “moved” > epsilon “moved” � � x x - -> > k k- -s s, , g g- -z z “box” “example” “box” “example” � � e e - -> > y y- -uw uw “askew” “askew” � Some alignments aren’t sensible � Some alignments aren’t sensible � � dept dept - -> d > d ih ih p p aa aa r t m ax n t r t m ax n t � � cmu cmu - -> s > s iy iy eh m y eh m y uw uw �

Training LTS models Use CART trees � Use CART trees � � One model for each letter One model for each letter � Predict phone (epsilon, phone, dual phone) � Predict phone (epsilon, phone, dual phone) � � From letter 3 From letter 3- -context (and POS) context (and POS) � # # # c h e c - -> > ch ch � # # # c h e c � # # c h e c k - -> _ > _ � # # c h e c k � # c h e c k e - -> eh > eh � # c h e c k e � c h e c k e d - -> k > k � c h e c k e d �

LTS results Split lexicon into train/test 90%/10% � Split lexicon into train/test 90%/10% � � i.e. every tenth entry is extracted for testing i.e. every tenth entry is extracted for testing � Lexicon Letter Acc Word Acc Lexicon Letter Acc Word Acc OALD 95.80% 75.56% OALD 95.80% 75.56% CMUDICT 91.99% 57.80% CMUDICT 91.99% 57.80% BRULEX 99.00% 93.03% BRULEX 99.00% 93.03% DE- -CELEX CELEX 98.79% 89.38% DE 98.79% 89.38% Thai 95.60% 68.76% Thai 95.60% 68.76%

Example Tree

But we need more than phones What about lexical stress � What about lexical stress � � p r aa1 j eh k t p r aa1 j eh k t - -> p r > p r aa aa j eh1 k t j eh1 k t � Two possibilities � Two possibilities � � A separate prediction model A separate prediction model � � Join model Join model – – introduce eh/eh1 (BETTER) introduce eh/eh1 (BETTER) � LTP+S LTPS LTP+S LTPS L no S 96.36% 96.27% L no S 96.36% 96.27% Letter --- 95.80% Letter --- 95.80% W no S 76.92% 74.69% W no S 76.92% 74.69% Word 63.68% 74.56% Word 63.68% 74.56%

Does it really work 40K words from Time Magazine � 40K words from Time Magazine � � 1775 (4.6%) not in OALD 1775 (4.6%) not in OALD � � LTS gets 70% correct (test set was 74%) LTS gets 70% correct (test set was 74%) � Occurs % Occurs % Names 1360 76.6 Names 1360 76.6 Unknown 351 19.8 Unknown 351 19.8 US Spelling 57 3.2 US Spelling 57 3.2 Typos 7 0.4 Typos 7 0.4

Dialect Lexicons � Need different lexicons for different dialects Need different lexicons for different dialects � � US, UK, Indian, Australia, Europeans US, UK, Indian, Australia, Europeans � � Build dialect independent lexicons Build dialect independent lexicons � � Dialect independent vowels (“key Dialect independent vowels (“key- -vowels”) vowels”) � coffee and conference The vowel in coffee and conference  The vowel in   Map to Map to aa aa in US, and o in the UK in US, and o in the UK  � Post Post- -vocalic r in UK English vocalic r in UK English �  Car Car - -> k > k aa aa  � Specific words Specific words �  Leisure, route, tortoise, poem Leisure, route, tortoise, poem 

Post-lexical Rules Sometime you need context � Sometime you need context � “the” as dh ax or dh iy iy � “the” as dh ax or dh � � The banana and The apple The banana and The apple � R- -insertion in UK English insertion in UK English � R � � Car door Car door vs vs car alarm car alarm � Liaison in French � Liaison in French � � Petit Petit vs vs Petit Petit ami ami �

Summary Linguistic analysis � Linguistic analysis � � Part of speech tagging Part of speech tagging � � Pronunciation Pronunciation �  Phones, stress, (syllables) Phones, stress, (syllables)   Letter to sound rules Letter to sound rules  � Post lexical rules Post lexical rules �

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Concepts of Concurrent Computation computation? Turing machines or the -calculus have

Finite and infinite traces, inductively and coinductively Jurriaan Rot WAIT 2018 1/16 Overview

COMP30112: Concurrency Introduction to Course & Introduction to FSP Howard Barringer Room

Effjcient Kernel Backporting Alex Shi LinuxConf EU 2016 http://www.linaro.org Agenda Why do

Spatially Induced Concurrency within Presheaves of Labelled Transition Systems Simon

st s rs trs

Generic Trace Semantics and Graded Monads Stefan Milius a Dirk Pattinson b oder a Lutz Schr a

Bounded Model Checking for Finite-State Systems Copenhagen, 2 March 2010 Quantitative Model

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

Concepts of Concurrent Computation computation? Turing machines or the -calculus have

Finite and infinite traces, inductively and coinductively Jurriaan Rot WAIT 2018 1/16 Overview

COMP30112: Concurrency Introduction to Course &amp; Introduction to FSP Howard Barringer Room

Effjcient Kernel Backporting Alex Shi LinuxConf EU 2016 http://www.linaro.org Agenda Why do

Spatially Induced Concurrency within Presheaves of Labelled Transition Systems Simon

st s rs trs

Generic Trace Semantics and Graded Monads Stefan Milius a Dirk Pattinson b oder a Lutz Schr a

Bounded Model Checking for Finite-State Systems Copenhagen, 2 March 2010 Quantitative Model

COMP30112: Concurrency Introduction to Course & Introduction to FSP Howard Barringer Room