speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody Part of Speech Tagging


  1. Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules

  2. Speech Synthesis Linguistic Analysis � Linguistic Analysis � � Pronunciations Pronunciations � � Prosody Prosody �

  3. Part of Speech Tagging Find the most likely tag for each word � Find the most likely tag for each word � � Most words only have one tag (92% correct) Most words only have one tag (92% correct) � Context often defines tag type � Context often defines tag type � � “The project” “The project” vs vs “To project” “To project” � Use HMM Part of Speech tagger tagger � Use HMM Part of Speech � � But need data to train it (English But need data to train it (English � PennTreeBank) ) PennTreeBank

  4. Poor Man’s PoS Tagger Hand list “function” word types � Hand list “function” word types � � (determiners a an the this) (determiners a an the this) � � (conjunctions and or but) (conjunctions and or but) � � (pp in on to) (pp in on to) � � (content everything else) (content everything else) � Better than nothing � Better than nothing � � Easy to do on new languages Easy to do on new languages �

  5. Pronunciation Lexicon � List of words and their pronunciation List of words and their pronunciation � � (“pencil” n (p eh1 n s (“pencil” n (p eh1 n s ih ih l)) l)) � � (“table” n (t ey1 b ax l)) (“table” n (t ey1 b ax l)) � � Need the right phoneme set Need the right phoneme set � � Need other information Need other information � � Part of speech Part of speech � � Lexical stress Lexical stress � � Other information (Tone, Lexical accent …) Other information (Tone, Lexical accent …) � � Syllable boundaries Syllable boundaries �

  6. Homograph Representation Must distinguish different pronunciations � Must distinguish different pronunciations � � (“project” n (p r aa1 (“project” n (p r aa1 jh jh eh k t)) eh k t)) � � (“project” v (p r ax (“project” v (p r ax jh jh eh1 k t)) eh1 k t)) � � (“bass” (“bass” n_music n_music (b ey1 s)) (b ey1 s)) � � (“bass” (“bass” n_fish n_fish (b ae1 s)) (b ae1 s)) � ASR multiple pronunciations � ASR multiple pronunciations � � (“route” n (r (“route” n (r uw uw t)) t)) � � (“route(2)” n (r aw t)) (“route(2)” n (r aw t)) �

  7. Pronunciation of Unknown Words How do you pronounce new words � How do you pronounce new words � 4% of tokens (in news) are new � 4% of tokens (in news) are new � You can’t synthesis them without � You can’t synthesis them without � pronunciations pronunciations You can’t recognize them without � You can’t recognize them without � pronunciations pronunciations Letter- -to to- -Sounds rules Sounds rules � Letter � Grapheme- -to to- -Phoneme rules Phoneme rules � Grapheme �

  8. LTS: Hand written Hand written rules � Hand written rules � � [ [LeftContext LeftContext] X [ ] X [RightContext RightContext] ] - -> Y > Y � � e.g. e.g. � � c [h r] c [h r] - -> k > k � � c [h] c [h] - -> > ch ch � � c [i] c [i] - -> s > s � � c c - -> k > k �

  9. LTS: Machine Learning Techniques Need an existing lexicon � Need an existing lexicon � � Pronunciations: words and phones Pronunciations: words and phones � � But different number of letters and phones But different number of letters and phones � Need an alignment � Need an alignment � � Between letters and phones Between letters and phones � � checked checked - -> > ch ch eh k t eh k t �

  10. LTS: alignment checked - -> > ch ch eh k t eh k t � checked � c h e c k e d c h e c k e d ch _ eh k _ _ t ch _ eh k _ _ t Some letters go to nothing � Some letters go to nothing � Some letters go to two phones � Some letters go to two phones � � box box - -> b > b aa aa k k- -s s � � table table - -> t > t ey ey b ax b ax- -l l - - �

  11. Find alignment automatically � Epsilon scattering Epsilon scattering � � Find all possible alignments Find all possible alignments � � Estimate Estimate p(L,P p(L,P) on each alignment ) on each alignment � � Find most probable alignment Find most probable alignment � � Hand seed Hand seed � � Hand specify allowable pairs Hand specify allowable pairs � � Estimate Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment � � Find most probable alignment Find most probable alignment � � Statistical Machine Translation (IBM model 1) Statistical Machine Translation (IBM model 1) � � Estimate Estimate p(L,P p(L,P) on each possible alignment ) on each possible alignment � � Find most probable alignment Find most probable alignment �

  12. Not everything aligns 0, 1, and 2 letter cases � 0, 1, and 2 letter cases � � e e - -> epsilon “moved” > epsilon “moved” � � x x - -> > k k- -s s, , g g- -z z “box” “example” “box” “example” � � e e - -> > y y- -uw uw “askew” “askew” � Some alignments aren’t sensible � Some alignments aren’t sensible � � dept dept - -> d > d ih ih p p aa aa r t m ax n t r t m ax n t � � cmu cmu - -> s > s iy iy eh m y eh m y uw uw �

  13. Training LTS models Use CART trees � Use CART trees � � One model for each letter One model for each letter � Predict phone (epsilon, phone, dual phone) � Predict phone (epsilon, phone, dual phone) � � From letter 3 From letter 3- -context (and POS) context (and POS) � # # # c h e c - -> > ch ch � # # # c h e c � # # c h e c k - -> _ > _ � # # c h e c k � # c h e c k e - -> eh > eh � # c h e c k e � c h e c k e d - -> k > k � c h e c k e d �

  14. LTS results Split lexicon into train/test 90%/10% � Split lexicon into train/test 90%/10% � � i.e. every tenth entry is extracted for testing i.e. every tenth entry is extracted for testing � Lexicon Letter Acc Word Acc Lexicon Letter Acc Word Acc OALD 95.80% 75.56% OALD 95.80% 75.56% CMUDICT 91.99% 57.80% CMUDICT 91.99% 57.80% BRULEX 99.00% 93.03% BRULEX 99.00% 93.03% DE- -CELEX CELEX 98.79% 89.38% DE 98.79% 89.38% Thai 95.60% 68.76% Thai 95.60% 68.76%

  15. Example Tree

  16. But we need more than phones What about lexical stress � What about lexical stress � � p r aa1 j eh k t p r aa1 j eh k t - -> p r > p r aa aa j eh1 k t j eh1 k t � Two possibilities � Two possibilities � � A separate prediction model A separate prediction model � � Join model Join model – – introduce eh/eh1 (BETTER) introduce eh/eh1 (BETTER) � LTP+S LTPS LTP+S LTPS L no S 96.36% 96.27% L no S 96.36% 96.27% Letter --- 95.80% Letter --- 95.80% W no S 76.92% 74.69% W no S 76.92% 74.69% Word 63.68% 74.56% Word 63.68% 74.56%

  17. Does it really work 40K words from Time Magazine � 40K words from Time Magazine � � 1775 (4.6%) not in OALD 1775 (4.6%) not in OALD � � LTS gets 70% correct (test set was 74%) LTS gets 70% correct (test set was 74%) � Occurs % Occurs % Names 1360 76.6 Names 1360 76.6 Unknown 351 19.8 Unknown 351 19.8 US Spelling 57 3.2 US Spelling 57 3.2 Typos 7 0.4 Typos 7 0.4

  18. Dialect Lexicons � Need different lexicons for different dialects Need different lexicons for different dialects � � US, UK, Indian, Australia, Europeans US, UK, Indian, Australia, Europeans � � Build dialect independent lexicons Build dialect independent lexicons � � Dialect independent vowels (“key Dialect independent vowels (“key- -vowels”) vowels”) � coffee and conference The vowel in coffee and conference  The vowel in   Map to Map to aa aa in US, and o in the UK in US, and o in the UK  � Post Post- -vocalic r in UK English vocalic r in UK English �  Car Car - -> k > k aa aa  � Specific words Specific words �  Leisure, route, tortoise, poem Leisure, route, tortoise, poem 

  19. Post-lexical Rules Sometime you need context � Sometime you need context � “the” as dh ax or dh iy iy � “the” as dh ax or dh � � The banana and The apple The banana and The apple � R- -insertion in UK English insertion in UK English � R � � Car door Car door vs vs car alarm car alarm � Liaison in French � Liaison in French � � Petit Petit vs vs Petit Petit ami ami �

  20. Summary Linguistic analysis � Linguistic analysis � � Part of speech tagging Part of speech tagging � � Pronunciation Pronunciation �  Phones, stress, (syllables) Phones, stress, (syllables)   Letter to sound rules Letter to sound rules  � Post lexical rules Post lexical rules �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend