hidden markov ov model hmm based s speech synthesis using
play

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing - PowerPoint PPT Presentation

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III) Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech


  1. Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III)

  2. Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech Synthesis Engine Synthesized Speech 2

  3. Introduction: Rule-based, formant synthesis � Hand-crafting each phonetic units by rule rules CORPUS-BASED: � Concatenative synthesis � High quality speech can be synthesized ized using waveform concatenation algorithms. concatenation algorithms. � To obtain various voices, a large amoun ount of speech data is necessary. � Statistical parametric synthesis � Generate speech parameters from stat statistical models � Voice quality can easily be changed by by transforming HMM parameters. 3

  4. Approaches at CLE: CORPUS-BASED: � Unit Selection � HMM based. Comparison of two Approaches: Unit Unit Selection Selection HMM based HMM based Unit Unit Selection Selection HMM based HMM based Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: High Quality at Waveform level • Sma Small Foot Print (Specific Domain) • Smo Smooth • Stab Stable Quality Disadvantages: Disadvantages: Disadvantages: Disadvantages: • Large footprints Vocode coder sound • Discontinuous (Domai main-independent) • Unstable quality 4

  5. Synthesis Model: ource Filter Model: Source excitation part Source Source Source Vocal tract Vocal tract Vocal tract Vocal tract tract tract tract tract resonance part Pulse train Excitation Linear ( n ( n ) ) e e time-invariant ant Speech Speech system system ( n ) h ( ) ( ) * ( ) = x n h n e n White noise � The h(n) is defined by the state output put vector of the HMM e.g mel-cepstrum 5

  6. General Overview(HTS): Training Part Training Part Training Part Training Part Speech Input Extract Spectrum, rum, F 0 , labels Labels Train Acoustic tic Models Stored Stored Models Parameter Synthesis Part Synthesis Part Synthesis Part Synthesis Part Text Input Generation Synthesis Synthesized Filter Speech 6

  7. Challenges: Generation of the full-context style la le labels. Addition of Stress/Syllable Layer. Defining the Question Set. Optimizing the Synthesized Quality. Optimizing the Synthesized Quality. 7

  8. Full-Context Label Style: P A K I S T_D A N A N Phoneme sequence T_D _D-A-N P-A-K Tri- Tri -phone context dependen phone context dependen ndent model ndent model Tri Tri - - phone context dependen phone context dependen ndent model ndent model Phoneme P A K I S T_D A N A N sequence x^P x^P- x^P x^P - - -A A A A+K +K +K +K= =I@x_x = = I@x_x/A … I@x_x I@x_x /A … /A … /A … S^T_D- S^T_D S^T_D S^T_D - -A - - - - - A+N= A A +N=x@x_x +N= +N= x@x_x x@x_x/A … x@x_x /A … /A … /A … Full Full- -context style context style context depe context depe ependent model ependent model Full Full - - context style context style context depe context depe ependent model ependent model 8

  9. Full-Context Format: x^x- SIL +A=L@1_0/A:0_0_0/B:0-0-0@1-0& &1-1#1-1$1-1!0-0;0- … x^SIL- A +L=I_I@1_1/A:0_0_0/B:0-0-1@1-2& &1-9#1-3$1-1!0-2;0- … SIL^A- L +I_I=A@1_2/A:0_0_1/B:0-0-2@2-1 1&2-8#1-3$1-1!0-1;0-0 … A^L- I_I +A=P@2_1/A:0_0_1/B:0-0-2@2-1& &2-8#1-3$1-1!0-1;0- … ۔۔۔ �����ا ��� 9 9

  10. Full-Context Format: SIL^A-L+I_I=A@ 1_2/A:0_0_1/B:0-0-2@2-1& 1&2-8#1-3$1-1!0-1; 0-0|I_I/C:1+0+2/D:0_0/E:co /E:content+2@1+5&1+ 4#0+1/F:content_2/G:0_0/ _0/H:9=5^1=2|NONE /I:8=6/J:17+11-2 Segmental Context Supra-Segmental Supr Context Context Segmental Segmental Segmental Segmental Supra Supra Supra Supra- - -Segmental - Segmental Segmental Segmental • Current Phoneme • Syl Syllable • Previous two Phonemes • Str Stress • Next two Phonemes • Wo Word • Ph Phrase • PO POS 10 10

  11. teps to Generate Full-Conte ontext Labels: Extract Segmental & extGrid File Word Layer Apply Stress & Syllabification Rules Rules Align Syllable Boundaries with Segmental Layer Generate new ew Convert to Full- TextGrid File with with Context format Additional Layer ayers 11 11

  12. TextGrid Format: 12 12

  13. teps to Generate Full-Conte ontext Labels: Extract Segmental & extGrid File Word Layer Apply Stress & Syllabification Rules Rules Align Syllable Boundaries with Segmental Layer Generate new ew Convert to Full- TextGrid File with with Context format Additional Layer ayers 13 13

  14. extGrid Format with Add Additional Layers: 14 14

  15. Context Clustering (Quest uestion Set) 1/2: Number of possible combinations are s are quite enormous with these 53 53 53 different contexts. 53 With only Segmental Context Possible sible models are: 66 5 ≈ 1252 mil million If we consider all the context, it will b If we consider all the context, it will b ill be practically infinite. ill be practically infinite. Solution: Solution: Solution: Solution: Record data having maximum phonem oneme coverage at tri-phone or di-phone level. Apply context clustering technique to e to classify and share acoustically similar models 15 15

  16. Context Clustering (Quest uestion Set) 2/2: Phoneme � {preceding, current, succeeding} phone onemes Stress/Syllable/Word/ � # of phonemes at {preceding, current, s # of phonemes at {preceding, current, s nt, succeeding} syllable nt, succeeding} syllable � stress of {preceding, current, succeedin eding} syllable � Position of current syllable in current w nt word � # of syllables {from previous, to next} st stressed syllable � Vowel within current syllable � # of syllables in {preceding, current, suc , succeeding} word 16 16

  17. Some Synthesized Examp mples: Seen Context Seen Context: : Training Set: Training Set: Seen Context Seen Context : : Training Set: Training Set: Un Un Un- Un - -seen - seen Context seen seen Context Context: Context : : : Different Carrier Word: Different Carrier Word: Different Carrier Word: Different Carrier Word: 17 17

  18. Questio Questio stions ? stions ? 18 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend