a study of
play

A study of hypo- and hyper-articulated synthesized speech Mauro - PowerPoint PPT Presentation

A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School,


  1. A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School, Aachen, February 15, 2011

  2. Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  3. Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  4. Speech Synthesis by Analysis Project Modifications of human speech: • Success in communication: • ‒ voice intensity increasing ‒ to produce an intelligible speech ‒ speech rate adjustments ‒ to satisfy listener’s needs ‒ noise rhythm adaptation ‒ to transfer a concept form talker’s to listener’s mind ‒ signal processing (i.e. Lombard effect) ‒ change of word vocabulary Lindblom (1990), Lane et al. (2007), Levelt et al. (1999) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  5. Speech Synthesis by Analysis Project • Automatic TTS ignore environmental effects on speech and any feedback from listener. • Many researchers in different disciplines are investigating model to describe the human behaviour • New way of thinking automatic speech synthesis Moore (2007), Casserly and Pisoni (2010) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  6. Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  7. Complete project architecture FEEDFORWARD FEEDBACK SII Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  8. Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  9. TTS prototype with control on speech quality • Control function: ‒ none • Synthesis: • Control actions: ‒ HTS + SAT synthesis ‒ Phoneme substitution ‒ STRAIGHT parameters ‒ MLLR transformation ‒ GV control ‒ GV gaussian model manipulation ‒ Dynamic feature weight control Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  10. TTS prototype with control on speech quality Hyper-articulated speech Hypo-articulated speech HTS-Demo speech Intelligible but unnatural Muttered but “friendly” • Aim: ‒ Manipulate HTS model parameters to shift the speech quality along this line ‒ Act on generation parameters ‒ Only acoustic model manipulation • Strategies ‒ Weighted MLLR transformation ‒ Global Variance model manipulation ‒ Dynamic- vs static-feature weight control in speech generation Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  11. Weighted MLLR transformation Idea: hypo articulation can be obtained by reducing all the normally-articulated vowels to minimally articulated schwa. A CMLLR can be trained to perform this change. Ideally, the “opposite” CMLLR transformation should define a transformation from the standard to the hyper-articulated acoustic space T’1 T’2 T1 T2 Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  12. Weighted MLLR transformation 1. Substituting in each vowel in generation label files with a schwa vowel, because this is the less articulated vowel amongst the others. 2. Generating a small corpus of hypo-articulated speech HTS-Demo Hypo speech examples (about 1100 utterances) speech 3. Training a CMLLR transformation from standard to hypo acoustic model. 4. New observation vectors (spectrum, F0 and duration) o � = Ao + b o: observation vector generated by standard model. A, b: parameters of transformation 5. This transformation can be weighted by using a scalar α. − I: Identity matrix 0: all-zero matrix o = ( α ∗ A + (1 − α ) I ) o + ( α ∗ b + (1 − α ) O ) ˆ 6. Ideally, the “opposite” CMLLR transformation should define a transformation from the standard to the hyper-articulated HTS-Demo Hyper speech acoustic space. speech 7. The inverse transformation has been computed: − ∗ − ∗ − ∗ o = ( α ∗ A + (1 − α ) I ) − 1 ˆ o − ( α ∗ A + (1 − α ) I ) − 1 ( α ∗ b + (1 − α ) O ) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  13. Global Variance model manipulation Idea: to change Global-Variance model parameters either to reduce or to amplify the range of variations in the generated feature vectors. ‒ generation of c vectors with Global Variance term � P ( c | λ , λ ν ) = P ( Wc , Q | λ ) ω P ( ν ( c ) | λ ν ) Toda and Tokuda (2007) all Q ‒ Manipulation of GV model is the manipulation of the variance value range of observation vectors ‒ Scaling factors are used to control the transformation (none for F0) ‒ This allows for a increasing of variance but the mean of observation vector is still leading the feature generation Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  14. Dynamic- vs static-feature weight control Idea: to give more importance to dynamic vs. static features in the speech generation process 1. By increasing (decreasing) the window weights in generation process, among the possible realizations of a phoneme it is chosen the one with the low (high) variations c = ( W T ˆ U − 1 W ) − 1 W T ˆ U − 1 ˆ µ 2. Different weight for each dynamic feature. Transformation defined by [α 1 α 2 α 3 ] vector       . . . . . . . . . . . . . . .       c t α 1 0 α 1 I α 1 0 c t − 1       · · · · · ·       = − α 2 I / 2 − α 2 I / 2 ∆ c t α 2 0 c t       · · · · · ·       ∆ 2 c t − 2 α 3 I α 3 I α 3 I c t +1       · · · · · ·       . . . . . . . . . . . . . . . � �� � � �� � � �� � = o W c 3. α 1 usually set to 1 for F0 (pitch shifting) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  15. Dynamic- vs static-feature weight control F1 0.1 0.1 0.1 0.463141502 0.463141502 0.463141502 1000 1000 1000 α 1 =1 α 2 =0.2 α 3 =0.2 Formant frequency (Hz) Formant frequency (Hz) Formant frequency (Hz) α 1 =1 α 2 =1 α 3 =1 α 1 =1 α 2 =10 α 3 =10 ae l ax s 0 0 0 0.1 0.1 0.1 0.4631 0.4631 0.4631 Time (s) Time (s) Time (s) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  16. Audio examples Hyper-articulated speech Hypo-articulated speech HTS-Demo speech Vowel Reduction GV weight Dynamic control Dynamic + reduction Dynamic + reduction in noise GUI Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  17. Outline a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) First realizations: a) TTS prototype with extended Speech Intelligibility Index (SII) feedback b) TTS prototype with control on speech quality (towards H&H) d) Next steps Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  18. Next steps … • Add articulatory constraints • Find new parameters to control feature generation • Complete the control feedback by: ‒ defining an optimization function ‒ adding recognition function ‒ real-time reactions • Investigate formant synthesiser as possible vocoder • Add more generalization in the parameter generation process: ‒ Multiple phonetization activated by same word ‒ Bayesan synthesiser (ref. Zen, H.) Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

  19. Thank you Mauro Nicolao A study of hypo- and hyper-articulated synthesised speech Aachen, February 15, 2011

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend