text to speech synthesis
play

Text-to-Speech Synthesis Bernd Mbius Language Science and - PowerPoint PPT Presentation

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling


  1. Text-to-Speech Synthesis Bernd Möbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Möbius Formant synthesis 1

  2. l Formant synthesis ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model B Möbius Formant synthesis 2

  3. Source-filter model of speech production

  4. l Source-filter model of speech production B Möbius Formant synthesis 4

  5. Source-filter model of speech production Glottal excitation Vocal tract: frequency response Sound spectrum

  6. l Vocal tract as acoustic filter ▪ Vocal tract geometry, determined by tongue position (and jaw opening and lip protrusion, not shown) B Möbius Formant synthesis 6

  7. l Vocal tract: acoustic tube model [Clark et al., 2007a, p.241] B Möbius Formant synthesis 7

  8. l Idealized simple tube model ▪ acoustic signals evolve as longitudinal waves in vocal tract ▪ 2 physical parameters of acoustic waves ▪ sound pressure p : change of air pressure evoked by sound at place of measurement ▪ sound velocity v : speed of air particles caused by sound event (note: this is not speed of sound c !) ▪ perfect reflexion at sound-hard (lossless) walls of tube ▪ v = 0 at place of reflexion ▪ (lossy) reflexion at sound-soft transition from vocal tract to free acoustic field (i.e. from lips to air) ▪ p = 0 at place of radiation B Möbius Formant synthesis 8

  9. l Sound pressure waves in vocal tract p=0 p=0 v=0 v=0 [Hess, ms.] B Möbius Formant synthesis 9

  10. l Computing formant frequencies ▪ resonance frequencies of neutral vocal tract computed as speed of sound divided by wave length: f i = c / λ i ▪ frequencies of resonances/formants: F1 = 340 / (4 * 0.17) = 340 / 0.68 = 500 Hz F2 = 340 / (4/3 * 0.17) = 3 * 340 / (4 * 0.17) = 1500 Hz F3 = 340 / (4/5 * 0.17) = 5 * 340 / (4 * 0.17) = 2500 Hz ▪ distribution of formant frequencies in neutral vocal tract corresponds to formants of central vowel 'schwa' [ ǝ ] ▪ simple tube model, with constant cross-section, is inadequate for computing formants of other vowels (cf. acoustic theory of vowel articulation [Ungeheuer 1962] ) B Möbius Formant synthesis 10

  11. l Tube model with varying cross-section [Clark et al., 2007a, p.246] B Möbius Formant synthesis 11

  12. l Acoustic theory of vowel articulation B Möbius Formant synthesis 12

  13. l Vowels (IPA) F2 F1 B Möbius Formant synthesis 13

  14. l Vowels (German, [Pompino-Marschall 1995] ) B Möbius Formant synthesis 14

  15. l Vowels (German, F1/F2/F3 [Möbius 2001a] ) B Möbius Formant synthesis 15

  16. l Cascade vs. parallel resonators [Allen et al. 1987] B Möbius Formant synthesis 16

  17. l Cascade/parallel resonators and voice source [Allen et al. 1987] B Möbius Formant synthesis 17

  18. l Klatt's formant synthesizer [Klatt 1980] B Möbius Formant synthesis 18

  19. l Klatt parameter values [Allen et al. 1987] B Möbius Formant synthesis 19

  20. l IMSkpe: Klatt parameter editor ▪ Klatt parameter editor GUI ▪ interactive tool for doing formant synthesis http://sourceforge.net/projects/imskpe/ https://github.com/imskpe/imskpe/ (Andreas Madsack, IMS, Univ. Stuttgart) B Möbius Formant synthesis 20

  21. l Formant synthesis: Summary ▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model ▪ explicit control of voice source parameters and prosody ▪ fair approximation of formant structure of speech sounds ▪ extensive knowledge acquisition and rule building phases ▪ TTS Systems: Klatt-Talk (MITalk, DECtalk), Delta, Infovox B Möbius Formant synthesis 21

  22. l Essential content Formant synthesis ▪ architecture and functional principle of a formant synthesizer, here: Klatt synthesizer ▪ relationship between a formant synthesizer and the source-filter model of speech production B Möbius Formant synthesis 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend