Text-to-Speech Synthesis Bernd Mbius Language Science and - - PowerPoint PPT Presentation

▶

Oct 11, 2023 336 likes •577 views

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling

SLIDE 1

B Möbius Formant synthesis

Text-to-Speech Synthesis

Bernd Möbius

Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis

SLIDE 2

l B Möbius Formant synthesis

Formant synthesis

▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model

SLIDE 3

Source-filter model of speech production

SLIDE 4

l B Möbius Formant synthesis

Source-filter model of speech production

SLIDE 5

Source-filter model of speech production

Glottal excitation Vocal tract: frequency response Sound spectrum

SLIDE 6

l B Möbius Formant synthesis

Vocal tract as acoustic filter

▪ Vocal tract geometry, determined by tongue position (and jaw opening and lip protrusion, not shown)

SLIDE 7

l B Möbius Formant synthesis

Vocal tract: acoustic tube model

[Clark et al., 2007a, p.241]

SLIDE 8

l B Möbius Formant synthesis

Idealized simple tube model

▪ acoustic signals evolve as longitudinal waves in vocal tract ▪ 2 physical parameters of acoustic waves ▪ sound pressure p : change of air pressure evoked by sound at place of measurement ▪ sound velocity v : speed of air particles caused by sound event (note: this is not speed of sound c !) ▪ perfect reflexion at sound-hard (lossless) walls of tube ▪ v = 0 at place of reflexion ▪ (lossy) reflexion at sound-soft transition from vocal tract to free acoustic field (i.e. from lips to air) ▪ p = 0 at place of radiation

SLIDE 9

l B Möbius Formant synthesis

Sound pressure waves in vocal tract

[Hess, ms.]

p=0 v=0

SLIDE 10

l B Möbius Formant synthesis

Computing formant frequencies

▪ resonance frequencies of neutral vocal tract computed as speed of sound divided by wave length: f i = c / λ i ▪ frequencies of resonances/formants: F1 = 340 / (4 * 0.17) = 340 / 0.68 = 500 Hz F2 = 340 / (4/3 * 0.17) = 3 * 340 / (4 * 0.17) = 1500 Hz F3 = 340 / (4/5 * 0.17) = 5 * 340 / (4 * 0.17) = 2500 Hz ▪ distribution of formant frequencies in neutral vocal tract corresponds to formants of central vowel 'schwa' [ǝ] ▪ simple tube model, with constant cross-section, is inadequate for computing formants of other vowels (cf. acoustic theory of vowel articulation [Ungeheuer 1962])

SLIDE 11

l B Möbius Formant synthesis

Tube model with varying cross-section

[Clark et al., 2007a, p.246]

SLIDE 12

l B Möbius Formant synthesis

Acoustic theory of vowel articulation

SLIDE 13

l B Möbius Formant synthesis

Vowels (IPA)

F2 F1

SLIDE 14

l B Möbius Formant synthesis

Vowels (German, [Pompino-Marschall 1995])

SLIDE 15

l B Möbius Formant synthesis

Vowels (German, F1/F2/F3 [Möbius 2001a])

SLIDE 16

l B Möbius Formant synthesis

Cascade vs. parallel resonators

[Allen et al. 1987]

SLIDE 17

l B Möbius Formant synthesis

Cascade/parallel resonators and voice source

[Allen et al. 1987]

SLIDE 18

l B Möbius Formant synthesis

Klatt's formant synthesizer

[Klatt 1980]

SLIDE 19

l B Möbius Formant synthesis

Klatt parameter values

[Allen et al. 1987]

SLIDE 20

l B Möbius Formant synthesis

IMSkpe: Klatt parameter editor

▪ Klatt parameter editor GUI ▪ interactive tool for doing formant synthesis http://sourceforge.net/projects/imskpe/ https://github.com/imskpe/imskpe/ (Andreas Madsack, IMS, Univ. Stuttgart)

SLIDE 21

l B Möbius Formant synthesis

Formant synthesis: Summary

▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model ▪ explicit control of voice source parameters and prosody ▪ fair approximation of formant structure of speech sounds ▪ extensive knowledge acquisition and rule building phases ▪ TTS Systems: Klatt-Talk (MITalk, DECtalk), Delta, Infovox

SLIDE 22

l B Möbius Formant synthesis

Essential content

Formant synthesis ▪ architecture and functional principle of a formant synthesizer, here: Klatt synthesizer ▪ relationship between a formant synthesizer and the source-filter model of speech production