Text-to-Speech Synthesis Bernd Mbius Language Science and - - PowerPoint PPT Presentation

text to speech synthesis
SMART_READER_LITE
LIVE PREVIEW

Text-to-Speech Synthesis Bernd Mbius Language Science and - - PowerPoint PPT Presentation

Text-to-Speech Synthesis Bernd Mbius Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis B Mbius Formant synthesis 1 l Formant synthesis acoustic-parametric synthesis method modeling


slide-1
SLIDE 1

B Möbius Formant synthesis

Text-to-Speech Synthesis

Bernd Möbius

Language Science and Technology Saarland University Lecture 3 May 28, 2020 Formant Synthesis

1

slide-2
SLIDE 2

l B Möbius Formant synthesis

Formant synthesis

▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model

2

slide-3
SLIDE 3

Source-filter model of speech production

slide-4
SLIDE 4

l B Möbius Formant synthesis

Source-filter model of speech production

4

slide-5
SLIDE 5

Source-filter model of speech production

Glottal excitation Vocal tract: frequency response Sound spectrum

slide-6
SLIDE 6

l B Möbius Formant synthesis

Vocal tract as acoustic filter

6

▪ Vocal tract geometry, determined by tongue position (and jaw opening and lip protrusion, not shown)

slide-7
SLIDE 7

l B Möbius Formant synthesis

Vocal tract: acoustic tube model

[Clark et al., 2007a, p.241]

7

slide-8
SLIDE 8

l B Möbius Formant synthesis

Idealized simple tube model

▪ acoustic signals evolve as longitudinal waves in vocal tract ▪ 2 physical parameters of acoustic waves ▪ sound pressure p : change of air pressure evoked by sound at place of measurement ▪ sound velocity v : speed of air particles caused by sound event (note: this is not speed of sound c !) ▪ perfect reflexion at sound-hard (lossless) walls of tube ▪ v = 0 at place of reflexion ▪ (lossy) reflexion at sound-soft transition from vocal tract to free acoustic field (i.e. from lips to air) ▪ p = 0 at place of radiation

8

slide-9
SLIDE 9

l B Möbius Formant synthesis

Sound pressure waves in vocal tract

[Hess, ms.]

9

p=0 v=0

p=0 v=0

slide-10
SLIDE 10

l B Möbius Formant synthesis

Computing formant frequencies

▪ resonance frequencies of neutral vocal tract computed as speed of sound divided by wave length: f i = c / λ i ▪ frequencies of resonances/formants: F1 = 340 / (4 * 0.17) = 340 / 0.68 = 500 Hz F2 = 340 / (4/3 * 0.17) = 3 * 340 / (4 * 0.17) = 1500 Hz F3 = 340 / (4/5 * 0.17) = 5 * 340 / (4 * 0.17) = 2500 Hz ▪ distribution of formant frequencies in neutral vocal tract corresponds to formants of central vowel 'schwa' [ǝ] ▪ simple tube model, with constant cross-section, is inadequate for computing formants of other vowels (cf. acoustic theory of vowel articulation [Ungeheuer 1962])

10

slide-11
SLIDE 11

l B Möbius Formant synthesis

Tube model with varying cross-section

[Clark et al., 2007a, p.246]

11

slide-12
SLIDE 12

l B Möbius Formant synthesis

Acoustic theory of vowel articulation

12

slide-13
SLIDE 13

l B Möbius Formant synthesis

Vowels (IPA)

F2 F1

13

slide-14
SLIDE 14

l B Möbius Formant synthesis

Vowels (German, [Pompino-Marschall 1995])

14

slide-15
SLIDE 15

l B Möbius Formant synthesis

Vowels (German, F1/F2/F3 [Möbius 2001a])

15

slide-16
SLIDE 16

l B Möbius Formant synthesis

Cascade vs. parallel resonators

[Allen et al. 1987]

16

slide-17
SLIDE 17

l B Möbius Formant synthesis

Cascade/parallel resonators and voice source

[Allen et al. 1987]

17

slide-18
SLIDE 18

l B Möbius Formant synthesis

Klatt's formant synthesizer

[Klatt 1980]

18

slide-19
SLIDE 19

l B Möbius Formant synthesis

Klatt parameter values

[Allen et al. 1987]

19

slide-20
SLIDE 20

l B Möbius Formant synthesis

IMSkpe: Klatt parameter editor

▪ Klatt parameter editor GUI ▪ interactive tool for doing formant synthesis http://sourceforge.net/projects/imskpe/ https://github.com/imskpe/imskpe/ (Andreas Madsack, IMS, Univ. Stuttgart)

20

slide-21
SLIDE 21

l B Möbius Formant synthesis

Formant synthesis: Summary

▪ acoustic-parametric synthesis method ▪ modeling the acoustic properties of speech sounds ▪ based on ▪ acoustic theory of speech production [Fant 1960] ▪ source-filter model ▪ explicit control of voice source parameters and prosody ▪ fair approximation of formant structure of speech sounds ▪ extensive knowledge acquisition and rule building phases ▪ TTS Systems: Klatt-Talk (MITalk, DECtalk), Delta, Infovox

21

slide-22
SLIDE 22

l B Möbius Formant synthesis

Essential content

Formant synthesis ▪ architecture and functional principle of a formant synthesizer, here: Klatt synthesizer ▪ relationship between a formant synthesizer and the source-filter model of speech production

22