Mauro Nicolao
Speech and Hearing Research Group - Department of Computer Science The University of Sheffield
SCALE - Speech Communication with Adaptive Learning 2nd Winter School, Aachen, February 15, 2011
A study of hypo- and hyper-articulated synthesized speech Mauro - - PowerPoint PPT Presentation
A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School,
Speech and Hearing Research Group - Department of Computer Science The University of Sheffield
SCALE - Speech Communication with Adaptive Learning 2nd Winter School, Aachen, February 15, 2011
Mauro Nicolao
a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
‒ to produce an intelligible speech ‒ to satisfy listener’s needs ‒ to transfer a concept form talker’s to listener’s mind
‒ voice intensity increasing ‒ speech rate adjustments ‒ noise rhythm adaptation ‒ signal processing (i.e. Lombard effect) ‒ change of word vocabulary
Lindblom (1990), Lane et al. (2007), Levelt et al. (1999)
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
behaviour
Moore (2007), Casserly and Pisoni (2010)
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
FEEDFORWARD FEEDBACK SII
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
‒ none
‒ Phoneme substitution ‒ MLLR transformation ‒ GV gaussian model manipulation ‒ Dynamic feature weight control
‒ HTS + SAT synthesis ‒ STRAIGHT parameters ‒ GV control
Mauro Nicolao
‒ Manipulate HTS model parameters to shift the speech quality along this line ‒ Act on generation parameters ‒ Only acoustic model manipulation
‒ Weighted MLLR transformation ‒ Global Variance model manipulation ‒ Dynamic- vs static-feature weight control in speech generation
Intelligible but unnatural Muttered but “friendly” Hyper-articulated speech Hypo-articulated speech
HTS-Demo speech
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
Idea: hypo articulation can be obtained by reducing all the normally-articulated vowels to minimally articulated schwa. A CMLLR can be trained to perform this change. Ideally, the “opposite” CMLLR transformation should define a transformation from the standard to the hyper-articulated acoustic space
T1 T2 T’1 T’2
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
schwa vowel, because this is the less articulated vowel amongst the others.
examples (about 1100 utterances)
acoustic model.
a transformation from the standard to the hyper-articulated acoustic space.
HTS-Demo speech Hyper speech
− ∗ − ∗ − ∗
Hypo speech HTS-Demo speech
− ˆ
standard model. A, b: parameters of transformation I: Identity matrix 0: all-zero matrix
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
‒ generation of c vectors with Global Variance term ‒ Manipulation of GV model is the manipulation of the variance value range
‒ Scaling factors are used to control the transformation (none for F0) ‒ This allows for a increasing of variance but the mean of observation vector is still leading the feature generation
Toda and Tokuda (2007)
P(c|λ, λν) =
P(Wc, Q|λ)ωP(ν(c)|λν) Idea: to change Global-Variance model parameters either to reduce or to amplify the range of variations in the generated feature vectors.
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
possible realizations of a phoneme it is chosen the one with the low (high) variations
Idea: to give more importance to dynamic vs. static features in the speech generation process c = (WT ˆ U−1W)−1WT ˆ U−1ˆ µ
. . . ct ∆ct ∆2ct . . .
. . . . . . . . . · · · α10 α1I α10 · · · · · · −α2I/2 α20 −α2I/2 · · · · · · α3I −2α3I α3I · · · . . . . . . . . .
. . . ct−1 ct ct+1 . . .
W c
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
Time (s) Formant frequency (Hz) 0.1 0.4631 1000 0.1 0.463141502 Time (s) Formant frequency (Hz) 0.1 0.4631 1000 0.1 0.463141502 Time (s) Formant frequency (Hz) 0.1 0.4631 1000 0.1 0.463141502
F1
ae l ax s
α1=1 α2=0.2 α3=0.2 α1=1 α2=10 α3=10 α1=1 α2=1 α3=1
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
Hyper-articulated speech Hypo-articulated speech
HTS-Demo speech
Vowel Reduction GV weight Dynamic control Dynamic + reduction Dynamic + reduction in noise
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
GUI
Mauro Nicolao
a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) First realizations: a) TTS prototype with extended Speech Intelligibility Index (SII) feedback b) TTS prototype with control on speech quality (towards H&H) d) Next steps
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
‒ defining an optimization function ‒ adding recognition function ‒ real-time reactions
‒ Multiple phonetization activated by same word ‒ Bayesan synthesiser (ref. Zen, H.)
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech
Mauro Nicolao
Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech