A study of hypo- and hyper-articulated synthesized speech Mauro - - PowerPoint PPT Presentation

a study of
SMART_READER_LITE
LIVE PREVIEW

A study of hypo- and hyper-articulated synthesized speech Mauro - - PowerPoint PPT Presentation

A study of hypo- and hyper-articulated synthesized speech Mauro Nicolao Speech and Hearing Research Group - Department of Computer Science The University of Sheffield SCALE - Speech Communication with Adaptive Learning 2 nd Winter School,


slide-1
SLIDE 1

Mauro Nicolao

Speech and Hearing Research Group - Department of Computer Science The University of Sheffield

SCALE - Speech Communication with Adaptive Learning 2nd Winter School, Aachen, February 15, 2011

A study of hypo- and hyper-articulated synthesized speech

slide-2
SLIDE 2

Mauro Nicolao

Outline

a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-3
SLIDE 3

Mauro Nicolao

Outline

a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-4
SLIDE 4

Mauro Nicolao

Speech Synthesis by Analysis Project

  • Success in communication:

‒ to produce an intelligible speech ‒ to satisfy listener’s needs ‒ to transfer a concept form talker’s to listener’s mind

  • Modifications of human speech:

‒ voice intensity increasing ‒ speech rate adjustments ‒ noise rhythm adaptation ‒ signal processing (i.e. Lombard effect) ‒ change of word vocabulary

Lindblom (1990), Lane et al. (2007), Levelt et al. (1999)

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-5
SLIDE 5

Mauro Nicolao

Speech Synthesis by Analysis Project

  • Automatic TTS ignore environmental effects on speech and any feedback from listener.
  • Many researchers in different disciplines are investigating model to describe the human

behaviour

  • New way of thinking automatic speech synthesis

Moore (2007), Casserly and Pisoni (2010)

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-6
SLIDE 6

Mauro Nicolao

Outline

a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-7
SLIDE 7

Mauro Nicolao

Complete project architecture

FEEDFORWARD FEEDBACK SII

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-8
SLIDE 8

Mauro Nicolao

Outline

a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) TTS prototype with control on speech quality (towards H&H) a) Weighted MLLR transformation b) Global Variance model manipulation c) Dynamic- vs static-feature weight control in speech generation d) Next steps

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-9
SLIDE 9

Mauro Nicolao

TTS prototype with control on speech quality

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

  • Control function:

‒ none

  • Control actions:

‒ Phoneme substitution ‒ MLLR transformation ‒ GV gaussian model manipulation ‒ Dynamic feature weight control

  • Synthesis:

‒ HTS + SAT synthesis ‒ STRAIGHT parameters ‒ GV control

slide-10
SLIDE 10

Mauro Nicolao

TTS prototype with control on speech quality

  • Aim:

‒ Manipulate HTS model parameters to shift the speech quality along this line ‒ Act on generation parameters ‒ Only acoustic model manipulation

  • Strategies

‒ Weighted MLLR transformation ‒ Global Variance model manipulation ‒ Dynamic- vs static-feature weight control in speech generation

Intelligible but unnatural Muttered but “friendly” Hyper-articulated speech Hypo-articulated speech

HTS-Demo speech

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-11
SLIDE 11

Mauro Nicolao

Weighted MLLR transformation

Idea: hypo articulation can be obtained by reducing all the normally-articulated vowels to minimally articulated schwa. A CMLLR can be trained to perform this change. Ideally, the “opposite” CMLLR transformation should define a transformation from the standard to the hyper-articulated acoustic space

T1 T2 T’1 T’2

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-12
SLIDE 12

Mauro Nicolao

Weighted MLLR transformation

  • 1. Substituting in each vowel in generation label files with a

schwa vowel, because this is the less articulated vowel amongst the others.

  • 2. Generating a small corpus of hypo-articulated speech

examples (about 1100 utterances)

  • 3. Training a CMLLR transformation from standard to hypo

acoustic model.

  • 4. New observation vectors (spectrum, F0 and duration)
  • 5. This transformation can be weighted by using a scalar α.
  • 6. Ideally, the “opposite” CMLLR transformation should define

a transformation from the standard to the hyper-articulated acoustic space.

  • 7. The inverse transformation has been computed:

HTS-Demo speech Hyper speech

− ∗ − ∗ − ∗

  • = (α ∗ A + (1 − α)I)−1ˆ
  • − (α ∗ A + (1 − α)I)−1(α ∗ b + (1 − α)O)

Hypo speech HTS-Demo speech

  • = Ao + b

− ˆ

  • = (α ∗ A + (1 − α)I)o + (α ∗ b + (1 − α)O)
  • : observation vector generated by

standard model. A, b: parameters of transformation I: Identity matrix 0: all-zero matrix

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-13
SLIDE 13

Mauro Nicolao

Global Variance model manipulation

‒ generation of c vectors with Global Variance term ‒ Manipulation of GV model is the manipulation of the variance value range

  • f observation vectors

‒ Scaling factors are used to control the transformation (none for F0) ‒ This allows for a increasing of variance but the mean of observation vector is still leading the feature generation

Toda and Tokuda (2007)

P(c|λ, λν) =

  • all Q

P(Wc, Q|λ)ωP(ν(c)|λν) Idea: to change Global-Variance model parameters either to reduce or to amplify the range of variations in the generated feature vectors.

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-14
SLIDE 14

Mauro Nicolao

Dynamic- vs static-feature weight control

  • 1. By increasing (decreasing) the window weights in generation process, among the

possible realizations of a phoneme it is chosen the one with the low (high) variations

  • 2. Different weight for each dynamic feature. Transformation defined by [α1 α2 α3] vector
  • 3. α1 usually set to 1 for F0 (pitch shifting)

Idea: to give more importance to dynamic vs. static features in the speech generation process c = (WT ˆ U−1W)−1WT ˆ U−1ˆ µ

        . . . ct ∆ct ∆2ct . . .        

  • =

        . . . . . . . . . · · · α10 α1I α10 · · · · · · −α2I/2 α20 −α2I/2 · · · · · · α3I −2α3I α3I · · · . . . . . . . . .        

       . . . ct−1 ct ct+1 . . .        

  • =

W c

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-15
SLIDE 15

Mauro Nicolao

Dynamic- vs static-feature weight control

Time (s) Formant frequency (Hz) 0.1 0.4631 1000 0.1 0.463141502 Time (s) Formant frequency (Hz) 0.1 0.4631 1000 0.1 0.463141502 Time (s) Formant frequency (Hz) 0.1 0.4631 1000 0.1 0.463141502

F1

ae l ax s

α1=1 α2=0.2 α3=0.2 α1=1 α2=10 α3=10 α1=1 α2=1 α3=1

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-16
SLIDE 16

Mauro Nicolao

Audio examples

Hyper-articulated speech Hypo-articulated speech

HTS-Demo speech

Vowel Reduction GV weight Dynamic control Dynamic + reduction Dynamic + reduction in noise

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

GUI

slide-17
SLIDE 17

Mauro Nicolao

Outline

a) The “Speech Synthesis by Analysis” project b) Complete project architecture c) First realizations: a) TTS prototype with extended Speech Intelligibility Index (SII) feedback b) TTS prototype with control on speech quality (towards H&H) d) Next steps

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-18
SLIDE 18

Mauro Nicolao

Next steps …

  • Add articulatory constraints
  • Find new parameters to control feature generation
  • Complete the control feedback by:

‒ defining an optimization function ‒ adding recognition function ‒ real-time reactions

  • Investigate formant synthesiser as possible vocoder
  • Add more generalization in the parameter generation process:

‒ Multiple phonetization activated by same word ‒ Bayesan synthesiser (ref. Zen, H.)

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech

slide-19
SLIDE 19

Mauro Nicolao

Thank you

Aachen, February 15, 2011 A study of hypo- and hyper-articulated synthesised speech