Speech Processing 15-492/18-492 Speech Synthesis Building Voices - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Synthesis Building Voices - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the Prompts Designing the Prompts Recording the Prompts Recording the Prompts Labeling the Utterances Labeling the Utterances


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Synthesis Building Voices

slide-2
SLIDE 2

Building a Voice

  • Designing the Prompts

Designing the Prompts

  • Recording the Prompts

Recording the Prompts

  • Labeling the Utterances

Labeling the Utterances

  • Finding parameters (F0, MCEP)

Finding parameters (F0, MCEP)

  • Building the synthesis voice

Building the synthesis voice

  • Tuning and Testing

Tuning and Testing

slide-3
SLIDE 3

Software Requirements

  • Festival Speech Synthesizer

Festival Speech Synthesizer

  • Free software language independent

Free software language independent synthesizer synthesizer

  • Multiplatform: Windows, Linux, OSX

Multiplatform: Windows, Linux, OSX

  • Used for research and commercial synthesis

Used for research and commercial synthesis

  • Festvox

Festvox

  • Voice building tools

Voice building tools

  • Scripts, instructions, example databases

Scripts, instructions, example databases

  • Used for over 40 different languages

Used for over 40 different languages

slide-4
SLIDE 4

Festival Speech Synthesis

  • After Installation

After Installation

  • festival

festival – –tts tts stuff.txt stuff.txt

  • festival

festival

  • festival> (

festival> (SayText SayText “hello world”) “hello world”)

slide-5
SLIDE 5

Building Synthetic Voices

  • http://festvox.org/bsv

http://festvox.org/bsv

  • Look at section on “Telling the Time”

Look at section on “Telling the Time”

slide-6
SLIDE 6

Automatic Labeling

slide-7
SLIDE 7

Automatic Labeling (bad)

slide-8
SLIDE 8

Parameterization

  • Extract pitch marks from data

Extract pitch marks from data

  • Find voices/unvoiced regions

Find voices/unvoiced regions

  • Add “fake” pitch marks during unvoiced regions

Add “fake” pitch marks during unvoiced regions

  • Extract MFCC pitch synchronously

Extract MFCC pitch synchronously

  • Instead of a fixed frame advance (e.g. 5ms)

Instead of a fixed frame advance (e.g. 5ms)

  • Extract it at each pitch mark

Extract it at each pitch mark

  • Try to capture the spectrum at the pitch period

Try to capture the spectrum at the pitch period

slide-9
SLIDE 9

Pitchmarks

slide-10
SLIDE 10

Building a LDOM synthesizer

  • Build cluster tree on each unit type

Build cluster tree on each unit type

  • Not just on phones

Not just on phones

  • Tag phones with word they come from

Tag phones with word they come from

  • d_limited

d_limited and and d_domain d_domain are treated as different are treated as different

slide-11
SLIDE 11

Tuning and Testing

  • Test it on some real data

Test it on some real data

  • Ensure number/symbol expansions are correct

Ensure number/symbol expansions are correct

  • Prompts should probably be word expanded

Prompts should probably be word expanded

  • Flight US187

Flight US187 -

  • > flight u s one eight seven

> flight u s one eight seven

  • Remove bad prompts

Remove bad prompts

  • Or fix labels

Or fix labels

  • Remember to keep access to the speaker

Remember to keep access to the speaker

  • If you have to update the system, you need the same

If you have to update the system, you need the same speaker available speaker available

slide-12
SLIDE 12

Summary

  • Building a voice

Building a voice

  • Databases design, recording, labeling

Databases design, recording, labeling

  • Parameter extraction and model building

Parameter extraction and model building

  • Limited domain synthesis

Limited domain synthesis

slide-13
SLIDE 13