Speech Processing 15-492/18-492 Speech Synthesis Building Voices - - PowerPoint PPT Presentation

▶

Apr 09, 2024 218 likes •357 views

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the Prompts Designing the Prompts Recording the Prompts Recording the Prompts Labeling the Utterances Labeling the Utterances

SLIDE 1

Speech Processing 15-492/18-492

Speech Synthesis Building Voices

SLIDE 2

Building a Voice

Designing the Prompts

Designing the Prompts

Recording the Prompts

Recording the Prompts

Labeling the Utterances

Labeling the Utterances

Finding parameters (F0, MCEP)

Finding parameters (F0, MCEP)

Building the synthesis voice

Building the synthesis voice

Tuning and Testing

Tuning and Testing

SLIDE 3

Software Requirements

Festival Speech Synthesizer

Festival Speech Synthesizer

Free software language independent

Free software language independent synthesizer synthesizer

Multiplatform: Windows, Linux, OSX

Multiplatform: Windows, Linux, OSX

Used for research and commercial synthesis

Used for research and commercial synthesis

Festvox

Festvox

Voice building tools

Voice building tools

Scripts, instructions, example databases

Scripts, instructions, example databases

Used for over 40 different languages

Used for over 40 different languages

SLIDE 4

Festival Speech Synthesis

After Installation

After Installation

festival

festival – –tts tts stuff.txt stuff.txt

festival

festival

festival> (

festival> (SayText SayText “hello world”) “hello world”)

SLIDE 5

Building Synthetic Voices

http://festvox.org/bsv

http://festvox.org/bsv

Look at section on “Telling the Time”

Look at section on “Telling the Time”

SLIDE 6

Automatic Labeling

SLIDE 7

Automatic Labeling (bad)

SLIDE 8

Parameterization

Extract pitch marks from data

Extract pitch marks from data

Find voices/unvoiced regions

Find voices/unvoiced regions

Add “fake” pitch marks during unvoiced regions

Add “fake” pitch marks during unvoiced regions

Extract MFCC pitch synchronously

Extract MFCC pitch synchronously

Instead of a fixed frame advance (e.g. 5ms)

Instead of a fixed frame advance (e.g. 5ms)

Extract it at each pitch mark

Extract it at each pitch mark

Try to capture the spectrum at the pitch period

Try to capture the spectrum at the pitch period

SLIDE 9

Pitchmarks

SLIDE 10

Building a LDOM synthesizer

Build cluster tree on each unit type

Build cluster tree on each unit type

Not just on phones

Not just on phones

Tag phones with word they come from

Tag phones with word they come from

d_limited

d_limited and and d_domain d_domain are treated as different are treated as different

SLIDE 11

Tuning and Testing

Test it on some real data

Test it on some real data

Ensure number/symbol expansions are correct

Ensure number/symbol expansions are correct

Prompts should probably be word expanded

Prompts should probably be word expanded

Flight US187

Flight US187 -

> flight u s one eight seven

> flight u s one eight seven

Remove bad prompts

Remove bad prompts

Or fix labels

Or fix labels

Remember to keep access to the speaker

Remember to keep access to the speaker

If you have to update the system, you need the same

If you have to update the system, you need the same speaker available speaker available

SLIDE 12

Summary

Building a voice

Building a voice

Databases design, recording, labeling

Databases design, recording, labeling

Parameter extraction and model building

Parameter extraction and model building

Limited domain synthesis

Limited domain synthesis

SLIDE 13