Speech Processing 15-492/18-492 Speech Synthesis Signal Processing - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal Parameterization Signal Parameterization Joining Joining LPC LPC PSOLA: pitch and duration modification PSOLA: pitch


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Synthesis Signal Processing

slide-2
SLIDE 2

Signal Manipulation

  • Signal Parameterization

Signal Parameterization

  • Joining

Joining

  • LPC

LPC

  • PSOLA: pitch and duration modification

PSOLA: pitch and duration modification

  • Statistical Parameterization

Statistical Parameterization

  • MELCEP/MLSA

MELCEP/MLSA

  • LSF, STRAIGHT, HNM, HSM

LSF, STRAIGHT, HNM, HSM

slide-3
SLIDE 3

TTS Signal Processing

  • Join together pieces of speech

Join together pieces of speech

  • Prosodic modification

Prosodic modification

  • Pitch (F0)

Pitch (F0)

  • Duration

Duration

  • Power

Power

  • Change spectral properties

Change spectral properties

  • Stress/

Stress/unstress unstress

  • Spectral tilt

Spectral tilt

  • Speaking style

Speaking style

slide-4
SLIDE 4

Joining

  • Just put them together

Just put them together

  • Gets clicks at join points

Gets clicks at join points

  • Join them at zero crossings

Join them at zero crossings

  • Window them and overlap them

Window them and overlap them

  • WSOLA

WSOLA

  • Join them at pitch periods

Join them at pitch periods

slide-5
SLIDE 5

Prosodic Modification

  • Modify pitch and duration

Modify pitch and duration independently independently

  • Changing sample rate changes both

Changing sample rate changes both

  • “chipmunk” style speech

“chipmunk” style speech

  • Duration

Duration

  • Duplicate/delete parts of the signal

Duplicate/delete parts of the signal

  • Pitch

Pitch

  • “resample” to change pitch

“resample” to change pitch

slide-6
SLIDE 6

Speech and Short Term Signals

slide-7
SLIDE 7

Duration Modification

slide-8
SLIDE 8

Pitch Modification

slide-9
SLIDE 9

Modify pitch and duration

  • Find ideal pitch periods and duration

Find ideal pitch periods and duration

  • Find closest actual periods from units

Find closest actual periods from units

  • End with

End with

  • Pitch period (short term signals)

Pitch period (short term signals)

  • Distances between them

Distances between them

slide-10
SLIDE 10

Signal Reconstruction

  • TD

TD-

  • PSOLA™

PSOLA™

  • Time domain pitch synchronous overlap and add

Time domain pitch synchronous overlap and add

  • Patented by France Telecom

Patented by France Telecom

  • Expired 2004

Expired 2004

  • Very efficient:

Very efficient:

  • No FFT (or inverse FFT)

No FFT (or inverse FFT)

  • Can modify Hz * 2.0 (or 0.5)

Can modify Hz * 2.0 (or 0.5)

  • The reason no one publishes algorithms

The reason no one publishes algorithms

  • The (partial) reason unit selection typically doesn’t

The (partial) reason unit selection typically doesn’t do pitch/duration modification do pitch/duration modification

slide-11
SLIDE 11

LPC: Linear predictive coding

  • Linear predictive coding

– Predict next sample point from previous – Weighted sum of previous points – Filter of order p. – Residual excited LPC

slide-12
SLIDE 12

LPC

  • Works well but can be

Works well but can be buzzy buzzy

  • Can be very compact

Can be very compact

  • Can be pitch synchronous

Can be pitch synchronous

  • Excited

Excited

  • Pulse

Pulse

  • Triangular pulse

Triangular pulse

  • Multi

Multi-

  • pulse

pulse

  • Full residual

Full residual

  • Used in standard speech coding

Used in standard speech coding

  • LPC10: 2.4kps

LPC10: 2.4kps

  • CELP: codebook excited LPC

CELP: codebook excited LPC

slide-13
SLIDE 13

Other Parametric Representations

  • Typically split spectral and residual

Typically split spectral and residual

  • MBROLA:

MBROLA:

  • Multi

Multi-

  • band overlap and add

band overlap and add

  • HNM/HSM:

HNM/HSM:

  • Harmonic plus (noise/stochastic) modeling

Harmonic plus (noise/stochastic) modeling

  • STRAIGHT

STRAIGHT

  • MELCEP/MLSA

MELCEP/MLSA

  • Often used in HMM synthesis

Often used in HMM synthesis

  • Sinusoidal (HARMONIC)

Sinusoidal (HARMONIC)

  • Wavelet

Wavelet

  • LSF/LPC

LSF/LPC

slide-14
SLIDE 14

Choosing the right unit type

  • Diphones

Diphones

  • Phone

Phone-

  • phone

phone

  • Joins at stable portions, not transitions

Joins at stable portions, not transitions

  • Half phone (AT&T Natural Voices)

Half phone (AT&T Natural Voices)

  • Hybrid systems (

Hybrid systems (Hadifix Hadifix – – Bonn systems) Bonn systems)

  • Other selection systems:

Other selection systems:

  • Syllable, phone, HMM state

Syllable, phone, HMM state

  • Even frame level

Even frame level

slide-15
SLIDE 15

Acoustically Derived Units

  • E.g

E.g Bacchiani Bacchiani 99 or Rita Singh CMU 99 or Rita Singh CMU

  • From some waveforms

From some waveforms

  • Find N most diverse unit types

Find N most diverse unit types

  • Varied in length

Varied in length

  • Still need to map letters to units

Still need to map letters to units

slide-16
SLIDE 16

Acoustic Phonetic Clustering

  • Parameterize database

Parameterize database

  • Melcep

Melcep plus power plus power

  • K

K-

  • means

means

  • Euclidean distance measure

Euclidean distance measure

  • 100 clusters

100 clusters

  • Label DB with best cluster

Label DB with best cluster

  • Build

Build clunits clunits synthesizer synthesizer

  • Can’t predict APC cluster directly

Can’t predict APC cluster directly

  • Use held out data for testing

Use held out data for testing

slide-17
SLIDE 17

Acoustic Phonetic Clustering

slide-18
SLIDE 18

Grapheme Based Synthesis

  • Synthesis without a phoneme set

Synthesis without a phoneme set

  • Use the letters as phonemes

Use the letters as phonemes

  • (“

(“alan alan” nil (a l a n)) ” nil (a l a n))

  • (“black” nil ( b l a c k ))

(“black” nil ( b l a c k ))

  • Spanish (easier ?)

Spanish (easier ?)

  • 419 utterances

419 utterances

  • HMM training to label databases

HMM training to label databases

  • Simple pronunciation rules

Simple pronunciation rules

  • Polici’a

Polici’a -

  • > p o l i c i’ a

> p o l i c i’ a

  • Cuatro

Cuatro -

  • > c u a t r o

> c u a t r o

slide-19
SLIDE 19

Spanish Grapheme Synthesis

slide-20
SLIDE 20

English Grapheme Synthesis

  • Use Letters are phones

Use Letters are phones

  • 26

26 “ “phonemes phonemes” ”

  • (

( “ “alan alan” ” n (a l a n)) n (a l a n))

  • (

( “ “black black” ” n (b l a c k)) n (b l a c k))

  • Build HMM acoustic models for labeling

Build HMM acoustic models for labeling

  • For English

For English

“This is a pen This is a pen” ”

“We went to the church at Christmas We went to the church at Christmas” ”

  • Festival intro

Festival intro

“do eight meat do eight meat” ”

  • Requires method to fix errors

Requires method to fix errors

  • Letter to letter mapping

Letter to letter mapping

slide-21
SLIDE 21

Signal Processing for TTS

  • Pitch and duration modification

Pitch and duration modification

  • LPC

LPC

  • Finding the right unit type

Finding the right unit type

  • Grapheme

Grapheme-

  • based Synthesis

based Synthesis

slide-22
SLIDE 22
slide-23
SLIDE 23

HW1: TTS

  • Due 3:30pm Friday October 2

Due 3:30pm Friday October 2nd

nd

  • Install Festival and

Install Festival and Festvox Festvox

  • Find 10 errors in each of two different

Find 10 errors in each of two different synthesizers synthesizers

  • Build a voice

Build a voice

  • A Talking Clock

A Talking Clock

  • A general voice

A general voice

  • (or both)

(or both)

slide-24
SLIDE 24