Speech Processing 15- -492/18 492/18- -492 492 Speech Processing - - PowerPoint PPT Presentation

speech processing 15 492 18 492 18 492 492 speech
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing - - PowerPoint PPT Presentation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody Speech Synthesis Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody


slide-1
SLIDE 1

Speech Processing 15 Speech Processing 15-

  • 492/18

492/18-

  • 492

492

Speech Synthesis Prosody

slide-2
SLIDE 2

Speech Synthesis Speech Synthesis

  • Linguistic Analysis

Linguistic Analysis

  • Pronunciations

Pronunciations

  • Prosody

Prosody

slide-3
SLIDE 3

Prosody Prosody

  • How the phonemes will be said

How the phonemes will be said

  • Four aspects of prosody

Four aspects of prosody

  • Phrasing: where the breaks will be

Phrasing: where the breaks will be

  • Intonation: pitch accents and F0 generation

Intonation: pitch accents and F0 generation

  • Duration: how long the phonemes will be

Duration: how long the phonemes will be

  • Power: energy in signal

Power: energy in signal

slide-4
SLIDE 4

Phrase Breaks Phrase Breaks

  • Need to take a breath

Need to take a breath

  • Need to chunk relevant parts together

Need to chunk relevant parts together

  • Sub

Sub-

  • sentential

sentential

  • Supra

Supra-

  • word

word

  • First approximation

First approximation

  • At punctuation (comma, semicolon, etc.)

At punctuation (comma, semicolon, etc.)

  • Too little

Too little

  • Second approximation

Second approximation

  • At each (or some) of the content/function words

At each (or some) of the content/function words

  • Too much

Too much

slide-5
SLIDE 5

Phrasing Phrasing

  • Punctuation

Punctuation

  • Next week, some inmates released early from

Next week, some inmates released early from the Hampton County jail in Springfield, will be the Hampton County jail in Springfield, will be wearing a wristband that hooks up to a special wearing a wristband that hooks up to a special jack on their home phones. jack on their home phones.

  • Content/function words

Content/function words

  • Next week || some inmates released early ||

Next week || some inmates released early || from the Hampton County jail || in Springfield || from the Hampton County jail || in Springfield || will be wearing || a wristband || that hooks || up will be wearing || a wristband || that hooks || up with a special jack || on their home phones. with a special jack || on their home phones.

slide-6
SLIDE 6

Phrasing Phrasing

  • Bachenko

Bachenko and Fitzpatrick 90 and Fitzpatrick 90

  • Rule driven with punctuation, POS and syntax

Rule driven with punctuation, POS and syntax

  • Balanced phrasing

Balanced phrasing

  • (the boy saw) (the girl in the park)

(the boy saw) (the girl in the park)

  • (the boy in the park) (saw the girl)

(the boy in the park) (saw the girl)

  • Hirschberg and

Hirschberg and Prieto Prieto 94 94

  • CART trees (similar features)

CART trees (similar features)

  • Ostendorf

Ostendorf and and Veilleux Veilleux 94 94

  • Hierarchical statistical model

Hierarchical statistical model

  • Multilevel breaks

Multilevel breaks

slide-7
SLIDE 7

Phrasing (Black and Taylor 97) Phrasing (Black and Taylor 97)

  • Balance length of phrases

Balance length of phrases

  • Predict probability of break with CART (use POS)

Predict probability of break with CART (use POS)

  • Use n

Use n-

  • gram of B/NB to keep balance

gram of B/NB to keep balance

  • Trained on BBC Radio 4 (NPR

Trained on BBC Radio 4 (NPR-

  • like)

like)

  • 31,707 words, 6,346 breaks

31,707 words, 6,346 breaks

  • 91% correct with 6

91% correct with 6-

  • gram

gram

  • Still makes errors

Still makes errors – – especially around especially around “ “I I” ”

slide-8
SLIDE 8

Phrasing Phrasing

  • What is correct?

What is correct?

  • Lots of answers are correct.

Lots of answers are correct.

  • But some are definitely bad.

But some are definitely bad.

  • Ostendorf

Ostendorf and and Vielleux Vielleux 94 94

  • Multiple people read same paragraphs

Multiple people read same paragraphs

  • If your method matches any single person

If your method matches any single person’ ’s s version it is correct. version it is correct.

slide-9
SLIDE 9

Intonation Intonation

  • The fundamental tune

The fundamental tune

  • Accents (highlighting important parts)

Accents (highlighting important parts)

  • F0 generation (the tune itself)

F0 generation (the tune itself)

slide-10
SLIDE 10

Intonation Contour Intonation Contour

slide-11
SLIDE 11

Intonation Information Intonation Information

  • Large pitch range (female)

Large pitch range (female)

  • Authoritative since goes down at the end

Authoritative since goes down at the end

  • News reader

News reader

  • Emphasis for Finance H*

Emphasis for Finance H*

  • Final has a raise

Final has a raise – – more information to more information to come come

  • Female American newsreader from WBUR

Female American newsreader from WBUR

  • (Boston University Public Radio)

(Boston University Public Radio)

slide-12
SLIDE 12

Intonation Examples Intonation Examples

  • Fixed durations, flat F0.

Fixed durations, flat F0.

  • Declining F0

Declining F0

“hat hat” ” accents on stressed syllables accents on stressed syllables

  • accents and end tones

accents and end tones

  • statistically trained

statistically trained

slide-13
SLIDE 13

Intonational Intonational Phonology Phonology

  • Accents and Boundaries

Accents and Boundaries

  • Where are the important changes in F0?

Where are the important changes in F0?

  • Accents on syllables

Accents on syllables

  • Identifies

Identifies “ “important important” ” words words

  It will be RAINY today in Boston

It will be RAINY today in Boston

  It will be rainy TODAY in Boston

It will be rainy TODAY in Boston

  It will BE rainy today IN Boston (strange)

It will BE rainy today IN Boston (strange)

slide-14
SLIDE 14

Where do the accents go? Where do the accents go?

  • On important words

On important words

  • First approximation

First approximation

  • On stressed syllables in content words

On stressed syllables in content words

  It WILL be RAINY TODAY in BOSTON

It WILL be RAINY TODAY in BOSTON

  • About 80% correct on news reader speech

About 80% correct on news reader speech

  • CART training on more features

CART training on more features

  • Content, proper nouns, POS, position in text

Content, proper nouns, POS, position in text

  • (not semantic information)

(not semantic information)

slide-15
SLIDE 15

ToBI ToBI

  • Tones and Break Indices

Tones and Break Indices

  • A labeling for intonation (English)

A labeling for intonation (English)

  • Different accent types

Different accent types

  • H*, !H, L*, L+H*

H*, !H, L*, L+H*

  • Different boundary types

Different boundary types

  • L+L%, L+H%, H+H%,

L+L%, L+H%, H+H%,

slide-16
SLIDE 16

ToBI ToBI examples examples

slide-17
SLIDE 17

F0 Generation F0 Generation

  • Contour from accents (and durations)

Contour from accents (and durations)

  • Piece together shapes of different accents

Piece together shapes of different accents

  • Generated

Generated

  • By rule

By rule

  • Trained from data

Trained from data

slide-18
SLIDE 18

Using real contours Using real contours

  • From a data base of different contours

From a data base of different contours

  • Select most appropriate one

Select most appropriate one

  • Record lots of different intonation examples

Record lots of different intonation examples

  • He DID then KNOW what HAD occurred

He DID then KNOW what HAD occurred

  • TARZAN and JANE raised THEIR heads

TARZAN and JANE raised THEIR heads

  • Label them and select the contours when

Label them and select the contours when you want emphasis you want emphasis

slide-19
SLIDE 19

Emphasis Synthesis Emphasis Synthesis

  • This is a short example

This is a short example

  • THIS is a short example

THIS is a short example

  • This IS a short example

This IS a short example

  • This is A short example

This is A short example

  • This is a SHORT example

This is a SHORT example

  • This is a short EXAMPLE

This is a short EXAMPLE

slide-20
SLIDE 20

Duration Prediction Duration Prediction

  • Each phone needs a duration

Each phone needs a duration

  • Make it 80ms

Make it 80ms

  • Vowels are typically longer than consonants

Vowels are typically longer than consonants

  • Emphasis/accent/stress lengthens them

Emphasis/accent/stress lengthens them

  • Initial and final phones are longer

Initial and final phones are longer

slide-21
SLIDE 21

Prediction Models Prediction Models

  • By rule

By rule

  • Klatt

Klatt rules rules

  • By training (using

By training (using Klatt Klatt features) features)

  • CART / linear regression

CART / linear regression

  • Easy to get reasonable durations

Easy to get reasonable durations

  • Hard to get very good durations

Hard to get very good durations

slide-22
SLIDE 22

Fast and Slow Speech Fast and Slow Speech

  • Speaking fast: not uniformly shorter durations

Speaking fast: not uniformly shorter durations

  • Have less prosodic breaks

Have less prosodic breaks

  • Reduce syllables

Reduce syllables

  • Make consonants shorter

Make consonants shorter

  • Make vowels a little shorter

Make vowels a little shorter

  • Speaking slow: not uniformly longer durations

Speaking slow: not uniformly longer durations

  • Add more prosodic breaks

Add more prosodic breaks

  • Small increases in vowel duration (?)

Small increases in vowel duration (?)

slide-23
SLIDE 23

Summary Summary

  • Prosody

Prosody

  • Phrasing

Phrasing

  • Intonation

Intonation

  Accents + F0 generation

Accents + F0 generation

  • Duration

Duration

  • Power

Power

slide-24
SLIDE 24