Hidden Markov ov Model (HMM) based S Speech Synthesis using ing - - PowerPoint PPT Presentation

hidden markov ov model hmm based s speech synthesis using
SMART_READER_LITE
LIVE PREVIEW

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing - - PowerPoint PPT Presentation

Hidden Markov ov Model (HMM) based S Speech Synthesis using ing HTS Toolkit. Presenter: Omer Nawaz Research Officer (III) Speech Synthesis Overvie rview: Text to be Synthesized Natural Language Processing (NLP) (NLP) Speech


slide-1
SLIDE 1

Hidden Markov (HMM) based S Synthesis using

Presenter: Omer Nawaz Research Officer (III)

  • v Model

Speech ing HTS Toolkit.

slide-2
SLIDE 2

Speech Synthesis Overvie

Text to be Synthesized Natural Language Processing (NLP) (NLP)

rview:

2

Speech Synthesis Engine Synthesized Speech

slide-3
SLIDE 3

Introduction:

Rule-based, formant synthesis

Hand-crafting each phonetic units by rule

CORPUS-BASED: Concatenative synthesis

High quality speech can be synthesized concatenation algorithms. concatenation algorithms. To obtain various voices, a large amoun is necessary.

Statistical parametric synthesis

Generate speech parameters from stat Voice quality can easily be changed by HMM parameters. rules ized using waveform

  • unt of speech data

statistical models by transforming

3

slide-4
SLIDE 4

Approaches at CLE:

CORPUS-BASED:

Unit Selection HMM based.

Comparison of two Approaches:

Unit Unit Unit Unit Selection Selection Selection Selection Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages: Advantages:

High Quality at Waveform level (Specific Domain)

  • Sma
  • Smo
  • Stab

Disadvantages: Disadvantages: Disadvantages: Disadvantages:

  • Large footprints
  • Discontinuous
  • Unstable quality

Vocode (Domai

HMM based HMM based HMM based HMM based

4

Small Foot Print Smooth Stable Quality coder sound main-independent)

slide-5
SLIDE 5

Synthesis Model:

Linear time-invariant system

) (n e

Excitation Pulse train

Source Source Source Source excitation part Vocal tract Vocal tract Vocal tract Vocal tract

  • urce Filter Model:

system

) (n h ) (n e

White noise

The h(n) is defined by the state output

mel-cepstrum

ant Speech

tract tract tract tract resonance part

5

) ( * ) ( ) ( n e n h n x =

Speech

put vector of the HMM e.g

slide-6
SLIDE 6

General Overview(HTS):

Extract Spectrum, F0, labels Train Acoustic Models Speech Input Labels Parameter Generation Synthesis Filter Text Input rum, tic Stored Training Part Training Part Training Part Training Part

6

Synthesized Speech Stored Models Synthesis Part Synthesis Part Synthesis Part Synthesis Part

slide-7
SLIDE 7

Challenges:

Generation of the full-context style la Addition of Stress/Syllable Layer. Defining the Question Set. Optimizing the Synthesized Quality. Optimizing the Synthesized Quality. le labels.

7

slide-8
SLIDE 8

Full-Context Label Style:

P A K I S T_D A N

P-A-K T_D

Phoneme sequence

Tri Tri Tri Tri-

  • phone context dependen

phone context dependen phone context dependen phone context dependen

P A K I S T_D A N

Phoneme sequence x^P x^P x^P x^P-

  • A

A A A+K +K +K +K= = = =I@x_x I@x_x I@x_x I@x_x/A … /A … /A … /A …

S^T_D S^T_D S^T_D S^T_D-

  • Full

Full Full Full-

  • context style

context style context style context style context depe context depe context depe context depe

A N

_D-A-N

8

ndent model ndent model ndent model ndent model

A N

  • A

A A A+N= +N= +N= +N=x@x_x x@x_x x@x_x x@x_x/A … /A … /A … /A …

ependent model ependent model ependent model ependent model

slide-9
SLIDE 9

Full-Context Format:

x^x-SIL+A=L@1_0/A:0_0_0/B:0-0-0@1-0& x^SIL-A+L=I_I@1_1/A:0_0_0/B:0-0-1@1-2& SIL^A-L+I_I=A@1_2/A:0_0_1/B:0-0-2@2-1 A^L-I_I+A=P@2_1/A:0_0_1/B:0-0-2@2-1& &1-1#1-1$1-1!0-0;0- … &1-9#1-3$1-1!0-2;0- … 1&2-8#1-3$1-1!0-1;0-0 … &2-8#1-3$1-1!0-1;0- …

9 9

۔۔۔ ا

slide-10
SLIDE 10

Full-Context Format:

SIL^A-L+I_I=A@ 1_2/A:0_0_1/B:0-0-2@2-1& 0-0|I_I/C:1+0+2/D:0_0/E:co 4#0+1/F:content_2/G:0_0/ /I:8=6/J:17+11-2

Supr Segmental Context

Segmental Segmental Segmental Segmental

  • Current Phoneme
  • Previous two Phonemes
  • Next two Phonemes
  • Syl
  • Str
  • Wo
  • Ph
  • PO

1&2-8#1-3$1-1!0-1; /E:content+2@1+5&1+ _0/H:9=5^1=2|NONE

Supra-Segmental Context

10 10

Context

Supra Supra Supra Supra-

  • Segmental

Segmental Segmental Segmental Syllable Stress Word Phrase POS

slide-11
SLIDE 11

teps to Generate Full-Conte

Extract Segmental & Word Layer Apply Stress & Syllabification Rules extGrid File Rules Align Syllable Boundaries with Segmental Layer Generate new TextGrid File with Additional Layer

  • ntext Labels:

11 11

Convert to Full- Context format ew with ayers

slide-12
SLIDE 12

TextGrid Format:

12 12

slide-13
SLIDE 13

teps to Generate Full-Conte

Extract Segmental & Word Layer Apply Stress & Syllabification Rules extGrid File Rules Align Syllable Boundaries with Segmental Layer Generate new TextGrid File with Additional Layer

  • ntext Labels:

13 13

Convert to Full- Context format ew with ayers

slide-14
SLIDE 14

extGrid Format with Add Additional Layers:

14 14

slide-15
SLIDE 15

Context Clustering (Quest

Number of possible combinations are these 53 53 53 53 different contexts. With only Segmental Context Possible 665 ≈ 1252 mil If we consider all the context, it will b If we consider all the context, it will b Solution: Solution: Solution: Solution: Record data having maximum phonem

  • r di-phone level.

Apply context clustering technique to acoustically similar models

uestion Set) 1/2:

s are quite enormous with sible models are: million ill be practically infinite.

15 15

ill be practically infinite.

  • neme coverage at tri-phone

e to classify and share

slide-16
SLIDE 16

Context Clustering (Quest

Phoneme

{preceding, current, succeeding} phone

Stress/Syllable/Word/

# of phonemes at {preceding, current, s

# of phonemes at {preceding, current, s

stress of {preceding, current, succeedin Position of current syllable in current w # of syllables {from previous, to next} st Vowel within current syllable # of syllables in {preceding, current, suc

uestion Set) 2/2:

  • nemes

nt, succeeding} syllable

16 16

nt, succeeding} syllable eding} syllable nt word stressed syllable , succeeding} word

slide-17
SLIDE 17

Some Synthesized Examp

Seen Context Seen Context Seen Context Seen Context: : : : Un Un Un Un-

  • seen

seen seen seen Context Context Context Context: : : : Different Carrier Word: Different Carrier Word: Different Carrier Word: Different Carrier Word:

mples:

Training Set: Training Set: Training Set: Training Set:

17 17

slide-18
SLIDE 18

Questio Questio stions ?

18 18

stions ?