Residual-based Excitation with Continuous F0 Modeling in HMM-based - PowerPoint PPT Presentation

Residual-based Excitation with Continuous F0 Modeling in HMM-based Speech Synthesis Tamás Gábor Csapó 1 , Géza Németh 1 , Milos Cernak 2 csapot@tmit.bme.hu 1 Budapest University of Technology and Economics 2 Idiap Research Institute SLSP 2015 Budapest Nov 24, 2015

HMM-TTS Excitation model Evaluation Summary HMM-based speech synthesis 1 Excitation models Effect of creaky voice Proposed residual-based excitation model 2 Analysis Training Synthesis Evaluation 3 Listening test Summary and conclusions 4 2 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Excitation models Effect of creaky voice HMM-based speech synthesis 3 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Excitation models Effect of creaky voice HMM-based speech synthesis State-of-the-art Text-To-Speech (TTS) synthesis technique [Zen et al., 2009] Statistical Generative models with maximum likelihood criterion Hidden Markov-models (HMM) Parametric Excitation and spectral modeling Speech signal is encoded to parameters Parameters suitable for statistical modeling Parameters are decoded to speech 4 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Excitation models Effect of creaky voice Excitation models in HMM-TTS Goal: model human speech production Source-filter separation [Fant, 1960] Excitation model types [Hu et al., 2013] Impulse-noise Mixed excitation Glottal source Harmonic plus noise Sinusoidal Residual-based 5 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Excitation models Effect of creaky voice Effect of creaky voice Creaky voice Irregular vibration of vocal folds Abrupt changes in F0 (fundamental frequency, pitch) and/or amplitudes Perceived as rough voice Up to 15% of vowels of natural speech Effect of creaky voice on HMM-TTS Can cause problems for standard speech analysis methods (e.g. F0 tracking and spectral analysis) Voiced / unvoiced error is learned during training Audible distortions in synthesized sentences 6 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Excitation models Effect of creaky voice Creaky voice sample 0.8 a) regions of creaky voice 0.6 Amplitude 0.4 0.2 0 −0.2 −0.4 300 b) standard F0 tracking Frequency (Hz) 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (s) ’Eggshell is not good to eat.’ (sample) 7 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Excitation models Effect of creaky voice Creaky voice sample 0.8 a) regions of creaky voice 0.6 Amplitude 0.4 0.2 0 −0.2 −0.4 300 b) standard F0 tracking Frequency (Hz) 250 200 150 100 50 0 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Time (s) ’Eggshell is not good to eat.’ (sample) 8 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Proposed residual-based excitation model 9 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Block diagram of analysis 10 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Block diagram of analysis 11 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Analysis: PCA-based residual Inverse filtered residual Pitch synchronous framing Earlier excitation models: Store frames in a codebook Select frames from codebook during synthesis Proposed model: Window and resample frames to fixed length Apply Principal Component Analysis (PCA) Use first PCA component later 12 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Analysis: PCA-based residual Normalized amplitude a) PCA residual for EN-M-AWB 0.5 0.0 0.5 0 50 100 150 200 250 Normalized amplitude b) PCA residual for EN-F-SLT 0.5 0.0 0.5 0 20 40 60 80 100 120 140 160 Time (samples) 13 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Analysis: continuous F0 modeling Traditional F0 trackers F0 is discontinuous, jumps occur at voiced-unvoiced transitions HMMs can model continuous functions efficiently Multi-Space Distribution (MSD) necessary for traditional F0 [Tokuda et al., 2002] Simple continuous pitch tracker ’F0cont’ [Garner et al., 2013] Standard autocorrelation No voiced/unvoiced decision Kalman smoothing-based interpolation Interpolates F0 in regions of creaky voice No need for MSD during training 14 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Analysis: Maximum Voiced Frequency Divide spectrum to two frequency bands Lower frequency band: voiced Higher frequency band: unvoiced Earlier excitation models: Boundary between frequency bands fixed (at 6 kHz) Proposed excitation model: Boundary between frequency bands varying Maximum Voiced Frequency (MVF) [Drugman and Stylianou, 2014] 15 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Training with proposed model Parameters calculated for each 25 ms frame MGC: Mel-Generalized Cepstrum F0cont: continuous pitch track MVF: Maximum Voiced Frequency Decision tree-based context clustering and Context dependent labeling [Zen et al., 2007] Independent decision trees for all the parameters and duration using a maximum likelihood criterion 16 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Block diagram of synthesis 17 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Block diagram of synthesis 18 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Analysis Training Synthesis Synthesis features PCA residual overlap-added according to F0cont Voiced and unvoiced excitation component added together according to MVF MVF models voicing for unvoiced sounds, the MVF is low (around 1 kHz) for voiced sounds, the MVF is high (above 4 kHz) for mixed excitation sounds, the MVF is in between (e.g. for voiced fricatives, MVF is around 2-3 kHz) Spectral filtering according to MGC 19 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Listening test Evaluation 20 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

HMM-TTS Excitation model Evaluation Summary Listening test Data Two English speakers from CMU-ARCTIC database [Kominek and Black, 2003] EN-M-AWB (Scottish English, male) EN-F-SLT (American English, female) Both produced irregular phonation frequently, mostly at the end of sentences 16 kHz sampling 1132 sentences from each speaker, single speaker training Text processing using the Festival TTS front-end (e.g. phonetic transcription, labeling, etc.) 21 / 30 Tamás Gábor Csapó, Géza Németh, Milos Cernak Residual-based Excitation with Continuous F0 in HMM-TTS

Residual-based Excitation with Continuous F0 Modeling in HMM-based - PowerPoint PPT Presentation

Residual-based Excitation with Continuous F0 Modeling in HMM-based Speech Synthesis Tams Gbor Csap 1 , Gza Nmeth 1 , Milos Cernak 2 csapot@tmit.bme.hu 1 Budapest University of Technology and Economics 2 Idiap Research Institute SLSP

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

1 Excitatory Inhibitory excitation inhibition 2 excitation excitation inhibition

NEW MULTI-PROCESSOR DIGITAL EXCITATION SYSTEM ECS2100 OVERVIEW Digital Excitation System Logic

Lower and upper estimates on the excitation threshold for DNLS lattices J. Cuevas, N.I.

67,69,71,73 Cu Coulomb excitation of the neutron- -rich Cu Coulomb excitation of the neutron

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Working Principle of a Semiconductor Based Solar Cell Excitation of Charge Carriers II Week

Residual Networks (ResNet) Residual Networks (ResNet) In [1]: import d2l from mxnet import gluon,

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1 Residual

Residual modular Galois representations and their images Samuele Anni University of Warwick

SESSION 8: VALUING RESIDUAL CLAIMS (EQUITY) Valuing Equity Equity represents a residual

Lecture 9: Residual Analysis Instructor: Prof. Shuai Huang Industrial and Systems Engineering

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2018 1 Residual

RADIOLOGICAL ASSESSMENT OF AN AREA WITH URANIUM RESIDUAL MATERIAL Danyl Prez-Snchez

CP4: Fitting and Bootstrapping GLMs for Incremental Development Triangles Thomas Hartl, PwC LLP

Illinois Competitive Energy Association IPA Workshop on Full Requirements June 5, 2014 Fixed

European Gas Target Model review and update Annex 3 Calculation Specification for Wholesale

Additive Manufacturing Fundamental Concepts DAU Lunch & Learn 02.07.2018 Tuke Klemmt, MLS,

First Quarter 2020 Earnings Presentation May 6, 2020 Safe Harbor Statement This presentation

Squeeze-and-Excitation Networks Jie Hu 1,* Li Shen 2,* Gang Sun 1 2 Department of Engineering

TRAPPED BETWEEN TRAPPED BETWEEN ADMINISTRATIVE DETENTION, ADMINISTRATIVE DETENTION, PRISON AND

Residual-based Excitation with Continuous F0 Modeling in HMM-based - PowerPoint PPT Presentation

Residual-based Excitation with Continuous F0 Modeling in HMM-based Speech Synthesis Tams Gbor Csap 1 , Gza Nmeth 1 , Milos Cernak 2 csapot@tmit.bme.hu 1 Budapest University of Technology and Economics 2 Idiap Research Institute SLSP

Pipeline Strategies and conversations behind securing a Residual Bequest Agenda 1. Why Residual?

1 Excitatory Inhibitory excitation inhibition 2 excitation excitation inhibition

NEW MULTI-PROCESSOR DIGITAL EXCITATION SYSTEM ECS2100 OVERVIEW Digital Excitation System Logic

Lower and upper estimates on the excitation threshold for DNLS lattices J. Cuevas, N.I.

67,69,71,73 Cu Coulomb excitation of the neutron- -rich Cu Coulomb excitation of the neutron

Clarifying Residual Flow s for Surface Water Takes August 2017 Clarifying Residual Flow s

An Overview of Deep Residual Learning Semih Yagcioglu 01.03.2016 Deep Residual Learning

Residual Flows for Invertible Generative Modeling Ricky T. Q. Chen, Jens Behrmann, David

SPOT Farm East (Elveden) 2016 Residual Herbicide Demonstration Report Background The urea

Working Principle of a Semiconductor Based Solar Cell Excitation of Charge Carriers II Week

Residual Networks (ResNet) Residual Networks (ResNet) In [1]: import d2l from mxnet import gluon,

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2017 1 Residual

Residual modular Galois representations and their images Samuele Anni University of Warwick

SESSION 8: VALUING RESIDUAL CLAIMS (EQUITY) Valuing Equity Equity represents a residual

Lecture 9: Residual Analysis Instructor: Prof. Shuai Huang Industrial and Systems Engineering

Lecture 3 Residual Analysis + Generalized Linear Models Colin Rundel 1/23/2018 1 Residual

RADIOLOGICAL ASSESSMENT OF AN AREA WITH URANIUM RESIDUAL MATERIAL Danyl Prez-Snchez

CP4: Fitting and Bootstrapping GLMs for Incremental Development Triangles Thomas Hartl, PwC LLP

Illinois Competitive Energy Association IPA Workshop on Full Requirements June 5, 2014 Fixed

European Gas Target Model review and update Annex 3 Calculation Specification for Wholesale

Additive Manufacturing Fundamental Concepts DAU Lunch &amp; Learn 02.07.2018 Tuke Klemmt, MLS,

First Quarter 2020 Earnings Presentation May 6, 2020 Safe Harbor Statement This presentation

Squeeze-and-Excitation Networks Jie Hu 1,* Li Shen 2,* Gang Sun 1 2 Department of Engineering

TRAPPED BETWEEN TRAPPED BETWEEN ADMINISTRATIVE DETENTION, ADMINISTRATIVE DETENTION, PRISON AND

Additive Manufacturing Fundamental Concepts DAU Lunch & Learn 02.07.2018 Tuke Klemmt, MLS,