KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis - PowerPoint PPT Presentation

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis Gustav Eje Henter Joint work with W. Bastiaan Kleijn and Arne Leijon at KTH CSTR internal presentation Monday 20 January, 2014 Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 1

Take-Home Message Current acoustic models in parametric speech synthesis are not a good fit We present a new acoustic model for speech that 1 Converges asymptotically on the true data-generating process 2 Can be interpreted as probabilistic hybrid speech synthesis 3 Models nonlinear time series better The advantages come thanks to nonparametric speech synthesis Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 2

Outline 1 Introduction 2 Kernel density estimation 3 KDE Markov models • Experiments 4 KDE-HMMs • Parameter estimation • Experiments 5 Summary and outlook Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 3

Standard Sequence Models Markovian paradigm • Finite-length memory • Examples: • Discrete Markov chain p X t | X t − 1 ( x t | x t − 1 ) • Linear autoregressive (AR) models p � X t = µ + α l ( x t − l − µ ) + E t l = 1 X t − 1 X t X t +1 Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 4

Standard Sequence Models Hidden-state paradigm • Unbounded memory • Admits a control signal • Examples: • Hidden Markov model (discrete state Q t ) • Kalman filter (continuous state) X t − 1 X t +1 X t Q t − 1 Q t Q t +1 Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 5

Standard HMM Acoustic Model Standard models for parametric speech synthesis are HMMs or HSMMs • States Q t represent (sub)phone, context, and prosodic information • Observables X t ∈ R D are vocoder parameters • State-conditional output distributions f X t | Q t ( x t | q t ) are Gaussian • Dynamic features ( ∆ s and ∆∆ s) tie adjacent observations together • Autoregressive HMMs (AR-HMMs) less mathematically objectionable X t − 1 X t X t +1 Q t − 1 Q t Q t +1 Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 6

Problems Even using ground-truth durations, generated features are poor • Sampled output is warbly (Shannon, Zen, & Byrne, 2011) • Most probable output sequence (ML parameter generation, MLPG) sounds muffled and buzzy Note: Unit selection does not have these problems Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 7

Problem Analysis What is wrong with our parametric models? • The model is inadequate • State-conditional outputs are overly simplistic—essentially just linear AR processes • Results on full-covariance models from Shannon, Zen, & Byrne (2011) suggest that trajectory time dependence is not well modelled • Nonlinear AR models are a closer match • Product of experts increase held-out data likelihood substantially, but not synthesis quality (Shannon, 2012) Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 8

New Idea What to do? • No one knows what the “true” distribution f of speech is • It is not obvious how to improve current models • This calls for a generally applicable technique! • Proposal: Kernel Conditional Density Estimation + Markov processes • Can describe any Markov model • Then add hidden state to control process output Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 9

Kernel Density Estimation Kernel Density Estimation (KDE) is a nonparametric density estimation technique • Training data D = { y 1 , . . . , y N } in R D sampled from reference f X • Test points { x 1 , . . . , x T } • KDE can be seen as a smoothing or blurring (convolution) of the empirical density function N � f X ( x | D ) = 1 ˙ δ ( x − y n ) N n = 1 with a nonnegative kernel function k ( r ) • Intuition: KDE is to squint while looking at the datapoints Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 11

Kernel Density Estimation • The estimated PDF can be written � x − y n � N � f X ( x | D , h ) = 1 1 � h D k N h n = 1 where h is a bandwidth parameter controlling the degree of smoothing ´ ´ • We require r k ( r ) d r = 1 and r r k ( r ) d r = 0 • Probabilistic interpretation: • Mixture distribution with k ( r ) -shaped zero-mean components • One component centered on each training-data point • We use Gaussian kernels throughout • Bandwidth h matters more than kernel shape k ( r ) Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 12

Example Data Running example: Santa Fe chaotic FIR laser series (1D, N = 1000 plotted) 250 200 Laser intensity x t 150 100 50 0 0 500 1000 Time index t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 13

Example Data Running example: Santa Fe chaotic FIR laser series (detail) 250 200 Laser intensity x t 150 100 50 0 100 150 200 250 Time index t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 14

Example Data Scatter plot of consecutive values { ( x t , x t + 1 ) } t reveals attractor structure 250 200 Subsequent value x t +1 150 100 50 0 0 50 100 150 200 250 Current value x t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 15

Example KDE Gaussian blur of points = 2D KDE (bandwidth � h optimised for log-prob) 250 200 Subsequent value x t +1 150 100 50 0 0 50 100 150 200 250 Current value x t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 16

Example KDE Scatter plot superimposed on 2D KDE fit 250 200 Subsequent value x t +1 150 100 50 0 0 50 100 150 200 250 Current value x t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 17

KDE Properties Strengths: • Asymptotically consistent: lim N →∞ � f X = f X under appropriate bandwidth selection ( h → 0, Nh → ∞ ), regardless of f X • Built from data points (nonparametric) • Single free parameter Weaknesses: • Data demanding • Computationally demanding • Substantial speedups are possible (e.g., Holmes, Gray, & Isbell, 2007) Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 18

Handling Time Dependence So far we have said nothing about time dependence � � • Key idea: A joint KDE PDF � x t f X t for sequence segments t − p t − p � � ⊺ x t x ⊺ t − p , . . . , x ⊺ t − 1 , x ⊺ t − p = t � � induces a conditional distribution � x t | x t − 1 f X t | X t − 1 t − p t − p • Hyndman, Bashtannyk, & Grunwald (1996) • These next-step distributions are sufficient to define a p -order Markov process • KDE Markov model (KDE-MM) • Nonlinear and nonparametric • Many independent proposals, e.g., Rajarshi (1990) Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 20

Graphical Illustration A conditional distribution is a cut through the KDE 250 200 Subsequent value x t 150 100 50 0 0 50 100 150 200 250 Given value x t − 1 Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 21

Graphical Illustration Resulting normalised next-step PDF � f X t | X t − 1 ( x | x t − 1 = 100 ) 0 . 015 Conditional PDF f X t | X t − 1 ( x t | 100) 0 . 01 0 . 005 0 0 50 100 150 200 250 Subsequent value x t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 22

KCDE Definition Kernel Conditional Density Estimation (KCDE) is a normalisation of the KDE, with resulting PDF � x t − l − y n − l � � � p l = 0 k � � = 1 n h � x t | x t − 1 f X t | X t − 1 t − p , D � x t − l − y n − l � , � � p h D t − p l = 1 k n h assuming the kernel factors as k ( r ) = � p l = 0 k ( r l ) Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 23

KDE-MM Remarks • KDE-MM converges on the true process as N → ∞ • Subject to some technical criteria • Ergodicity, stationarity, appropriate bandwidth selection • Maximum likelihood estimation for h is inappropriate • Training set likelihood is degenerate as h → 0 • One component centered on each data point Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 24

Degeneracy Illustrated As h → 0, kernels become spikes at the points in D ; no generalisation 250 200 Subsequent value x t +1 150 100 50 0 0 50 100 150 200 250 Current value x t Gustav Eje Henter (CSTR) KDE-HMMs for Speech Synthesis 2014-01-20 25

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis - PowerPoint PPT Presentation

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis Gustav Eje Henter Joint work with W. Bastiaan Kleijn and Arne Leijon at KTH CSTR internal presentation Monday 20 January, 2014 Gustav Eje Henter (CSTR) KDE-HMMs for Speech

KDE Edu KDE for Education Aleix Pol, Laszlo Papp July 2012 Aleix Pol, Laszlo Papp KDE Edu

KDE Usability Project August 9 2008 Celeste Lyn Paul celeste@kde.org KDE Usability Project

KDE libraries for Qt application developers Kevin Krammer <krammer@kde.org>, FOSDEM 2013

KOrganizer The KDE Calendaring and Scheduling Application Cornelius Schumacher The KDE Project

Frank Karlitschek KDE Developer openDesktop.org KDE-Look.org KDE-Apps.org Social Desktop

Debian KDE-Extras Team Debian KDE-Extras Team Mark Purcell <msp@debian.org> Introduction

KDE Frameworks 5 David Faure 28/06/2012 | Tallinn | Akademy Be Free. KDE About me David Faure

KDE Quality Teams The Quality Team Project (a little background) Me (even less background)

KDE Device spectrum: Plasma Netbook Marco Martin Why for netbooks? Why KDE SC? Why KDE SC?

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

The Qt and KDE Frameworks: An Overview Kevin Ottens Kevin Ottens The Qt and KDE

The Qt and KDE Frameworks: An Overview Kevin Ottens Kevin Ottens The Qt and KDE

KDE on Mac OS X Taking the "D" out of KDE aKademy 2007, Glasgow Benjamin Reed

Meta-Programming in KDE The technology behind KConfig XT and friends Cornelius Schumacher The

EBSpat an R package devoted to simulation and estimation around nearest-neighbour type Gibbs point

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation

Estimating Recall the general mean-variance specification E( Y | x ) = f ( x , ) , var( Y |

General Formulations for Structures: Markov Logic CS 6355: Structured Prediction 1 This lecture

Extractors and Pseudorandom Generators Luca Trevisan Columbia University Extractors and

Pseudo-Random Number Generators Functional Programming and Intelligent Algorithms Prof Hans Georg

Events and Randomness 15-110 Wednesday 11/20 Learning Goals Build simulations to study how

Traversing a n -cube without Balanced Hamiltonian Cycle to Generate Pseudorandom Numbers J.-F .

Sambuz

Useful Links

Newsletter

Mail Us

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis - PowerPoint PPT Presentation

KDE-HMMs New, Nonparametric Acoustic Models for Speech Synthesis Gustav Eje Henter Joint work with W. Bastiaan Kleijn and Arne Leijon at KTH CSTR internal presentation Monday 20 January, 2014 Gustav Eje Henter (CSTR) KDE-HMMs for Speech

KDE Edu KDE for Education Aleix Pol, Laszlo Papp July 2012 Aleix Pol, Laszlo Papp KDE Edu

KDE Usability Project August 9 2008 Celeste Lyn Paul celeste@kde.org KDE Usability Project

KDE libraries for Qt application developers Kevin Krammer &lt;krammer@kde.org&gt;, FOSDEM 2013

KOrganizer The KDE Calendaring and Scheduling Application Cornelius Schumacher The KDE Project

Frank Karlitschek KDE Developer openDesktop.org KDE-Look.org KDE-Apps.org Social Desktop

Debian KDE-Extras Team Debian KDE-Extras Team Mark Purcell &lt;msp@debian.org&gt; Introduction

KDE Frameworks 5 David Faure 28/06/2012 | Tallinn | Akademy Be Free. KDE About me David Faure

KDE Quality Teams The Quality Team Project (a little background) Me (even less background)

KDE Device spectrum: Plasma Netbook Marco Martin Why for netbooks? Why KDE SC? Why KDE SC?

Algorithms for NLP IITP, Spring 2020 HMMs, POS tagging, NER Yulia Tsvetkov 1 Plan POS

HMMs for Acoustic Modeling (Part II) Lecture 3 CS 753 Instructor: Preethi Jyothi Recap: HMMs

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

The Qt and KDE Frameworks: An Overview Kevin Ottens Kevin Ottens The Qt and KDE

The Qt and KDE Frameworks: An Overview Kevin Ottens Kevin Ottens The Qt and KDE

KDE on Mac OS X Taking the &quot;D&quot; out of KDE aKademy 2007, Glasgow Benjamin Reed

Meta-Programming in KDE The technology behind KConfig XT and friends Cornelius Schumacher The

EBSpat an R package devoted to simulation and estimation around nearest-neighbour type Gibbs point

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation

Estimating Recall the general mean-variance specification E( Y | x ) = f ( x , ) , var( Y |

General Formulations for Structures: Markov Logic CS 6355: Structured Prediction 1 This lecture

Extractors and Pseudorandom Generators Luca Trevisan Columbia University Extractors and

Pseudo-Random Number Generators Functional Programming and Intelligent Algorithms Prof Hans Georg

Events and Randomness 15-110 Wednesday 11/20 Learning Goals Build simulations to study how

Traversing a n -cube without Balanced Hamiltonian Cycle to Generate Pseudorandom Numbers J.-F .

Sambuz

Useful Links

Newsletter

Mail Us

KDE libraries for Qt application developers Kevin Krammer <krammer@kde.org>, FOSDEM 2013

Debian KDE-Extras Team Debian KDE-Extras Team Mark Purcell <msp@debian.org> Introduction

KDE on Mac OS X Taking the "D" out of KDE aKademy 2007, Glasgow Benjamin Reed