A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling - PowerPoint PPT Presentation

Introduction FFV Representation Applicability Experiments Conclusion A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling Kornel Laskowski 1 , 2 , Mattias Heldner 3 & Jens Edlund 3 1 Carnegie Mellon University, Pittsburgh PA, USA 2 Universit¨ at Karlsruhe, Karlsruhe, Germany 3 KTH — Royal Institute of Technology, Stockholm, Sweden 8 September, 2008 K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 1/20

Introduction FFV Representation Applicability Experiments Conclusion Imagine you had ... a local representation of tone estimated from a single ASR-size analysis frame which would not require: prior determination of voicing speaker normalization with separable codeword clusters for absence of voicing presence of voicing, constant F 0 presence of voicing, falling F 0 , with rate of change presence of voicing, rising F 0 , with rate of change K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 2/20

Introduction FFV Representation Applicability Experiments Conclusion Then you could do lots of things cheaply ... Examples include: online prosodic modeling improved ASR for tonal languages enriched ASR for other languages contrastive phone models variously accented same-word lexicon entries (word-conditioned) prosodic phrasing for free K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 3/20

Introduction FFV Representation Applicability Experiments Conclusion Instead, currently you need to ... 1 run a pitch tracker , which computes a local estimate of voicing and of pitch 1 applies dynamic programming over a long observation time 2 2 heuristically correct its output , by pruning outliers, based on long-observation-time trends, and/or 1 applying a piecewise linear approximation 2 3 normalize for the speaker , by determining a long-observation-time speaker norm 1 applying the normalization to each frame 2 4 treat unvoiced regions by interpolating inside them, or posting exceptions in downstream modeling/handling 5 compute a first-order log-difference K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 4/20

Introduction FFV Representation Applicability Experiments Conclusion What we will present ... 1 Fundamental Frequency Variation (FFV) 2 Applicability of the FFV Representation speaker change prediction speaker classification dialog act classification 3 Several Basic Questions feature transformation feature regularization concatenation with other features runtime improvements acoustic model complexity 4 Summary K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 5/20

Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values 1 0.8 0.6 0.4 0.2 freq domain 0 −2 −1 0 +1 +2 pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

Introduction FFV Representation Applicability Experiments Conclusion Comparison with MFCC Computation AUDIO AUDIO PRE−EMPHASIS PRE−EMPHASIS POW SPECTRUM FFV SPECTRUM ESTIMATION ESTIMATION FILTERBANK PERCEPTUAL FILTERBANK (MEL) DECORRELATE DECORRELATE (INV. COS−II) (KLT) MODELING MODELING MFCC features FFV features K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 7/20

Introduction FFV Representation Applicability Experiments Conclusion FFV versus Pitch Tracking, Conceptually Formant Pitch FFV Peak Tracking Tracking Tracking − → − → − − − → → → − → − → FFT Autocorr FFV Spectrum Spectrum Spectrum K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 8/20

A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling - PowerPoint PPT Presentation

Introduction FFV Representation Applicability Experiments Conclusion A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling Kornel Laskowski 1 , 2 , Mattias Heldner 3 & Jens Edlund 3 1 Carnegie Mellon University, Pittsburgh PA,

Hidden Markov Models Discrete Markov Processes 1 Hidden Markov Models Hidden Markov Models 2

CSCE 471/871 Lecture 3: Markov Chains Markov Chains and and Hidden Markov Models Hidden

Outline depmixS4: an R-package for hidden Markov models Hidden Markov Models Ingmar Visser 1

Markov chains and Hidden Markov Models 9000 Markov chains and HMMs We will discuss: Markov

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Hidden Markov Models Steven J Zeil Old Dominion Univ. Fall 2010 1 Discrete Markov Processes

Markov Models Kunsch, H.R., State Space and Hidden Markov Models . ETH- Zurich, Zurich;

The Hidden Markov The Hidden Markov Model (HMM) Model (HMM) 1 Lecture Outline Lecture Outline

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University Markov Chains and

Markov Chains and Hidden Markov Models COMP 571 Luay Nakhleh, Rice University 2 Markov Chains

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

Markov Chains and Hidden Markov Models COMP 571 - Spring 2015 Luay Nakhleh, Rice University

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

A spectral algorithm for learning hidden Markov models . . . h 3 h 2 h 1 x 3 x 2 x 1 Daniel Hsu

CS 4495 Computer Vision Hidden Markov Models Aaron Bobick School of Interactive Computing

OpenGL Shading Language Benj Lipchak Rob Simpson Bill Licea-Kane Fixed Functionality Pipeline

What is the shell distribution of a graph telling us? Vishesh Karwa Based on joint work with

Transformations Composition of Transformations Congruence Transformations Dilations

A question of Bukh on sums of dilates Giorgis Petridis The University of Georgia September 7th

The Choquet boundary of an operator system Kenneth R. Davidson University of Waterloo Banach

Spectral Properties of Simplicial Rook Graphs Sebastian Cioab a Willem Haemers Jason Vermette

OpenCV Tutorial Nicolas ROUGON - Yassine LEHIANI ARTEMIS Department

State of the Art: Bifurcation Treatment Strategies James B. Hermiller, MD The Care Group, LLC