a general purpose 32 ms prosodic vector for hidden markov
play

A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling - PowerPoint PPT Presentation

Introduction FFV Representation Applicability Experiments Conclusion A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling Kornel Laskowski 1 , 2 , Mattias Heldner 3 & Jens Edlund 3 1 Carnegie Mellon University, Pittsburgh PA,


  1. Introduction FFV Representation Applicability Experiments Conclusion A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling Kornel Laskowski 1 , 2 , Mattias Heldner 3 & Jens Edlund 3 1 Carnegie Mellon University, Pittsburgh PA, USA 2 Universit¨ at Karlsruhe, Karlsruhe, Germany 3 KTH — Royal Institute of Technology, Stockholm, Sweden 8 September, 2008 K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 1/20

  2. Introduction FFV Representation Applicability Experiments Conclusion Imagine you had ... a local representation of tone estimated from a single ASR-size analysis frame which would not require: prior determination of voicing speaker normalization with separable codeword clusters for absence of voicing presence of voicing, constant F 0 presence of voicing, falling F 0 , with rate of change presence of voicing, rising F 0 , with rate of change K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 2/20

  3. Introduction FFV Representation Applicability Experiments Conclusion Then you could do lots of things cheaply ... Examples include: online prosodic modeling improved ASR for tonal languages enriched ASR for other languages contrastive phone models variously accented same-word lexicon entries (word-conditioned) prosodic phrasing for free K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 3/20

  4. Introduction FFV Representation Applicability Experiments Conclusion Instead, currently you need to ... 1 run a pitch tracker , which computes a local estimate of voicing and of pitch 1 applies dynamic programming over a long observation time 2 2 heuristically correct its output , by pruning outliers, based on long-observation-time trends, and/or 1 applying a piecewise linear approximation 2 3 normalize for the speaker , by determining a long-observation-time speaker norm 1 applying the normalization to each frame 2 4 treat unvoiced regions by interpolating inside them, or posting exceptions in downstream modeling/handling 5 compute a first-order log-difference K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 4/20

  5. Introduction FFV Representation Applicability Experiments Conclusion What we will present ... 1 Fundamental Frequency Variation (FFV) 2 Applicability of the FFV Representation speaker change prediction speaker classification dialog act classification 3 Several Basic Questions feature transformation feature regularization concatenation with other features runtime improvements acoustic model complexity 4 Summary K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 5/20

  6. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  7. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  8. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  9. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  10. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  11. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  12. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  13. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  14. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values freq domain pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  15. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values 1 0.8 0.6 0.4 0.2 freq domain 0 −2 −1 0 +1 +2 pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  16. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values 1 0.8 0.6 0.4 0.2 freq domain 0 −2 −1 0 +1 +2 pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  17. Introduction FFV Representation Applicability Experiments Conclusion Computation, for Each (32 ms) Analysis Frame estimate the FFV spectrum g [ ρ ] estimate the power spectra F L and F R R 7 dilate F R by a factor 2 ρ , ρ > 0 dot product with undilated F L time domain repeat for a continuum of ρ values 1 0.8 0.6 0.4 0.2 freq domain 0 −2 −1 0 +1 +2 pass g ( ρ ) through a filterbank to yield G ∈ decorrelate G K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 6/20

  18. Introduction FFV Representation Applicability Experiments Conclusion Comparison with MFCC Computation AUDIO AUDIO PRE−EMPHASIS PRE−EMPHASIS POW SPECTRUM FFV SPECTRUM ESTIMATION ESTIMATION FILTERBANK PERCEPTUAL FILTERBANK (MEL) DECORRELATE DECORRELATE (INV. COS−II) (KLT) MODELING MODELING MFCC features FFV features K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 7/20

  19. Introduction FFV Representation Applicability Experiments Conclusion FFV versus Pitch Tracking, Conceptually Formant Pitch FFV Peak Tracking Tracking Tracking − → − → − − − → → → − → − → FFT Autocorr FFV Spectrum Spectrum Spectrum K. Laskowski, M. Heldner & J. Edlund Interspeech 2009, Brighton, UK 8/20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend