speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Synthesis Signal Processing Signal Manipulation Signal Parameterization Signal Parameterization Joining Joining LPC LPC PSOLA: pitch and duration modification PSOLA: pitch


  1. Speech Processing 15-492/18-492 Speech Synthesis Signal Processing

  2. Signal Manipulation Signal Parameterization � Signal Parameterization � � Joining Joining � � LPC LPC � � PSOLA: pitch and duration modification PSOLA: pitch and duration modification � Statistical Parameterization � Statistical Parameterization � � MELCEP/MLSA MELCEP/MLSA � � LSF, STRAIGHT, HNM, HSM LSF, STRAIGHT, HNM, HSM �

  3. TTS Signal Processing � Join together pieces of speech Join together pieces of speech � � Prosodic modification Prosodic modification � � Pitch (F0) Pitch (F0) � � Duration Duration � � Power Power � � Change spectral properties Change spectral properties � � Stress/ Stress/unstress unstress � � Spectral tilt Spectral tilt � � Speaking style Speaking style �

  4. Joining Just put them together � Just put them together � � Gets clicks at join points Gets clicks at join points � Join them at zero crossings � Join them at zero crossings � Window them and overlap them � Window them and overlap them � � WSOLA WSOLA � Join them at pitch periods � Join them at pitch periods �

  5. Prosodic Modification independently Modify pitch and duration independently � Modify pitch and duration � Changing sample rate changes both � Changing sample rate changes both � � “chipmunk” style speech “chipmunk” style speech � Duration � Duration � � Duplicate/delete parts of the signal Duplicate/delete parts of the signal � Pitch � Pitch � � “resample” to change pitch “resample” to change pitch �

  6. Speech and Short Term Signals

  7. Duration Modification

  8. Pitch Modification

  9. Modify pitch and duration Find ideal pitch periods and duration � Find ideal pitch periods and duration � Find closest actual periods from units � Find closest actual periods from units � End with � End with � � Pitch period (short term signals) Pitch period (short term signals) � � Distances between them Distances between them �

  10. Signal Reconstruction � TD TD- -PSOLA™ PSOLA™ � � Time domain pitch synchronous overlap and add Time domain pitch synchronous overlap and add � � Patented by France Telecom Patented by France Telecom � � Expired 2004 Expired 2004 � � Very efficient: Very efficient: � � No FFT (or inverse FFT) No FFT (or inverse FFT) � � Can modify Hz * 2.0 (or 0.5) Can modify Hz * 2.0 (or 0.5) � � The reason no one publishes algorithms The reason no one publishes algorithms � � The (partial) reason unit selection typically doesn’t The (partial) reason unit selection typically doesn’t � do pitch/duration modification do pitch/duration modification

  11. LPC: Linear predictive coding • Linear predictive coding – Predict next sample point from previous – Weighted sum of previous points – Filter of order p. – Residual excited LPC

  12. LPC � Works well but can be Works well but can be buzzy buzzy � � Can be very compact Can be very compact � � Can be pitch synchronous Can be pitch synchronous � � Excited Excited � � Pulse Pulse � � Triangular pulse Triangular pulse � � Multi Multi- -pulse pulse � � Full residual Full residual � � Used in standard speech coding Used in standard speech coding � � LPC10: 2.4kps LPC10: 2.4kps � � CELP: codebook excited LPC CELP: codebook excited LPC �

  13. Other Parametric Representations � Typically split spectral and residual Typically split spectral and residual � � MBROLA: MBROLA: � � Multi Multi- -band overlap and add band overlap and add � � HNM/HSM: HNM/HSM: � � Harmonic plus (noise/stochastic) modeling Harmonic plus (noise/stochastic) modeling � � STRAIGHT STRAIGHT � � MELCEP/MLSA MELCEP/MLSA � � Often used in HMM synthesis Often used in HMM synthesis � � Sinusoidal (HARMONIC) Sinusoidal (HARMONIC) � � Wavelet Wavelet � � LSF/LPC LSF/LPC �

  14. Choosing the right unit type Diphones � Diphones � � Phone Phone- -phone phone � � Joins at stable portions, not transitions Joins at stable portions, not transitions � Half phone (AT&T Natural Voices) � Half phone (AT&T Natural Voices) � Hybrid systems (Hadifix Hadifix – – Bonn systems) Bonn systems) � Hybrid systems ( � Other selection systems: � Other selection systems: � � Syllable, phone, HMM state Syllable, phone, HMM state � � Even frame level Even frame level �

  15. Acoustically Derived Units E.g Bacchiani Bacchiani 99 or Rita Singh CMU 99 or Rita Singh CMU � E.g � From some waveforms � From some waveforms � � Find N most diverse unit types Find N most diverse unit types � � Varied in length Varied in length � Still need to map letters to units � Still need to map letters to units �

  16. Acoustic Phonetic Clustering � Parameterize database Parameterize database � � Melcep Melcep plus power plus power � � K K- -means means � � Euclidean distance measure Euclidean distance measure � � 100 clusters 100 clusters � � Label DB with best cluster Label DB with best cluster � � Build Build clunits clunits synthesizer synthesizer � � Can’t predict APC cluster directly Can’t predict APC cluster directly � � Use held out data for testing Use held out data for testing �

  17. Acoustic Phonetic Clustering

  18. Grapheme Based Synthesis � Synthesis without a phoneme set Synthesis without a phoneme set � � Use the letters as phonemes Use the letters as phonemes � � (“ (“alan alan” nil (a l a n)) ” nil (a l a n)) � � (“black” nil ( b l a c k )) (“black” nil ( b l a c k )) � � Spanish (easier ?) Spanish (easier ?) � � 419 utterances 419 utterances � � HMM training to label databases HMM training to label databases � � Simple pronunciation rules Simple pronunciation rules � � Polici’a Polici’a - -> p o l i c i’ a > p o l i c i’ a � � Cuatro Cuatro - -> c u a t r o > c u a t r o �

  19. Spanish Grapheme Synthesis

  20. English Grapheme Synthesis Use Letters are phones Use Letters are phones - - 26 “ “phonemes phonemes” ” 26 - - ( “ ( “alan alan” ” n (a l a n)) n (a l a n)) - - ( “ “black black” ” n (b l a c k)) n (b l a c k)) ( - - Build HMM acoustic models for labeling Build HMM acoustic models for labeling - - For English For English - - “This is a pen This is a pen” ” “ - - “We went to the church at Christmas We went to the church at Christmas” ” “ - - Festival intro Festival intro - - “do eight meat do eight meat” ” “ - - Requires method to fix errors Requires method to fix errors - - Letter to letter mapping Letter to letter mapping - -

  21. Signal Processing for TTS Pitch and duration modification � Pitch and duration modification � LPC � LPC � Finding the right unit type � Finding the right unit type � Grapheme- -based Synthesis based Synthesis � Grapheme �

  22. HW1: TTS Due 3:30pm Friday October 2 nd nd � Due 3:30pm Friday October 2 � Install Festival and Festvox Festvox � Install Festival and � Find 10 errors in each of two different � Find 10 errors in each of two different � synthesizers synthesizers Build a voice � Build a voice � � A Talking Clock A Talking Clock � � A general voice A general voice � � (or both) (or both) �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend