automatic speech recognition cs753 automatic speech
play

Automatic Speech Recognition (CS753) Automatic Speech Recognition - PowerPoint PPT Presentation

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech Synthesis (Part I) Instructor: Preethi Jyothi Oct 30, 2017 T ext- T o- S peech Systems Storied History Von Kempelens speaking machine (1791)


  1. Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech Synthesis (Part I) Instructor: Preethi Jyothi Oct 30, 2017 


  2. T ext- T o- S peech Systems 
 Storied History Von Kempelen’s speaking machine (1791) • Bellows simulated the lungs • Rubber mouth and nose; nostrils had to be covered with 
 • two fingers for non-nasals Homer Dudley’s VODER (1939) • First device to synthesize speech sounds via electrical 
 • means Gunnar Fant’s OVE formant synthesizer (1960s) • Formant synthesizer for vowels • Computer-aided speech synthesis (1970s) • Concatenative (unit selection) • Parametric (HMM-based and NN-based) 
 • All images from http://www2.ling.su.se/staff/hartmut/kemplne.htm

  3. Speech synthesis or TTS systems Goal of a TTS system: Produce a natural-sounding high- • quality speech waveform for a given word sequence TTS systems are typically divided into two parts: • A. Linguistic specification B. Waveform generation

  4. Current TTS systems Constructed using a large amount of speech data • Referred to as corpus-based TTS systems • Two prominent instances of corpus-based TTS: • 1. Unit selection and concatenation 2. Statistical parametric speech synthesis

  5. Unit selection synthesis

  6. Unit selection synthesis All segments Synthesize new sentences • by selecting sub-word units from a database of speech Optimal size of units? • Diphones? 
 Half-phones? Target cost Concatenation cost Image from Zen et al., “Statistical Parametric Speech Synthesis”, SPECOM 2001

  7. Unit selection synthesis Target cost between a candidate, u i , and a target unit t i : • p w ( t ) j C ( t ) C ( t ) ( t i , u i ) = � j ( t i , u i ) , j =1 Concatenation cost between candidate units: • q w ( c ) k C ( c ) C ( c ) ( u i − 1 , u i ) = � k ( u i − 1 , u i ) , k =1 Find string of units that minimises the overall cost: • u 1: n = arg min ˆ u 1: n { C ( t 1: n , u 1: n ) } n n � � C ( t ) ( t i , u i ) + C ( c ) ( u i − 1 , u i ) . C ( t 1: n , u 1: n ) = i =1 i =2

  8. Unit selection synthesis Clustered segments Target cost is 
 • pre-calculated using a clustering method Target cost Concatenation cost

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend