speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech Recognition From acoustics to text From acoustics to text Acoustic modeling Acoustic modeling Recognizing all forms of all phonemes


  1. Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs

  2. Speech Recognition From acoustics to text � From acoustics to text � Acoustic modeling � Acoustic modeling � � Recognizing all forms of all phonemes Recognizing all forms of all phonemes � Language modeling � Language modeling � � Expectation of what might be said Expectation of what might be said � We need both to do recognition � We need both to do recognition �

  3. Acoustics are not enough � Last Saturday in Hawaii, numerous Last Saturday in Hawaii, numerous Waipouli Waipouli vacationers were vacationers were � shocked to find their beach cordoned off for a UC Berkeley Drama shocked to find their beach cordoned off for a UC Berkeley Drama enactment of "Personal office space". The play features exclusively ely enactment of "Personal office space". The play features exclusiv topless men and women in an everyday office environment. Richard topless men and women in an everyday office environment. Richard Carlson, one of the annoyed tourists and a regular swimmer at Carlson, one of the annoyed tourists and a regular swimmer at Waipouli beach, complained that they really knew how to wreck a nice beach, complained that they really knew how to wreck a nice Waipouli beach with the nudist play. Many of the tourists appeared ruffled by the d by the beach with the nudist play. Many of the tourists appeared ruffle content and fled the scene to avoid compromising photos. content and fled the scene to avoid compromising photos. � In yesterday's press release, AT&T unveiled In yesterday's press release, AT&T unveiled SpeechKit SpeechKit, its new , its new � speech recognition toolkit. According to Michael Armstrong, the COO COO speech recognition toolkit. According to Michael Armstrong, the of the company, the most innovative feature of the system is its of the company, the most innovative feature of the system is its revolutionary three- revolutionary three -dimensional interface, which opens a new universe dimensional interface, which opens a new universe of possibilities for the speech recognition community. During the he of possibilities for the speech recognition community. During t official software release, Jonathan Blues, a senior researcher at AT&T t AT&T official software release, Jonathan Blues, a senior researcher a Labs, explained how to recognize speech with the new display, and d Labs, explained how to recognize speech with the new display, an how the toolkit has already played a crucial role in his research. how the toolkit has already played a crucial role in his researc h.

  4. Acoustics are not enough � Last Saturday in Hawaii, numerous Last Saturday in Hawaii, numerous Waipouli Waipouli vacationers were vacationers were � shocked to find their beach cordoned off for a UC Berkeley Drama shocked to find their beach cordoned off for a UC Berkeley Drama enactment of "Personal office space". The play features exclusively ely enactment of "Personal office space". The play features exclusiv topless men and women in an everyday office environment. Richard topless men and women in an everyday office environment. Richard Carlson, one of the annoyed tourists and a regular swimmer at Carlson, one of the annoyed tourists and a regular swimmer at Waipouli beach, complained that they really knew beach, complained that they really knew how to wreck a nice how to wreck a nice Waipouli beach with this nudist play. Many of the tourists appeared ruffled by . Many of the tourists appeared ruffled by beach with this nudist play the content and fled the scene to avoid compromising photos. the content and fled the scene to avoid compromising photos. � In yesterday's press release, AT&T unveiled In yesterday's press release, AT&T unveiled SpeechKit SpeechKit, its new , its new � speech recognition toolkit. According to Michael Armstrong, the COO COO speech recognition toolkit. According to Michael Armstrong, the of the company, the most innovative feature of the system is its of the company, the most innovative feature of the system is its revolutionary three- revolutionary three -dimensional interface, which opens a new universe dimensional interface, which opens a new universe of possibilities for the speech recognition community. During the he of possibilities for the speech recognition community. During t official software release, Jonathan Blues, a senior researcher at AT&T t AT&T official software release, Jonathan Blues, a senior researcher a Labs, explained how to recognize speech with this new display how to recognize speech with this new display, and , and Labs, explained how the toolkit has already played a crucial role in his research. how the toolkit has already played a crucial role in his researc h.

  5. Split the task Build Acoustic models � Build Acoustic models � � Probability of phones given acoustics Probability of phones given acoustics � Build Language models � Build Language models � � Probability of word string Probability of word string �

  6. Acoustic models Represent all ways to say each phoneme � Represent all ways to say each phoneme � � Like “templates” for each phoneme Like “templates” for each phoneme � � Averages over multiple examples Averages over multiple examples � � Different phonetic contexts Different phonetic contexts �  “sow” “sow” vs vs “see” etc “see” etc  � Different people speaking Different people speaking � � Different acoustic environment Different acoustic environment � � Different channels Different channels �  (assume channel is similar) (assume channel is similar) 

  7. Better Acoustic Models DTW Template � DTW Template � � Could be averages over multiple examples Could be averages over multiple examples � � Need to be time normalized Need to be time normalized �  Linear interpolate or try to match Linear interpolate or try to match  � Matching probabilistically Matching probabilistically �  What is the probability that example matches What is the probability that example matches   Test each frame Test each frame 

  8. Hidden Markov Models • Markov Process – Future can be predicted from the past • Hidden Markov Models: – When the state is unknown – A probability is given for each states

  9. Hidden Markov Model

  10. Key Requirements

  11. Find Probability of Observation � Given observation O and model M Given observation O and model M � � Efficiently file P(O|M) Efficiently file P(O|M) � decoding Called decoding � Called � � Find sum of all paths probabilities Find sum of all paths probabilities � � Each path Each path prob prob is product of each transition in is product of each transition in � state sequence state sequence � Use dynamic programming (generalized DTW) Use dynamic programming (generalized DTW) � � Also used in Chart Parsers, Theorem Also used in Chart Parsers, Theorem Provers Provers �

  12. Finding the Best Path � What is the most probable state sequence What is the most probable state sequence � Viterbi algorithm Use Viterbi � Use algorithm � � Maximize best sequence Maximize best sequence � � At each point hold list possible states At each point hold list possible states � � Hold back Hold back- -pointer to best previous state pointer to best previous state � � Cumulate values along path Cumulate values along path � � Because we are looking for BEST Because we are looking for BEST � � Can ignore other back Can ignore other back- -pointers pointers � � (When looking for N (When looking for N- -best need more complex best need more complex � structure) structure)

  13. Parameter Estimation training Called training � Called � � Use Maximum Likelihood Estimation Use Maximum Likelihood Estimation � � Baum Baum- -Welch (forward/backward algorithm) Welch (forward/backward algorithm) � � Special case of EM (Expectation Maximization) Special case of EM (Expectation Maximization) � � Run observation and find current Run observation and find current probs probs (forward) (forward) � � Modify probabilities to make observations best path Modify probabilities to make observations best path � (backward) (backward) � Repeat until convergences Repeat until convergences � � Not globally optimal Not globally optimal � � May find local maximum May find local maximum �

  14. HMM recognition A bunch of HMM � A bunch of HMM � � One for each phone type One for each phone type � Each observation (e.g. 10ms frame) � Each observation (e.g. 10ms frame) � � Probability distribution of possible phone type Probability distribution of possible phone type � Thus can find most probably sequence � Thus can find most probably sequence � � Use Use Viterbi Viterbi to find best path to find best path �

  15. But that’s not enough • But not all phones are equi-probable • Find word sequences that maximizes • Using Bayes’ Law • Combine models – Us HMMs to provide – Use language model to provide

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend