Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs

Speech Recognition From acoustics to text � From acoustics to text � Acoustic modeling � Acoustic modeling � � Recognizing all forms of all phonemes Recognizing all forms of all phonemes � Language modeling � Language modeling � � Expectation of what might be said Expectation of what might be said � We need both to do recognition � We need both to do recognition �

Acoustics are not enough � Last Saturday in Hawaii, numerous Last Saturday in Hawaii, numerous Waipouli Waipouli vacationers were vacationers were � shocked to find their beach cordoned off for a UC Berkeley Drama shocked to find their beach cordoned off for a UC Berkeley Drama enactment of "Personal office space". The play features exclusively ely enactment of "Personal office space". The play features exclusiv topless men and women in an everyday office environment. Richard topless men and women in an everyday office environment. Richard Carlson, one of the annoyed tourists and a regular swimmer at Carlson, one of the annoyed tourists and a regular swimmer at Waipouli beach, complained that they really knew how to wreck a nice beach, complained that they really knew how to wreck a nice Waipouli beach with the nudist play. Many of the tourists appeared ruffled by the d by the beach with the nudist play. Many of the tourists appeared ruffle content and fled the scene to avoid compromising photos. content and fled the scene to avoid compromising photos. � In yesterday's press release, AT&T unveiled In yesterday's press release, AT&T unveiled SpeechKit SpeechKit, its new , its new � speech recognition toolkit. According to Michael Armstrong, the COO COO speech recognition toolkit. According to Michael Armstrong, the of the company, the most innovative feature of the system is its of the company, the most innovative feature of the system is its revolutionary three- revolutionary three -dimensional interface, which opens a new universe dimensional interface, which opens a new universe of possibilities for the speech recognition community. During the he of possibilities for the speech recognition community. During t official software release, Jonathan Blues, a senior researcher at AT&T t AT&T official software release, Jonathan Blues, a senior researcher a Labs, explained how to recognize speech with the new display, and d Labs, explained how to recognize speech with the new display, an how the toolkit has already played a crucial role in his research. how the toolkit has already played a crucial role in his researc h.

Acoustics are not enough � Last Saturday in Hawaii, numerous Last Saturday in Hawaii, numerous Waipouli Waipouli vacationers were vacationers were � shocked to find their beach cordoned off for a UC Berkeley Drama shocked to find their beach cordoned off for a UC Berkeley Drama enactment of "Personal office space". The play features exclusively ely enactment of "Personal office space". The play features exclusiv topless men and women in an everyday office environment. Richard topless men and women in an everyday office environment. Richard Carlson, one of the annoyed tourists and a regular swimmer at Carlson, one of the annoyed tourists and a regular swimmer at Waipouli beach, complained that they really knew beach, complained that they really knew how to wreck a nice how to wreck a nice Waipouli beach with this nudist play. Many of the tourists appeared ruffled by . Many of the tourists appeared ruffled by beach with this nudist play the content and fled the scene to avoid compromising photos. the content and fled the scene to avoid compromising photos. � In yesterday's press release, AT&T unveiled In yesterday's press release, AT&T unveiled SpeechKit SpeechKit, its new , its new � speech recognition toolkit. According to Michael Armstrong, the COO COO speech recognition toolkit. According to Michael Armstrong, the of the company, the most innovative feature of the system is its of the company, the most innovative feature of the system is its revolutionary three- revolutionary three -dimensional interface, which opens a new universe dimensional interface, which opens a new universe of possibilities for the speech recognition community. During the he of possibilities for the speech recognition community. During t official software release, Jonathan Blues, a senior researcher at AT&T t AT&T official software release, Jonathan Blues, a senior researcher a Labs, explained how to recognize speech with this new display how to recognize speech with this new display, and , and Labs, explained how the toolkit has already played a crucial role in his research. how the toolkit has already played a crucial role in his researc h.

Split the task Build Acoustic models � Build Acoustic models � � Probability of phones given acoustics Probability of phones given acoustics � Build Language models � Build Language models � � Probability of word string Probability of word string �

Acoustic models Represent all ways to say each phoneme � Represent all ways to say each phoneme � � Like “templates” for each phoneme Like “templates” for each phoneme � � Averages over multiple examples Averages over multiple examples � � Different phonetic contexts Different phonetic contexts �  “sow” “sow” vs vs “see” etc “see” etc  � Different people speaking Different people speaking � � Different acoustic environment Different acoustic environment � � Different channels Different channels �  (assume channel is similar) (assume channel is similar) 

Better Acoustic Models DTW Template � DTW Template � � Could be averages over multiple examples Could be averages over multiple examples � � Need to be time normalized Need to be time normalized �  Linear interpolate or try to match Linear interpolate or try to match  � Matching probabilistically Matching probabilistically �  What is the probability that example matches What is the probability that example matches   Test each frame Test each frame 

Hidden Markov Models • Markov Process – Future can be predicted from the past • Hidden Markov Models: – When the state is unknown – A probability is given for each states

Hidden Markov Model

Key Requirements

Find Probability of Observation � Given observation O and model M Given observation O and model M � � Efficiently file P(O|M) Efficiently file P(O|M) � decoding Called decoding � Called � � Find sum of all paths probabilities Find sum of all paths probabilities � � Each path Each path prob prob is product of each transition in is product of each transition in � state sequence state sequence � Use dynamic programming (generalized DTW) Use dynamic programming (generalized DTW) � � Also used in Chart Parsers, Theorem Also used in Chart Parsers, Theorem Provers Provers �

Finding the Best Path � What is the most probable state sequence What is the most probable state sequence � Viterbi algorithm Use Viterbi � Use algorithm � � Maximize best sequence Maximize best sequence � � At each point hold list possible states At each point hold list possible states � � Hold back Hold back- -pointer to best previous state pointer to best previous state � � Cumulate values along path Cumulate values along path � � Because we are looking for BEST Because we are looking for BEST � � Can ignore other back Can ignore other back- -pointers pointers � � (When looking for N (When looking for N- -best need more complex best need more complex � structure) structure)

Parameter Estimation training Called training � Called � � Use Maximum Likelihood Estimation Use Maximum Likelihood Estimation � � Baum Baum- -Welch (forward/backward algorithm) Welch (forward/backward algorithm) � � Special case of EM (Expectation Maximization) Special case of EM (Expectation Maximization) � � Run observation and find current Run observation and find current probs probs (forward) (forward) � � Modify probabilities to make observations best path Modify probabilities to make observations best path � (backward) (backward) � Repeat until convergences Repeat until convergences � � Not globally optimal Not globally optimal � � May find local maximum May find local maximum �

HMM recognition A bunch of HMM � A bunch of HMM � � One for each phone type One for each phone type � Each observation (e.g. 10ms frame) � Each observation (e.g. 10ms frame) � � Probability distribution of possible phone type Probability distribution of possible phone type � Thus can find most probably sequence � Thus can find most probably sequence � � Use Use Viterbi Viterbi to find best path to find best path �

But that’s not enough • But not all phones are equi-probable • Find word sequences that maximizes • Using Bayes’ Law • Combine models – Us HMMs to provide – Use language model to provide

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech Recognition From acoustics to text From acoustics to text Acoustic modeling Acoustic modeling Recognizing all forms of all phonemes

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Questions to Answer Systems Approach to the Exam 1. What are expected healing rates for DFU with

+ Luke 2:8-20 Argyle Candlelit THE Carol Service HERDSMEN 2018 + LUKE 2:8-20 v8 And

Hopes and Fears for Evergreen Oh were free! Free! Forever were free Come join the song

FOUNDATIONS [Track One : Believer To Disciple] Lesson 06 : Praise and Worship Lesson 06 :

Journey planning in uncertain environments, the multi-objective way Mickael Randour UMONS -

CSEP 517 Natural Language Processing Parsing (Trees) Luke Zettlemoyer - University of Washington

Product types: Both x and y New abstract syntax: PAIR , FST , SND 1 and 2 are types ` e 1 `

1 Peter Series Lesson #087 April 20, 2017 Dean Bible Ministries www.deanbibleministries.org Dr.