hidden markov models in speech recognition
play

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard - PowerPoint PPT Presentation

HIDDENMARKOVMODELS INSPEECHRECOGNITION WayneWard CarnegieMellonUniversity Pittsburgh,PA 1 Acknowledgements Muchofthistalkisderivedfromthepaper


  1. HIDDEN�MARKOV�MODELS IN�SPEECH�RECOGNITION Wayne�Ward Carnegie�Mellon�University Pittsburgh,�PA 1

  2. Acknowledgements Much�of�this�talk�is�derived�from�the�paper "An�Introduction�to�Hidden�Markov�Models", by Rabiner and Juang and�from�the�talk "Hidden�Markov�Models:�Continuous�Speech� Recognition" by�Kai-Fu�Lee 2

  3. Topics • Markov�Models�and�Hidden�Markov�Models • HMMs applied�to�speech�recognition • Training • Decoding 3

  4. Speech�Recognition O 1 O 2 � �O T W 1 W 2 � �W T Front Match End Search Analog Discrete Word Speech Observations Sequence 4

  5. ML�Continuous�Speech�Recognition Goal: Given��acoustic�data����A�=�a 1 ,�a 2 ,�..., a k Find�word�sequence����W�=�w 1 ,�w 2 ,�... w n Such�that���P(W�|�A)��is�maximized Bayes Rule: acoustic�model�(HMMs) language�model P(A�|�W)��•�P(W) P(W�|�A)��=� P(A) P(A)�is�a�constant�for�a�complete�sentence 5

  6. = = Markov�Models Elements: S�� = � S 0 , �S 1 , � �S N States�: P qt�=�Si�|�qt-1�=�Sj Transition�probabilities�:� P(B�|�B) P(A�|�A) P(B�|�A) A B P(A�|�B) Markov�Assumption: Transition�probability�depends�only�on�current�state P q t = S i |�q t-1 = S j ,� q t-2 = S k ,� =�P q t = S i |�q t-1� =�S j =�� a ji N ∀ aji� ≥ �0�� ∀ �j,i a 1 ������ j ∑ ji i 0 6

  7. Single�Fair�Coin 0.5 0.5 0.5 1 2 0.5 P(H)�=�1.0 P(H)�=�0.0 P(T)�=��0.0 P(T)�=��1.0 Outcome�head�corresponds�to�state�1,�tail�to�state�2 Observation�sequence�uniquely�defines�state�sequence 7

  8. Hidden�Markov�Models Elements: S��=� S0,�S1,� SN States �P qt�=�Si�|�qt-1�=�Sj ��=��aji Transition�probabilities Output prob distributions���� P(yt�=�Ok�|�qt�=�Sj)�=�b j k (at�state�j�for�symbol�k) P(B�|�B) P(A�|�A) P O 1 � | �B P O 1 � | �A P O 2 � | �B P O 2 � | �A P(B�|�A) P O M � | �B P O M � | �A A B Prob Prob P(A�|�B) Obs Obs 8

  9. Discrete�Observation�HMM •��•��• P(R)�=�0.31 P(R)�=�0.50 P(R)�=�0.38 P(B)�=�0.50 P(B)�=�0.25 P(B)�=�0.12 P(Y)�=�0.19 P(Y)�=�0.25 P(Y)�=�0.50 Observation�sequence:��R�B�Y�Y��•�•�•�R not�unique�to�state�sequence 9

  10. HMMs In�Speech�Recognition Represent�speech�as�a�sequence�of�observations Use�HMM�to�model�some�unit�of�speech�(phone,�word) Concatenate�units�into�larger�units ih Phone�Model d d ih Word�Model 10

  11. HMM�Problems�And�Solutions Evaluation: •�Problem�- Compute Probabilty of�observation sequence�given�a�model •�Solution�- Forward�Algorithm� and Viterbi Algorithm Decoding: •�Problem�- Find�state�sequence�which�maximizes probability�of�observation�sequence •�Solution�- Viterbi Algorithm Training: •�Problem�- Adjust�model�parameters�to�maximize probability�of�observed�sequences •�Solution�- Forward-Backward�Algorithm 11

  12. ( × ( ) = ( ( ) ( ) ) Evaluation Probability�of�observation�sequence� O �=�O1�O2� �OT given�HMM�model� λ ฀ λ ฀ is�: λ ฀ λ ฀ λ λ P O P O Q | , | ) �� Q�=�q 0 q 1 …�q T is�a�state�sequence ∑ ∀ Q a b O a b O a b O q T = ∑ L q q q q q q q q T 1 2 − T 1 T 0 1 1 1 2 2 Not�practical�since�the�number�of�paths�is� O(�N T �) N�=�number�of�states�in�model� T�=�number�of�observations�in�sequence 12

  13. ( > = ( ) ( ( ) ) ) = The�Forward�Algorithm α �O t ,�q t �=�S j �|� λ ��) � t� j �=�P(O 1 �O 2 � Compute� α ฀ α ฀ recursively: α ฀ α ฀ 1 if�j�is�start�state α � 0� j �=� 0 otherwise N   α α j i a b O t ������� 0   − t t ij j t ∑ 1 ( )   i 0     λ = α P O S O N 2 T | �������� Computatio n � is �� ( ) T N 13

  14. Forward�Trellis 0.6 1.0 A���0.8 A���0.3 B���0.2 B���0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.23 0.48 0.03 1.0 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.09 0.12 0.13 0.0 14

  15. = ( ) ) ( ) ( = + + < ) ( ) ) ( ( The�Backward�Algorithm β �q t �=�S i �,� λ ��t (i)�=�P(O t+1 �O t+2 � �O T �| ��) Compute� β β β β recursively: 1 if�i�is�end�state β ��T (i)�=� 0 otherwise N β 1 β i a b O j t T �������� = ∑ t ij j t t 1 j=0 λ β α P O S S O N T 2 | �������� Computatio n � is �� ( ) T N 0 0 15

  16. Backward�Trellis 0.6 1.0 A���0.8 A���0.3 B���0.2 B���0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.28 0.22 0.0 0.13 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.7 0.21 1.0 0.06 16

  17. The Viterbi Algorithm For�decoding: Find�the�state�sequence Q�� which�maximizes P(O,�Q�|� λ λ λ λ ) Similar�to�Forward�Algorithm�except MAX� instead�of SUM �O t ,�q t =i�|� λ ��) VP t (i)�=�MA X q 0 , q t-1 ��P(O 1 O 2 � Recursive�Computation: VP t (j)�=�MAX i=0,� ,�N �VP t-1 (i)��a ij b j (O t )������t�>�0 P(O,�Q�|� λ ��)�=�VP T (S N ) Save�each�maximum�for backtrace at�end 17

  18. Viterbi Trellis 0.6 1.0 A���0.8 A���0.3 B���0.2 B���0.7 0.4 Initial Final A A B t=2 t=1 t=3 t=0 0.6�*�0.2 0.6�*�0.8 0.6�*�0.8 state�1 0.23 0.48 0.03 1.0 0.4�*�0.7 0.4�*�0.3 0.4�*�0.3 1.0�*�0.7 1.0�*�0.3 1.0�*�0.3 state�2 0.06 0.12 0.06 0.0 18

  19. Training�HMM�Parameters Train�parameters�of�HMM Tune� λ ฀ to�maximize�P(O�|� λ ) • • No�efficient�algorithm�for�global�optimum • Efficient�iterative�algorithm�finds�a�local�optimum Baum-Welch�(Forward-Backward)�re-estimation Compute�probabilities�using�current�model� λ • Refine�� λ ฀ −− > λ ฀฀ based�on�computed�values • Use� α and�� β from�Forward-Backward� • 19

  20. Forward-Backward�Algorithm S j ξ t (i,j)�=� S i Probability�of�transiting�from�������to at�time��t��given�O =�P(�q t =�S i ,�q t+1 =�S j �|�O,� λ ��) =� α t (i)�a ij �b j (O t+1 )� β t+1 (j) P(O�|� λ ��) α t (i) β t+1 (j) a ij b j (O t+1 ) 20

  21. = = ( ) = = = = ( = ) ) ( ) ( = = Baum-Welch Reestimation expected�number�of�trans�from� Si�to�Sj aij�=� expected�number�of�trans�from� Si − T 1 ξ i j , ∑ t t 0 − T N 1 ξ i j , ∑∑ t t j 0 0 b j (k)�=�expected�number�of�times�in�state�j�with�symbol�k expected�number�of�times�in�state�j�� N ξ i j ��� , ∑ ∑ t t O k i : 0 t +1 T − N 1 ξ i j , ∑∑ t t i 0 0 21

  22. 3.฀ 3.฀ 3.฀ 4.฀ 4.฀ 4.฀ Convergence�of�FB�Algorithm 1. Initialize�� λ ฀ λ ฀ =�(A,B) λ ฀ λ ฀ 2. Compute� α α α α ,� β β β β ,�and� ξ ξ ξ ξ 3.฀ Estimate� λ λ λ λ =�(A,�B)�from� ξ ξ ξ ξ 4.฀ Replace� λ λ λ with� λ λ λ λ λ 5.�If�not�converged�go�to�2 It�can�be�shown�that���P(O�|� λ λ λ λ )�>�P(O�|� λ λ λ λ )�unless� λ λ λ λ =� λ ฀ λ ฀ λ ฀ λ ฀ 22

  23. HMMs In�Speech�Recognition Represent�speech�as�a�sequence�of�symbols Use�HMM�to�model�some�unit�of�speech�(phone,�word) Output�Probabilities�- Prob of�observing�symbol�in�a�state Transition Prob - Prob of�staying�in�or�skipping�state Phone�Model 23

  24. Training HMMs for�Continuous�Speech • Use�only orthograph transcription�of�sentence • no�need�for�segmented/labelled data • Concatenate�phone�models�to�give�word�model • Concatenate�word�models�to�give�sentence�model • Train�entire�sentence�model�on�entire�spoken�sentence 24

  25. Forward-Backward�Training for�Continuous�Speech ALL SHOW ALERTS AX L SH L TS AA ER OW 25

  26. Recognition�Search /w/ /ah/ /ts/ /ax/ /th/ /w/�->�/ah/�->�/ts/���������/th/�->�/ax/ location willamette's what's the kirk's longitude sterett's lattitude display 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend