speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Recognition Template - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by Templates A little history A little history Matching Templates Matching Templates DTW (Dynamic Time Warping) DTW (Dynamic


  1. Speech Processing 15-492/18-492 Speech Recognition Template matching

  2. Speech Recognition by Templates A little history … � A little history … � Matching Templates � Matching Templates � DTW (Dynamic Time Warping) � DTW (Dynamic Time Warping) � Beyond template matching � Beyond template matching �

  3. Radio Rex (1922) • Toys always lead technology … • Call “Rex” and he comes out of his kennel • (Crystalradio.com and Rhys Jones)

  4. Toy ASR“Tricks” Radio Rex � Radio Rex � � Recognizes vowel formants in “EH” Recognizes vowel formants in “EH” � Voice activated toy train � Voice activated toy train � � Multilingual stop/go Multilingual stop/go hashire/tomate hashire/tomate � Toys “pets” don’t need perfect ASR � Toys “pets” don’t need perfect ASR �

  5. Template Matching Record templates from user � Record templates from user � � Store in library Store in library � Record ASR example � Record ASR example � � Compare against each library template Compare against each library template � Select closest example � Select closest example � For example … � For example … � � On a voice dialing system On a voice dialing system �

  6. Voice Dialing System • Library – Mom – Dad – Bob – Mario’s Pizza – Let’s Go Bus Information System

  7. Matching in Time Domain Duration � Duration � � Will discriminate some examples Will discriminate some examples � � But Mom, Bob and Dad will be confused But Mom, Bob and Dad will be confused � What about spectral properties � What about spectral properties �

  8. Matching in Frequency Domain Mom Bob

  9. Different deliveries We change durations � We change durations � � Two utterances are never the same Two utterances are never the same � When it fails we change our delivery � When it fails we change our delivery � � Become more Become more articular articular � � “clearer” “clearer” �

  10. Dynamic Time Warping Template Sample Speech

  11. DTW algorithm i Template i-1 j-1 j Sample � For each square For each square � � Dist(template[i],sample[j Dist(template[i],sample[j]) + ]) + � smallest_of (Dist(template[i (Dist(template[i- -1],sample[j]) 1],sample[j]) smallest_of Dist(template[i],sample[j- -1]) 1]) Dist(template[i],sample[j Dist(template[i- -1],sample[j 1],sample[j- -1]) 1]) Dist(template[i Remember which choice your took (count path) Remember which choice your took (count path)

  12. Multiple Templates Compare against each � Compare against each � Find closest � Find closest � Need to normalize scores � Need to normalize scores � � (divide by length of matches) (divide by length of matches) �

  13. Matching Templates Template Library Sample Word0 Word1 Word2 … For Word in Templates Score = dtw(Template[Word], Sample); if (Score < BestScore) BestWord = Word; DoAction(Action[BestWord])

  14. DTW issues What happens with no- -matches matches � What happens with no � � Need to deal with none of the above Need to deal with none of the above � What happens with more templates � What happens with more templates � � Harder to choose between Harder to choose between � � Once variance greater than differences Once variance greater than differences � Choose templates that are very different � Choose templates that are very different �

  15. DTW/Template Applications Voice dialer � Voice dialer � Simple command and control � Simple command and control � Speaker ID � Speaker ID �

  16. Speaker ID Template Library Sample Speaker0 Speaker1 Speaker2 … For Speaker in Templates Score = dtw(Template[Speaker], Sample); if (Score < BestScore) BestSpeaker = Speaker;

  17. DTW � Advantages Advantages � � Works well for small number of templates (<20) Works well for small number of templates (<20) � � Language independent Language independent � � Speaker specific Speaker specific � � Easy to train (end user controls it) Easy to train (end user controls it) � � Disadvantages Disadvantages � � Limited number of templates Limited number of templates � � Speaker specific Speaker specific � � Need actual training examples Need actual training examples �

  18. More reliable matching • Distance metric – Euclidean • But some distances are bigger than others – Silence is pretty similar – Fricatives are quite larger • A longer fricative might give large score • A longer vowel might give smaller score

  19. More reliable matching • Having multiple template examples – Individual matches or – Average them together • DTW align all of the examples • Collect statistics as a Gaussian – Mean and standard deviation for each coeff

  20. More reliable distances • Instead of Euclidean distance – Doesn’t care about the standard deviation • Use Mahalanobis distance – Care about means and standard deviation

  21. Extending Template matching String word templates together � String word templates together � � Need to find word segmentation Need to find word segmentation � Word0 Word1 Word2 … But there are many words … � But there are many words … �

  22. Extending template model String phoneme templates together � String phoneme templates together � � A template model for each phoneme A template model for each phoneme � Phoneme Templates Sample Phone0 k ae t Phone1 Phone2 …

  23. Summary Speech Recognition by Templates � Speech Recognition by Templates � � Good for simple small vocabulary tasks Good for simple small vocabulary tasks � Dynamic Time Warping (DTW) � Dynamic Time Warping (DTW) � � Can match different durational examples Can match different durational examples � Averaging over multiple models � Averaging over multiple models � Distance metrics � Distance metrics � � Euclidean Euclidean vs vs Mahalanobis Mahalanobis �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend