how to wreck a nice beach
play

How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL - PowerPoint PPT Presentation

How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6, 2007 1 Speech Recognition Today Dictation Transcribe spoken words to text Support punctuation and correction Dragon NaturallySpeaking


  1. How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6, 2007 1

  2. Speech Recognition Today  Dictation  Transcribe spoken words to text  Support punctuation and correction  Dragon NaturallySpeaking (2004)  Interactive Voice Response  System-initiated dialog  Saturday Night Live Mock (2005) 6.Insight - How to Wreck a Nice Beach 2

  3. Theory 6.Insight - How to Wreck a Nice Beach 3

  4. Speech Recognition Overview Acoustic Lexical Language Models Models Models Speech Recognized Signal Words Representation Search a r z m - t 0 t 3 t 1 t 2 t 4 t 5 t 6 t 7 t 8 Time 6.Insight - How to Wreck a Nice Beach 4

  5. Speech Signal Processing Speech Spectrum 40 20 Energy (dB) 0 9 Davis Square, Somerville -20 -40 -60 0 1000 2000 3000 4000 5000 6000 7000 8000 Frequency (Hz) MFCC Features (C 0 - C 12 ) 50 0 -50 -100 0 20 40 60 80 100 120 140 160 6.Insight - How to Wreck a Nice Beach 5 Frame (1 sec = 100 frames)

  6. Acoustic Modeling Techniques  Pattern match  Dim reduction Challenges  Lots of overlap  Data annotation  Speaker / Accent  Noise 6.Insight - How to Wreck a Nice Beach 6

  7. Lexical Modeling a (ax | ey) Techniques ● ● ● beach b iy ch  Dictionary ● ● ●  Pron generation nice n (iy | ay) s ● ● ● recognize r eh k ax gd n ay z Challenges ● ● ● speech s p- iy ch  Missing words ● ● ● stata s t- (ey | aa) tf ax  6.Insight ● ● ● tomato t ax m (ey | aa) tf ow  Pron variation ● ● ● wreck r eh kd  Nice ● ● ●  Stata 6.Insight - How to Wreck a Nice Beach 7

  8. Language Modeling Purpose Constrain word order  Assign probability  1. recognize speech 2. wreck a nice beach Techniques 0.036 the ● ● ● Context-free grammar 0.011 a good  0.026 a 0.003 a morning N-gram 0.018 of  ● ● ● 0.086 good morning ● ● ● Challenges 0.007 good 0.026 good day ● ● ● Data sparsity 0.005 day ● ● ●  0.149 of a ● ● ● Domain adaptation 0.003 morning  0.057 of day ● ● ● ● ● ● 6.Insight - How to Wreck a Nice Beach 8

  9. Search Techniques a  Dyn programming  A * search backtrace r Lexical Nodes  Pruning z m Challenges -  Huge search space t 0 t 2 t 3 t 5 t 6 t 7 t 1 t 4 t 8 Time - m a r z - 6.Insight - How to Wreck a Nice Beach 9

  10. Practice 6.Insight - How to Wreck a Nice Beach 10

  11. Command & Control Microsoft Windows Vista Speech Recognition Features  Control PC apps  Dictate documents  Accessibility Challenges  Constrained cmds  User training 6.Insight - How to Wreck a Nice Beach 11

  12. Interactive Dialog Systems SLS City Browser http://web.sls.csail.mit.edu/city/ Features  Restaurants, POI  Free-form dialog  Query refinement  Multimodal control Challenges  Labor intensive  Data collection 6.Insight - How to Wreck a Nice Beach 12

  13. Audio Indexing & Search SLS Lecture Browser http://web.sls.csail.mit.edu/lectures/ Features  Keyword search  Topic segmentation  Lecture transcript  A/V navigation Challenges  Disfluencies  Jargons 6.Insight - How to Wreck a Nice Beach 13

  14. Mobile Speech Recognition SLS Pocket SUMMIT Speech Recognizer Features  Small-footprint  Low CPU/memory Challenges  Noise robustness  Limited grammar 6.Insight - How to Wreck a Nice Beach 14

  15. Challenges 6.Insight - How to Wreck a Nice Beach 15

  16. Noise Robustness Microphone quality  Close-Talking Headset  Bluetooth Headset  Mounted GPS Environmental/Background noise  Music  Babble  Heating Vent 6.Insight - How to Wreck a Nice Beach 16

  17. Adaptation Speaker Adaptation  Gender  Accent Domain Adaptation  GPS navigation  Lecture transcription 6.Insight - How to Wreck a Nice Beach 17

  18. Application Diversity Labor Intensive  Few Applications  Weather  Flight Reservation  Restaurants Can the system automatically generate spoken dialogue systems via user feedback? 6.Insight - How to Wreck a Nice Beach 18

  19. Resources 6.Insight - How to Wreck a Nice Beach 19

  20. Related Courses Machine Learning Linguistics Natural Lang Proc 6.825, 6.867 24.901 6.864    Acoustic Lexical Language Models Models Models Speech Recognized Signal Words Representation Search Signal Processing Algorithms 6.003, 6.011, 6.341 6.034, 6.046, 6.851   Acoustic Phonetics Speech Recognition 6.541, 6.543, 6.551, 6.552 6.345   6.Insight - How to Wreck a Nice Beach 20

  21. Research Groups @ MIT CSAIL Spoken Language Systems Group  PIs – James Glass, Stephanie Seneff, Victor Zue  Research – Speech recognition and dialog systems  http://groups.csail.mit.edu/sls/ RLE Speech Communications Group  PIs – Kenneth Stevens, Stefanie Shattuck-Hufnagel  Research – Speech production and perception  http://www.rle.mit.edu/speech/ 6.Insight - How to Wreck a Nice Beach 21

  22. External Opportunities Companies & Research Labs (alphabetical order)  AT&T  BBN  Google  IBM  Microsoft  Nuance  SRI  VoiceSignal Technology  Yahoo  … 6.Insight - How to Wreck a Nice Beach 22

  23. Conclusion To wreck a nice beach, you need: Shovel  Bulldozer  …  Questions? Paul Hsu bohsu@mit.edu 32-G442 6.Insight - How to Wreck a Nice Beach 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend