speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Processing Current Topics - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges Commercial and Research Current and Future What are the hot topics in Speech What are the hot topics in Speech What currently works What


  1. Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges Commercial and Research

  2. Current and Future What are the hot topics in Speech � What are the hot topics in Speech � � What currently works What currently works � � What could work soon (5 What could work soon (5- -10years) 10years) � What are the industry hot topics � What are the industry hot topics � What are the research challenges � What are the research challenges �

  3. Spoken Dialog: Now Industry: � Industry: � � Location based querying Location based querying �  Google Google: 411, : 411, smartphone smartphone   Microsoft Live Search: Microsoft Live Search: smartphone smartphone   Yahoo ( Yahoo (Vlingo Vlingo) )  � Blackberry, Blackberry, IPhone IPhone �  (Owners have money) (Owners have money)  � How do you make money out of this … How do you make money out of this … �

  4. Spoken Dialog: Now Research � Research � � Error recovery Error recovery � � Adaptive systems Adaptive systems � � Rapid deployment Rapid deployment � � Learning dialog structure from data Learning dialog structure from data �

  5. ASR: Now Industry � Industry � � Moving from grammar based to N Moving from grammar based to N- -gram based gram based � � Broadcast news transcription of IR Broadcast news transcription of IR � � Robust speech recognition: Robust speech recognition: �  In car, outside, in noisy office In car, outside, in noisy office  � LM adaptation from other sources LM adaptation from other sources �  Using click through and search queries Using click through and search queries  � Pronunciation variants (“wrong” ones too) Pronunciation variants (“wrong” ones too) � � Medical transcription Medical transcription �

  6. ASR: Now Research: � Research: � � Discriminative training Discriminative training �  Acoustic parameter projections to discriminate Acoustic parameter projections to discriminate  between the correct answers and competitors between the correct answers and competitors � Robust recognition Robust recognition �  Far field microphones Far field microphones   Blind source separation Blind source separation  � Out of vocabulary words Out of vocabulary words � � Unsupervised training Unsupervised training �

  7. TTS: Now Industry � Industry � � Building custom voices (and your voice) Building custom voices (and your voice) � � Multilingual on small devices Multilingual on small devices �  E.g. for GPS Navigation over Europe E.g. for GPS Navigation over Europe  � Easy methods to build new languages Easy methods to build new languages �

  8. TTS: Now Research � Research � � Improving statistical synthesis Improving statistical synthesis � � Rapid support in new languages Rapid support in new languages � � Emotional speech synthesis Emotional speech synthesis � � Automatic building of voices from data Automatic building of voices from data �  Without any human intervention Without any human intervention  � Synthesis beyond the sentence Synthesis beyond the sentence �  Synthesis with more text analysis Synthesis with more text analysis 

  9. Speech to Speech Translation Industry � Industry � � One way systems, domain limited systems One way systems, domain limited systems � � Simple targeted cell phone systems Simple targeted cell phone systems � Research � Research � � Two way systems, large domains Two way systems, large domains � � One way lecture/broadcast news One way lecture/broadcast news �

  10. VC and SID: Now Voice conversion � Voice conversion � � Cross Lingual Voice Conversion Cross Lingual Voice Conversion � � Emotion/style conversion Emotion/style conversion � � Conversion without training data Conversion without training data � Speaker ID � Speaker ID � � Accuracy on large data sets (> 1000 speakers) Accuracy on large data sets (> 1000 speakers) � � Cross channel/language ID Cross channel/language ID � � More information in ID (prosody, More information in ID (prosody, vocab vocab) ) �

  11. CALL: Now Industry � Industry � � Pronunciation training Pronunciation training � � Scenario practicing Scenario practicing � Research � Research � � Game based tools Game based tools � � Measuring educational contribution Measuring educational contribution �

  12. Speech Processing Future Hard challenges (PhD topics and beyond) � Hard challenges (PhD topics and beyond) � All on the research side � All on the research side � � But maybe in Research Labs But maybe in Research Labs �

  13. Speech Reco without Speech Using other modalities � Using other modalities � � Lip movement, muscle movement Lip movement, muscle movement � � Silent speech Silent speech �  No generated audio No generated audio   Just think about the words Just think about the words  � Gesture recognition Gesture recognition �

  14. Conversational Systems Participant in a meeting � Participant in a meeting � � True conversational speech True conversational speech �  Appropriate non Appropriate non- -word speech generation word speech generation   Know when to speak, when to laugh, when to listen Know when to speak, when to laugh, when to listen   Appropriate timing conversation Appropriate timing conversation   Able to interrupt when having something to say Able to interrupt when having something to say  � Have something to say Have something to say �

  15. Summaries and Discussions Describe a paper/movie/event � Describe a paper/movie/event � � Appropriate summary Appropriate summary � � Allow questions Allow questions � � Know when to use style/emotion Know when to use style/emotion � � Not just speech< Not just speech<- ->text >text �  Understand more of the text content Understand more of the text content 

  16. Final Notes Don’t forget to fill in Faculty Course � Don’t forget to fill in Faculty Course � Evaluation Evaluation Final Homework due � Final Homework due � th 3:30pm Monday 8 th � Monday 8 3:30pm � Final exam � Final exam � th 1pm Tuesday 16 th � Tuesday 16 1pm- -4pm 4pm WeH WeH 6423 6423 �

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend