speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Using Speech with Computers Alan W - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008 Overview Practical and Theory: Practical and Theory: Understand concepts, Implement Solutions Understand concepts, Implement Solutions


  1. Speech Processing 15-492/18-492 Using Speech with Computers Alan W Black August 2008

  2. Overview Practical and Theory: � Practical and Theory: � � Understand concepts, Implement Solutions Understand concepts, Implement Solutions � Speech Recognition � Speech Recognition � � Speech to text Speech to text � Speech Synthesis � Speech Synthesis � � Text to Speech Text to Speech � Spoken Dialog Systems � Spoken Dialog Systems � � Interaction with machines Interaction with machines �

  3. Course Schedule MWF 3:30- -4:20 4:20 � MWF 3:30 � DH 1117 � DH 1117 � Lecturer: Alan W Black (awb@cs.cmu.edu awb@cs.cmu.edu) ) � Lecturer: Alan W Black ( � TA: David Huggins (dhuggins@cs.cmu.edu dhuggins@cs.cmu.edu) ) � TA: David Huggins ( � http://www.speech.cs.cmu.edu/15- -492/ 492/ � http://www.speech.cs.cmu.edu/15 �

  4. Course Details Three lectures a week � Three lectures a week � 4 Homeworks Homeworks � 4 � � Speech Recognition Speech Recognition � � Speech Synthesis Speech Synthesis � � Spoken Dialog System Spoken Dialog System � � Other Other � Final Exam � Final Exam �

  5. Homeworks (Mostly) Practical � (Mostly) Practical � � Build something that talks/can be spoken to Build something that talks/can be spoken to � � Software and speech data will be provided Software and speech data will be provided �  Will run on Windows/Linux or OSX Will run on Windows/Linux or OSX   Access to Linux servers if required Access to Linux servers if required  � Written description of what you did Written description of what you did �

  6. Schedule Details th ) Week 1 (Aug 15 th ) � Week 1 (Aug 15 � � Applications, Human and Computer Speech Applications, Human and Computer Speech � Processing Processing rd ) Speech Recognition 4 (Sep 3 rd Week 2- -4 (Sep 3 ) Speech Recognition � Week 2 � � Signal representation, acoustic modeling Signal representation, acoustic modeling � � Language modeling, applications Language modeling, applications � � Tuning, evaluation, expectations Tuning, evaluation, expectations �

  7. Course Details nd Sep) Speech Synthesis Week 5- -7 (22 7 (22 nd Sep) Speech Synthesis � Week 5 � � Text processing, prosody, waveform synthesis Text processing, prosody, waveform synthesis � � Building voices, evaluations, voice conversion Building voices, evaluations, voice conversion � th Oct) Week 8 (13 th Oct) Multilinguality Multilinguality � Week 8 (13 � � Supporting new languages efficiently Supporting new languages efficiently � th Oct) Dialog Systems 11 (20 th Week 9- -11 (20 Oct) Dialog Systems � Week 9 � � VoiceXML VoiceXML, Mixed initiative, barge , Mixed initiative, barge- -in in � � Design, installation and tuning. Design, installation and tuning. �

  8. Course Details th Nov) Week 12 (10 th Nov) � Week 12 (10 � � Speech to Speech translation Speech to Speech translation � � Language support, tight integration Language support, tight integration � th Nov) Week 13 (17 th Nov) � Week 13 (17 � � Evaluation and expectations Evaluation and expectations � th ) Week 14 (24 th ) � Week 14 (24 � � Speaker ID, Silent Speech, Conversion Speaker ID, Silent Speech, Conversion � � What still needs to be done. What still needs to be done. � st Dec) Week 15 (1 st Dec) � Week 15 (1 � � Exam Exam �

  9. Why Speech Most natural way to communicate � Most natural way to communicate � � (For Humans) (For Humans) � Not ideal for everything � Not ideal for everything � � Graphics and text can be better (sometimes) Graphics and text can be better (sometimes) � Doesn’t compress well � Doesn’t compress well � Hard to search � Hard to search �

  10. Compression Alice in Wonderland � Alice in Wonderland � � Text Text �  150K uncompressed 150K uncompressed   43K compressed 43K compressed  � Speech (2hrs 20mins) Speech (2hrs 20mins) �  270M uncompressed 270M uncompressed   600K compressed (mp3, 24KBS) 600K compressed (mp3, 24KBS) 

  11. Searching Find all NPR broadcasts mentioning Obama Obama � Find all NPR broadcasts mentioning � � Listen to them all Listen to them all � From lecture recordings � From lecture recordings � � Find all occurrences of “this will be in the exam” Find all occurrences of “this will be in the exam” � So listen to it faster … � So listen to it faster … � � Normal 2x speed Normal 2x speed � � 2x 4x 8x 2x 4x 8x �

  12. Eyes/Hands Free � Interaction when driving Interaction when driving � � Look at screen to see next turnoff Look at screen to see next turnoff � � “In 200 yards turn right onto Murray Ave.” “In 200 yards turn right onto Murray Ave.” � � Blind users/ Assistive technology Blind users/ Assistive technology � � Text isn’t very useful Text isn’t very useful � � Alerts Alerts � � “Will self “Will self- -destruct in 10 seconds” destruct in 10 seconds” vs vs � � blinking light blinking light � � Telephone dialog systems Telephone dialog systems �

  13. Speech Applications � Command and Control Command and Control � � Information Agents Information Agents � � Speech to Speech Translation Speech to Speech Translation � � Speech summarization Speech summarization � � Lecture or Meeting summarization Lecture or Meeting summarization � � Transcription/Dictation Transcription/Dictation � � Speaker Identification Speaker Identification � � emotion/dialect/language emotion/dialect/language � � Language Learning Language Learning �

  14. “Hot” Commercial Applications Location- -based services: based services: � Location � � Yahoo GO Yahoo GO � � Google Google Maps Maps � � Microsoft Live Search Microsoft Live Search � All phone/pda pda based based � All phone/ � � Use speech Use speech- -in in � � Directions speech Directions speech- -out out �

  15. Other Speech uses Spoken Dialog Systems Spoken Dialog Systems - - Let’s Go Public 412 268 3526 evenings 412 442 2000 Let’s Go Public 412 268 3526 evenings 412 442 2000 - - Pittsburgh bus timetables by phone Pittsburgh bus timetables by phone - - Assistive Technologies Assistive Technologies - - Screen readers Screen readers - - Augmentitive and assistive communication devices and assistive communication devices Augmentitive - - On- -line Personalization line Personalization On - - Blogcasts (your voice, or appropriate voice) Blogcasts (your voice, or appropriate voice) - - Game character customization Game character customization - - Talking Heads Talking Heads - - CMU’s roboceptionist roboceptionist CMU’s - - Singing Synthesis Singing Synthesis - - XML interface for song specification XML interface for song specification - -

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend