speech technology for mobile phones
play

Speech Technology for Mobile Phones Part I : ASR, and TTS on the - PowerPoint PPT Presentation

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been


  1. Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been collected from various sources available on the web and have been acknowledged in the slides.

  2. Topics Covered • How can speech technology be used for developing applications on a mobile phone ? • What is Automatic speech recognition (ASR) and text to speech synthesis (TTS) ? • What are the challenges in implementing ASR and TTS systems on a mobile phone ? • What are the potential applications of speech technology in delivering personalized services on a mobile phone ?

  3. Broad Objectives of Speech Technology for Machines Speech to Text (ASR) Text to Speech (TTS) Source : Reynolds et. al, Apple developer page

  4. Speech Recognition for Mobile Phones • Speech recognition converts a speech signal, acquired by a mobile phone, to a sequence of words. • The recognition output can be used in command and control, email, search, and communication. • This output can also be used in dialog management and natural language understanding. • What you can do with it : Dictation, Call routing, Directory assistance, Travel planning, and Logistics.

  5. Overview of the Automatic Speech Recognition (ASR) Technology Open Source Tools : HTK and CMU Sphinx Source : Google Image Search

  6. Popular Commercial Applications : Siri, Google Voice Source : Google, Apple

  7. Client and Server Based Speech Recognition on the Mobile Phone Server based Speech Speech Recognition at the Recognition on the Mobile Client Mobile Phone Phone Source : Pearce et. al. ETSI

  8. Speech Recognition in a little bit of detail Source : MIT OC, Reynolds et. al

  9. Speech Recognition on Mobile Phones Source : Rose et. al

  10. ASR Issues on Mobile Phone • Memory Crunching • Computational Complexity • Power Requirement • Floating Point Support

  11. ASR Issues on Mobile Phones : Search Complexity DH DH EH R [word] K AA R “Their Car” = P(“DH”) Source : Slides Krishna et.al, from U Michigan

  12. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH Source : Slides from Krishna et.al, U Michigan

  13. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH Source : Slides from Krishna et. al, U Michigan

  14. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” DH EH R AX IH AH IY “The” “Ear” [word] Source : Slides from Krishna et. al, U Michigan

  15. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” DH EH R AX IH AH IY “The” “Ear” [word] Source : Slides from Krishna et. al, U Michigan

  16. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R “Their” “Car” DH EH R K AA R AX IH AE P “Cap” AH IY T “The” “Ear” “Cat” [word] [word] Source : Slides from Krishna et. al, U Michigan

  17. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH EH R K AA R AX IH NH AE P AH IY L T N EH F S OY Source : Slides from Krishna et. al, U Michigan

  18. ASR Issues on Mobile Phones : Search Complexity DH EH R [word] K AA R DH EH R K AA R AX IH NH AE P TH AH IY IY L T SH N EH F S T OY G OW Source : Slides from Krishna et. al, U Michigan

  19. ASR Issues on Mobile Phones : Search Complexity AX V DH EH R [word] K AA R JH ZH GH G SH DH EH R K AA R OW Z AX IH NH AE P TH CH DK IH AH IY IY OW IY L T SH DUH N EH F ER F K S T IH OY G OW Source : Slides from Krishna et. al, U Michigan

  20. SEARCH – Computing Requirements on the Mobile Phone 1. Search • Roughly 50% of total time for Speech Recognition is taken away by search • Even More for Large Vocabulary Recognition • Considerably less for Small vocabulary tasks 2. Solutions • Network optimization • Efficient search techniques • Pruning methods i) Look-ahead based strategy ii) Pruning threshold dependent on the grammar • Multi-pass methods i) A fast first pass to produce a short list of candidates or a lattice, followed by second pass rescoring with larger acoustic and language models Source : Rose et. al

  21. Exploiting Task Constraints on the Mobile Phone Form Filling Example (Rose et, al) • Recognize first and last names independently • Switch between pre compiled grammars • Generate Dynamic grammars

  22. What is Text to Speech Synthesis (TTS) • Process of converting a given text in a specific language to human like speech • Software or Hardware based methods • Software based methods are preferred • Involves Text Analysis, Automatic Phonetization, Dictionary or Rule based synthesis. • Types : Concatenative, Unit Selection, Diphone based, Formant based, Articulatory, and HMM based Synthesis. • What you can do with it : E-Learning, Screen Readers, Audio Books, ATM Banking, Call Centers, Interactive Kiosks

  23. Overview of Text to Speech Speech Synthesis (TTS) Technology Open Source Tool : Festival speech synthesis system from CSTR Source : Google image search, Wikipedia

  24. Cell Phone based Applications using Speech Inputs PSTN Web/Database Server Asterisk Server Network SMS Gateway

  25. Cell Phone based Agriculture Information Systems Crop Advisory Weather Advisory Source : Digital Mandi for the Indian Kisan

  26. Questions rhegde@iitk.ac.in URL : http://202.3.77.107/mips/ ?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend