Speech Technology for Mobile Phones Part I : ASR, and TTS on the - - PowerPoint PPT Presentation

speech technology for mobile phones
SMART_READER_LITE
LIVE PREVIEW

Speech Technology for Mobile Phones Part I : ASR, and TTS on the - - PowerPoint PPT Presentation

Speech Technology for Mobile Phones Part I : ASR, and TTS on the Mobile phone Rajesh M. Hegde rhegde@iitk.ac.in Associate Professor Dept. of EE Indian Institute of Technology Kanpur Several pictures used in this presentation have been


slide-1
SLIDE 1

Speech Technology for Mobile Phones

Part I : ASR, and TTS on the Mobile phone

Rajesh M. Hegde

rhegde@iitk.ac.in

Associate Professor

  • Dept. of EE

Indian Institute of Technology Kanpur

Several pictures used in this presentation have been collected from various sources available on the web and have been acknowledged in the slides.

slide-2
SLIDE 2

Topics Covered

  • How can speech technology be used for

developing applications on a mobile phone ?

  • What is Automatic speech recognition (ASR)

and text to speech synthesis (TTS) ?

  • What are the challenges in implementing

ASR and TTS systems on a mobile phone ?

  • What are the potential applications of

speech technology in delivering personalized services on a mobile phone ?

slide-3
SLIDE 3

Broad Objectives of Speech Technology for Machines

Speech to Text (ASR) Text to Speech (TTS) Source : Reynolds et. al, Apple developer page

slide-4
SLIDE 4

Speech Recognition for Mobile Phones

  • Speech recognition converts a speech signal,

acquired by a mobile phone, to a sequence of words.

  • The recognition output can be used in command

and control, email, search, and communication.

  • This output can also be used in dialog

management and natural language understanding.

  • What you can do with it : Dictation, Call

routing, Directory assistance, Travel planning, and Logistics.

slide-5
SLIDE 5

Overview of the Automatic Speech Recognition (ASR) Technology

Open Source Tools : HTK and CMU Sphinx Source : Google Image Search

slide-6
SLIDE 6

Popular Commercial Applications : Siri, Google Voice

Source : Google, Apple

slide-7
SLIDE 7

Client and Server Based Speech Recognition on the Mobile Phone

Source : Pearce et. al. ETSI

Speech Recognition at the Client Mobile Phone Server based Speech Recognition on the Mobile Phone

slide-8
SLIDE 8

Speech Recognition in a little bit

  • f detail

Source : MIT OC, Reynolds et. al

slide-9
SLIDE 9

Speech Recognition on Mobile Phones

Source : Rose et. al

slide-10
SLIDE 10

ASR Issues on Mobile Phone

  • Memory Crunching
  • Computational Complexity
  • Power Requirement
  • Floating Point Support
slide-11
SLIDE 11

“Their Car” =

ASR Issues on Mobile Phones : Search Complexity

DH EH R [word] K AA R

DH

P(“DH”)

Source : Slides Krishna et.al, from U Michigan

slide-12
SLIDE 12

DH

DH EH R [word] K AA R

Source : Slides from Krishna et.al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-13
SLIDE 13

DH

DH EH R [word] K AA R

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-14
SLIDE 14

DH EH R AH AX “The” IH IY [word] “Ear” “Their”

DH EH R [word] K AA R

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-15
SLIDE 15

DH EH R [word] K AA R

DH EH R AH AX “The” IH IY [word] “Ear” “Their”

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-16
SLIDE 16

“Their” “Car” “The” [word] “Ear” DH EH R AH AX IH IY K AA R [word] P AE T “Cat” “Cap”

DH EH R [word] K AA R

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-17
SLIDE 17

DH EH R AH AX IH IY K AA R P AE T NH F S N L EH OY

DH EH R [word] K AA R

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-18
SLIDE 18

DH EH R AH AX IH IY K AA R P AE T NH F S N L EH OY TH SH T IY OW G

DH EH R [word] K AA R

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-19
SLIDE 19

DH EH R AH AX IH IY K AA R P AE T NH F S N L EH OY TH SH T IY OW G DK CH ER IY IH F K DUH OW IH Z OW JH V ZH AX G SH GH

DH EH R [word] K AA R

Source : Slides from Krishna et. al, U Michigan

ASR Issues on Mobile Phones : Search Complexity

slide-20
SLIDE 20

SEARCH – Computing Requirements on the Mobile Phone

  • 1. Search
  • Roughly 50% of total time for Speech Recognition is taken away by search
  • Even More for Large Vocabulary Recognition
  • Considerably less for Small vocabulary tasks
  • 2. Solutions
  • Network optimization
  • Efficient search techniques
  • Pruning methods

i) Look-ahead based strategy ii) Pruning threshold dependent on the grammar

  • Multi-pass methods

i) A fast first pass to produce a short list of candidates or a lattice, followed by second pass rescoring with larger acoustic and language models Source : Rose et. al

slide-21
SLIDE 21

Exploiting Task Constraints on the Mobile Phone

Form Filling Example (Rose et, al)

  • Recognize first and last names

independently

  • Switch between pre compiled grammars
  • Generate Dynamic grammars
slide-22
SLIDE 22

What is Text to Speech Synthesis (TTS)

  • Process of converting a given text in a specific language

to human like speech

  • Software or Hardware based methods
  • Software based methods are preferred
  • Involves Text Analysis, Automatic Phonetization,

Dictionary or Rule based synthesis.

  • Types : Concatenative, Unit Selection, Diphone based,

Formant based, Articulatory, and HMM based Synthesis.

  • What you can do with it : E-Learning, Screen Readers,

Audio Books, ATM Banking, Call Centers, Interactive Kiosks

slide-23
SLIDE 23

Overview of Text to Speech Speech Synthesis (TTS) Technology

Open Source Tool : Festival speech synthesis system from CSTR Source : Google image search, Wikipedia

slide-24
SLIDE 24

Network PSTN

SMS Gateway Web/Database Server Asterisk Server

Cell Phone based Applications using Speech Inputs

slide-25
SLIDE 25

Cell Phone based Agriculture Information Systems

Crop Advisory Weather Advisory Source : Digital Mandi for the Indian Kisan

slide-26
SLIDE 26

Questions

rhegde@iitk.ac.in URL : http://202.3.77.107/mips/

?