eecs e6870
play

EECS E6870 converting speech to text Speech Recognition automatic - PowerPoint PPT Presentation

What Is Speech Recognition? EECS E6870 converting speech to text Speech Recognition automatic speech recognition (ASR), speech-to-text (STT) what its not Michael Picheny,


  1. ✝✞ ✂ ☛ ✠✡ ✁ � ✄☎ ✆ ✟ What Is Speech Recognition? EECS E6870 ■ converting speech to text Speech Recognition ● automatic speech recognition (ASR), speech-to-text (STT) ■ what it’s not Michael Picheny, Stanley F . Chen, Bhuvana Ramabhadran ● speaker recognition — recognizing who is speaking IBM T.J. Watson Research Center ● natural language understanding — understanding what is being said Yorktown Heights, NY, USA ● speech synthesis — converting text to speech (TTS) { picheny,stanchen,bhuvana } @us.ibm.com 8 September 2009 EECS E6870: Speech Recognition EECS E6870: Speech Recognition 1 Why Is Speech Recognition Important? Why Is Speech Recognition Important? ■ speech is potentially the fastest way people can communicate with machines Ways that people communicate ● natural; requires no specialized training modality method rate (words/min) ● can be used in parallel with other modalities sound speech 150–200 ■ remote speech access is ubiquitous sight sign language; gestures 100–150 touch typing; mousing 60 ● not everyone has Internet; everyone has a phone taste covering self in food < 1 ■ archiving/indexing/compressing/understanding human speech smell not showering < 1 ● e.g. , transcription: legal, medical, TV ● e.g. , transaction: flight information, name dialing ● e.g. , embedded: navigation from the car EECS E6870: Speech Recognition 2 EECS E6870: Speech Recognition 3

  2. ✟ ✁ ✆ ✝✞ ✂ ✄☎ � ☛ ✠✡ This Course Speech Recognition Is Multidisciplinary ■ too much knowledge to fit in one brain ■ cover fundamentals of ASR in depth (weeks 1–9) ● signal processing, machine learning ● linguistics ■ survey state-of-the-art techniques (weeks 10–13) ● computational linguistics, natural language processing ■ force you, the student, to implement key algorithms in C++ ● pattern recognition, artificial intelligence, cognitive science ● C++ is the international language of ASR ■ three lecturers (no TA?) ● Michael Picheny ● Stanley F . Chen ● Bhuvana Ramabhadran ■ from IBM T.J. Watson Research Center, Yorktown Heights, NY ● hotbed of speech recognition research EECS E6870: Speech Recognition 4 EECS E6870: Speech Recognition 5 Meets Here and Now Assignments ■ 1300 Mudd; 4:10-6:40pm Tuesday ■ four programming assignments (80% of grade) ● 5 minute break at 5:25pm ● implement key algorithms for ASR in C++ (best supported) ● some short written questions ■ hardcopy of slides distributed at each lecture ● optional exercises for those with excessive leisure time ● 4 per page ● check, check-plus, check-minus grading ■ final reading project (undecided; 20% of grade) ● choose paper(s) about topic not covered in depth in course; give 15- minute presentation summarizing paper(s) ● programming project ■ weekly readings ● journal/conference articles; book chapters EECS E6870: Speech Recognition 6 EECS E6870: Speech Recognition 7

  3. ✁ ✆ ☛ ✠✡ ✄☎ ✟ ✝✞ ✂ � Course Outline Programming Assignments week topic assigned due ■ C++ (g++ compiler) on x86 PC’s running Linux 1 Introduction; ● knowledge of C++ and Unix helpful 2 Signal processing; DTW lab 1 3 Gaussian mixture models; HMMs ■ extensive code infrastructure in C++ with SWIG to make it accessible from 4 Hidden Markov Models lab 2 lab 1 Java and Python (provided by IBM) 5 Language modeling ● you, the student, only have to write the “fun” parts 6 Pronunciation modeling,Decision lab 3 lab 2 ● by end of course, you will have written key parts of basic large vocabulary Trees continuous speech recognition system 7 LVCSR and finite-state transducers 8 Search lab 4 lab 3 ■ get account on ILAB computer cluster 9 Robustness; Adaptation ● complete the survey 10 Advanced language modeling project lab 4 ■ labs due Wednesday at 6pm 11 Discriminative training, ROVER 12 Spoken Document Retrieval, S2S 13 Project presentations project EECS E6870: Speech Recognition 8 EECS E6870: Speech Recognition 9 Readings How To Contact Us ■ PDF versions of readings will be available on the web site ■ in E-mail, prefix subject line with “EECS E6870:” !!! ■ recommended text (bookstore): ■ Michael Picheny — picheny@us.ibm.com ● Speech Synthesis and Recognition , Holmes, 2nd edition (paperback, 256 pp., 2001, ISBN 0748408576) [ Holmes ] ■ Stanley F . Chen — stanchen@watson.ibm.com ■ reference texts (library, online, bookstore, EE?): ■ Bhuvana Ramabhadran — bhuvana@us.ibm.com ● Fundmentals of Speech Recognition , Rabiner, Juang ● phone: 914-945-2593,914-945-2976 (paperback, 496 pp., 1993, ISBN 0130151572) [ R+J ] ● Speech and Language Processing , Jurafsky, Martin ■ office hours: right after class; or before class by appointment (2nd-Ed, hardcover, 1024 pp., 2008, ISBN 01318732210) [ J+M ] ■ Courseworks ● Statistical Methods for Speech Recognition , Jelinek ● for posting questions about labs (hardcover, 305 pp., 1998, ISBN 0262100665) [ Jelinek ] ● Spoken Language Processing , Huang, Acero, Hon (paperback, 1008 pp., 2001, ISBN 0130226165) [ HAH ] EECS E6870: Speech Recognition 10 EECS E6870: Speech Recognition 11

  4. ✟ ✄☎ ☛ ✠✡ � ✁ ✂ ✝✞ ✆ Web Site Help Us Help You ■ feedback questionnaire after each lecture (2 questions) http://www.ee.columbia.edu/˜stanchen/fall09/e6870/ ● feedback welcome any time ■ syllabus ■ EE’s may find CS parts challenging, and vice versa ■ slides from lectures (PDF) ● online by 8pm the night before each lecture ■ you, the student, are partially responsible for quality of course ■ lab assignments (PDF) ■ together, we can get through this ■ reading assignments (PDF) ■ let’s go! ● online by lecture they are assigned ● password-protected (not working right now) ● username: speech , password: pythonrules EECS E6870: Speech Recognition 12 EECS E6870: Speech Recognition 13 Outline For Rest of Today A Quick Historical Tour 1. a brief history of speech recognition 1. the early years: 1920–1960’s ■ ad hoc methods 2. speech recognition as pattern classification ■ why is speech recognition hard? 2. the birth of modern ASR: 1970–1980’s ■ maturation of statistical methods; basic HMM/GMM framework developed 3. speech production and perception 3. the golden years: 1990’s–now 4. introduction to signal processing ■ more processing power, data ■ variations on a theme; tuning; ■ demand from downstream technologies (search, translation) EECS E6870: Speech Recognition 14 EECS E6870: Speech Recognition 15

  5. ✟ ✝✞ ✆ ✄☎ ✂ ✁ � ✠✡ ☛ The Start of it All The Early Years: 1920–1960’s Ad hoc methods ■ simple signal processing/feature extraction ● detect energy at various frequency bands; or find dominant frequencies ■ many ideas central to modern ASR introduced, but not used all together ● e.g. , statistical training; language modeling ■ small vocabulary ● digits; yes/no; vowels ■ not tested with many speakers (usually < 10) Radio Rex (1920’s) ■ error rates < 10% ■ speaker-independent single-word recognizer (“Rex”) ● triggered if sufficient energy at 500Hz detected (from “e” in “Rex”) EECS E6870: Speech Recognition 16 EECS E6870: Speech Recognition 17 The Turning Point The Turning Point ■ killed ASR research at Bell Labs for many years Whither Speech Recognition? John Pierce, Bell Labs, 1969 ■ partially served as impetus for first (D)ARPA program (1971–1976) funding Speech recognition has glamour. Funds have been available. Results ASR research have been less glamorous . . . ● goal: integrate speech knowledge, linguistics, and AI to make a . . . General-purpose speech recognition seems far away. Special- breakthrough in ASR purpose speech recognition is severely limited. It would seem appropriate ● large vocabulary: 1000 words; artificial syntax for people to ask themselves why they are working in the field and what they can expect to accomplish . . . ● < 60 × “real time” . . . These considerations lead us to believe that a general phonetic typewriter is simply impossible unless the typewriter has an intelligence and a knowledge of language comparable to those of a native speaker of English . . . EECS E6870: Speech Recognition 18 EECS E6870: Speech Recognition 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend