VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber - PDF document

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber weber@isip.msstate.edu EE 4012 Senior Design Project April 18th, 1996 Mississippi State University ABSTRACT In the era of natural language recognition machines, the access of electronic equipment through a speech interface will go a long way in making state-of-the-art technology available to a larger class of users. A typical application useful to a significant group of people (e.g. students) is an on-line dictionary that can be accessed using voice commands. Currently, no such dictionaries exist for UNIX- based computer systems. Some personal computers offer this feature to a limited extent, but these are constrained by the amount of memory required for a large vocabulary recognition system. In this project, we design an interface that uses a public-domain speech-recognition software to recognize the specified words and accesses a dictionary that is available on-line. The resulting system will be publicly available from the ISIP home page.

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 2 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING WHY VOICE INTERFACE TO A DICTIONARY? MORE NATURAL TO SPEAK THAN TO PROGRAM Database query requires complicated programming languages Interface by speaking is natural Definition is found easier Writing process speeds up Test bed for other data base queries ● Library Resources ● Telephone Directory ● Television Listings

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 3 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING STATE OF THE ART SPEECH RECOGNITION / UNDERSTANDING When a system comprehends what is spoken Challenges: Word spacing ● Coarticulation / Context ● Dialect ● Speaking rate / style ● Performance: 1,000 Words 5,000 Words 20,000 Words 100 ● Conversational Speech Read Speech ● ● Broadcast News ● Word Error Rate (%) ● ● ● 10 ● ● ● ● ● ● ● ● Unlimited Vocabulary 1 1991 1993 1994 1995 1996 1988 1989 1990 1992 Results - Speaker Independent ●

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 4 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING WHY ABBOT? The competition - Cambridge HTK System a generic HMM recognizer ABBOT Cambridge HTK Hybrid connectionist Gaussian HMM hidden HMM VS . Context - independent Context - dependent Recurrent Network Tied - State System System Cost - Free Cost - $100,000

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 5 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING ACOUSTIC MODELING IN ABBOT u(t) y(t-4) x(t+1) x(t) Time Delay Phonetic Context-Independent Recurrent Neural Network Input: acoustic vector u(t) current state x(t)   t + 4 ( ) ≅ ( ) u 1 yi t Pr qi t   Output: output vector y(t-4) next state vector x(t+1)

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 6 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING LANGUAGE MODELING IN ABBOT HMM of V1 HMM of V2 HMM of VN Phone Set - 79 phone symbols, vowels ● have 3 levels of stresses Connectionist component - trained phone ● classifier Models - context & gender independent ● Sentence - Markov Process - Words Words - Markov Process - Phones Phones - Markov Process - States

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 7 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING THE ABBOT DEMO Record at 16 kHz ● Determine the endpoints ● Convert to ASCII ● Normalize audio-gain ● u(t) y(t-4) ● Prints best guess to word & recognition continues x(t+1) x(t) ● The recognized word comes at the end Time Delay Strip the recognized word ● from the end of the process Look up the word in the ● on-line dictionary

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 8 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING ARCHITECTURE OF CURRENT SYSTEM Dictionary Spoken Utterance Word - (n) the thing you looked up Netscape N N Dictionary ISIP Endpoint Detection Word List The Word EE ISIP ABBOT Recognition Isolator The Word

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 9 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING WEBSTER DICTIONARY - WEB BASED SOURCE The Web Dictionary Systems of makeup Limited release of access lexicon ● grammar ● semantic Service for a fee ● phonology ● Current version 160,000 entries is first attempt CD Rom limited Pronunciations interface control

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 10 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING INTERFACING TO THE WEBSTER DICTIONARY The Ultimate Interface Natural Language Interface to the Dictionary The Netscape Version Point - and - click interface Type the word Hit return Retrieve definition

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 11 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING LANGUAGE MODELING ISSUES IN THE INTERFACE (GUI DESIGN) Obstacles Encountered Every word in dictionary recognized ABBOT is a CSR Dictionary takes only word roots Data transported between machines Word transported to Netscape Dictionary Practical Solutions Triphone based recognizer Language model changed to ISR Portion recognizes prefixes / suffixes Recognizer available locally Dictionary available locally

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 12 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING BUILDING THIS DESIGN IN HARDWARE Hand-Held Computer with DSP Chip and A/D Converter - Smaller than a credit card - Plenty of memory for large recognition vocabulary

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 13 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING SUMMARY Designing a Voice Interface Dictionary An endpointed, spoken word A compatible speech recognizer An accessible, on-line dictionary A way to make all three work together Future Enhancements: More adaptable recognizer (ISIP recognizer) Local dictionary access Cut down on real-time errors

May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 14 OF 14 DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING REFERENCES 1. A.J. Robinson, An Application of Recurrent Nets to Phone Probability Estimation , in IEEE Transactions on Neural Networks , vol. 5, no. 2, pp. 298- 305, March 1994. 2. D.B. Roe and J.G. Wilpon editors, Voice Communication Between Humans and Machines , National Academy Press, Washington D.C., USA, 1994. 3. J.R. Deller, J.G. Proakis, and J.H.L. Hansen, Discrete Time Processing of Speech Signals , MacMillan, New York, New York, USA, 1993. 4. L. Rabiner and B.H. Juang, Fundamentals of Speech Recognition , Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1993. 5. V.V. Digalakis, Mari Ostendorf, and J.R. Rohlicek, Fast Algorithms for Phone Classification and Recognition Using Segment-Based Models, in IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 2885-2896, December 1992. 6. J.G. Proakis and D.G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications , 2nd Edition , Macmillan, New York, New York, USA, 1992. 7. Kai-Fu Lee and Hsiao-Wuen Hon, Speaker-Independent Phone Recognition Using Hidden Markov Models , in IEEE Transactions on Acoustics, Speech, and Signal Processing , vol. 37, no. 11, pp. 1641-1648, November 1989. 8. Douglas O’Shaughnessy, Speech Communication: Human and Machine , Addison-Wesley Publishing Co., Reading Massachusetts, USA, 1987. 9. Sadaoki Furui, Speaker-Independent Isolated Word Recognition Using Dynamic Features of Speech Spectrum , in IEEE Transactions on Acoustics, Speech, and Signal Processing , vol. ASSP-34, no. 1, pp. 52-59, February 1986. 10. L.R. Bahl, F. Jelinek, and R.L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition, in IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. PAMI-5, no. 2, pp. 179-190, March 1983. 11. L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals , Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1978. ACKNOWLEDGEMENTS A special thanks to the following people for their help with this project . Dr. Joseph Picone Sean Lauderdale Rick Duncan Arvind Ganapathiraju Neeraj Deshmukh Daniel Williams and Dr. Anthony J. Robinson of CMU

VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber - PDF document

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber weber@isip.msstate.edu EE 4012 Senior Design Project April 18th, 1996 Mississippi State University ABSTRACT In the era of natural

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

CMSC 206 Dictionaries and Hashing The Dictionary ADT n a dictionary (table) is an abstract

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

TDDE18 & 726G77 Interface, command line and vector interface An interface is an abstract

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Interface Aesthetics Week 10 Print Media Interface Aesthetics 04/07/08 OUTLINE - Print media -

Whistleblower Protections in the Dodd-Frank Wall Street Reform and Consumer Protection Act and

ITS STANDARDS IN EUROPE TOWARDS CAVS Presented by Emilio Davila European Commission DG

Q1 2018 14 May 2018 Agenda 1 Executive Summary 2 Financial Results 3 Q&A 1 Helios

Holly Lodge Estate Heating and hot water installation presentation to residents 06 December 2017

Pl Place Plan Review Process Pl R i P Brown Clee LJC Brown Clee LJC November 14th 2012

Management Integration Management Integration with Chuo Mitsui Trust Group with Chuo Mitsui

School Meals in Scotland School Meals in Scotland Paul Gona Paul Gona ASPE Soft FM Advisory

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber - PDF document

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber weber@isip.msstate.edu EE 4012 Senior Design Project April 18th, 1996 Mississippi State University ABSTRACT In the era of natural

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

The Dictionary ADT The dictionary ADT models a searchable collection findElement(k): if the

I/O Bus and Interface Data Bus Addr Bus CPU Control Interface Interface Interface Interface

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

CMSC 206 Dictionaries and Hashing The Dictionary ADT n a dictionary (table) is an abstract

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

TDDE18 &amp; 726G77 Interface, command line and vector interface An interface is an abstract

Getting Sta rted with Voice API Lorna Mitchell Getting Sta rted with Voice API Use the Voice

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Interface Aesthetics Week 10 Print Media Interface Aesthetics 04/07/08 OUTLINE - Print media -

Whistleblower Protections in the Dodd-Frank Wall Street Reform and Consumer Protection Act and

ITS STANDARDS IN EUROPE TOWARDS CAVS Presented by Emilio Davila European Commission DG

Q1 2018 14 May 2018 Agenda 1 Executive Summary 2 Financial Results 3 Q&amp;A 1 Helios

Holly Lodge Estate Heating and hot water installation presentation to residents 06 December 2017

Pl Place Plan Review Process Pl R i P Brown Clee LJC Brown Clee LJC November 14th 2012

Management Integration Management Integration with Chuo Mitsui Trust Group with Chuo Mitsui

School Meals in Scotland School Meals in Scotland Paul Gona Paul Gona ASPE Soft FM Advisory

Subspace Modeling and Selection Subspace Modeling and Selection for Noisy Speech Recognition for

TDDE18 & 726G77 Interface, command line and vector interface An interface is an abstract

Q1 2018 14 May 2018 Agenda 1 Executive Summary 2 Financial Results 3 Q&A 1 Helios