VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber - - PDF document

voice interface to an on line dictionary
SMART_READER_LITE
LIVE PREVIEW

VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber - - PDF document

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING VOICE INTERFACE TO AN ON-LINE DICTIONARY by Mary E. Weber weber@isip.msstate.edu EE 4012 Senior Design Project April 18th, 1996 Mississippi State University ABSTRACT In the era of natural


slide-1
SLIDE 1

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING

by

Mary E. Weber

weber@isip.msstate.edu EE 4012 Senior Design Project April 18th, 1996 Mississippi State University

ABSTRACT

In the era of natural language recognition machines, the access of electronic equipment through a speech interface will go a long way in making state-of-the-art technology available to a larger class of users. A typical application useful to a significant group of people (e.g. students) is an

  • n-line

dictionary that can be accessed using voice

  • commands. Currently, no such dictionaries exist for UNIX-

based computer systems. Some personal computers offer this feature to a limited extent, but these are constrained by the amount of memory required for a large vocabulary recognition system. In this project, we design an interface that uses a public-domain speech-recognition software to recognize the specified words and accesses a dictionary that is available on-line. The resulting system will be publicly available from the ISIP home page.

VOICE INTERFACE TO AN ON-LINE DICTIONARY

slide-2
SLIDE 2

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 2 OF 14

WHY VOICE INTERFACE TO A DICTIONARY? MORE NATURAL TO SPEAK THAN TO PROGRAM Database query requires complicated programming languages Interface by speaking is natural Definition is found easier Writing process speeds up Test bed for other data base queries

  • Library Resources
  • Telephone Directory
  • Television Listings
slide-3
SLIDE 3

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 3 OF 14

STATE OF THE ART SPEECH RECOGNITION / UNDERSTANDING When a system comprehends what is spoken Challenges:

  • Word spacing
  • Coarticulation / Context
  • Dialect
  • Speaking rate / style

Performance:

1 10 100 1988 1989 1991 1990 1992 1993 1994 1995 1996 Unlimited Vocabulary 20,000 Words 5,000 Words 1,000 Words Read Speech Broadcast News Conversational Speech

  • Word Error Rate (%)
  • Results - Speaker Independent
slide-4
SLIDE 4

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 4 OF 14

WHY ABBOT? The competition - Cambridge HTK System a generic HMM recognizer ABBOT Hybrid connectionist hidden HMM Context - independent Cost - Free Tied - State System Gaussian HMM Cost - $100,000 Context - dependent Cambridge HTK

VS.

Recurrent Network System

slide-5
SLIDE 5

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 5 OF 14

ACOUSTIC MODELING IN ABBOT

u(t) y(t-4) x(t) x(t+1)

Time Delay

Output:

  • utput vector

y(t-4) next state vector x(t+1) yi t ( ) Pr qi t ( ) u1 t 4 +     ≅ Phonetic Context-Independent Recurrent Neural Network Input: acoustic vector u(t) current state x(t)

slide-6
SLIDE 6

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 6 OF 14

LANGUAGE MODELING IN ABBOT

  • Phone Set - 79 phone symbols, vowels

have 3 levels of stresses

  • Connectionist component - trained phone

classifier

  • Models - context & gender independent

Sentence - Markov Process - Words Words - Markov Process - Phones Phones - Markov Process - States

HMM of V1 HMM of V2 HMM of VN

slide-7
SLIDE 7

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 7 OF 14

THE ABBOT DEMO

Record at 16 kHz Determine the endpoints Convert to ASCII Normalize audio-gain

  • Prints best guess to word
  • & recognition continues

The recognized word

  • comes at the end

Strip the recognized word

  • Look up the word in the
  • from the end of the process
  • n-line dictionary

u(t) y(t-4) x(t) x(t+1) Time Delay

slide-8
SLIDE 8

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 8 OF 14

ARCHITECTURE OF CURRENT SYSTEM

ISIP ISIP

EE

Endpoint Detection Spoken Utterance ABBOT Recognition The Word

Dictionary

Word - (n) the thing you looked up

N N Netscape Dictionary

Isolator The Word Word List

slide-9
SLIDE 9

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 9 OF 14

WEBSTER DICTIONARY - WEB BASED SOURCE Dictionary The Web Systems of makeup 160,000 entries Pronunciations Limited release Service for a fee is first attempt CD Rom limited interface control Current version

  • f access

lexicon grammar semantic phonology

slide-10
SLIDE 10

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 10 OF 14

INTERFACING TO THE WEBSTER DICTIONARY The Ultimate Interface Natural Language Interface to the Dictionary Point - and - click interface Type the word Hit return Retrieve definition The Netscape Version

slide-11
SLIDE 11

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 11 OF 14

LANGUAGE MODELING ISSUES IN THE INTERFACE (GUI DESIGN) Every word in dictionary recognized Triphone based recognizer ABBOT is a CSR Language model changed to ISR Dictionary takes only word roots Portion recognizes prefixes / suffixes Obstacles Encountered Data transported between machines Word transported to Netscape Dictionary Practical Solutions Recognizer available locally Dictionary available locally

slide-12
SLIDE 12

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 12 OF 14

BUILDING THIS DESIGN IN HARDWARE

  • Smaller than a credit card
  • Plenty of memory for large

recognition vocabulary Hand-Held Computer with DSP Chip and A/D Converter

slide-13
SLIDE 13

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 13 OF 14

SUMMARY More adaptable recognizer (ISIP recognizer) Local dictionary access Cut down on real-time errors Future Enhancements: Designing a Voice Interface Dictionary A compatible speech recognizer An accessible, on-line dictionary An endpointed, spoken word A way to make all three work together

slide-14
SLIDE 14

DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING May 15, 1998 EE 4012 SENIOR DESIGN PRESENTATION PAGE 14 OF 14

REFERENCES

1. A.J. Robinson, An Application of Recurrent Nets to Phone Probability Estimation, in IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 298- 305, March 1994. 2. D.B. Roe and J.G. Wilpon editors, Voice Communication Between Humans and Machines, National Academy Press, Washington D.C., USA, 1994. 3. J.R. Deller, J.G. Proakis, and J.H.L. Hansen, Discrete Time Processing of Speech Signals, MacMillan, New York, New York, USA, 1993. 4.

  • L. Rabiner

and B.H. Juang, Fundamentals

  • f

Speech Recognition, Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1993. 5. V.V. Digalakis, Mari Ostendorf, and J.R. Rohlicek, Fast Algorithms for Phone Classification and Recognition Using Segment-Based Models, in IEEE Transactions on Signal Processing, vol. 40, no. 12, pp. 2885-2896, December 1992. 6. J.G. Proakis and D.G. Manolakis, Digital Signal Processing: Principles, Algorithms, and Applications, 2nd Edition, Macmillan, New York, New York, USA, 1992. 7. Kai-Fu Lee and Hsiao-Wuen Hon, Speaker-Independent Phone Recognition Using Hidden Markov Models, in IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 11, pp. 1641-1648, November 1989. 8. Douglas O’Shaughnessy, Speech Communication: Human and Machine, Addison-Wesley Publishing Co., Reading Massachusetts, USA, 1987. 9. Sadaoki Furui, Speaker-Independent Isolated Word Recognition Using Dynamic Features

  • f

Speech Spectrum, in IEEE Transactions

  • n

Acoustics, Speech, and Signal Processing, vol. ASSP-34, no. 1, pp. 52-59, February 1986. 10. L.R. Bahl, F. Jelinek, and R.L. Mercer, A Maximum Likelihood Approach to Continuous Speech Recognition, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-5, no. 2, pp. 179-190, March 1983. 11. L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, New Jersey, USA, 1978.

ACKNOWLEDGEMENTS A special thanks to the following people for their help

  • Dr. Joseph Picone

Rick Duncan Neeraj Deshmukh Sean Lauderdale Arvind Ganapathiraju and Dr. Anthony J. Robinson of CMU Daniel Williams with this project.