Audient: Audient: An Acoustic Search Engine An Acoustic Search - - PowerPoint PPT Presentation

audient audient an acoustic search engine an acoustic
SMART_READER_LITE
LIVE PREVIEW

Audient: Audient: An Acoustic Search Engine An Acoustic Search - - PowerPoint PPT Presentation

Audient: Audient: An Acoustic Search Engine An Acoustic Search Engine By Ted Leath Supervisor: Prof. Paul Mc Kevitt Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems School of Computing and Intelligent Systems


slide-1
SLIDE 1

Audient: An Acoustic Search Engine

By Ted Leath Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Audient: An Acoustic Search Engine

Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

slide-2
SLIDE 2

Food for Thought Food for Thought

slide-3
SLIDE 3

Existing SDR Systems

  • Involve the production of intermediate text

for the purposes of indexing, searching and retrieval

  • Require a high level of semantic

processing for word recognition

  • Have a limited vocabulary
  • Have a high word recognition error rate

Existing SDR Systems

Involve the production of intermediate text for the purposes of indexing, searching Require a high level of semantic processing for word recognition Have a limited vocabulary Have a high word recognition error rate

slide-4
SLIDE 4

Things can be done differently! Things can be done differently!

slide-5
SLIDE 5

Non­word Representations of Speech

  • Could be features of the audio signal
  • Could be phonemes

word Representations of Speech

Could be features of the audio signal

slide-6
SLIDE 6

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in the English alphabet.

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

slide-7
SLIDE 7

Project Goals

  • Create a unique alternative to existing word

based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

  • Develop a speech­centric model which uses

standards­based phonogrammic streams as primary internal data representation

  • Allow both text and nonlexical phonemic audio

queries of varying length

  • Test against audio corpora used in the

evaluation of other Information Retrieval (IR) systems

Project Goals

Create a unique alternative to existing word­ based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation centric model which uses based phonogrammic streams as primary internal data representation Allow both text and nonlexical phonemic audio queries of varying length Test against audio corpora used in the evaluation of other Information Retrieval (IR)

slide-8
SLIDE 8

Previous Research/Systems

  • TREC

– The Informedia projects at Carnegie Mellon University – The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University – The SCAN system at AT&T Research – The THISL project at Sheffield University

  • SpeechBot and NPR Online

Search Sites

  • The National Gallery of the Spoken Word
  • BBN Rough ‘n’ Ready
  • Fast­Talk

Previous Research/Systems

The Informedia projects at Carnegie Mellon University The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University The SCAN system at AT&T Research The THISL project at Sheffield University

SpeechBot and NPR Online – Public Internet The National Gallery of the Spoken Word

slide-9
SLIDE 9

SDR System Comparison Chart SDR System Comparison Chart

slide-10
SLIDE 10

Audient System Architecture

Audio Input Non­speech Speech Phonemic Stream Abstraction/Construction Phonetic and temporal abstraction Query construction Speech queries Text Queries Non­speech Processing

Audient System Architecture

Phonemic Stream Abstraction/Construction Phonetic and temporal abstraction Query construction Text Queries Indexing Database Query response

Phonogrammic and Temporal Information

Indexed Data Formatted Query

slide-11
SLIDE 11

Core Modules

Queries and Table Input Phonemic Recognition and Abstraction Phonogrammic Streams, Location, Temporal Information and Indexing

Text Query Speech Query Digitised Audio Stream Phonogrammic Stream Phonogrammic Stream Digitised Audio Stream and Location Phonogrammic Match Request Phonogrammic Match Answer Text Converted Phonogrammic Stream

Audio Stream Replay

Location and Temporal Reference Location and Temporal Reference Digitised Audio Stream and Location Phonogrammic Stream,Location and Temporal Information Text for Translation Phongrammic Translation

Core Modules

Stream to Speech Text to Stream Create Translation Table Phonogrammic Streams, Location, Temporal Information and Indexing

Phonogrammic Stream Phonogrammic Stream Synthetic Speech Text Table Component Phonogrammatic Table Component Converted Phonogrammic Stream Phonogrammic Query Result Text Translation Information

TextTranslation Table

slide-12
SLIDE 12

Proposed Tools

  • The Hidden Markov Model Toolkit (HTK)
  • Linux and C++
  • Festival
  • VoiceXML and the SGML Family
  • The Apache Web Server

Proposed Tools

The Hidden Markov Model Toolkit (HTK) VoiceXML and the SGML Family The Apache Web Server

slide-13
SLIDE 13

Project Schedule

ID Task Name Start End Duration 1 01/08/2003 01/08/2002 Literature Survey 2 19/02/2004 20/06/2003 Write up literature review 3 18/12/2003 17/06/2003 Selection, installation and integration of tools 4 18/03/2004 18/12/2003 Construct Phonemic Recognition and Abstraction Module 5 17/06/2004 18/03/2004 Construct Stream to Speech module 6 16/07/2004 17/06/2004 Test and refine modules 7 18/10/2004 16/07/2004 Construct Text to Stream module 8 17/11/2004 18/10/2004 Test and refine modules 9 15/02/2005 17/11/2004 Construct Queries and Table Input module 10 18/05/2005 15/02/2005 Construct Create Translation Table module 11 18/08/2005 18/05/2005 Construct Audio Stream Replay module 12 16/12/2005 19/07/2004 Integrate and test core modules 13 17/03/2006 18/08/2005 Test core modules against other IR systems using corpora and optimise 14 22/06/2006 17/03/2006 Populate index and demonstrate 15 25/10/2006 22/06/2006 Incorporate search engine elements 16 29/05/2007 14/06/2006 Finish thesis

Project Schedule

Duration

2002 2003 2004 2005 2006 2007 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1

262d 175d 133d 66d 66d 22d 67d 23d 65d 67d 67d 370d 152d 70d 90d 250d

slide-14
SLIDE 14

Conclusion

  • Create a unique alternative to existing word

based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

  • Develop a speech­centric model which uses

standards­based phonogrammic streams as primary internal data representation

  • Allow both text and nonlexical phonemic audio

queries of varying length

  • Test against audio corpora used in the

evaluation of other Information Retrieval (IR) systems

Conclusion

Create a unique alternative to existing word­ based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation centric model which uses based phonogrammic streams as primary internal data representation Allow both text and nonlexical phonemic audio queries of varying length Test against audio corpora used in the evaluation of other Information Retrieval (IR)

slide-15
SLIDE 15

Applications

  • Searching, indexing and retrieval of Internet

audio and video files

  • Searching, indexing and retrieval of broadcast

media

  • Services for the blind
  • Library services
  • Surveillance and intelligence gathering
  • Voice mail
  • Audio mining
  • Trend analysis (topic detection and tracking)

Applications

Searching, indexing and retrieval of Internet Searching, indexing and retrieval of broadcast Surveillance and intelligence gathering Trend analysis (topic detection and tracking)