Audient: Audient: An Acoustic Search Engine An Acoustic Search - - PowerPoint PPT Presentation

▶

Mar 17, 2024 141 likes •295 views

Audient: Audient: An Acoustic Search Engine An Acoustic Search Engine By Ted Leath Supervisor: Prof. Paul Mc Kevitt Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems School of Computing and Intelligent Systems

SLIDE 1

Audient: An Acoustic Search Engine

By Ted Leath Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Audient: An Acoustic Search Engine

Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

SLIDE 2

Food for Thought Food for Thought

SLIDE 3

Existing SDR Systems

Involve the production of intermediate text

for the purposes of indexing, searching and retrieval

Require a high level of semantic

processing for word recognition

Have a limited vocabulary
Have a high word recognition error rate

Existing SDR Systems

Involve the production of intermediate text for the purposes of indexing, searching Require a high level of semantic processing for word recognition Have a limited vocabulary Have a high word recognition error rate

SLIDE 4

Things can be done differently! Things can be done differently!

SLIDE 5

Nonword Representations of Speech

Could be features of the audio signal
Could be phonemes

word Representations of Speech

Could be features of the audio signal

SLIDE 6

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in the English alphabet.

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

SLIDE 7

Project Goals

Create a unique alternative to existing word

based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

Develop a speechcentric model which uses

standardsbased phonogrammic streams as primary internal data representation

Allow both text and nonlexical phonemic audio

queries of varying length

Test against audio corpora used in the

evaluation of other Information Retrieval (IR) systems

Project Goals

Create a unique alternative to existing word based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation centric model which uses based phonogrammic streams as primary internal data representation Allow both text and nonlexical phonemic audio queries of varying length Test against audio corpora used in the evaluation of other Information Retrieval (IR)

SLIDE 8

Previous Research/Systems

TREC

– The Informedia projects at Carnegie Mellon University – The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University – The SCAN system at AT&T Research – The THISL project at Sheffield University

SpeechBot and NPR Online

Search Sites

The National Gallery of the Spoken Word
BBN Rough ‘n’ Ready
FastTalk

Previous Research/Systems

The Informedia projects at Carnegie Mellon University The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University The SCAN system at AT&T Research The THISL project at Sheffield University

SpeechBot and NPR Online – Public Internet The National Gallery of the Spoken Word

SLIDE 9

SDR System Comparison Chart SDR System Comparison Chart

SLIDE 10

Audient System Architecture

Audio Input Nonspeech Speech Phonemic Stream Abstraction/Construction Phonetic and temporal abstraction Query construction Speech queries Text Queries Nonspeech Processing

Audient System Architecture

Phonemic Stream Abstraction/Construction Phonetic and temporal abstraction Query construction Text Queries Indexing Database Query response

Phonogrammic and Temporal Information

Indexed Data Formatted Query

SLIDE 11

Core Modules

Queries and Table Input Phonemic Recognition and Abstraction Phonogrammic Streams, Location, Temporal Information and Indexing

Text Query Speech Query Digitised Audio Stream Phonogrammic Stream Phonogrammic Stream Digitised Audio Stream and Location Phonogrammic Match Request Phonogrammic Match Answer Text Converted Phonogrammic Stream

Audio Stream Replay

Location and Temporal Reference Location and Temporal Reference Digitised Audio Stream and Location Phonogrammic Stream,Location and Temporal Information Text for Translation Phongrammic Translation

Core Modules

Stream to Speech Text to Stream Create Translation Table Phonogrammic Streams, Location, Temporal Information and Indexing

Phonogrammic Stream Phonogrammic Stream Synthetic Speech Text Table Component Phonogrammatic Table Component Converted Phonogrammic Stream Phonogrammic Query Result Text Translation Information

TextTranslation Table

SLIDE 12

Proposed Tools

The Hidden Markov Model Toolkit (HTK)
Linux and C++
Festival
VoiceXML and the SGML Family
The Apache Web Server

Proposed Tools

The Hidden Markov Model Toolkit (HTK) VoiceXML and the SGML Family The Apache Web Server

SLIDE 13

Project Schedule

ID Task Name Start End Duration 1 01/08/2003 01/08/2002 Literature Survey 2 19/02/2004 20/06/2003 Write up literature review 3 18/12/2003 17/06/2003 Selection, installation and integration of tools 4 18/03/2004 18/12/2003 Construct Phonemic Recognition and Abstraction Module 5 17/06/2004 18/03/2004 Construct Stream to Speech module 6 16/07/2004 17/06/2004 Test and refine modules 7 18/10/2004 16/07/2004 Construct Text to Stream module 8 17/11/2004 18/10/2004 Test and refine modules 9 15/02/2005 17/11/2004 Construct Queries and Table Input module 10 18/05/2005 15/02/2005 Construct Create Translation Table module 11 18/08/2005 18/05/2005 Construct Audio Stream Replay module 12 16/12/2005 19/07/2004 Integrate and test core modules 13 17/03/2006 18/08/2005 Test core modules against other IR systems using corpora and optimise 14 22/06/2006 17/03/2006 Populate index and demonstrate 15 25/10/2006 22/06/2006 Incorporate search engine elements 16 29/05/2007 14/06/2006 Finish thesis

Project Schedule

Duration

2002 2003 2004 2005 2006 2007 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1

262d 175d 133d 66d 66d 22d 67d 23d 65d 67d 67d 370d 152d 70d 90d 250d

SLIDE 14

Conclusion

Create a unique alternative to existing word

based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

Develop a speechcentric model which uses

standardsbased phonogrammic streams as primary internal data representation

Allow both text and nonlexical phonemic audio

queries of varying length

Test against audio corpora used in the

evaluation of other Information Retrieval (IR) systems

Conclusion

Create a unique alternative to existing word based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation centric model which uses based phonogrammic streams as primary internal data representation Allow both text and nonlexical phonemic audio queries of varying length Test against audio corpora used in the evaluation of other Information Retrieval (IR)

SLIDE 15

Applications

Searching, indexing and retrieval of Internet

audio and video files

Searching, indexing and retrieval of broadcast

media

Services for the blind
Library services
Surveillance and intelligence gathering
Voice mail
Audio mining
Trend analysis (topic detection and tracking)

Applications

Searching, indexing and retrieval of Internet Searching, indexing and retrieval of broadcast Surveillance and intelligence gathering Trend analysis (topic detection and tracking)

Audient: An Acoustic Search Engine

By Ted Leath Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Audient: An Acoustic Search Engine

Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University of Ulster, Magee

Food for Thought Food for Thought

Existing SDR Systems

for the purposes of indexing, searching and retrieval

processing for word recognition

Existing SDR Systems

Involve the production of intermediate text for the purposes of indexing, searching Require a high level of semantic processing for word recognition Have a limited vocabulary Have a high word recognition error rate

Things can be done differently! Things can be done differently!

Non­word Representations of Speech

word Representations of Speech

Could be features of the audio signal

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in the English alphabet.

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

Phonemic and Phonogrammic Streams

Phonogrammic streams are orthographical representations of phonemic streams. This abstraction is ancient, and partially inherent in

Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm

Project Goals

based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

standards­based phonogrammic streams as primary internal data representation

queries of varying length

evaluation of other Information Retrieval (IR) systems

Project Goals

Previous Research/Systems

– The Informedia projects at Carnegie Mellon University – The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University – The SCAN system at AT&T Research – The THISL project at Sheffield University

Search Sites

Previous Research/Systems

The Informedia projects at Carnegie Mellon University The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University The SCAN system at AT&T Research The THISL project at Sheffield University

SpeechBot and NPR Online – Public Internet The National Gallery of the Spoken Word

SDR System Comparison Chart SDR System Comparison Chart

Audient System Architecture

Audient System Architecture

Core Modules

Core Modules

Proposed Tools

Proposed Tools

The Hidden Markov Model Toolkit (HTK) VoiceXML and the SGML Family The Apache Web Server

Project Schedule

Project Schedule

Conclusion

based LVCSR speech retrieval systems along with potential tools for future cognitive and philosophical investigation

standards­based phonogrammic streams as primary internal data representation

queries of varying length

evaluation of other Information Retrieval (IR) systems

Conclusion

Applications

audio and video files

media

Applications

Searching, indexing and retrieval of Internet Searching, indexing and retrieval of broadcast Surveillance and intelligence gathering Trend analysis (topic detection and tracking)

Nonword Representations of Speech

standardsbased phonogrammic streams as primary internal data representation

standardsbased phonogrammic streams as primary internal data representation