Voice Based Information Retrieval System How far is it from text - - PowerPoint PPT Presentation

voice based information retrieval system
SMART_READER_LITE
LIVE PREVIEW

Voice Based Information Retrieval System How far is it from text - - PowerPoint PPT Presentation

Voice Based Information Retrieval System How far is it from text based retrieval system? PRAJNA BHANDARY CMSC 676 MOTIVATION The ever increasing Internet bandwidth, the ever-decreasing storage costs and the fast development of multimedia


slide-1
SLIDE 1

Voice Based Information Retrieval System

How far is it from text based retrieval system?

PRAJNA BHANDARY CMSC 676

slide-2
SLIDE 2

MOTIVATION

  • The ever increasing Internet bandwidth, the ever-decreasing

storage costs and the fast development of multimedia technologies have paved road for more and more multimedia network content.

  • The main motivation for many researchers in this area is to

help visually challenged individuals to get information using a device used for speech recognition system

slide-3
SLIDE 3

There are 3 different tasks of the Voice based Retrieval System

  • Using Text Queries to retrieve spoken documents

○ Referred as Spoken Document Retrieval ○ Found that the queries need to be long in order for it to be more efficient

  • Using spoken queries to retrieve text documents

○ Voice Search ○ The information to be retrieved is usually an existing text database such as those in directory assistance applications, although with lexical variations and so on but primarily without recognition uncertainty.

  • Using spoken queries to retrieve spoken documents

○ In this case the speech recognition uncertainty exists on both sides of the queries and the documents, and therefore naturally this is a more difficult task this.

INTRODUCTION

slide-4
SLIDE 4

COMPARISON

Text-Based Voice-Based Resources Rich resources-huge quantities of text documents available over the internet Quantity continues to increase exponentially due to convenient access Spoken/multimedia content are the new trend Can be realized even sooner given mature technologies Accuracy Retrieval accuracy is acceptable to users and are properly ranked and filtered Problems with speech recognition errors, especially for spontaneous speech under adverse environments User-System Interaction Retrieved documents easily summarised on-screen thus easily scanned and selected by the user User may easily select query terms suggested for next iteration retrieval in an interactive process Spoken/multimedia documents easily summarised on-screen thus difficult to scan and select Lacks efficient user system interaction

slide-5
SLIDE 5

RETRIEVAL ACCURACY

  • Lattice-based Approaches
  • Position Specific Posterior Lattices(PSPL)
  • Confusion Networks(CN)
  • Time-based Merging for Indexing(TMI)
  • Time-anchored Lattice Expansion(TALE)
  • Position Specific Posterior Lattices(PSPL)
  • Locating a word in a segment according to

the position(or sequence ordering) of the word in a path as a tuple (W, d, pos, prob).

  • Confusion Networks(CN)
  • Clustering several words in a segment

according to similar time spans and word pronunciation.

slide-6
SLIDE 6

Relevance ranking relevance scores between the segments and a query Q, which is a sequence of words, {W j , j = 1, 2.., Q} First calculate the expected tapered-count for each N-gram {Wi...Wi+N−1} within the query in a spoken segment d, S(d,Wi...Wi+N−1) as given below and aggregate the results to produce a score S N-gram (d, Q) for each order N as in

RETRIEVAL ACCURACY (Cont’d)

where L is the lattice obtained from d and k is the cluster number in PSPL or CN structures. The different proximity types, one for each N-gram order allowed by the query length Q, are finally combined by a weighted sum to give the final relevance score S(d, Q),

slide-7
SLIDE 7
  • Multi-model dialogue

for a query given by the user, the retrieval system produces a topic hierarchy constructed from the retrieved spoken documents to be shown on the screen.

  • Semantic analysis of spoken documents

USER-SYSTEM INTERACTION

slide-8
SLIDE 8
  • Automatic Generation of Summaries and Titles for spoken

documents

  • Query-based Local Semantic Structuring of Spoken Documents
  • Semantic Structuring of spoken documents
  • Interactive retrieval in Dialogue loop
  • Key term extraction from spoken documents

Based on latent topic significance

USER-SYSTEM INTERACTION

slide-9
SLIDE 9

Voice Voice to text Keyword Pattern Matching BoW(Bag of words) Voice based reply Voice Reply If matc h with DB no yes

PROPOSED MODEL

This is a three step process: 1. Speech to text 2. Pattern matching 3. Text to speech

slide-10
SLIDE 10
  • A fuzzy logics can be used to match the speech of different accents. eg.

the word “Vector” has different pronunciations

  • Thus a single word can be represented by a fuzzy set.
  • Now since this is a very specific to fit in a generic model of speech

recognition, we can have a more general model of fuzzification of phonemes.

  • This model is applied to spoken sentences. One fuzzy set is based on

accents, the second one the speeds of pronunciation and the third on emphasis

VOICE TO TEXT

slide-11
SLIDE 11

BAG-of-WORDS

  • A bag-of-words is a representation of text that describes the
  • ccurrence of words within a document. It involves two things:

○ A vocabulary of known words. ○ A measure of the presence of unknown words. ○ The steps followed: ■ Collect data ■ Create Vocabulary ■ Create Document Vector ■ Managing Vocabulary ■ Scoring words ■ Word Hashing ■ TF-IDF

slide-12
SLIDE 12
  • Boyer-Moore(BM) algorithm can be used which positions the pattern over

the leftmost characters in the text and attempts to match it from right to

  • left. If no mismatch occurs then the pattern is found else.
  • The algorithm computes a shift by an amount by which the pattern is

moved to the right before a new matching is undertaken

  • Shift is computed using two heuristics :

○ match heuristic ○ Occurence heuristics i. Match all characters previously matched and ii. To bring different character to the position in the text that caused the mismatch 𝑒[𝑦] = 𝑛𝑗𝑜{𝑡|𝑡 = 𝑛 𝑝𝑠 (0 𝑡 < 𝑛 𝑏𝑜𝑒 𝑞𝑏𝑢𝑢𝑓𝑠𝑜 [𝑛 − 𝑡] = 𝑦)}

PATTERN MATCHING

slide-13
SLIDE 13
  • After getting the text it must it must analyse and then transform into a

phonetic description

  • NLP module:

○ Digital Signal Processing(DSP) module: It transforms the symbolic information received to audible one as follows: text analysis: first the text is segmented into tokens. The token-to-word conversion creates the orthographic form of the token example Mr is mister and humber like 2 are transformed to two ○ Application of Pronunciation rules: After the text analysis is completed pronunciation rules can be applied. Silent letters in a word(h in caught)

  • r several phoneme like(m in maximum)

■ Dictionary based solution: A dictionary can be used where all forms of possible words are stored. ■ Rule based solution: rules are generated from the phonological knowledge of dictionaries. Only words with come exception on pronunciation are included

TEXT TO VOICE

slide-14
SLIDE 14

CONCLUSION & FUTURE SCOPE

It can be concluded that this approach is efficient in term of reduced computation complexity, reduced time

  • There is research being done to make the whole process telephonic
  • Limitations of Bag-of-Words
  • Vocabulary
  • Sparsity
  • Meaning
slide-15
SLIDE 15

REFERENCES

[1] R. Uma, B. Latha. “An efficient voice based information retrieval using bag of words based indexing”, International Journal of Engineering & Technology [2] Lin-shan Lee and Yi-cheng Pan. “Voice-based Information Retrieval- how far are we from the text-based information retrieval?”, 2009 IEEE [3] Kiruthika M, Priyadarsini S, Rishwana Roshan K, Shifana Parvin V.M, Dr. G. Umamaheshwari. “Voice Based iNformation Retrieval System”, International Journal of Innovative Research in Science, Engineering and Technology [4]Personal Voice Based Information Retrieval System, patent [5] Lakra, Sachin, et al. "Application of fuzzy mathematics to speechto-text conversion by elimination of paralinguistic content." arXiv preprint arXiv: 1209.4535 (2012). [6] KNUTH, D., J. MORRIS, and V. PRATT. 1977. "Fast Pattern Matching in Strings." SIAM J on Computing, 6, 323-50. [7] BOYER, R., and S. MOORE. 1977. "A Fast String Searching Algorithm." CACM, 20, 762-72. [8] Ondrej Chum, James Philbin, Josef Sivic, Michael Isard, and Andrew Zisserman. Total recall:Automatic query expansion with a generative feature model for object retrieval. In ICCV, pages1–8, 2007. [9] HHerv´eJ´egou, MatthijsDouze, and CordeliaSchmid. Improving bag-of-features for largescale image

  • search. International Journal of Computer Vision, 87(3):316–336, 2010.