 
              Effect of Pronunciations on OOV Queries in Spoken Term Detection D. Can 1 E. Cooper 2 A. Sethy 3 C. White 4 B. Ramabhadran 3 M. Saraà § lar 1 1 2 3 4
Introduction Methods Experiments Summary Outline Introduction 1 Spoken Term Detection Task Motivation Methods 2 WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search Experiments 3 Experimental Setup Results Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Outline Introduction 1 Spoken Term Detection Task Motivation Methods 2 WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search Experiments 3 Experimental Setup Results Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Anatomy of a Spoken Term Detection (STD) System Speech User Retrieve Database yes larger Query ASR no than τ ? Dispose Search Preprocess Index Engine Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Anatomy of a Spoken Term Detection (STD) System Speech User Retrieve Database yes larger Query ASR no than τ ? Dispose INDEXING Search Preprocess Index Engine Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Anatomy of a Spoken Term Detection (STD) System Speech User Retrieve Database yes SEARCH larger Query no than τ ? Dispose Search Preprocess Index Engine Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Anatomy of a Spoken Term Detection (STD) System Speech User Retrieve Database yes RETRIEVAL larger Query no than τ ? Dispose Search Preprocess Index Engine Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Outline Introduction 1 Spoken Term Detection Task Motivation Methods 2 WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search Experiments 3 Experimental Setup Results Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Challenges of the Spoken Term Detection Task Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view" High error rate of one-best transcripts 1 Alternative transcriptions: [tie bay [light 0.6, night 0.4] view] Out-Of-Vocabulary queries 2 Phonetic search: /t ay b ey n ay t v iy w/ Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Challenges of the Spoken Term Detection Task Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view" High error rate of one-best transcripts 1 Alternative transcriptions: [tie bay [light 0.6, night 0.4] view] Out-Of-Vocabulary queries 2 Phonetic search: /t ay b ey n ay t v iy w/ Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Challenges of the Spoken Term Detection Task Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view" High error rate of one-best transcripts 1 Alternative transcriptions: [tie bay [light 0.6, night 0.4] view] Out-Of-Vocabulary queries 2 Phonetic search: /t ay b ey n ay t v iy w/ Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Challenges of the Spoken Term Detection Task Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view" High error rate of one-best transcripts 1 Alternative transcriptions: [tie bay [light 0.6, night 0.4] view] Out-Of-Vocabulary queries 2 Phonetic search: /t ay b ey n ay t v iy w/ Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Challenges of the Spoken Term Detection Task Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view" High error rate of one-best transcripts 1 Alternative transcriptions: [tie bay [light 0.6, night 0.4] view] Out-Of-Vocabulary queries 2 Phonetic search: /t ay b ey n ay t v iy w/ Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods Spoken Term Detection Task Experiments Motivation Summary Challenges of the Spoken Term Detection Task Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view" High error rate of one-best transcripts 1 Efficient Indexing and Search of Alternatives Out-Of-Vocabulary queries 2 OOV Pronunciation Modeling Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods WFST-based Spoken Term Detection Experiments Query Forming and Expansion for Phonetic Search Summary Outline Introduction 1 Spoken Term Detection Task Motivation Methods 2 WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search Experiments 3 Experimental Setup Results Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods WFST-based Spoken Term Detection Experiments Query Forming and Expansion for Phonetic Search Summary Index for Spoken Utterance Retrieval [Allauzen et al., 2004] Database: Index: “a a" 1 a: ǫ /1 1 3 a/1 a/1 ǫ :2/.4 ǫ :1/2 0 1 2 a: ǫ /1 ǫ :1/1 ǫ :2/1.4 “[b .6, a .4] a" 2 b/.6 a/1 0 5 0 1 2 ǫ :2/.6 a/.4 b: ǫ /1 ǫ :2.6 a/1 Query: 0 1 2 4 a: ǫ /1 Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods WFST-based Spoken Term Detection Experiments Query Forming and Expansion for Phonetic Search Summary Index for Spoken Utterance Retrieval [Allauzen et al., 2004] Database: Index: “a a" 1 a: ǫ /1 1 3 a/1 a/1 ǫ :2/.4 ǫ :1/2 0 1 2 a: ǫ /1 ǫ :1/1 ǫ :2/1.4 “[b .6, a .4] a" 2 b/.6 a/1 0 5 0 1 2 ǫ :2/.6 a/.4 b: ǫ /1 ǫ :2.6 a/1 Query: 0 1 2 4 a: ǫ /1 Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods WFST-based Spoken Term Detection Experiments Query Forming and Expansion for Phonetic Search Summary Index for Spoken Utterance Retrieval [Allauzen et al., 2004] Database: Index: “a a" 1 a: ǫ /1 1 3 a/1 a/1 ǫ :2/.4 ǫ :1/2 0 1 2 a: ǫ /1 ǫ :1/1 ǫ :2/1.4 “[b .6, a .4] a" 2 b/.6 a/1 0 5 0 1 2 ǫ :2/.6 a/.4 b: ǫ /1 ǫ :2.6 a/1 Query: 0 1 2 4 a: ǫ /1 Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods WFST-based Spoken Term Detection Experiments Query Forming and Expansion for Phonetic Search Summary Index for Spoken Utterance Retrieval [Allauzen et al., 2004] Database: Results: “a a" 1 ǫ :1/2 a/1 a/1 0 1 2 a: ǫ /1 0 1 2 ǫ :2/1.4 (Utterance ID, Expected Count): “[b .6, a .4] a" 2 (1,2) b/.6 a/1 1 (2,1.4) 2 0 1 2 a/.4 a/1 Query: 0 1 Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Introduction Methods WFST-based Spoken Term Detection Experiments Query Forming and Expansion for Phonetic Search Summary 2-pass Retrieval for STD [Parlak and Saraclar, 2008] Procedure For each query: Obtain (utterance ID, expected count) pairs (1 st pass) For each utterance with expected count > τ : Align the query with the utterance → time interval (2 nd pass) Return (utterance ID, time interval, expected count) triplet Problems 2 nd pass takes time → slow Multiple occurrences of a query in the same utterance contribute to the same expected count. Ideal for Spoken Utterance Retrieval Not so for Spoken Term Detection Can, Cooper, Sethy, White, Ramabhadran, Saraà § lar Effect of Pronunciations on OOV Queries in STD
Recommend
More recommend