Effect of Pronunciations on OOV Queries in Spoken Term Detection D. - - PowerPoint PPT Presentation

effect of pronunciations on oov queries in spoken term
SMART_READER_LITE
LIVE PREVIEW

Effect of Pronunciations on OOV Queries in Spoken Term Detection D. - - PowerPoint PPT Presentation

Effect of Pronunciations on OOV Queries in Spoken Term Detection D. Can 1 E. Cooper 2 A. Sethy 3 C. White 4 B. Ramabhadran 3 M. Sara lar 1 1 2 3 4 Introduction Methods Experiments Summary Outline Introduction 1 Spoken Term


slide-1
SLIDE 1

Effect of Pronunciations on OOV Queries in Spoken Term Detection

  • D. Can1
  • E. Cooper2
  • A. Sethy3
  • C. White4
  • B. Ramabhadran3
  • M. Saraçlar1

1 2 3 4

slide-2
SLIDE 2

Introduction Methods Experiments Summary

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-3
SLIDE 3

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-4
SLIDE 4

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Anatomy of a Spoken Term Detection (STD) System

User Query Preprocess Search Engine larger than τ? Retrieve Dispose Speech Database ASR Index yes no

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-5
SLIDE 5

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Anatomy of a Spoken Term Detection (STD) System

User Query Preprocess Search Engine larger than τ? Retrieve Dispose Speech Database ASR Index INDEXING yes no

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-6
SLIDE 6

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Anatomy of a Spoken Term Detection (STD) System

User Query Preprocess Search Engine larger than τ? Retrieve Dispose Speech Database Index SEARCH yes no

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-7
SLIDE 7

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Anatomy of a Spoken Term Detection (STD) System

User Query Preprocess Search Engine larger than τ? Retrieve Dispose Speech Database Index RETRIEVAL yes no

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-8
SLIDE 8

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-9
SLIDE 9

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Challenges of the Spoken Term Detection Task

Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view"

1

High error rate of one-best transcripts Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

2

Out-Of-Vocabulary queries Phonetic search: /t ay b ey n ay t v iy w/

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-10
SLIDE 10

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Challenges of the Spoken Term Detection Task

Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view"

1

High error rate of one-best transcripts Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

2

Out-Of-Vocabulary queries Phonetic search: /t ay b ey n ay t v iy w/

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-11
SLIDE 11

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Challenges of the Spoken Term Detection Task

Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view"

1

High error rate of one-best transcripts Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

2

Out-Of-Vocabulary queries Phonetic search: /t ay b ey n ay t v iy w/

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-12
SLIDE 12

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Challenges of the Spoken Term Detection Task

Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view"

1

High error rate of one-best transcripts Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

2

Out-Of-Vocabulary queries Phonetic search: /t ay b ey n ay t v iy w/

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-13
SLIDE 13

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Challenges of the Spoken Term Detection Task

Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view"

1

High error rate of one-best transcripts Alternative transcriptions: [tie bay [light 0.6, night 0.4] view]

2

Out-Of-Vocabulary queries Phonetic search: /t ay b ey n ay t v iy w/

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-14
SLIDE 14

Introduction Methods Experiments Summary Spoken Term Detection Task Motivation

Challenges of the Spoken Term Detection Task

Aim: Open vocabulary search Reference: “Taipei night view" Challenge: Unreliable transcriptions ASR Output: “tie bay light view"

1

High error rate of one-best transcripts Efficient Indexing and Search of Alternatives

2

Out-Of-Vocabulary queries OOV Pronunciation Modeling

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-15
SLIDE 15

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-16
SLIDE 16

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Utterance Retrieval [Allauzen et al., 2004]

Database:

1

“a a" 1 2

a/1 a/1

2

“[b .6, a .4] a" 1 2

b/.6 a/.4 a/1

Query: 1

a/1

Index: 1 2 5 3 4

a:ǫ/1 b:ǫ/1 a:ǫ/1 ǫ:1/2 ǫ:2/1.4 ǫ:2/.6 a:ǫ/1 ǫ:1/1 ǫ:2/.4 ǫ:2.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-17
SLIDE 17

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Utterance Retrieval [Allauzen et al., 2004]

Database:

1

“a a" 1 2

a/1 a/1

2

“[b .6, a .4] a" 1 2

b/.6 a/.4 a/1

Query: 1

a/1

Index: 1 2 5 3 4

a:ǫ/1 b:ǫ/1 a:ǫ/1 ǫ:1/2 ǫ:2/1.4 ǫ:2/.6 a:ǫ/1 ǫ:1/1 ǫ:2/.4 ǫ:2.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-18
SLIDE 18

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Utterance Retrieval [Allauzen et al., 2004]

Database:

1

“a a" 1 2

a/1 a/1

2

“[b .6, a .4] a" 1 2

b/.6 a/.4 a/1

Query: 1

a/1

Index: 1 2 5 3 4

a:ǫ/1 b:ǫ/1 a:ǫ/1 ǫ:1/2 ǫ:2/1.4 ǫ:2/.6 a:ǫ/1 ǫ:1/1 ǫ:2/.4 ǫ:2.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-19
SLIDE 19

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Utterance Retrieval [Allauzen et al., 2004]

Database:

1

“a a" 1 2

a/1 a/1

2

“[b .6, a .4] a" 1 2

b/.6 a/.4 a/1

Query: 1

a/1

Results: 1 2

a:ǫ/1 ǫ:1/2 ǫ:2/1.4

(Utterance ID, Expected Count):

1

(1,2)

2

(2,1.4)

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-20
SLIDE 20

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

2-pass Retrieval for STD [Parlak and Saraclar, 2008]

Procedure For each query:

Obtain (utterance ID, expected count) pairs (1st pass) For each utterance with expected count > τ:

Align the query with the utterance → time interval (2nd pass) Return (utterance ID, time interval, expected count) triplet

Problems 2nd pass takes time → slow Multiple occurrences of a query in the same utterance contribute to the same expected count.

Ideal for Spoken Utterance Retrieval Not so for Spoken Term Detection

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-21
SLIDE 21

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Term Detection

Database:

1

“a a" 1 2

a:0.1-1/1 a:1-1.8/.6 a:1-1.9/.4

2

“[b .6, a .4] a" 1 2

b:0.2-1/.6 a:0.1-1/.4 a:1-1.9/1

Index: 1 2 6 5 3 4

a:0-1/1 a:1-2/1 b:0-1/1 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/1 ǫ:1/1 ǫ:2/.6 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-22
SLIDE 22

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Term Detection

Database:

1

“a a" 1 2

a:0.1-1/1 a:1-1.9/.6 a:1-1.9/.4

2

“[b .6, a .4] a" 1 2

b:0.2-1/.6 a:0.1-1/.4 a:1-1.9/1

CLUSTERING Index: 1 2 6 5 3 4

a:0-1/1 a:1-2/1 b:0-1/1 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/1 ǫ:1/1 ǫ:2/.6 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-23
SLIDE 23

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Term Detection

Database:

1

“a a" 1 2

a:0-1/1 a:1-2/.6 a:1-2/.4

2

“[b .6, a .4] a" 1 2

b:0-1/.6 a:0-1/.4 a:1-2/1

QUANTIZATION Index: 1 2 6 5 3 4

a:0-1/1 a:1-2/1 b:0-1/1 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/1 ǫ:1/1 ǫ:2/.6 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-24
SLIDE 24

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Term Detection

Database:

1

“a a" 1 2

a:0-1/1 a:1-2/.6 a:1-2/.4

2

“[b .6, a .4] a" 1 2

b:0-1/.6 a:0-1/.4 a:1-2/1

Index: 1 2 6 5 3 4

a:0-1/1 a:1-2/1 b:0-1/1 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/1 ǫ:1/1 ǫ:2/.6 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-25
SLIDE 25

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Term Detection

Database:

1

“a a" 1 2

a:0-1/1 a:1-2/.6 a:1-2/.4

2

“[b .6, a .4] a" 1 2

b:0-1/.6 a:0-1/.4 a:1-2/1

Query: 1

a/1

Index: 1 2 6 5 3 4

a:0-1/1 a:1-2/1 b:0-1/1 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/1 ǫ:1/1 ǫ:2/.6 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/.6

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-26
SLIDE 26

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Index for Spoken Term Detection

Database:

1

“a a" 1 2

a:0-1/1 a:1-2/.6 a:1-2/.4

2

“[b .6, a .4] a" 1 2

b:0-1/.6 a:0-1/.4 a:1-2/1

Query: 1

a/1

Results: 1 2 3

a:0-1/1 a:1-2/1 ǫ:1/1 ǫ:2/.4 ǫ:2/1 ǫ:1/1

(Utterance ID, Time Interval, Posterior Probability):

1

(1,0-1,1)

2

(1,1-2,1)

3

(2,0-1,.4)

4

(2,1-2,1)

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-27
SLIDE 27

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

1-pass Retrieval for STD

Procedure For each query:

Obtain (utterance ID, time interval, posterior probability) triplets Return triplets with posterior probability > τ

Highlights No 2nd pass → fast No multiple occurrence problem Every distinct interval leads to another index entry →

  • verlapping intervals are clustered

Time interval mismatches → common paths are reduced → larger index → time intervals are quantized

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-28
SLIDE 28

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Evaluation on the stddev06 data-set

(dryrun06 query-set, word-level lattices, word-level indexes)

98 95 90 80 60 40 20 10 .05 .02 .01 .004 .001 .0001 Miss probability (in %) False Alarm probability (in %) Combined DET Curve: 1-pass vs. 2-pass Retrieval 1-pass Retrieval: MTWV=0.791, Search Time= 1.33s 2-pass Retrieval: MTWV=0.792, Search Time=535.59s

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-29
SLIDE 29

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Evaluation on the stddev06 data-set

(dryrun06 query-set, word-level lattices, word-level indexes)

Maximum Term Weighted Value w/ Global Thresholding

98 95 90 80 60 40 20 10 .05 .02 .01 .004 .001 .0001 Miss probability (in %) False Alarm probability (in %) Combined DET Curve: 1-pass vs. 2-pass Retrieval 1-pass Retrieval: MTWV=0.791, Search Time= 1.33s 2-pass Retrieval: MTWV=0.792, Search Time=535.59s

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-30
SLIDE 30

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Evaluation on the stddev06 data-set

(dryrun06 query-set, word-level lattices, word-level indexes)

Much Faster Search

98 95 90 80 60 40 20 10 .05 .02 .01 .004 .001 .0001 Miss probability (in %) False Alarm probability (in %) Combined DET Curve: 1-pass vs. 2-pass Retrieval 1-pass Retrieval: MTWV=0.791, Search Time= 1.33s 2-pass Retrieval: MTWV=0.792, Search Time=535.59s

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-31
SLIDE 31

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-32
SLIDE 32

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Query Forming for Phonetic Search

Motivation: To search for OOV queries Preparation Convert word/subword lattices to phonetic lattices Build a phonetic index How to search for OOVs? Orthographic form (text) available, we need phonetic form (pronunciation) Use a letter-to-sound (L2S) system to obtain likely pronunciations Use multiple pronunciations to search for OOV queries

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-33
SLIDE 33

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

L2S Pronunciations

L2S System n-gram model over (letter, phone) pairs Scores have a wide dynamic range due to the conditional independence assumption Pointless to use L2S scores as they are Unweighted L2S Pronunciations

1

Obtain weighted pronunciations from the L2S transducer

2

Pick n-best alternatives and remove weights

3

Search: Compose the unweighted automaton representing alternatives with the phonetic index

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-34
SLIDE 34

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Weighted L2S Pronunciations (Query: Taipei)

1

Obtain weighted pronunciations from L2S transducer [/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

2

Pick n-best alternatives to prevent false alarms [/t ay b ey/ .5, /t ay p ey/ .05] n = 2

3

Scale the weights with query length [/t ay b ey/

6

√ .5, /t ay p ey/

6

√ .05] query length = 6

4

Normalize scaled weights to obtain posterior scores [/t ay b ey/ .6, /t ay p ey/ .4]

5

Search: Compose the weighted automaton representing alternatives with the phonetic index

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-35
SLIDE 35

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Weighted L2S Pronunciations (Query: Taipei)

1

Obtain weighted pronunciations from L2S transducer [/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

2

Pick n-best alternatives to prevent false alarms [/t ay b ey/ .5, /t ay p ey/ .05] n = 2

3

Scale the weights with query length [/t ay b ey/

6

√ .5, /t ay p ey/

6

√ .05] query length = 6

4

Normalize scaled weights to obtain posterior scores [/t ay b ey/ .6, /t ay p ey/ .4]

5

Search: Compose the weighted automaton representing alternatives with the phonetic index

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-36
SLIDE 36

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Weighted L2S Pronunciations (Query: Taipei)

1

Obtain weighted pronunciations from L2S transducer [/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

2

Pick n-best alternatives to prevent false alarms [/t ay b ey/ .5, /t ay p ey/ .05] n = 2

3

Scale the weights with query length [/t ay b ey/

6

√ .5, /t ay p ey/

6

√ .05] query length = 6

4

Normalize scaled weights to obtain posterior scores [/t ay b ey/ .6, /t ay p ey/ .4]

5

Search: Compose the weighted automaton representing alternatives with the phonetic index

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-37
SLIDE 37

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Weighted L2S Pronunciations (Query: Taipei)

1

Obtain weighted pronunciations from L2S transducer [/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

2

Pick n-best alternatives to prevent false alarms [/t ay b ey/ .5, /t ay p ey/ .05] n = 2

3

Scale the weights with query length [/t ay b ey/

6

√ .5, /t ay p ey/

6

√ .05] query length = 6

4

Normalize scaled weights to obtain posterior scores [/t ay b ey/ .6, /t ay p ey/ .4]

5

Search: Compose the weighted automaton representing alternatives with the phonetic index

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-38
SLIDE 38

Introduction Methods Experiments Summary WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

Weighted L2S Pronunciations (Query: Taipei)

1

Obtain weighted pronunciations from L2S transducer [/t ay b ey/ .5, /t ay p ey/ .05, /d ay b ey/ 0.005, ...]

2

Pick n-best alternatives to prevent false alarms [/t ay b ey/ .5, /t ay p ey/ .05] n = 2

3

Scale the weights with query length [/t ay b ey/ .9, /t ay p ey/ .6] query length = 6

4

Normalize scaled weights to obtain posterior scores [/t ay b ey/ .6, /t ay p ey/ .4]

5

Search: Compose the weighted automaton representing alternatives with the phonetic index

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-39
SLIDE 39

Introduction Methods Experiments Summary Experimental Setup Results

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-40
SLIDE 40

Introduction Methods Experiments Summary Experimental Setup Results

Query Set – All OOVs! Minimum of 5 acoustic instances and 4 phones per OOV Common English words were filtered out 1290 OOVs selected from English broadcast news (BN) NATALIE, PUTIN, QAEDA, HOLLOWAY LVCSR System Acoustic Data: 400 hours of HUB4 data

Train set: 300 hours (utterances w/o OOVs) Test set: 100 hours (utterances w/ OOVs) ← large

Language Data: 400M words from various text sources IBM Attila Speech Recognition Toolkit

SAT, VTLN, fMLLR, no discriminative training WER on RT04 test set : 19.4%

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-41
SLIDE 41

Introduction Methods Experiments Summary Experimental Setup Results

Query Set – All OOVs! Minimum of 5 acoustic instances and 4 phones per OOV Common English words were filtered out 1290 OOVs selected from English broadcast news (BN) NATALIE, PUTIN, QAEDA, HOLLOWAY LVCSR System Acoustic Data: 400 hours of HUB4 data

Train set: 300 hours (utterances w/o OOVs) Test set: 100 hours (utterances w/ OOVs) ← large

Language Data: 400M words from various text sources IBM Attila Speech Recognition Toolkit

SAT, VTLN, fMLLR, no discriminative training WER on RT04 test set : 19.4%

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-42
SLIDE 42

Introduction Methods Experiments Summary Experimental Setup Results

Outline

1

Introduction Spoken Term Detection Task Motivation

2

Methods WFST-based Spoken Term Detection Query Forming and Expansion for Phonetic Search

3

Experiments Experimental Setup Results

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-43
SLIDE 43

Introduction Methods Experiments Summary Experimental Setup Results

Experiment I - Reference Lexicon (Reflex) Pronunciations

Phone Indexes, Query Set: 1290 OOVs

Actual Term Weighted Value Subwords obtained by pruning a phone n-gram model Data P(FA) P(Miss) ATWV Word 1-best .00001 .770 .215 Word Consensus Nets .00002 .687 .294 Word Lattices .00002 .657 .322 Fragment 1-best .00001 .680 .306 Fragment Consensus Nets .00003 .584 .390 Fragment Lattices .00003 .485 .484

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-44
SLIDE 44

Introduction Methods Experiments Summary Experimental Setup Results

Experiment I - Reference Lexicon (Reflex) Pronunciations

Phone Indexes, Query Set: 1290 OOVs

Actual Term Weighted Value Subwords obtained by pruning a phone n-gram model Data P(FA) P(Miss) ATWV Word 1-best .00001 .770 .215 Word Consensus Nets .00002 .687 .294 Word Lattices .00002 .657 .322 Fragment 1-best .00001 .680 .306 Fragment Consensus Nets .00003 .584 .390 Fragment Lattices .00003 .485 .484

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-45
SLIDE 45

Introduction Methods Experiments Summary Experimental Setup Results

ATWV vs N-best L2S Pronunciations

Phone Indexes, Query Set: 1290 OOVs

1 2 3 4 5 6 7 8 9 10 0.2 0.25 0.3 0.322 0.35 0.4 0.45 0.484 0.5

N ATWV Fragment Lattices + Weighted L2S Pronunciations Fragment Lattices + Unweighted L2S Pronunciations Word Lattices + Weighted L2S Pronunciations Word Lattices + Unweighted L2S Pronunciations Fragment Lattices + Reflex Word Lattices + Reflex

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-46
SLIDE 46

Introduction Methods Experiments Summary Experimental Setup Results

Combined DET Plot for Weighted L2S Pronunciations

Phone Indexes, Query Set: 1290 OOVs

98 95 90 80 60 40 .1 .05 .02 .01 .004 .001 .0001 Miss probability (in %) False Alarm probability (in %) Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices 1-best, MTWV=0.334, ATWV=0.372 2-best, MTWV=0.354, ATWV=0.422 3-best, MTWV=0.352, ATWV=0.440 4-best, MTWV=0.339, ATWV=0.447 5-best, MTWV=0.316, ATWV=0.451

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-47
SLIDE 47

Introduction Methods Experiments Summary Experimental Setup Results

Combined DET Plot for Weighted L2S Pronunciations

Phone Indexes, Query Set: 1290 OOVs

Maximum Term Weighted Value w/ Global Thresholding peaks at 2-best

98 95 90 80 60 40 .1 .05 .02 .01 .004 .001 .0001 Miss probability (in %) False Alarm probability (in %) Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices 1-best, MTWV=0.334, ATWV=0.372 2-best, MTWV=0.354, ATWV=0.422 3-best, MTWV=0.352, ATWV=0.440 4-best, MTWV=0.339, ATWV=0.447 5-best, MTWV=0.316, ATWV=0.451

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-48
SLIDE 48

Introduction Methods Experiments Summary Experimental Setup Results

Combined DET Plot for Weighted L2S Pronunciations

Phone Indexes, Query Set: 1290 OOVs

Actual Term Weighted Value w/ Term Specific Thresholding (β = 1000)

98 95 90 80 60 40 .1 .05 .02 .01 .004 .001 .0001 Miss probability (in %) False Alarm probability (in %) Combined DET Plot: Weighted Letter-to-Sound 1-5 Best Fragment Lattices 1-best, MTWV=0.334, ATWV=0.372 2-best, MTWV=0.354, ATWV=0.422 3-best, MTWV=0.352, ATWV=0.440 4-best, MTWV=0.339, ATWV=0.447 5-best, MTWV=0.316, ATWV=0.451

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-49
SLIDE 49

Introduction Methods Experiments Summary

Summary

WFST-based indexing provides a fast, mathematically sound retrieval solution for the STD task. Phone indexes generated from sub-word (fragment) lattices represent OOVs better. Using multiple pronunciations from the L2S system improves the performance, particularly when they are properly weighted.

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-50
SLIDE 50

Introduction Methods Experiments Summary

Remarks

WS’08 Summer Workshop Multilingual STD: Finding and Testing New Pronunciations Technical Report available @ http://www.clsp.jhu.edu/workshops/ws08/groups/mstdftnp/ STD tools implemented using OpenFst Library will soon be available @ http://www.openfst.org/ "Web derived Pronunciations for Spoken Term Detection" coming this July @ SIGIR’09 Boston.

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD

slide-51
SLIDE 51

Appendix References

References I

Allauzen, C., Mohri, M., and Saraclar, M. (2004). General-indexation of weighted automata-application to spoken utterance retrieval. In Proc. HLT-NAACL. Parlak, S. and Saraclar, M. (2008). Spoken term detection for Turkish Broadcast News. In Proc. ICASSP.

Can, Cooper, Sethy, White, Ramabhadran, Saraçlar Effect of Pronunciations on OOV Queries in STD