Discriminative Keyword Spotting Joseph Keshet, The Hebrew - - PowerPoint PPT Presentation

discriminative keyword spotting
SMART_READER_LITE
LIVE PREVIEW

Discriminative Keyword Spotting Joseph Keshet, The Hebrew - - PowerPoint PPT Presentation

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP Research Institute Samy Bengio , Google Inc. Joseph Keshet, The Hebrew University Outline Problem Definition Keyword Spotting with HMMs


slide-1
SLIDE 1

Joseph Keshet, The Hebrew University

Discriminative Keyword Spotting

Joseph Keshet, The Hebrew University David Grangier, IDIAP Research Institute Samy Bengio, Google Inc.

slide-2
SLIDE 2

Joseph Keshet, The Hebrew University

Outline

  • Problem Definition
  • Keyword Spotting with HMMs
  • Discriminative Keyword Spotting

– derivation – analysis – feature functions

  • Experimental Results
slide-3
SLIDE 3

Joseph Keshet, The Hebrew University

Problem Definition

h iy b ao t ix z he's it tcl bcl bought

Goal: find a keyword in a speech signal

slide-4
SLIDE 4

Joseph Keshet, The Hebrew University

Problem Definition

h iy b ao t ix z he's it tcl bcl bought

Goal: find a keyword in a speech signal

slide-5
SLIDE 5

Joseph Keshet, The Hebrew University

Notation:

Problem Definition

keyword keyword phoneme sequence alignment sequence bought bcl b ao t

k ¯ p ¯ s s1 s2 s3 s4e4

¯ x = (x1, x2, x3, . . . xT )

acoustic feature vectors

slide-6
SLIDE 6

Joseph Keshet, The Hebrew University

Keyword Spotter

predicted decision speech signal

Problem Definition

keyword (phoneme sequence)

¯ x ¯ p = /b ao t/

detection (yes/no)

¯ s′

predicted alignment

f(¯ x, ¯ p)

slide-7
SLIDE 7

Joseph Keshet, The Hebrew University

Fat is Good

The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve.

true positive = detected utterances with keywords total utterances with keywords false positive = detected utterances without keywords total utterances without keywords

slide-8
SLIDE 8

Joseph Keshet, The Hebrew University

Fat is Good

The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve.

false positive rate true positive rate area under curve

A

true positive = detected utterances with keywords total utterances with keywords false positive = detected utterances without keywords total utterances without keywords

slide-9
SLIDE 9

Joseph Keshet, The Hebrew University

false positive rate true positive rate

A = 1

Fat is Good

The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve.

true positive = detected utterances with keywords total utterances with keywords false positive = detected utterances without keywords total utterances without keywords

slide-10
SLIDE 10

Joseph Keshet, The Hebrew University

false positive rate true positive rate

A

Fat is Good

The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve.

true positive = detected utterances with keywords total utterances with keywords false positive = detected utterances without keywords total utterances without keywords

slide-11
SLIDE 11

Joseph Keshet, The Hebrew University

Fat is Good

The performance of a keyword spotting system is measured by a Receiver Operating Characteristics (ROC) curve.

false positive rate true positive rate area under curve

A

true positive = detected utterances with keywords total utterances with keywords false positive = detected utterances without keywords total utterances without keywords

slide-12
SLIDE 12

Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting

slide-13
SLIDE 13

Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting

Whole Word Modeling

bought

10 ms

¯ x ¯ q

[Rahim et al, 1997; Rohlicek et al, 1989]

slide-14
SLIDE 14

Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting

Whole Word Modeling

bought

10 ms

¯ x ¯ q a garbage model

[Rahim et al, 1997; Rohlicek et al, 1989]

slide-15
SLIDE 15

Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting

Whole Word Modeling

bought

10 ms

¯ x ¯ q

[Rahim et al, 1997; Rohlicek et al, 1989]

slide-16
SLIDE 16

Joseph Keshet, The Hebrew University

HMM-based Keyword Spotting

Phoneme-Based

¯ p ¯ x ¯ q

[Bourlard et al, 1994; Manos & Zue, 1997; Rohlicek et al, 1993]

garbage bought

h iy b ao t t ih

10 ms

garbage

slide-17
SLIDE 17

Joseph Keshet, The Hebrew University

  • Linguistic constraints on the garbage

model

  • Does a human listener need to have a

large vocabulary in order to recognize one word?

HMM-based Keyword Spotting

Large Vocabulary Based

(Cardillo et al, 2002; Rose & Paul, 1990; Szoke et al, 2005; Weintraub, 1995)

slide-18
SLIDE 18

Joseph Keshet, The Hebrew University

HMM Approaches to Keyword Spotting

  • Do not address specifically the goal of

maximizing the area under the ROC curve for the task of keyword spotting

slide-19
SLIDE 19

Joseph Keshet, The Hebrew University

Discriminative Approach

slide-20
SLIDE 20

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

slide-21
SLIDE 21

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

keyword (phoneme sequence)

slide-22
SLIDE 22

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

utterance in which the keyword is uttered

slide-23
SLIDE 23

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

utterance in which the keyword is not uttered

slide-24
SLIDE 24

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

alignment of the keyword and the utterance with keyword

slide-25
SLIDE 25

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

slide-26
SLIDE 26

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

Discriminative Keyword Spotting

slide-27
SLIDE 27

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

Class of all keyword spotting functions

Discriminative Keyword Spotting

Fw

slide-28
SLIDE 28

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

Discriminative Keyword Spotting

w ∈ Rn f(¯ x, ¯ p) = max

¯ s

w · φ(¯ x, ¯ p, ¯ s)

slide-29
SLIDE 29

Joseph Keshet, The Hebrew University

Feature Functions

We define 7 feature functions of the form:

sequence of acoustic features keyword (phoneme sequence) Suggested alignment

Feature Functions

(¯ x, ¯ p) R

φj

¯ s

Confidence in the keyword and suggested alignment

slide-30
SLIDE 30

Joseph Keshet, The Hebrew University

Cumulative spectral change around the boundaries

Feature Functions I

si −j + si j + si

φj(¯ x, ¯ p, ¯ s) =

|¯ p|−1

  • i=2

d(x−j+si, xj+si), j ∈ {1, 2, 3, 4}

slide-31
SLIDE 31

Joseph Keshet, The Hebrew University

Cumulative spectral change around the boundaries

Feature Functions I

si −j + si j + si

φj(¯ x, ¯ p, ¯ s) =

|¯ p|−1

  • i=2

d(x−j+si, xj+si), j ∈ {1, 2, 3, 4}

slide-32
SLIDE 32

Joseph Keshet, The Hebrew University

pi = eh

pi−1 = t

. . . . . .

. . .

si−1 si si+1

φ5(¯ x, ¯ p, ¯ s) =

|¯ p|

  • i=1

si+1−1

  • t=si

g(xt, pi)

Cumulative confidence in the phoneme sequence

Feature Functions II

slide-33
SLIDE 33

Joseph Keshet, The Hebrew University

pi = eh

pi−1 = t

. . . . . .

. . .

si−1 si si+1

φ5(¯ x, ¯ p, ¯ s) =

|¯ p|

  • i=1

si+1−1

  • t=si

g(xt, pi)

Cumulative confidence in the phoneme sequence

is the confidence that phoneme was uttered at frame

[Dekel, Keshet, Singer, ‘04]

We build a static frame-based phoneme classifier

g : X × Y → R g(xt, pi) xt pi

Feature Functions II

slide-34
SLIDE 34

Joseph Keshet, The Hebrew University

pi = eh

pi−1 = t

. . . . . .

. . .

si−1 si si+1

φ5(¯ x, ¯ p, ¯ s) =

|¯ p|

  • i=1

si+1−1

  • t=si

g(xt, pi)

Cumulative confidence in the phoneme sequence

frame based phoneme classifier

Feature Functions II

slide-35
SLIDE 35

Joseph Keshet, The Hebrew University

si+1 − si si − si−1

Phoneme duration model

Feature Functions III

φ6(¯ x, ¯ p, ¯ s) =

|¯ p|

  • i=1

log N(si+1 − si; ˆ µpi , ˆ σpi)

slide-36
SLIDE 36

Joseph Keshet, The Hebrew University

si+1 − si si − si−1

Phoneme duration model

Feature Functions III

  • average length of phoneme
  • standard deviation of the

length of phoneme

pi pi ˆ µpi ˆ σpi φ6(¯ x, ¯ p, ¯ s) =

|¯ p|

  • i=1

log N(si+1 − si; ˆ µpi , ˆ σpi)

slide-37
SLIDE 37

Joseph Keshet, The Hebrew University

si+1 − si si − si−1

Phoneme duration model

Feature Functions III

Statistics of phoneme pi

φ6(¯ x, ¯ p, ¯ s) =

|¯ p|

  • i=1

log N(si+1 − si; ˆ µpi , ˆ σpi)

slide-38
SLIDE 38

Joseph Keshet, The Hebrew University

Speaking-rate modeling (“dynamics”)

Spectogram at different rates of articulation (after Pickett, 1980)

Feature Functions IV

φ7(¯ x, ¯ p, ¯ s) = −

|¯ p|−1

  • i=2

si+1 − si ˆ µpi − si − si−1 ˆ µpi−1 2

slide-39
SLIDE 39

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

slide-40
SLIDE 40

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

Discriminative Keyword Spotting

w ∈ Rn f(¯ x, ¯ p) = max

¯ s

w · φ(¯ x, ¯ p, ¯ s)

slide-41
SLIDE 41

Joseph Keshet, The Hebrew University

Large-Margin Model

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

slide-42
SLIDE 42

Joseph Keshet, The Hebrew University

Large-Margin Model

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-43
SLIDE 43

Joseph Keshet, The Hebrew University

Large-Margin Model

negative utterance with best alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-44
SLIDE 44

Joseph Keshet, The Hebrew University

Large-Margin Model

negative utterance with best alignment negative utterance with

  • ther

alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-45
SLIDE 45

Joseph Keshet, The Hebrew University

Large-Margin Model

w

negative utterance with best alignment negative utterance with

  • ther

alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-46
SLIDE 46

Joseph Keshet, The Hebrew University

Large-Margin Model

w

negative utterance with best alignment negative utterance with

  • ther

alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-47
SLIDE 47

Joseph Keshet, The Hebrew University

Large-Margin Model

w

negative utterance with best alignment negative utterance with

  • ther

alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-48
SLIDE 48

Joseph Keshet, The Hebrew University

Large-Margin and Noise

w

negative utterance with best alignment negative utterance with

  • ther

alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-49
SLIDE 49

Joseph Keshet, The Hebrew University

Large-Margin and Noise

w

negative utterance with best alignment negative utterance with

  • ther

alignment

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

positive utterance with correct alignment

slide-50
SLIDE 50

Joseph Keshet, The Hebrew University

Large-Margin Derivation

w

d

φ(¯ x+, ¯ p, ¯ s) φ(¯ x−, ¯ p, ¯ s′) φ(¯ x−, ¯ p, ¯ s′′)

d = w · φ(¯ x+, ¯ p, ¯ s) − w · φ(¯ x−, ¯ p, ¯ s′) w w · φ(¯ x+, ¯ p, ¯ s) − w · φ(¯ x−, ¯ p, ¯ s′) ≥ 1 ∀¯ s′

slide-51
SLIDE 51

Joseph Keshet, The Hebrew University

Discriminative Keyword Spotting

w ∈ Rn f(¯ x, ¯ p) = max

¯ s

w · φ(¯ x, ¯ p, ¯ s)

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

slide-52
SLIDE 52

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

w ∈ Rn f(¯ x, ¯ p) = max

¯ s

w · φ(¯ x, ¯ p, ¯ s)

slide-53
SLIDE 53

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

w ∈ Rn f(¯ x, ¯ p) = max

¯ s

w · φ(¯ x, ¯ p, ¯ s)

slide-54
SLIDE 54

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

max

w d

s.t. w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

slide-55
SLIDE 55

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

s.t. w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

min

w

1 2w2

slide-56
SLIDE 56

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

s.t. w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

min

w

1 2w2

Exponential number of constraints

slide-57
SLIDE 57

Joseph Keshet, The Hebrew University

Learning Paradigm

Discriminative learning from examples

f(¯ x, ¯ p)

Keyword spotter

S = {(¯ p1, ¯ x+

1 , ¯

x−

1 , ¯

s1), . . . , (¯ pm, ¯ x+

m, ¯

x−

m, ¯

sm)}

s.t. w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

min

w

1 2w2

slide-58
SLIDE 58

Joseph Keshet, The Hebrew University

Iterative Algorithm

Given a training set: Find

w

{

w = arg min 1

2w2

such that S = {(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj)} w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

slide-59
SLIDE 59

Joseph Keshet, The Hebrew University

Iterative Algorithm

Given a training set: Find

w

{

w = arg min 1

2w2

such that S = {(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj)}

Exponential number of constraints

w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

slide-60
SLIDE 60

Joseph Keshet, The Hebrew University

Iterative Algorithm

Given a training set: Find

w

{

w = arg min 1

2w2

such that S = {(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj)} w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀j ∀¯ s′

slide-61
SLIDE 61

Joseph Keshet, The Hebrew University

Iterative Algorithm

Denote current suggestion by Process one example at a time

{

(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj) wj−1 wj = arg min 1 2w − wj−12 such that

w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀¯ s′

slide-62
SLIDE 62

Joseph Keshet, The Hebrew University

Iterative Algorithm

Denote current suggestion by Process one example at a time

{

Exponential number of constraints

(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj) wj−1 wj = arg min 1 2w − wj−12 such that

w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀¯ s′

slide-63
SLIDE 63

Joseph Keshet, The Hebrew University

Iterative Algorithm

Denote current suggestion by Process one example at a time

{

(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj) wj−1 wj = arg min 1 2w − wj−12 such that

w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ∀¯ s′

slide-64
SLIDE 64

Joseph Keshet, The Hebrew University

Iterative Algorithm

Approximation: Replace exponentially many constraints with a single (most violated) constraint. Define:

{

wj = arg min 1 2w − wj−12 such that w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1 ¯ s′ = arg max

¯ s

wj−1 · φ(¯ x−

j , ¯

pj, ¯ s)

slide-65
SLIDE 65

Joseph Keshet, The Hebrew University

Iterative Algorithm

Approximation: Replace exponentially many constraints with a single (most violated) constraint. Define:

{

wj = arg min 1 2w − wj−12 such that w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′) ≥ 1

wj = wj−1 + 1 − wj−1∆φ ∆φ2 ∆φ = w · φ(¯ x+

j , ¯

pj, ¯ sj) − w · φ(¯ x−

j , ¯

pj, ¯ s′)

¯ s′ = arg max

¯ s

wj−1 · φ(¯ x−

j , ¯

pj, ¯ s)

slide-66
SLIDE 66

Joseph Keshet, The Hebrew University

Iterative Algorithm

Input: training set Initialize: For each example Predict: Set: If Update: Output Choose which attains the lowest cost

  • n a validation set.

w0 = 0

wj = wj−1 + 1 − wj−1∆φ ∆φ2

wj (¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj) S = {(¯ pj, ¯ x+

j , ¯

x−

j , ¯

sj)} ¯ s′ = arg max

¯ s

wj−1 · φ(¯ x−

j , ¯

pj, ¯ s) ∆φ = φ(¯ x+

j , ¯

pj, ¯ sj) − φ(¯ x−

j , ¯

pj, ¯ s′) w · ∆φ ≤ 1

slide-67
SLIDE 67

Joseph Keshet, The Hebrew University

Formal Properties

  • Convex optimization problem - single minimum
  • Worse case analysis: Area Under Curve during

the training phase is high

  • The expected Area Under Curve on unseen

examples is high in probability

1 − A ≤ 1 m

m

  • i=1

ℓ(w⋆) + w⋆2 m + O

  • ln(m/δ),

1 √mval

  • 1 − ˜

A ≤ 1 mw⋆2 + 2C m

m

  • i=1

ℓ(w⋆)

slide-68
SLIDE 68

Joseph Keshet, The Hebrew University

Experimental Results

slide-69
SLIDE 69

Joseph Keshet, The Hebrew University

Training Setup

  • TIMIT corpus
  • Phoneme representation:

– 39 phonemes (Lee & Hon, 1989)

  • Acoustic Representation:

– MFCC+∆+∆∆ (ETSI standard)

  • TIMIT training set:

– 500 utterances for training set of the feature functions – 3116 utterance used for training set – 80 utterances used for validation (40 keywords)

slide-70
SLIDE 70

Joseph Keshet, The Hebrew University

Results on TIMIT

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 false positive rate true positive rate discriminative HMM

Area under the ROC curve: 0.99 discriminative 0.96 HMM

80 new keywords, and for each, 20 positive and 20 negative utterances

slide-71
SLIDE 71

Joseph Keshet, The Hebrew University

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 false positive rate true positive rate discriminative HMM

Results on WSJ

Area under the ROC curve: 0.94 discriminative 0.88 HMM

model trained on TIMIT, same 80 new keywords, and for each, 20 positive and 20 negative utterances from si_tr_s part of WSJ

slide-72
SLIDE 72

Joseph Keshet, The Hebrew University

Practicalities & Algorithms

  • The quadratic programming

– Algorithm for solving the quadratic programming with exponential number of constraints

[Keshet, Grangier and Bengio, 2006]

  • Training the feature function classifiers

– Hierarchical phoneme classifier

[Dekel, Keshet and Singer, 2004]

  • Non-separable case

– Common technique in training soft SVM

[Cristianini & Shawe-Taylor, 2000; Vapnik, 1998]

slide-73
SLIDE 73

Joseph Keshet, The Hebrew University

Thanks!