Speech Processing 15-492/18-492 Speech Recognition Template - - PowerPoint PPT Presentation

speech processing 15 492 18 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 15-492/18-492 Speech Recognition Template - - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by Templates A little history A little history Matching Templates Matching Templates DTW (Dynamic Time Warping) DTW (Dynamic


slide-1
SLIDE 1

Speech Processing 15-492/18-492

Speech Recognition Template matching

slide-2
SLIDE 2

Speech Recognition by Templates

  • A little history …

A little history …

  • Matching Templates

Matching Templates

  • DTW (Dynamic Time Warping)

DTW (Dynamic Time Warping)

  • Beyond template matching

Beyond template matching

slide-3
SLIDE 3

Radio Rex (1922)

  • Toys always lead technology …
  • Call “Rex” and he comes out of his kennel
  • (Crystalradio.com and Rhys Jones)
slide-4
SLIDE 4

Toy ASR“Tricks”

  • Radio Rex

Radio Rex

  • Recognizes vowel formants in “EH”

Recognizes vowel formants in “EH”

  • Voice activated toy train

Voice activated toy train

  • Multilingual stop/go

Multilingual stop/go hashire/tomate hashire/tomate

  • Toys “pets” don’t need perfect ASR

Toys “pets” don’t need perfect ASR

slide-5
SLIDE 5

Template Matching

  • Record templates from user

Record templates from user

  • Store in library

Store in library

  • Record ASR example

Record ASR example

  • Compare against each library template

Compare against each library template

  • Select closest example

Select closest example

  • For example …

For example …

  • On a voice dialing system

On a voice dialing system

slide-6
SLIDE 6

Voice Dialing System

  • Library

– Mom – Dad – Bob – Mario’s Pizza – Let’s Go Bus Information System

slide-7
SLIDE 7

Matching in Time Domain

  • Duration

Duration

  • Will discriminate some examples

Will discriminate some examples

  • But Mom, Bob and Dad will be confused

But Mom, Bob and Dad will be confused

  • What about spectral properties

What about spectral properties

slide-8
SLIDE 8

Matching in Frequency Domain

Mom Bob

slide-9
SLIDE 9

Different deliveries

  • We change durations

We change durations

  • Two utterances are never the same

Two utterances are never the same

  • When it fails we change our delivery

When it fails we change our delivery

  • Become more

Become more articular articular

  • “clearer”

“clearer”

slide-10
SLIDE 10

Dynamic Time Warping

Template Sample Speech

slide-11
SLIDE 11

DTW algorithm

  • For each square

For each square

  • Dist(template[i],sample[j

Dist(template[i],sample[j]) + ]) + smallest_of smallest_of (Dist(template[i (Dist(template[i-

  • 1],sample[j])

1],sample[j]) Dist(template[i],sample[j Dist(template[i],sample[j-

  • 1])

1]) Dist(template[i Dist(template[i-

  • 1],sample[j

1],sample[j-

  • 1])

1]) Remember which choice your took (count path) Remember which choice your took (count path)

Template Sample

j-1 j i i-1

slide-12
SLIDE 12

Multiple Templates

  • Compare against each

Compare against each

  • Find closest

Find closest

  • Need to normalize scores

Need to normalize scores

  • (divide by length of matches)

(divide by length of matches)

slide-13
SLIDE 13

Matching Templates

Sample Template Library Word0 Word1 Word2 … For Word in Templates Score = dtw(Template[Word], Sample); if (Score < BestScore) BestWord = Word; DoAction(Action[BestWord])

slide-14
SLIDE 14

DTW issues

  • What happens with no

What happens with no-

  • matches

matches

  • Need to deal with none of the above

Need to deal with none of the above

  • What happens with more templates

What happens with more templates

  • Harder to choose between

Harder to choose between

  • Once variance greater than differences

Once variance greater than differences

  • Choose templates that are very different

Choose templates that are very different

slide-15
SLIDE 15

DTW/Template Applications

  • Voice dialer

Voice dialer

  • Simple command and control

Simple command and control

  • Speaker ID

Speaker ID

slide-16
SLIDE 16

Speaker ID

Sample Template Library Speaker0 Speaker1 Speaker2 … For Speaker in Templates Score = dtw(Template[Speaker], Sample); if (Score < BestScore) BestSpeaker = Speaker;

slide-17
SLIDE 17

DTW

  • Advantages

Advantages

  • Works well for small number of templates (<20)

Works well for small number of templates (<20)

  • Language independent

Language independent

  • Speaker specific

Speaker specific

  • Easy to train (end user controls it)

Easy to train (end user controls it)

  • Disadvantages

Disadvantages

  • Limited number of templates

Limited number of templates

  • Speaker specific

Speaker specific

  • Need actual training examples

Need actual training examples

slide-18
SLIDE 18

More reliable matching

  • Distance metric

– Euclidean

  • But some distances are bigger than others

– Silence is pretty similar – Fricatives are quite larger

  • A longer fricative might give large score
  • A longer vowel might give smaller score
slide-19
SLIDE 19

More reliable matching

  • Having multiple template examples

– Individual matches or – Average them together

  • DTW align all of the examples
  • Collect statistics as a Gaussian

– Mean and standard deviation for each coeff

slide-20
SLIDE 20

More reliable distances

  • Instead of Euclidean distance

– Doesn’t care about the standard deviation

  • Use Mahalanobis distance

– Care about means and standard deviation

slide-21
SLIDE 21

Extending Template matching

  • String word templates together

String word templates together

  • Need to find word segmentation

Need to find word segmentation

  • But there are many words …

But there are many words …

Word0 Word1 Word2 …

slide-22
SLIDE 22

Extending template model

  • String phoneme templates together

String phoneme templates together

  • A template model for each phoneme

A template model for each phoneme k ae t Sample Phone0 Phone1 Phone2 … Phoneme Templates

slide-23
SLIDE 23

Summary

  • Speech Recognition by Templates

Speech Recognition by Templates

  • Good for simple small vocabulary tasks

Good for simple small vocabulary tasks

  • Dynamic Time Warping (DTW)

Dynamic Time Warping (DTW)

  • Can match different durational examples

Can match different durational examples

  • Averaging over multiple models

Averaging over multiple models

  • Distance metrics

Distance metrics

  • Euclidean

Euclidean vs vs Mahalanobis Mahalanobis

slide-24
SLIDE 24