Missing data speech recognition in rever- berant acoustic conditions - - PowerPoint PPT Presentation

missing data speech recognition in rever berant acoustic
SMART_READER_LITE
LIVE PREVIEW

Missing data speech recognition in rever- berant acoustic conditions - - PowerPoint PPT Presentation

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test


slide-1
SLIDE 1

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon

ONE SIX FOUR... !

  • Mr. M. D. Dummy
slide-2
SLIDE 2

Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test conditions 5.Results 6.Discussion

slide-3
SLIDE 3

−15 −10 −5 5 10 15 20 25 30 −15 −10 −5 5 10 15 20 25 30

  • 1. Introduction to reverberation
  • Fig1. Image expansion. Direct sound.
slide-4
SLIDE 4

−200 −150 −100 −50 50 100 150 200 −150 −100 −50 50 100 150

  • 1. Introduction to reverberation
  • Fig2. Image expansion. 3rd order reflections.
slide-5
SLIDE 5

1000 2000 3000 4000 5000 6000 7000 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 10

4

−120 −100 −80 −60 −40 −20

  • 1. Introduction to reverberation
  • Fig3. Room impulse responses. Left: linear
  • amplitude. Right: log-amplitude.
slide-6
SLIDE 6

10 20 30 40 50 60 70 80 5 10 15 20 25 30 20 40 60 80 100 120 140 160 5 10 15 20 25 30

  • 2. Speech modulation frequencies
  • Fig4. Ratemap for

utterance 5527.

  • Fig5. FFT of the

ratemap of Fig3.

slide-7
SLIDE 7

10 20 30 40 50 −10 10 20 Magn (dB)

  • Freq. (Hz)
  • 3. Reverberation masking model

auditory filterbank leaky integrator missing data speech recogniser cube root g rate map threshold downsampling by masking filter compression mask

  • Fig6. Diagram of the model.
slide-8
SLIDE 8

0.5 1 1.5 2 2.5 0.5 2 6.4 0.5 1 1.5 2 2.5 0.5 2 6.4 0.5 1 1.5 2 2.5 0.5 2 6.4 0.5 1 1.5 2 2.5 0.5 2 6.4

  • 3. Reverberation masking model
  • Fig7. Rate maps; Top: clean.

Bottom: reverberated

  • Fig8. Masks; Top: a-priori.

Bottom: reverb. masking

slide-9
SLIDE 9
  • 4. Test conditions

Room 1: 15 x 13 x 6.5 m3, T60=0.7 sec., D/R=0 & -10 dB Room 2: 25 x 20 x 6 m3, T60=1.7 sec., D/R=0 & -10 dB Room 3: 55 x 35 x 14 m3, T60=2.7 sec., D/R=0 & -10 dB

slide-10
SLIDE 10
  • 5. Results

Recognition Technique Clean T60=0.7s D/R= 0dB T60=0.7s D/R=

  • 10db

T60=1.7s D/R=0dB T60=1.7s D/R=

  • 10dB

T60=2.7s D/R= 0dB T60=2.7s D/R=- 10dB Unity mask 98.26 62.48 46.39 42.82 29.24 34.90 24.28 MFCC 99.65 60.40 47.08 47.35 34.46 40.73 28.55 Reverb. masking 90.33 85.03 82.25 71.11 66.05 45.52 a priori mask 95.82 93.73 92.42 89.12 90.60 87.99

T

slide-11
SLIDE 11
  • 6. Discussion
  • Advantage over robust feature tech-

niques: + Mask estimation can be changed on the fly when condi-

tions change -> no-retraining required when the rule is changed

+ Better performance ???

  • Currently all thresholds are hand tuned,

an adaptive system is under work