missing data speech recognition in rever berant acoustic
play

Missing data speech recognition in rever- berant acoustic conditions - PowerPoint PPT Presentation

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test


  1. Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy

  2. Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test conditions 5.Results 6.Discussion

  3. 1. Introduction to reverberation 30 25 20 15 10 5 0 −5 −10 −15 −15 −10 −5 0 5 10 15 20 25 30 Fig1. Image expansion. Direct sound.

  4. 1. Introduction to reverberation 150 100 50 0 −50 −100 −150 −200 −150 −100 −50 0 50 100 150 200 Fig2. Image expansion. 3rd order reflections.

  5. 1. Introduction to reverberation 1 0 0.8 −20 0.6 −40 0.4 0.2 −60 0 −80 −0.2 −0.4 −100 −0.6 −120 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −0.8 4 0 1000 2000 3000 4000 5000 6000 7000 x 10 Fig3. Room impulse responses. Left: linear amplitude. Right: log-amplitude.

  6. 2. Speech modulation frequencies 30 30 25 25 20 20 15 15 10 10 5 5 10 20 30 40 50 60 70 80 20 40 60 80 100 120 140 160 Fig4. Ratemap for Fig5. FFT of the utterance 5527. ratemap of Fig3.

  7. 3. Reverberation masking model g rate map cube root auditory downsampling by compression missing data filterbank leaky integrator speech masking recogniser threshold filter mask Magn (dB) 20 10 0 −10 0 10 20 30 40 50 Freq. (Hz) Fig6. Diagram of the model.

  8. 3. Reverberation masking model 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 Fig7. Rate maps; Top: clean. Fig8. Masks; Top: a-priori. Bottom: reverberated Bottom: reverb. masking

  9. 4. Test conditions Room 1: 15 x 13 x 6.5 m 3 , T60=0.7 sec., D/R=0 & -10 dB Room 2: 25 x 20 x 6 m 3 , T60=1.7 sec., D/R=0 & -10 dB Room 3: 55 x 35 x 14 m 3 , T60=2.7 sec., D/R=0 & -10 dB

  10. 5. Results T 60 =0.7s T 60 =0.7s T 60 =1.7s T 60 =2.7s T 60 =2.7s T 60 =1.7s Recognition Clean D/R= D/R= D/R= D/R= D/R=- Technique D/R=0dB 0dB -10db -10dB 0dB 10dB Unity mask 98.26 62.48 46.39 42.82 29.24 34.90 24.28 MFCC 99.65 60.40 47.08 47.35 34.46 40.73 28.55 Reverb. 90.33 85.03 82.25 71.11 66.05 45.52 masking a priori mask 95.82 93.73 92.42 89.12 90.60 87.99 T

  11. 6. Discussion •Advantage over robust feature tech- niques: + Mask estimation can be changed on the fly when condi- tions change -> no-retraining required when the rule is changed + Better performance ??? • Currently all thresholds are hand tuned, an adaptive system is under work

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend