microphone array processing for distant speech recognition
play

Microphone Array Processing for Distant Speech Recognition From - PowerPoint PPT Presentation

SPSC - Microphone Array Processing for Distant Speech Recognition Microphone Array Processing for Distant Speech Recognition From close-talking microphones to far-field sensors Hannes Pessentheiner Signal Processing and Speech Communication


  1. SPSC - Microphone Array Processing for Distant Speech Recognition Microphone Array Processing for Distant Speech Recognition From close-talking microphones to far-field sensors Hannes Pessentheiner Signal Processing and Speech Communication Laboratory Advanced Signal Processing 2 Hannes Pessentheiner Advanced Signal Processing 2 page 1/32

  2. SPSC - Microphone Array Processing for Distant Speech Recognition Distant Speech-Interaction in Robust Home Applications ◮ people that would like to have assistance in everyday life ◮ physically handicapped people Problem: want to live independently - lack of sphere of privacy - depend on other people Solution: ambient assisted living (AAL) - operated by device - operated by voice command AAL-scenario of handicapped woman. What is a main challenge in voice command? Hannes Pessentheiner Advanced Signal Processing 2 page 2/32

  3. SPSC - Microphone Array Processing for Distant Speech Recognition Distant Speech Recognition (DSR) ◮ most natural human computer interface - interaction through speech - no use of body- or head-mounted microphones Block-diagram of a simple DSR system. How does speech capturing work? Hannes Pessentheiner Advanced Signal Processing 2 page 3/32

  4. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Capturing ◮ free field / diffuse field ◮ spherical / planar wave propagation Propagation of spherical (left) and plane (right) wave. What to do with multi-channel data? Hannes Pessentheiner Advanced Signal Processing 2 page 4/32

  5. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Source localization & tracking ◮ estimate speaker’s position / direction for each instant of time ◮ compute trajectory of instantaneous position estimates Sound capturing (left), speaker localization (center), and speaker tracking (right). How to employ directional information? Hannes Pessentheiner Advanced Signal Processing 2 page 5/32

  6. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming How to improve pre-enhanced signal? Hannes Pessentheiner Advanced Signal Processing 2 page 6/32

  7. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Postfiltering Beamformer-related Postfilter (ao) Generalized Sidelobe Canceller (GSC): Filter&Sum-Beamformer w H with parallel filter. q Autonomous Postfilter (ao) ◮ single-channel source separation filter ◮ echo & noise attenuation filter ◮ spectral subtraction What about errors in DSR systems? Hannes Pessentheiner Advanced Signal Processing 2 page 7/32

  8. SPSC - Microphone Array Processing for Distant Speech Recognition Errors in DSR Major Errors ◮ front-end errors ◮ corrupted training material for single-channel source separation or speech recognizer Minor Errors (ao) Representative front-end errors. ◮ distorted features ◮ numerical accuracy (single/double precision) How to measure performance of DSR system? Hannes Pessentheiner Advanced Signal Processing 2 page 8/32

  9. SPSC - Microphone Array Processing for Distant Speech Recognition Metrics Word Error Rate (WER) WER = S + D + I S + D + C where I . . . # of insertions S . . . # of substitutions D . . . # of deletions C . . . # of of corrects Word Accuracy Rate (WACC) WACC = 1 − WER Hannes Pessentheiner Advanced Signal Processing 2 page 9/32

  10. SPSC - Microphone Array Processing for Distant Speech Recognition Metrics cont’d Real-valued Kurtosis (peakedness) K ( X ) = E {| X | 4 } − β · E {| X | 2 } 2 where X . . . random variable (RV) E . . . expectation operator β . . . positive constant K > 1 : super-Gaussian probability density function (PDF) K = 0 : Gaussian PDF K < 0 : sub-Gaussian PDF Hannes Pessentheiner Advanced Signal Processing 2 page 10/32

  11. SPSC - Microphone Array Processing for Distant Speech Recognition Metrics cont’d Negentropy (Gaussian-distance) N ( X ) = H ( X Gaussian ) − H ( X ) with differential entropy � H ( X ) = − p X ( x ) log p X ( x ) dx = − E { log p X ( x ) } N = 0 : Gaussian PDF N > 0 : non-Gaussian PDF Why to consider kurtosis & negentropy? Hannes Pessentheiner Advanced Signal Processing 2 page 11/32

  12. SPSC - Microphone Array Processing for Distant Speech Recognition Distribution of Speech Samples Histograms of real parts of sub-band frequency components ( f = 800 Hz) of (a) clean speech, (b) noise-corrupted speech, and (c) reverberated speech snapshots. ◮ PDF of sum of independent RVs approach Gaussian in limit - mix of speech, reverb, & noise exhibits Gaussian PDF - clean speech exhibits super-Gaussian PDF - use N and K to restore super-Gaussianity How to restore super-Gaussianity? Hannes Pessentheiner Advanced Signal Processing 2 page 12/32

  13. SPSC - Microphone Array Processing for Distant Speech Recognition Conventional Beamforming Assumptions ◮ signal s ( ω ) exhibits plane wave characteristic ◮ microphone i captures noise-corrupted signal x i ( ω ) x ( ω ) = s ( ω ) d ( ω, k ) + v ( ω ) where ω . . . radial frequency k . . . wave-frequency vector d . . . array manifold / sound capture model vector x . . . snapshot vector Hannes Pessentheiner Advanced Signal Processing 2 page 13/32

  14. SPSC - Microphone Array Processing for Distant Speech Recognition Conventional Beamforming cont’d Beamforming ◮ linear spatio-temporal filter ◮ compensate d ( ω, k ) for steering direction y ( ω ) = w H ( ω ) x ( ω ) with w H ( ω ) d ( ω, k ) = 1 Hannes Pessentheiner Advanced Signal Processing 2 page 14/32

  15. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d 3D directivity pattern of two different beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 15/32

  16. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Delay&Sum (DS) w ( ω ) = d ( ω, k ) N Minimum Variance Distortionless Response (MVDR) w H ( ω ) R NN ( ω ) w ( ω ) arg min w ( ω ) d H ( ω, k ) R − 1 NN ( ω ) w H ( ω ) = d H ( ω, k ) R − 1 NN ( ω ) d ( ω, k ) Hannes Pessentheiner Advanced Signal Processing 2 page 16/32

  17. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d MVDR with Diagonal Loading ◮ consider quadratic constraint: 0 < | w | 2 < γ ◮ optimization replaces R − 1 NN by R − 1 NN ( ω ) + σ 2 I where σ . . . loading level I . . . identity matrix Super-directive MVDR � � ω · l m,n ◮ replace R NN by Γ m,n = sinc c where l m,n . . . distance between microphones m and n c . . . sound velocity Hannes Pessentheiner Advanced Signal Processing 2 page 17/32

  18. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d One-dimensional directivity pattern of DS and MVDR. Hannes Pessentheiner Advanced Signal Processing 2 page 18/32

  19. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d 3D Convex-optimized (CVX) ◮ more constraints & optimization in 3 spatial dimensions w ( ω ) � G ( ω ) · [ w ( ω ) ⊗ I ] − ˆ arg min D � F subject to | w T ( ω ) d ( ω ) | 2 w H ( ω ) d ( ω ) = 1 w H ( ω ) V ( ω ) = 0 w H ( ω ) w ( ω ) ≥ γ , , , � �� � � �� � � �� � distortionless response null steering white noise gain where G . . . 3D capturing response matrix ˆ D . . . 3D desired response matrix V . . . 3D null steering matrix Hannes Pessentheiner Advanced Signal Processing 2 page 19/32

  20. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Two-dimensional CVX directivity pattern with synthesized null and frequency-invariance. Hannes Pessentheiner Advanced Signal Processing 2 page 20/32

  21. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Generalized Sidelobe Canceller (GSC) ◮ combine beamformer and postfilter Block-diagram of a GSC beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 21/32

  22. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Subspace Maximum Kurtosis (MK) / Negentropy (MN) ◮ based on GSC, but with subspace filter matrix U that - reduces dimensionality and - decomposes signal into spatially correlated and ambient comp. ◮ use kurtosis or negentropy to detect (sub-/super-)Gaussianity Block-diagram of a MK/MN beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 22/32

  23. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Modes (Eigenvalues) of spatially correlated and ambient components. Hannes Pessentheiner Advanced Signal Processing 2 page 23/32

  24. SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Super-directive MVDR based on Spherical Harmonics ◮ based on 3D microphone array ◮ frequency-invariant directivity pattern ◮ directivity stability around spherical array ◮ redefine array manifold / sound capture model vector d ◮ all beamforming techniques can be applied Eigenmike: spherical microphone array. Hannes Pessentheiner Advanced Signal Processing 2 page 24/32

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend