Enhanced Robot Audition Based on Microphone Array Source Separation - - PowerPoint PPT Presentation

▶

Apr 19, 2023 269 likes •421 views

Enhanced Robot Audition Based on Microphone Array Source Separation with Post-Filter Jean-Marc Valin , Jean Rouat, Franois Michaud Department of Electrical Engineering and Computer Engineering Universit de Sherbrooke, Qubec, Canada

SLIDE 1

Enhanced Robot Audition Based

n Microphone Array Source

Separation with Post-Filter

Jean-Marc Valin, Jean Rouat, François Michaud Department of Electrical Engineering and Computer Engineering Université de Sherbrooke, Québec, Canada Jean-Marc.Valin@USherbrooke.ca

SLIDE 2

Motivations

The context: mobile robot and cocktail party efgect The problem: separating sound sources The solution: microphone array with both linear and non-linear processing

Geometric source separation Microphones

X n(k ,l) Sm(k ,l)

Sources Post- fjlter

Y m(k ,l) ^ Sm(k ,l)

Separated Sources

SLIDE 3

Approach

Frequency-domain processing Geometric Source Separation (GSS)

Minimize leakage under constraints Adapted for real-time processing

Post-fjlter

Cancels remaining interferences Based on Ephraim and Malah estimator Handles both stationary and non-stationary noise/interference

SLIDE 4

Geometric Source Separation

Frequency domain: Constrained optimization

Minimize correlation of the outputs: Subject to geometric constraint:

Modifjcations to original GSS algorithm

Instantaneous computation of correlations Stochastic-gradient descent

SLIDE 5

Post-Filter Overview

Noise estimate as the sum of two components (stationary + transient)

SLIDE 6

Background Noise Estimation

Minima-Controlled Recursive Average (Cohen)

Noise estimate is adapted during quiet periods Applied for each source of interest

Initial estimate provided directly from the microphones

SLIDE 7

Interference Estimation

Source separation leaks

Incomplete adaptation Inaccuracy in localization Reverberation Imperfect microphones

Estimation from other separated sources

SLIDE 8

Suppression Rule

Ephraim & Malah spectral estimator Gain is modifjed to take into account probability of source being present (Cohen)

SLIDE 9

Experimental Setup

Array of 8 inexpensive microphones on a Pioneer2 robot Automatic localization Noisy conditions 350 ms reverberation time

SLIDE 10

Results (Signal-to-Noise Ratio)

Three voices recorded separately so clean signal is available

SLIDE 11

Results (spectrograms)

Input GSS Post-fjlter output Reference

SLIDE 12

Results (recognition with post-fjlter)

Japanese isolated word recognition (SIG2 robot)

3 simultaneous sources 200 word vocabulary 90 degrees separation

14% reduction in error rate

mixed GSS+pf 66% 15% 41% 71% 21% 53% GSS only right left center

SLIDE 13

Conclusion

Geometric Source Separation

Real-time minimization of leakage

Source separation post-fjlter

Interference estimated using other sources

Future work

Robustness to reverberation Better integration with speech recognition

Using the post-fjlter to estimate ASR feature reliability

riginal

processed

SLIDE 14

Enhanced Robot Audition Based

Separation with Post-Filter

Motivations

The context: mobile robot and cocktail party efgect The problem: separating sound sources The solution: microphone array with both linear and non-linear processing

Approach

Frequency-domain processing Geometric Source Separation (GSS)

Minimize leakage under constraints Adapted for real-time processing

Post-fjlter

Cancels remaining interferences Based on Ephraim and Malah estimator Handles both stationary and non-stationary noise/interference

Geometric Source Separation

Frequency domain: Constrained optimization

Minimize correlation of the outputs: Subject to geometric constraint:

Modifjcations to original GSS algorithm

Instantaneous computation of correlations Stochastic-gradient descent

Post-Filter Overview

Noise estimate as the sum of two components (stationary + transient)

Background Noise Estimation

Minima-Controlled Recursive Average (Cohen)

Noise estimate is adapted during quiet periods Applied for each source of interest

Initial estimate provided directly from the microphones

Interference Estimation

Source separation leaks

Incomplete adaptation Inaccuracy in localization Reverberation Imperfect microphones

Estimation from other separated sources

Suppression Rule

Ephraim & Malah spectral estimator Gain is modifjed to take into account probability of source being present (Cohen)

Experimental Setup

Array of 8 inexpensive microphones on a Pioneer2 robot Automatic localization Noisy conditions 350 ms reverberation time

Results (Signal-to-Noise Ratio)

Three voices recorded separately so clean signal is available

Results (spectrograms)

Input GSS Post-fjlter output Reference

Results (recognition with post-fjlter)

Japanese isolated word recognition (SIG2 robot)

3 simultaneous sources 200 word vocabulary 90 degrees separation

14% reduction in error rate

mixed GSS+pf 66% 15% 41% 71% 21% 53% GSS only right left center

Conclusion

Geometric Source Separation

Real-time minimization of leakage

Source separation post-fjlter

Interference estimated using other sources

Future work

Robustness to reverberation Better integration with speech recognition

Using the post-fjlter to estimate ASR feature reliability

Questions?