# Multi-Microphone Speech Dereverberation using - PowerPoint PPT Presentation

## Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz Schwartz 1 ,Sharon Gannot 1 ,Emanu el A.P. Habets 2 1 Faculty of Engineering, Bar-Ilan University,Israel 2 International Audio Laboratories

1. Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz Schwartz 1 ,Sharon Gannot 1 ,Emanu¨ el A.P. Habets 2 1 Faculty of Engineering, Bar-Ilan University,Israel 2 International Audio Laboratories Erlangen, Germany EUSIPCO 2013, Marrakesh, Morocco September 10th B. Schwartz, S. Gannot, E. Habets KEMD 1 / 26

2. Introduction Outline Outline Statistical Model Speech and noise model Acoustical system model B. Schwartz, S. Gannot, E. Habets KEMD 2 / 26

3. Introduction Outline Outline Statistical Model Speech and noise model Acoustical system model EM-Kalman Algorithm Maximum likelihood problem EM algorithm approach Kalman Smoother in E step B. Schwartz, S. Gannot, E. Habets KEMD 2 / 26

4. Introduction Outline Outline Statistical Model Speech and noise model Acoustical system model EM-Kalman Algorithm Maximum likelihood problem EM algorithm approach Kalman Smoother in E step Experiments Algorithm Initialization Spectral Profile Results B. Schwartz, S. Gannot, E. Habets KEMD 2 / 26

5. Introduction Problem Statement Problem Statement B. Schwartz, S. Gannot, E. Habets KEMD 3 / 26

6. Introduction Problem Statement Problem Statement B. Schwartz, S. Gannot, E. Habets KEMD 4 / 26

7. Introduction Problem Statement Problem Statement   h 1 k   z t k , 1   v t k , 1 B. Schwartz, S. Gannot, E. Habets KEMD 5 / 26

8. Introduction Problem Statement Problem Statement   z t k ,   J h J k   v t k , J   h 1 k     z t k , h 2 k 1   v t k , 1   v t k ,   2 z t k , 2 B. Schwartz, S. Gannot, E. Habets KEMD 6 / 26

9.             Statistical Model Convolutive Transfer Function (CTF) Model Time-domain system z j [ n ] = x [ n ] ∗ h j [ n ] + v j [ n ] ; j = 1 , . . . , J B. Schwartz, S. Gannot, E. Habets KEMD 7 / 26

10.       Statistical Model Convolutive Transfer Function (CTF) Model Time-domain system z j [ n ] = x [ n ] ∗ h j [ n ] + v j [ n ] ; j = 1 , . . . , J � z j ( t, k ) ≈ h j,l ( k ) x ( t − l, k ) + v j ( t, k ) CTF system l ∈� L �     x t k , z t k , k k t t   h k B. Schwartz, S. Gannot, E. Habets KEMD 7 / 26

11. Statistical Model Convolutive Transfer Function (CTF) Model Time-domain system z j [ n ] = x [ n ] ∗ h j [ n ] + v j [ n ] ; j = 1 , . . . , J � z j ( t, k ) ≈ h j,l ( k ) x ( t − l, k ) + v j ( t, k ) CTF system l ∈� L � � � h j,l ( k, k ′ ) x ( t − l, k ′ ) + v j ( t, k ) z j ( t, k ) = STFT-domain system k ′ ∈� K ′ � l ∈� L �         x t k , z t k , x t k , z t k , k k k k t t t t     h k h k B. Schwartz, S. Gannot, E. Habets KEMD 7 / 26

12. Statistical Model Vector Representation and Signals Model Vector Representation where: CTF-convolution: x t ( k ) = [ x ( t, k ) , . . . , x ( t − L, k )] z j ( t, k ) ≈ h T j ( k ) x t ( k ) + v j ( t, k ) h T j ( k ) = [ h j, 0 ( k ) , . . . , h j,L − 1 ( k )] Speech Model Noise Model � � � � 0 , σ 2 0 , σ 2 v j ( t, k ) ∼ N c x ( t, k ) ∼ N c x ( t, k ) v ( k ) B. Schwartz, S. Gannot, E. Habets KEMD 8 / 26

13. Statistical Model Model Parameters B. Schwartz, S. Gannot, E. Habets KEMD 9 / 26

14. Statistical Model Model Parameters B. Schwartz, S. Gannot, E. Habets KEMD 10 / 26

15. Statistical Model Model Parameters B. Schwartz, S. Gannot, E. Habets KEMD 11 / 26

16. Algorithm Derivation Parameter Estimation Problem Maximum-Likelihood and EM Maximum-Likelihood Find:   z t k , 1 Speech variance CTF coefficients Noise variance that maximizes the likelihood function. EM Algorithm   z t k , Define a latent data set, that if available, 1 would facilitate the parameter estimation. The algorithm iterates between evaluation of the latent data, and the estimation of the parameters. B. Schwartz, S. Gannot, E. Habets KEMD 12 / 26

17. Algorithm Derivation Parameter Estimation Problem Maximum-Likelihood Problem Statement Set of Measurements Parameters 1 ≤ j ≤ J � � σ 2 x ( t, k ) , h j ( k ) , σ 2 Θ ≡ v j ( k ) 1 ≤ t ≤ T z j ( t, k ) 1 ≤ k ≤ K Maximum Likelihood argmax { f ( Z ; Θ ) } Θ B. Schwartz, S. Gannot, E. Habets KEMD 13 / 26

18. Algorithm Derivation Parameter Estimation Problem Latent Data and EM Latent Data Set 1 ≤ t ≤ T x ( t, k ) 1 ≤ k ≤ K EM Algorithm E-step (after ℓ iterations): � � ( ℓ ) � � � ( ℓ ) � � � � � � Z ; � ≡ E log f ( Z , X ; Θ ) Q Θ Θ Θ M-step: � � ( ℓ ) � ( ℓ +1) = arg max � � � � Θ Θ Q Θ Θ B. Schwartz, S. Gannot, E. Habets KEMD 14 / 26

19. Algorithm Derivation Parameter Estimation Problem Algorithm Outline E Step: M Step: Kalman Parameters Smoother Estimation B. Schwartz, S. Gannot, E. Habets KEMD 15 / 26

20. Algorithm Derivation E-step Kalman Smoother: Forward ( t = 1 , . . . , T ) Formulas B. Schwartz, S. Gannot, E. Habets KEMD 16 / 26

21. Algorithm Derivation E-step Kalman Smoother: Backward ( t = T, . . . , 1) Formulas B. Schwartz, S. Gannot, E. Habets KEMD 17 / 26

22. Algorithm Derivation E-step Obtaining Data Required for the M-Step The Smoother output For the M-step we need: Signal estimation: First- and second-order statistics: E { x t |Z ; Θ } = � x t | T E { x t |Z ; Θ } � x t x t † � � and the estimation error: � � Z ; Θ ��� � † � E ; t = 1 , . . . , T � �� P t | T ≡ E x t | T − x t x t | T − x t Second-order statistics is given by: � x t x t † � � � x † � Z ; Θ = � x t | T � t | T + P t | T E B. Schwartz, S. Gannot, E. Habets KEMD 18 / 26

23. Algorithm Derivation M-step M-Step: Parameter Estimation Speech Variance Reverberation System Trivial variance Least-squares system identification: estimation: � T � − 1 � T � � � ✭ ❤ ❤ ✭ h ( ℓ +1) � x t x t † x t · z ∗ ( ℓ +1) ( t ) = ✭ ❤ ❤ × � ✭ = j ( t ) � | x ( t ) | 2 σ 2 j x t =1 t =1 Noise Variance Residual noise calculation: ✭✭✭✭✭ ❤ ❤ ❤ � � ❤ ❤ � � T � T 2 ( ℓ +1) = 1 � � � � � h ( ℓ +1) � σ 2 � z j ( t ) − x t � v j j T t =1 B. Schwartz, S. Gannot, E. Habets KEMD 19 / 26

24. Experiments Sonograms 8 8 10 0 0 6 6 Frequency [KHz] Frequency [KHz] −10 −10 4 4 −20 −20 −30 −30 2 2 −40 −40 0 0 1 1 −50 Amplitude −50 Amplitude 0 0 −60 −60 −1 −1 −70 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time [Sec] Time [Sec] (a) Clean signal (b) Reverberant signal 8 8 10 10 0 6 0 6 Frequency [KHz] Frequency [KHz] −10 −10 4 4 −20 −20 −30 2 2 −30 −40 −40 0 0 1 1 Amplitude Amplitude −50 −50 0 0 −60 −60 −1 −1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time [Sec] Time [Sec] (c) 1 st Iteration (d) 10 th Iteration B. Schwartz, S. Gannot, E. Habets KEMD 20 / 26

25. Experiments Practical Considerations - H Initialization Alternatives Room Impulse Response 0.03 First arrivals are First Arrivals 0.02 known in advance 0.01 (use STFT-domain 0 representation). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) B. Schwartz, S. Gannot, E. Habets KEMD 21 / 26

26. Experiments Practical Considerations - H Initialization Alternatives Room Impulse Response 0.03 First arrivals are First Arrivals 0.02 known in advance 0.01 (use STFT-domain 0 representation). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) Mic 1,2,3,4 No a priori knowledge 0.03 Single peak 0.02 - H set to time aligned 0.01 and equal in amplitude 0 peaks (STFT). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) B. Schwartz, S. Gannot, E. Habets KEMD 21 / 26

27. Experiments Practical Considerations - H Initialization Alternatives Room Impulse Response 0.03 First arrivals are First Arrivals 0.02 known in advance 0.01 (use STFT-domain 0 representation). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) Mic 1,2,3,4 No a priori knowledge 0.03 Single peak 0.02 - H set to time aligned 0.01 and equal in amplitude 0 peaks (STFT). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) Tests results No significant difference: subjective and objective. B. Schwartz, S. Gannot, E. Habets KEMD 21 / 26

28. Experiments Practical Considerations - σ 2 x Initialization Alternatives 1. Spectral Enhancement 2. Noisy Reverberant Values Pre-processing Using SE method for Using the noisy-reverberant dereverberation [Habets et al. absolute-squared value: x ← | z | 2 2009] as a pre-processing step: σ 2 ( se ) x ← � σ 2 σ 2 x Tests Results With pre-processing: better dereverberation. Without pre-processing: less dereverberation, more natural output. B. Schwartz, S. Gannot, E. Habets KEMD 22 / 26

29. Experiments Gain Ambiguity Signal gain or system gain? z j ( t, k ) = h T j ( k ) x t ( k ) + v j ( t, k ) The result can be a random spectral profile! Suggested cure Preserve the input power profile: T T � � � | z ( t, k ) | 2 σ 2 x ( t, k ) = t =1 t =1 B. Schwartz, S. Gannot, E. Habets KEMD 23 / 26

Recommend

More recommend