Multi-Microphone Speech Dereverberation using - PowerPoint PPT Presentation

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz Schwartz 1 ,Sharon Gannot 1 ,Emanu¨ el A.P. Habets 2 1 Faculty of Engineering, Bar-Ilan University,Israel 2 International Audio Laboratories Erlangen, Germany EUSIPCO 2013, Marrakesh, Morocco September 10th B. Schwartz, S. Gannot, E. Habets KEMD 1 / 26

Introduction Outline Outline Statistical Model Speech and noise model Acoustical system model B. Schwartz, S. Gannot, E. Habets KEMD 2 / 26

Introduction Outline Outline Statistical Model Speech and noise model Acoustical system model EM-Kalman Algorithm Maximum likelihood problem EM algorithm approach Kalman Smoother in E step B. Schwartz, S. Gannot, E. Habets KEMD 2 / 26

Introduction Outline Outline Statistical Model Speech and noise model Acoustical system model EM-Kalman Algorithm Maximum likelihood problem EM algorithm approach Kalman Smoother in E step Experiments Algorithm Initialization Spectral Profile Results B. Schwartz, S. Gannot, E. Habets KEMD 2 / 26

Introduction Problem Statement Problem Statement B. Schwartz, S. Gannot, E. Habets KEMD 3 / 26

Introduction Problem Statement Problem Statement B. Schwartz, S. Gannot, E. Habets KEMD 4 / 26

Introduction Problem Statement Problem Statement   h 1 k   z t k , 1   v t k , 1 B. Schwartz, S. Gannot, E. Habets KEMD 5 / 26

Introduction Problem Statement Problem Statement   z t k ,   J h J k   v t k , J   h 1 k     z t k , h 2 k 1   v t k , 1   v t k ,   2 z t k , 2 B. Schwartz, S. Gannot, E. Habets KEMD 6 / 26

            Statistical Model Convolutive Transfer Function (CTF) Model Time-domain system z j [ n ] = x [ n ] ∗ h j [ n ] + v j [ n ] ; j = 1 , . . . , J B. Schwartz, S. Gannot, E. Habets KEMD 7 / 26

      Statistical Model Convolutive Transfer Function (CTF) Model Time-domain system z j [ n ] = x [ n ] ∗ h j [ n ] + v j [ n ] ; j = 1 , . . . , J � z j ( t, k ) ≈ h j,l ( k ) x ( t − l, k ) + v j ( t, k ) CTF system l ∈� L �     x t k , z t k , k k t t   h k B. Schwartz, S. Gannot, E. Habets KEMD 7 / 26

Statistical Model Convolutive Transfer Function (CTF) Model Time-domain system z j [ n ] = x [ n ] ∗ h j [ n ] + v j [ n ] ; j = 1 , . . . , J � z j ( t, k ) ≈ h j,l ( k ) x ( t − l, k ) + v j ( t, k ) CTF system l ∈� L � � � h j,l ( k, k ′ ) x ( t − l, k ′ ) + v j ( t, k ) z j ( t, k ) = STFT-domain system k ′ ∈� K ′ � l ∈� L �         x t k , z t k , x t k , z t k , k k k k t t t t     h k h k B. Schwartz, S. Gannot, E. Habets KEMD 7 / 26

Statistical Model Vector Representation and Signals Model Vector Representation where: CTF-convolution: x t ( k ) = [ x ( t, k ) , . . . , x ( t − L, k )] z j ( t, k ) ≈ h T j ( k ) x t ( k ) + v j ( t, k ) h T j ( k ) = [ h j, 0 ( k ) , . . . , h j,L − 1 ( k )] Speech Model Noise Model � � � � 0 , σ 2 0 , σ 2 v j ( t, k ) ∼ N c x ( t, k ) ∼ N c x ( t, k ) v ( k ) B. Schwartz, S. Gannot, E. Habets KEMD 8 / 26

Statistical Model Model Parameters B. Schwartz, S. Gannot, E. Habets KEMD 9 / 26

Algorithm Derivation Parameter Estimation Problem Maximum-Likelihood and EM Maximum-Likelihood Find:   z t k , 1 Speech variance CTF coefficients Noise variance that maximizes the likelihood function. EM Algorithm   z t k , Define a latent data set, that if available, 1 would facilitate the parameter estimation. The algorithm iterates between evaluation of the latent data, and the estimation of the parameters. B. Schwartz, S. Gannot, E. Habets KEMD 12 / 26

Algorithm Derivation Parameter Estimation Problem Maximum-Likelihood Problem Statement Set of Measurements Parameters 1 ≤ j ≤ J � � σ 2 x ( t, k ) , h j ( k ) , σ 2 Θ ≡ v j ( k ) 1 ≤ t ≤ T z j ( t, k ) 1 ≤ k ≤ K Maximum Likelihood argmax { f ( Z ; Θ ) } Θ B. Schwartz, S. Gannot, E. Habets KEMD 13 / 26

Algorithm Derivation Parameter Estimation Problem Latent Data and EM Latent Data Set 1 ≤ t ≤ T x ( t, k ) 1 ≤ k ≤ K EM Algorithm E-step (after ℓ iterations): � � ( ℓ ) � � � ( ℓ ) � � � � � � Z ; � ≡ E log f ( Z , X ; Θ ) Q Θ Θ Θ M-step: � � ( ℓ ) � ( ℓ +1) = arg max � � � � Θ Θ Q Θ Θ B. Schwartz, S. Gannot, E. Habets KEMD 14 / 26

Algorithm Derivation Parameter Estimation Problem Algorithm Outline E Step: M Step: Kalman Parameters Smoother Estimation B. Schwartz, S. Gannot, E. Habets KEMD 15 / 26

Algorithm Derivation E-step Kalman Smoother: Forward ( t = 1 , . . . , T ) Formulas B. Schwartz, S. Gannot, E. Habets KEMD 16 / 26

Algorithm Derivation E-step Kalman Smoother: Backward ( t = T, . . . , 1) Formulas B. Schwartz, S. Gannot, E. Habets KEMD 17 / 26

Algorithm Derivation E-step Obtaining Data Required for the M-Step The Smoother output For the M-step we need: Signal estimation: First- and second-order statistics: E { x t |Z ; Θ } = � x t | T E { x t |Z ; Θ } � x t x t † � � and the estimation error: � � Z ; Θ �� † � E ; t = 1 , . . . , T � �� P t | T ≡ E x t | T − x t x t | T − x t Second-order statistics is given by: � x t x t † � � � x † � Z ; Θ = � x t | T � t | T + P t | T E B. Schwartz, S. Gannot, E. Habets KEMD 18 / 26

Algorithm Derivation M-step M-Step: Parameter Estimation Speech Variance Reverberation System Trivial variance Least-squares system identification: estimation: � T � − 1 � T � � � ✭ ❤ ❤ ✭ h ( ℓ +1) � x t x t † x t · z ∗ ( ℓ +1) ( t ) = ✭ ❤ ❤ × � ✭ = j ( t ) � | x ( t ) | 2 σ 2 j x t =1 t =1 Noise Variance Residual noise calculation: ✭✭✭✭✭ ❤ ❤ ❤ � � ❤ ❤ � � T � T 2 ( ℓ +1) = 1 � � � � � h ( ℓ +1) � σ 2 � z j ( t ) − x t � v j j T t =1 B. Schwartz, S. Gannot, E. Habets KEMD 19 / 26

Experiments Sonograms 8 8 10 0 0 6 6 Frequency [KHz] Frequency [KHz] −10 −10 4 4 −20 −20 −30 −30 2 2 −40 −40 0 0 1 1 −50 Amplitude −50 Amplitude 0 0 −60 −60 −1 −1 −70 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time [Sec] Time [Sec] (a) Clean signal (b) Reverberant signal 8 8 10 10 0 6 0 6 Frequency [KHz] Frequency [KHz] −10 −10 4 4 −20 −20 −30 2 2 −30 −40 −40 0 0 1 1 Amplitude Amplitude −50 −50 0 0 −60 −60 −1 −1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Time [Sec] Time [Sec] (c) 1 st Iteration (d) 10 th Iteration B. Schwartz, S. Gannot, E. Habets KEMD 20 / 26

Experiments Practical Considerations - H Initialization Alternatives Room Impulse Response 0.03 First arrivals are First Arrivals 0.02 known in advance 0.01 (use STFT-domain 0 representation). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) B. Schwartz, S. Gannot, E. Habets KEMD 21 / 26

Experiments Practical Considerations - H Initialization Alternatives Room Impulse Response 0.03 First arrivals are First Arrivals 0.02 known in advance 0.01 (use STFT-domain 0 representation). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) Mic 1,2,3,4 No a priori knowledge 0.03 Single peak 0.02 - H set to time aligned 0.01 and equal in amplitude 0 peaks (STFT). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) B. Schwartz, S. Gannot, E. Habets KEMD 21 / 26

Experiments Practical Considerations - H Initialization Alternatives Room Impulse Response 0.03 First arrivals are First Arrivals 0.02 known in advance 0.01 (use STFT-domain 0 representation). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) Mic 1,2,3,4 No a priori knowledge 0.03 Single peak 0.02 - H set to time aligned 0.01 and equal in amplitude 0 peaks (STFT). −0.01 4 6 8 10 12 14 16 18 20 Time (msec) Tests results No significant difference: subjective and objective. B. Schwartz, S. Gannot, E. Habets KEMD 21 / 26

Experiments Practical Considerations - σ 2 x Initialization Alternatives 1. Spectral Enhancement 2. Noisy Reverberant Values Pre-processing Using SE method for Using the noisy-reverberant dereverberation [Habets et al. absolute-squared value: x ← | z | 2 2009] as a pre-processing step: σ 2 ( se ) x ← � σ 2 σ 2 x Tests Results With pre-processing: better dereverberation. Without pre-processing: less dereverberation, more natural output. B. Schwartz, S. Gannot, E. Habets KEMD 22 / 26

Experiments Gain Ambiguity Signal gain or system gain? z j ( t, k ) = h T j ( k ) x t ( k ) + v j ( t, k ) The result can be a random spectral profile! Suggested cure Preserve the input power profile: T T � � � | z ( t, k ) | 2 σ 2 x ( t, k ) = t =1 t =1 B. Schwartz, S. Gannot, E. Habets KEMD 23 / 26

Multi-Microphone Speech Dereverberation using - PowerPoint PPT Presentation

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz Schwartz 1 ,Sharon Gannot 1 ,Emanu el A.P. Habets 2 1 Faculty of Engineering, Bar-Ilan University,Israel 2 International Audio Laboratories

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

Joint Dereverberation and Noise Reduction Using Beamforming and a Single-Channel Speech

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Microphone Array Processing for Distant Speech Recognition From close-talking microphones to

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Catastrophe Risk RenaissanceRe Holdings Ltd. Kevin J. ODonnell November 29, 2017 Q3 2017

Mario Livio Space Telescope Science Institute 1 Object Physical System Ste tellar Young

The classification of root systems Maris Ozols University of Waterloo Department of C&O

Third Quarter 2019 Financial Results October 30, 2019 Welcome and Participants Vyomesh Joshi

mgvivj Awaekb (4) mykvmb cm Kwlwe` BwUwUDkb evsjv`k, XvKv Centre for Policy

An Evaluation of Edge Modification Techniques for Privacy-Preserving on Graphs Jordi Casas-Roma

The Bi-objective Multi-Vehicle Covering Tour Problem (BOMCTP): formulation and lower-bound

Robocup 2019 HELLO! Welc lcome ome to Robocup 2019 Sydney | Australia MRL-RSL 2 Overview

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Multi-Microphone Speech Dereverberation using - PowerPoint PPT Presentation

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz Schwartz 1 ,Sharon Gannot 1 ,Emanu el A.P. Habets 2 1 Faculty of Engineering, Bar-Ilan University,Israel 2 International Audio Laboratories

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

more on expectation 1 2 properties of expectation properties of expectation Linearity, II

Joint Dereverberation and Noise Reduction Using Beamforming and a Single-Channel Speech

CS70: Jean Walrand: Lecture 27. Expectation; Conditional Expectation; B(n, p); G(p) 1. Review of

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Expectation Will Perkins January 21, 2013 Expectation Definition The expectation of a random

Microphone Array Processing for Distant Speech Recognition From close-talking microphones to

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Foundations of Computer Science Lecture 20 Expected Value of a Sum Linearity of Expectation

Expectation Maximization CMSC 691 UMBC Outline EM (Expectation Maximization) Basic idea Three

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Catastrophe Risk RenaissanceRe Holdings Ltd. Kevin J. ODonnell November 29, 2017 Q3 2017

Mario Livio Space Telescope Science Institute 1 Object Physical System Ste tellar Young

The classification of root systems Maris Ozols University of Waterloo Department of C&amp;O

Third Quarter 2019 Financial Results October 30, 2019 Welcome and Participants Vyomesh Joshi

mgvivj Awaekb (4) mykvmb cm Kwlwe` BwUwUDkb evsjv`k, XvKv Centre for Policy

An Evaluation of Edge Modification Techniques for Privacy-Preserving on Graphs Jordi Casas-Roma

The Bi-objective Multi-Vehicle Covering Tour Problem (BOMCTP): formulation and lower-bound

Robocup 2019 HELLO! Welc lcome ome to Robocup 2019 Sydney | Australia MRL-RSL 2 Overview

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

The classification of root systems Maris Ozols University of Waterloo Department of C&O