Multi-Microphone Speech Dereverberation using - - PowerPoint PPT Presentation

multi microphone speech dereverberation using expectation
SMART_READER_LITE
LIVE PREVIEW

Multi-Microphone Speech Dereverberation using - - PowerPoint PPT Presentation

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing Boaz Schwartz 1 ,Sharon Gannot 1 ,Emanu el A.P. Habets 2 1 Faculty of Engineering, Bar-Ilan University,Israel 2 International Audio Laboratories


slide-1
SLIDE 1

Multi-Microphone Speech Dereverberation using Expectation-Maximization and Kalman Smoothing

Boaz Schwartz1,Sharon Gannot1,Emanu¨ el A.P. Habets2

1Faculty of Engineering, Bar-Ilan University,Israel 2International Audio Laboratories Erlangen, Germany

EUSIPCO 2013, Marrakesh, Morocco September 10th

  • B. Schwartz, S. Gannot, E. Habets

KEMD 1 / 26

slide-2
SLIDE 2

Introduction Outline

Outline

Statistical Model

Speech and noise model Acoustical system model

  • B. Schwartz, S. Gannot, E. Habets

KEMD 2 / 26

slide-3
SLIDE 3

Introduction Outline

Outline

Statistical Model

Speech and noise model Acoustical system model

EM-Kalman Algorithm

Maximum likelihood problem EM algorithm approach Kalman Smoother in E step

  • B. Schwartz, S. Gannot, E. Habets

KEMD 2 / 26

slide-4
SLIDE 4

Introduction Outline

Outline

Statistical Model

Speech and noise model Acoustical system model

EM-Kalman Algorithm

Maximum likelihood problem EM algorithm approach Kalman Smoother in E step

Experiments

Algorithm Initialization Spectral Profile Results

  • B. Schwartz, S. Gannot, E. Habets

KEMD 2 / 26

slide-5
SLIDE 5

Introduction Problem Statement

Problem Statement

  • B. Schwartz, S. Gannot, E. Habets

KEMD 3 / 26

slide-6
SLIDE 6

Introduction Problem Statement

Problem Statement

  • B. Schwartz, S. Gannot, E. Habets

KEMD 4 / 26

slide-7
SLIDE 7

Introduction Problem Statement

Problem Statement

 

1 k

h

 

1

, v t k

 

1

, z t k

  • B. Schwartz, S. Gannot, E. Habets

KEMD 5 / 26

slide-8
SLIDE 8

Introduction Problem Statement

Problem Statement

 

2 k

h

 

1 k

h

 

J k

h

 

1

, v t k

 

2

, v t k

 

,

J

v t k

 

1

, z t k

 

2

, z t k

 

,

J

z t k

  • B. Schwartz, S. Gannot, E. Habets

KEMD 6 / 26

slide-9
SLIDE 9

Statistical Model

Convolutive Transfer Function (CTF) Model

Time-domain system zj[n] = x[n] ∗ hj[n] + vj[n] ; j = 1, . . . , J

 

 

   

 

 

  • B. Schwartz, S. Gannot, E. Habets

KEMD 7 / 26

slide-10
SLIDE 10

Statistical Model

Convolutive Transfer Function (CTF) Model

Time-domain system zj[n] = x[n] ∗ hj[n] + vj[n] ; j = 1, . . . , J CTF system zj(t, k) ≈

  • l∈L

hj,l(k)x(t − l, k) + vj(t, k)

 

, x t k

 

, z t k  

h k

t k k t

 

 

 

  • B. Schwartz, S. Gannot, E. Habets

KEMD 7 / 26

slide-11
SLIDE 11

Statistical Model

Convolutive Transfer Function (CTF) Model

Time-domain system zj[n] = x[n] ∗ hj[n] + vj[n] ; j = 1, . . . , J CTF system zj(t, k) ≈

  • l∈L

hj,l(k)x(t − l, k) + vj(t, k) STFT-domain system zj(t, k) =

  • k′∈K′
  • l∈L

hj,l(k, k′)x(t − l, k′) + vj(t, k)

 

, x t k

 

, z t k  

h k

t k k t

 

, x t k

 

, z t k  

h k

t k k t

  • B. Schwartz, S. Gannot, E. Habets

KEMD 7 / 26

slide-12
SLIDE 12

Statistical Model

Vector Representation and Signals Model

Vector Representation

CTF-convolution: zj(t, k) ≈ hT

j (k)xt(k) + vj(t, k)

where: xt(k) = [x(t, k), . . . , x(t − L, k)] hT

j (k) = [hj,0(k), . . . , hj,L−1(k)]

Speech Model

x(t, k) ∼ Nc

  • 0, σ2

x(t, k)

  • Noise Model

vj(t, k) ∼ Nc

  • 0, σ2

v(k)

  • B. Schwartz, S. Gannot, E. Habets

KEMD 8 / 26

slide-13
SLIDE 13

Statistical Model

Model Parameters

  • B. Schwartz, S. Gannot, E. Habets

KEMD 9 / 26

slide-14
SLIDE 14

Statistical Model

Model Parameters

  • B. Schwartz, S. Gannot, E. Habets

KEMD 10 / 26

slide-15
SLIDE 15

Statistical Model

Model Parameters

  • B. Schwartz, S. Gannot, E. Habets

KEMD 11 / 26

slide-16
SLIDE 16

Algorithm Derivation Parameter Estimation Problem

Maximum-Likelihood and EM

Maximum-Likelihood

Find: Speech variance CTF coefficients Noise variance that maximizes the likelihood function.

 

1

, z t k

EM Algorithm

Define a latent data set, that if available, would facilitate the parameter estimation. The algorithm iterates between evaluation of the latent data, and the estimation of the parameters.

 

1

, z t k

  • B. Schwartz, S. Gannot, E. Habets

KEMD 12 / 26

slide-17
SLIDE 17

Algorithm Derivation Parameter Estimation Problem

Maximum-Likelihood Problem Statement

Set of Measurements

zj(t, k) 1 ≤ j ≤ J 1 ≤ t ≤ T 1 ≤ k ≤ K

Parameters

Θ ≡

  • σ2

x(t, k), hj(k), σ2 vj(k)

  • Maximum Likelihood

argmax

Θ

{f(Z; Θ)}

  • B. Schwartz, S. Gannot, E. Habets

KEMD 13 / 26

slide-18
SLIDE 18

Algorithm Derivation Parameter Estimation Problem

Latent Data and EM

Latent Data Set

x(t, k) 1 ≤ t ≤ T 1 ≤ k ≤ K

EM Algorithm

E-step (after ℓ iterations): Q

  • Θ
  • Θ

(ℓ)

≡ E

  • log f(Z, X; Θ)
  • Z;

Θ

(ℓ)

M-step:

  • Θ

(ℓ+1) = arg max Θ Q

  • Θ
  • Θ

(ℓ)

  • B. Schwartz, S. Gannot, E. Habets

KEMD 14 / 26

slide-19
SLIDE 19

Algorithm Derivation Parameter Estimation Problem

Algorithm Outline

E Step: Kalman Smoother M Step: Parameters Estimation

  • B. Schwartz, S. Gannot, E. Habets

KEMD 15 / 26

slide-20
SLIDE 20

Algorithm Derivation E-step

Kalman Smoother: Forward (t = 1, . . . , T)

Formulas

  • B. Schwartz, S. Gannot, E. Habets

KEMD 16 / 26

slide-21
SLIDE 21

Algorithm Derivation E-step

Kalman Smoother: Backward (t = T, . . . , 1)

Formulas

  • B. Schwartz, S. Gannot, E. Habets

KEMD 17 / 26

slide-22
SLIDE 22

Algorithm Derivation E-step

Obtaining Data Required for the M-Step

The Smoother output

Signal estimation: E {xt|Z; Θ} = xt|T and the estimation error:

Pt|T ≡ E

  • xt|T − xt

xt|T − xt †

For the M-step we need:

First- and second-order statistics: E {xt|Z; Θ} E

  • xtxt†
  • Z; Θ
  • ; t = 1, . . . , T

Second-order statistics is given by:

E

  • xtxt†
  • Z; Θ
  • =

xt|T x†

t|T + Pt|T

  • B. Schwartz, S. Gannot, E. Habets

KEMD 18 / 26

slide-23
SLIDE 23

Algorithm Derivation M-step

M-Step: Parameter Estimation

Speech Variance

Trivial variance estimation:

  • σ2

x (ℓ+1)(t) =

✭ ✭ ❤ ❤

|x(t)|2

Reverberation System

Least-squares system identification:

  • h(ℓ+1)

j

= T

  • t=1

✭ ✭ ❤ ❤

xtxt† −1 × T

  • t=1
  • xt · z∗

j (t)

  • Noise Variance

Residual noise calculation:

  • σ2

vj (ℓ+1) = 1

T

T

  • t=1

✭✭✭✭✭ ❤ ❤ ❤ ❤ ❤

  • zj(t) −
  • h(ℓ+1)

j

T xt

  • 2
  • B. Schwartz, S. Gannot, E. Habets

KEMD 19 / 26

slide-24
SLIDE 24

Experiments

Sonograms

Frequency [KHz] 2 4 6 8 −60 −50 −40 −30 −20 −10 10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 1 Time [Sec] Amplitude

(a) Clean signal

Frequency [KHz] 2 4 6 8 −70 −60 −50 −40 −30 −20 −10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 1 Time [Sec] Amplitude

(b) Reverberant signal

Frequency [KHz] 2 4 6 8 −60 −50 −40 −30 −20 −10 10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 1 Time [Sec] Amplitude

(c) 1st Iteration

Frequency [KHz] 2 4 6 8 −60 −50 −40 −30 −20 −10 10 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 −1 1 Time [Sec] Amplitude

(d) 10th Iteration

  • B. Schwartz, S. Gannot, E. Habets

KEMD 20 / 26

slide-25
SLIDE 25

Experiments

Practical Considerations - H Initialization Alternatives

First arrivals are known in advance (use STFT-domain representation).

4 6 8 10 12 14 16 18 20 −0.01 0.01 0.02 0.03 Time (msec) Room Impulse Response First Arrivals

  • B. Schwartz, S. Gannot, E. Habets

KEMD 21 / 26

slide-26
SLIDE 26

Experiments

Practical Considerations - H Initialization Alternatives

First arrivals are known in advance (use STFT-domain representation).

4 6 8 10 12 14 16 18 20 −0.01 0.01 0.02 0.03 Time (msec) Room Impulse Response First Arrivals

No a priori knowledge

  • H set to time aligned

and equal in amplitude peaks (STFT).

4 6 8 10 12 14 16 18 20 −0.01 0.01 0.02 0.03 Time (msec) Mic 1,2,3,4

Single peak

  • B. Schwartz, S. Gannot, E. Habets

KEMD 21 / 26

slide-27
SLIDE 27

Experiments

Practical Considerations - H Initialization Alternatives

First arrivals are known in advance (use STFT-domain representation).

4 6 8 10 12 14 16 18 20 −0.01 0.01 0.02 0.03 Time (msec) Room Impulse Response First Arrivals

No a priori knowledge

  • H set to time aligned

and equal in amplitude peaks (STFT).

4 6 8 10 12 14 16 18 20 −0.01 0.01 0.02 0.03 Time (msec) Mic 1,2,3,4

Single peak

Tests results

No significant difference: subjective and objective.

  • B. Schwartz, S. Gannot, E. Habets

KEMD 21 / 26

slide-28
SLIDE 28

Experiments

Practical Considerations - σ2

x Initialization Alternatives

  • 1. Spectral Enhancement

Pre-processing

Using SE method for dereverberation [Habets et al. 2009] as a pre-processing step: σ2

x ←

σ2

x (se)

  • 2. Noisy Reverberant Values

Using the noisy-reverberant absolute-squared value: σ2

x ← |z|2

Tests Results

With pre-processing: better dereverberation. Without pre-processing: less dereverberation, more natural

  • utput.
  • B. Schwartz, S. Gannot, E. Habets

KEMD 22 / 26

slide-29
SLIDE 29

Experiments

Gain Ambiguity

Signal gain or system gain?

zj(t, k) = hT

j (k)xt(k) + vj(t, k)

The result can be a random spectral profile!

Suggested cure

Preserve the input power profile:

T

  • t=1
  • σ2

x(t, k) = T

  • t=1

|z(t, k)|2

  • B. Schwartz, S. Gannot, E. Habets

KEMD 23 / 26

slide-30
SLIDE 30

Experiments

Thank you!

Kalman-EM for Dereverberation (KEMD)

  • B. Schwartz, S. Gannot, E. Habets

KEMD 24 / 26

slide-31
SLIDE 31

Algorithm Derivation E-step

Filter Formulas

Back

Forward recursion

for t = 1 to T do Predict:

  • xt|t−1 = Φ ·

xt−1|t−1 Pt|t−1 = Φ · Pt−1|t−1 · ΦT + Qt Update: Kt = Pt|t−1H

  • H†Pt|t−1H + R

−1 et = zt − H† xt|t−1

  • xt|t =

xt|t−1 + Kt · et Pt|t =

  • I − KtH†

Pt|t−1 end

  • B. Schwartz, S. Gannot, E. Habets

KEMD 25 / 26

slide-32
SLIDE 32

Algorithm Derivation E-step

Smoother Formulas

Back

Backward recursion

for t = T to 2 do St−1 = Pt−1|t−1ΦT P−1

t|t−1

et|T =

  • xt|T − Φ

xt−1|t−1

  • xt−1|T =

xt−1|t−1 + St−1 · et|T Pt−1|T = Pt−1|t−1 +St−1

  • Pt|T − Pt|t−1
  • ST

t−1

end

  • B. Schwartz, S. Gannot, E. Habets

KEMD 26 / 26