A Priori SNR Estimation Using Weibull Mixture Model 12. ITG - - PowerPoint PPT Presentation

a priori snr estimation using weibull mixture model
SMART_READER_LITE
LIVE PREVIEW

A Priori SNR Estimation Using Weibull Mixture Model 12. ITG - - PowerPoint PPT Presentation

A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University 7. Oktober 2016 Computer Science,


slide-1
SLIDE 1

A Priori SNR Estimation Using Weibull Mixture Model

  • 12. ITG Fachtagung Sprachkommunikation

Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University

  • 7. Oktober 2016

Computer Science, Electrical Engineering and Mathematics

Communications Engineering

  • Prof. Dr.-Ing. Reinhold Häb-Umbach

NT

slide-2
SLIDE 2

Table of contents

1

Problem formulation and motivation

2

A priori SNR estimation based on Weibull mixture model

3

Experimental evaluation

4

Conclusions and outlook

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

1 / 10

NT

slide-3
SLIDE 3

Problem formulation and motivation

Single-channel clean speech s(t) contaminated by an additive noise n(t): y(t) = s(t) + n(t)

STFT

  • ——-
  • Y (k, ℓ) = S(k, ℓ) + N(k, ℓ)

| · |2 Noise PSD tracker A priori SNR estimator Gain function ISTFT Y (k, ℓ) |Y (k, ℓ)|2

  • ˆ

λN(k, ℓ) − noise power spectral density (PSD) k - frequency bin ℓ - frame index ˆ ξ(k, ℓ) G(k, ℓ) ˆ S(k, ℓ) ˆ s(t)

A priori SNR ξ(k, ℓ) = λS (k,ℓ)

λN(k,ℓ) – a key component in enhancement system

λS(k, ℓ) = E |S(k, ℓ)|2

  • clean speech PSD, λN(k, ℓ) = E

|N(k, ℓ)|2

  • noise PSD

Motivated by a generalized spectral subtraction (GSS) denoising |Y (k, ℓ)|α for α ∈ R>0 not restricted to (α = 1) or (α = 2) with assumption |Y (k, ℓ)|α = |S(k, ℓ)|α + |N(k, ℓ)|α

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

1 / 10

NT

slide-4
SLIDE 4

Table of contents

1

Problem formulation and motivation

2

A priori SNR estimation based on Weibull mixture model

3

Experimental evaluation

4

Conclusions and outlook

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

1 / 10

NT

slide-5
SLIDE 5

Normalized α-order magnitude (NAOM) domain

A priori SNR estimator Estimate PSα(k) and go into NAOM domain Estimate parameter of WMM pSα(s) Estimate clean speech NAOMs Calculate a priori SNR |Y (k, ℓ)|2 ˆ λN(k, ℓ) Yα(k, ℓ) λNα(k, ℓ) λm(k, ℓ) πm(k, ℓ) ˆ Sα(k, ℓ) ˆ ξ(k, ℓ)

Normalize |Y (k, ℓ)|α to a root of an averaged power PSα(k) of |S(k, ℓ)|α Yα(k, ℓ) = |Y (k, ℓ)|α

  • PSα(k)

= Sα(k, ℓ)+Nα(k, ℓ) with PSα(k) = 1 L

L

  • ℓ=1

|S(k, ℓ)|2α

Statistical models independent of speaker loudness Normalized energy of clean speech NAOMs E[S2

α(k)] = 1

Sα(k, ℓ) & Nα(k, ℓ) – realizations of random variables Sα(k) & Nα(k) Estimate Sα(k, ℓ) from Yα(k, ℓ) given models for Sα(k) & Nα(k)

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

2 / 10

NT

slide-6
SLIDE 6

Modeling of noise NAOM coefficients Nα(k, ℓ)

N(k, ℓ) ∼ Nc(n; 0, λN(k, ℓ)) Nα(k, ℓ) – Weibull distributed pNα(k,ℓ)(n) = Weib(n; λNα(k, ℓ), α)

Shape parameter α ∈ R>0 Scale parameter λNα(k, ℓ) = λN(k, ℓ)

α

  • PSα(k)

∈ R>0

Weibull PDF for λ = 1 and different α n 0.5 1.5 2 1 Weib(n; 1, α) 0.5 1 1.5 2 Model Nα(k) with Weibull PDF pNα(k)(n) = Weib(n; λNα(k), α) with λNα(k) = 1 L

L

  • ℓ=1

λNα(k, ℓ) NAOM coefficients of white noise signal and estimated pNα(k)(n) Histogram and Weibull PDF for α = 0.7 n 0.3 0.6 0.9 1 2 3 pNα(n) Noise NAOMs Weibull PDF

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

3 / 10

NT

slide-7
SLIDE 7

Modeling of NAOM coefficients of clean speech Sα(k, ℓ)

S(k, ℓ) ∼ Nc(n; 0, λS(k, ℓ)) Bimodal Weibull mixture model (WMM) to model Sα(k) pSα(k)(s) =

2

  • m=1

πm(k)·Weib(s; λm(k), β)

m = 1 : silence m = 2 : activity πm(k) ∈ [0, 1]: weights λm(k): scale parameters β: shape parameter

β = α : additional degree of freedom in the model Clean speech NAOMs & estimated WMM (α = 0.7; β = 2.5) Histogram and estimated WMM s 0.5 1.0 1.5 0.1 1 10 pSα(s) Clean speech NAOMs Bimodal WMM m = 1 component m = 2 component

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

4 / 10

NT

slide-8
SLIDE 8

Estimation of WMM parameters and clean speech NAOMs

A priori SNR estimator Estimate PSα(k) and go into NAOM domain Estimate parameter of WMM pSα(s) Estimate clean speech NAOMs Calculate a priori SNR |Y (k, ℓ)|2 ˆ λN(k, ℓ) Yα(k, ℓ) λNα(k, ℓ) λm(k, ℓ) πm(k, ℓ) ˆ Sα(k, ℓ) ˆ ξ(k, ℓ)

Set λ1(k) acc. to ξmin usually used in a priori SNR estimation [Cappe 94] Expectation Maximization algorithm to estimate λ2(k), πm(k)

After EM, weights πm(k) are corrected with the constraint E[S2

α(k)] = 1

A priori SNR estimator Estimate PSα(k) and go into NAOM domain Estimate parameter of WMM pSα(s) Estimate clean speech NAOMs Calculate a priori SNR |Y (k, ℓ)|2 ˆ λN(k, ℓ) Yα(k, ℓ) λNα(k, ℓ) λm(k, ℓ) πm(k, ℓ) ˆ Sα(k, ℓ) ˆ ξ(k, ℓ)

Maximum a posteriori (MAP) estimation: ˆ SMAP

α

(k, ℓ) = argmax

s

pSα(k) | Yα(k,ℓ)(s|y)

Yα(k, ℓ) is a realisation of random variable Yα(k) = Sα(k) + Nα(k) Approximative computationally efficient solution for β = α = 1

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

5 / 10

NT

slide-9
SLIDE 9

Calculation of a priori SNR and causal implementation

A priori SNR estimator Estimate PSα(k) and go into NAOM domain Estimate parameter of WMM pSα(s) Estimate clean speech NAOMs Calculate a priori SNR |Y (k, ℓ)|2 ˆ λN(k, ℓ) Yα(k, ℓ) λNα(k, ℓ) λm(k, ℓ) πm(k, ℓ) ˆ Sα(k, ℓ) ˆ ξ(k, ℓ)

Go back into domain of power spectral density by calculating ˆ ξ(k, ℓ) = max

  

  • ˆ

Sα(k, ℓ) ·

  • PSα(k)

2

α

λN(k, ℓ) , ξmin

   Causal implementation of WMM-based a priori SNR estimators

Calculate PSα(k) and λNα(k) in a causal way Causal EM for λ2(k) and π2(k) with one EM-iteration per time frame Note, parameters α and β have to be set appropriately → optimization

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

6 / 10

NT

slide-10
SLIDE 10

Table of contents

1

Problem formulation and motivation

2

A priori SNR estimation based on Weibull mixture model

3

Experimental evaluation

4

Conclusions and outlook

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

6 / 10

NT

slide-11
SLIDE 11

Experimental evaluation

Data and setup

Clean speech: Wall Street Journal database 16 kHz (male and female) 7 different noise types of Noisex92 database: white, pink, f16, hfchannel, factory-1, factory-2, babble Input global SNR from −5 dB up to 25 dB in 5 dB steps

Spectral speech enhancement framework

Noise PSD tracking using Minimum statistics approach [Martin 01] A priori SNR estimation with ξmin = −18 dB [Cappe 94]

Proposed WMM-based approach with Wiener filter Reference approach: Decision Directed [Ephraim 84]

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

7 / 10

NT

slide-12
SLIDE 12

Optimization of α and β

Speech quality maximization in terms of wide-band mean opinion score listening quality objective (MOS-LQO) with ∆MOS-LQO = max( MOS-LQOWMM − MOS-LQODD , 0 ) Averaging over genders, noise types and input global SNR values (αopt, βopt) = (0.64, 2.7) 0.4 0.6 0.8 1 2 4 0.1 α β ∆MOS-LQO

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

8 / 10

NT

slide-13
SLIDE 13

Final experimental results

Clean speech: WSJ database signals other than used for optimization Estimation error – Itakura-Saito distance (ISD) and estimator’s variance – logarithmic error variance (LEV): the smaller the better Resulting ISD, LEV and MOS-LQO values averaged over noise types SNR, dB −5 5 10 15 20 25 AVG ISD DD 48.8 44.0 39.6 34.9 30.2 24.5 19.1 34.4 WMM 42.6 38.1 34.1 30.4 27.3 23.0 18.9 30.6 LEV DD 53.1 49.0 46.4 45.1 45.5 47.4 50.5 48.1 WMM 45.6 43.9 42.6 41.1 39.0 37.0 35.9 40.7 MOS-LQO DD 1.11 1.30 1.63 2.09 2.57 3.00 3.39 2.16 WMM 1.18 1.46 1.77 2.13 2.62 3.16 3.61 2.28

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

9 / 10

NT

slide-14
SLIDE 14

Conclusions and outlook

Conclusions

Novel causal a priori SNR estimator based on a bimodal Weibull mixture model for the normalized α-order spectral magnitudes (NAOMs) Optimization of the proposed approach by maximization of speech quality

Power exponent αopt = 0.64 smaller than 1 (spectral magnitudes) Shape factor βopt = 2.7 – a heavier tailed Weibull distribution

Compared to the wide-spread Decision Directed approach:

Reduced error and variance of the WMM-based a priori SNR estimator Improvement of speech quality of the enhanced signals Higher computational effort

Outlook

Reduction of computational effort – fixed speaker-independent models Development of model-based spectral enhancement using generalized (arbitrary) power exponent in the spirit of generalized spectral subtraction

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

10 / 10

NT

slide-15
SLIDE 15

Thank you for your attention! Questions?

Paderborn University Department of Communications Engineering Web: nt.upb.de

Computer Science, Electrical Engineering and Mathematics

Communications Engineering

  • Prof. Dr.-Ing. Reinhold Häb-Umbach

NT

slide-16
SLIDE 16

Resulting WMM parameter and audio samples

50 100 150 200 250 −0.6 −0.4 −0.2 0.2 log(λ) λmean

1

(k) λmean

2

(k) 50 100 150 200 250 0.2 0.4 0.6 0.8 k π πmean

1

(k) πmean

2

(k)

Figure : Resulting WMM parameter over frequency bins

Exemplarily speech samples: Noisy DD WMM

A Priori SNR Estimation Using Weibull Mixture Model

  • A. Chinaev, J. Heitkaemper, R. Haeb-Umbach

10 / 10

NT