a priori snr estimation using weibull mixture model
play

A Priori SNR Estimation Using Weibull Mixture Model 12. ITG - PowerPoint PPT Presentation

A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University 7. Oktober 2016 Computer Science,


  1. A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University 7. Oktober 2016 Computer Science, Electrical NT Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach

  2. Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model Experimental evaluation 3 Conclusions and outlook 4 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10

  3. Problem formulation and motivation Single-channel clean speech s ( t ) contaminated by an additive noise n ( t ) : STFT y ( t ) = s ( t ) + n ( t ) ◦ ——- • Y ( k , ℓ ) = S ( k , ℓ ) + N ( k , ℓ ) ˆ λ N ( k , ℓ ) − noise power spectral density (PSD) k - frequency bin Noise PSD ℓ - frame index tracker | Y ( k , ℓ ) | 2 ˆ ˆ Y ( k , ℓ ) ξ ( k , ℓ ) G ( k , ℓ ) S ( k , ℓ ) ˆ s ( t ) A priori SNR Gain | · | 2 • • ISTFT estimator function A priori SNR ξ ( k , ℓ ) = λ S ( k ,ℓ ) λ N ( k ,ℓ ) – a key component in enhancement system | S ( k , ℓ ) | 2 � | N ( k , ℓ ) | 2 � λ S ( k , ℓ ) = E � - clean speech PSD, λ N ( k , ℓ ) = E � - noise PSD Motivated by a generalized spectral subtraction (GSS) denoising | Y ( k , ℓ ) | α for α ∈ R > 0 not restricted to ( α = 1) or ( α = 2) with assumption | Y ( k , ℓ ) | α = | S ( k , ℓ ) | α + | N ( k , ℓ ) | α NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10

  4. Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model Experimental evaluation 3 Conclusions and outlook 4 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10

  5. Normalized α -order magnitude (NAOM) domain A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate P S α ( k ) Estimate Estimate ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate parameter of and go into clean speech a priori SNR WMM p S α ( s ) NAOM domain NAOMs ˆ λ N α ( k , ℓ ) π m ( k , ℓ ) λ N ( k , ℓ ) Normalize | Y ( k , ℓ ) | α to a root of an averaged power P S α ( k ) of | S ( k , ℓ ) | α L Y α ( k , ℓ ) = | Y ( k , ℓ ) | α P S α ( k ) = 1 � | S ( k , ℓ ) | 2 α = S α ( k , ℓ )+ N α ( k , ℓ ) with � L P S α ( k ) ℓ = 1 Statistical models independent of speaker loudness Normalized energy of clean speech NAOMs E [ S 2 α ( k )] = 1 S α ( k , ℓ ) & N α ( k , ℓ ) – realizations of random variables S α ( k ) & N α ( k ) Estimate S α ( k , ℓ ) from Y α ( k , ℓ ) given models for S α ( k ) & N α ( k ) NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 2 / 10

  6. Modeling of noise NAOM coefficients N α ( k , ℓ ) N ( k , ℓ ) ∼ N c ( n ; 0 , λ N ( k , ℓ )) Weibull PDF for λ = 1 and different α 0 . 5 N α ( k , ℓ ) – Weibull distributed Weib ( n ; 1 , α ) 1 p N α ( k ,ℓ ) ( n ) = Weib ( n ; λ N α ( k , ℓ ) , α ) 1 1 . 5 2 Shape parameter α ∈ R > 0 Scale parameter λ N ( k , ℓ ) λ N α ( k , ℓ ) = ∈ R > 0 0 � 0.5 1.5 2 P S α ( k ) α n Histogram and Weibull PDF for α = 0 . 7 Model N α ( k ) with Weibull PDF Noise NAOMs 3 p N α ( k ) ( n ) = Weib ( n ; λ N α ( k ) , α ) Weibull PDF L p N α (n) 2 with λ N α ( k ) = 1 � λ N α ( k , ℓ ) L ℓ = 1 1 NAOM coefficients of white noise 0 signal and estimated p N α ( k ) ( n ) 0 0.3 0.6 0.9 n NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 3 / 10

  7. Modeling of NAOM coefficients of clean speech S α ( k , ℓ ) Histogram and estimated WMM S ( k , ℓ ) ∼ N c ( n ; 0 , λ S ( k , ℓ )) 10 Bimodal Weibull mixture model Clean speech NAOMs (WMM) to model S α ( k ) Bimodal WMM 2 � m = 1 component p S α ( k ) ( s ) = π m ( k ) · Weib ( s ; λ m ( k ) , β ) m = 2 component m = 1 m = 1 : silence p S α (s) 1 m = 2 : activity π m ( k ) ∈ [ 0 , 1 ] : weights λ m ( k ) : scale parameters β : shape parameter β � = α : additional degree of freedom in the model 0.1 Clean speech NAOMs & estimated 0 0.5 1.0 1.5 WMM ( α = 0 . 7 ; β = 2 . 5) s NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 4 / 10

  8. Estimation of WMM parameters and clean speech NAOMs A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate Estimate Estimate P S α ( k ) ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate parameter of and go into clean speech a priori SNR NAOM domain WMM p S α ( s ) NAOMs ˆ λ N α ( k , ℓ ) π m ( k , ℓ ) λ N ( k , ℓ ) Set λ 1 ( k ) acc. to ξ min usually used in a priori SNR estimation [Cappe 94] Expectation Maximization algorithm to estimate λ 2 ( k ) , π m ( k ) After EM, weights π m ( k ) are corrected with the constraint E [ S 2 α ( k )] = 1 A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate P S α ( k ) Estimate Estimate ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate parameter of and go into clean speech a priori SNR WMM p S α ( s ) NAOM domain NAOMs ˆ λ N α ( k , ℓ ) π m ( k , ℓ ) λ N ( k , ℓ ) Maximum a posteriori (MAP) estimation: ˆ S MAP ( k , ℓ ) = argmax p S α ( k ) | Y α ( k ,ℓ ) ( s | y ) α s Y α ( k , ℓ ) is a realisation of random variable Y α ( k ) = S α ( k ) + N α ( k ) Approximative computationally efficient solution for β = α = 1 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 5 / 10

  9. Calculation of a priori SNR and causal implementation A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate Estimate P S α ( k ) Estimate ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate and go into parameter of clean speech a priori SNR WMM p S α ( s ) NAOM domain NAOMs ˆ λ N ( k , ℓ ) λ N α ( k , ℓ ) π m ( k , ℓ ) Go back into domain of power spectral density by calculating � 2   � ˆ � α S α ( k , ℓ ) · P S α ( k ) ˆ ξ ( k , ℓ ) = max , ξ min   λ N ( k , ℓ )   Causal implementation of WMM-based a priori SNR estimators Calculate P S α ( k ) and λ N α ( k ) in a causal way Causal EM for λ 2 ( k ) and π 2 ( k ) with one EM-iteration per time frame Note, parameters α and β have to be set appropriately → optimization NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 6 / 10

  10. Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model Experimental evaluation 3 Conclusions and outlook 4 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 6 / 10

  11. Experimental evaluation Data and setup Clean speech: Wall Street Journal database 16 kHz (male and female) 7 different noise types of Noisex92 database: white , pink , f16 , hfchannel , factory-1 , factory-2 , babble Input global SNR from − 5 dB up to 25 dB in 5 dB steps Spectral speech enhancement framework Noise PSD tracking using Minimum statistics approach [Martin 01] A priori SNR estimation with ξ min = − 18 dB [Cappe 94] Proposed WMM-based approach with Wiener filter Reference approach: Decision Directed [Ephraim 84] NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 7 / 10

  12. Optimization of α and β Speech quality maximization in terms of wide-band mean opinion score listening quality objective (MOS-LQO) with ∆ MOS-LQO = max ( MOS-LQO WMM − MOS-LQO DD , 0 ) Averaging over genders, noise types and input global SNR values ( α opt , β opt ) = ( 0 . 64 , 2 . 7 ) ∆ MOS-LQO 0.1 0 4 2 1 0 . 8 β 0 . 6 0 . 4 α NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 8 / 10

  13. Final experimental results Clean speech: WSJ database signals other than used for optimization Estimation error – Itakura-Saito distance (ISD) and estimator’s variance – logarithmic error variance (LEV): the smaller the better Resulting ISD, LEV and MOS-LQO values averaged over noise types SNR, dB − 5 0 5 10 15 20 25 AVG 34 . 4 DD 48 . 8 44 . 0 39 . 6 34 . 9 30 . 2 24 . 5 19 . 1 ISD WMM 30 . 6 42 . 6 38 . 1 34 . 1 30 . 4 27 . 3 23 . 0 18 . 9 DD 53 . 1 49 . 0 46 . 4 45 . 1 45 . 5 47 . 4 50 . 5 48 . 1 LEV WMM 45 . 6 43 . 9 42 . 6 41 . 1 39 . 0 37 . 0 35 . 9 40 . 7 2 . 16 DD 1 . 11 1 . 30 1 . 63 2 . 09 2 . 57 3 . 00 3 . 39 MOS-LQO WMM 1 . 18 1 . 46 1 . 77 2 . 13 2 . 62 3 . 16 3 . 61 2 . 28 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 9 / 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend