real time capable robust noise reduction
play

Real-Time Capable Robust Noise Reduction Single microphone. - PowerPoint PPT Presentation

Maximilian Luz Speech Signal Processing and Speech Enhancement Summer Semester 2019 IMS Real-Time Capable Robust Noise Reduction Single microphone. Real-time capable. Adaptive to changes in noise/signal. Processing in


  1. Maximilian Luz Speech Signal Processing and Speech Enhancement Summer Semester 2019 IMS Real-Time Capable Robust Noise Reduction

  2. • Single microphone. • Real-time capable. • Adaptive to changes in noise/signal. • Processing in frequency-domain. • Unsupervised. 1/15 Goals

  3. Basics Short-Time Fourier Transform (Weighted) Overlapp and Add Methods Spectral Subtraction MMSE and log-MMSE Robustifjcation Demonstration 2/15 Outline

  4. Basics

  5. M overlap N segment length … Segmentation DFT 3/15 Short-Time Fourier Transform x ( t ) Window Function h ( n ) x 1 ( t ) x 2 ( t ) x 3 ( t ) | X k ( f ) |

  6. 4/15 R hop size Sum Weighting IDFT (Weighted) Overlapp and Add | X k ( f ) | Window Function h ( n ) ˆ x 1 ( t ) ˆ x 2 ( t ) ˆ x 3 ( t ) ˆ x ( t )

  7. Methods

  8. 5/15 Noise Estimation noisy signal • How to handle negative magnitude values after subtraction? signal • How to estimate noise? noise STFT ISTFT p Spectral Subtraction � D [ k ] � � ˆ � � � − Y [ k ] y ( t ) = x ( t ) + d ( t ) | . | p + arg Y [ k ] | . | 1 / p Open Questions:

  9. 6/15 0 • Too much subtraction leads to speech distortion. • Residual (musical) noise. Spectral Subtration 8 6 4 2 0 4 3 2 1 0 Original 1 2 3 4 0 2 4 6 8 Spectral Subtraction: Results Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ] Issues

  10. 7/15 x where Need to be estimated! ( a posteriori SNR) STFT ISTFT spectral noise power d G ( a priori SNR) d spectral signal power Gain Function X [ k ] = A [ k ] e j α [ k ] , Y [ k ] = R [ k ] e j ϑ [ k ] , y ( t ) = x ( t ) + d ( t ) Usually: G ξ [ k ] , γ [ k ] � � ∈ R ξ [ k ] := λ [ k ] Y [ k ] λ [ k ] R [ k ] � 2 � G [ k ] γ [ k ] := × λ [ k ]

  11. 8/15 ISTFT MMSE G d Noise Estimation STFT Minimum Mean-Square Error Spectral Amplitude Estimator (MMSE) Y [ k ] ξ Estimation Idea: Minimize ξ [ k ] ˆ �� A [ k ] � 2 � A [ k ] − ˆ E Solution: (Assumes Gaussian distribution) λ [ k ] A [ k ] = E � A [ k ] | Y [ k ] � γ [ k ] ˆ γ � ξ [ k ] , γ [ k ] � · R [ k ] = G MMSE × G [ k ]

  12. 9/15 • MMSE with difgerent penalization. log-MMSE G d Noise Estimation ISTFT STFT et al. 1980]. Minimum Mean-Square Error Log-Spectral Amplitude Estimator (log-MMSE) Idea: Minimize �� A [ k ] � 2 � log A [ k ] − log ˆ Y [ k ] E ξ Estimation ξ [ k ] ˆ Solution: (Assumes Gaussian distribution) A [ k ] = exp E � ln A [ k ] | Y [ k ] � ˆ λ [ k ] � ξ [ k ] , γ [ k ] � · R [ k ] = G log-MMSE γ [ k ] γ Notes: × G [ k ] • Better measure for speech [Gray

  13. 10/15 MMSE 1 2 3 4 0 2 4 6 8 0 Spectral Subtraction 1 2 3 4 0 2 4 6 8 log-MMSE 0 0 Original 0 6 8 0 4 3 2 1 1 2 2 3 4 0 2 4 6 8 4 MMSE and log-MMSE: Results Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ] Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ]

  14. 11/15 G G G H 1 bility Estimation Presence Proba- Conditional Speech bility Estimation Absence Proba- A Priori Speech 0 H 1 min 1 1 Incorporating Signal Presence Uncertainty (OM-LSA) [Cohen and Berdugo 2001] Idea: Two hypotheses 0 : Y [ k ] = D [ k ] H [ k ] ξ [ k ] : Y [ k ] = X [ k ] + D [ k ] H [ k ] q [ k ] ˆ p [ k ] := P � | Y [ k ] � H [ k ] γ [ k ] Solution: p [ k ] ˆ � ξ [ k ] , γ [ k ] � = G p [ k ] � ξ [ k ] , γ [ k ] � · G 1 − p [ k ] Estimate p k via Gaussian model and q [ k ] := P � � H [ k ] G [ k ]

  15. P frame P global P local avg. over freq. globally avg. over freq. locally exp. avg. over time frame 12/15 Estimating the a priori Speech Absence Probability q [ k ] [Cohen and Berdugo 2001] ξ q [ k ] = 1 − P [ k ] local · P [ k ] global · P [ k ] ˆ

  16. 13/15 H 1 Decision-directed approach usually has less musical noise. Estimating the a priori SNR ξ [ k ] [Cohen and Berdugo 2001; Ephraim and Malah 1984] Maximum Likelihood: γ [ k , n − 1 ] + ( 1 − α ) γ [ k , n ] γ [ k , n ] = α ¯ ¯ 0 ≤ α ≤ 1 , β ≥ 1 , β ξ [ k , n ] = max � γ [ k , n ] − 1 , 0 � ˆ ¯ Decision-Directed: ξ [ k , n ] = α G 2 · γ [ k , n − 1 ] + ( 1 − α ) max γ [ k , n ] − 1 , 0 � ξ [ k , n − 1 ] , γ [ k , n − 1 ] � � � ˆ ˆ

  17. 14/15 d d exp. avg. over time speech indicator decision ratio localized minimum exp. avg. over time avg. over freq. d 2 p d d I S r S min Y S f S min L Minima controlled recursive averaging (MCRA): d d Adaptive and Robust Noise Estimation ( λ [ k ] d ) [Cohen and Berdugo 2001] S f / S min > δ ⇒ 1 ≤ δ ⇒ 0 � Y [ k , n ] � � � � λ [ k , n + 1 ] ˆ α [ k , n ] · ˆ λ [ k , n ] α [ k , n ] = ˜ + 1 − ˜ · � � � α [ k , n ] = α d + ( 1 − α d ) p [ k , n ] ˜

  18. 15/15 log-MMSE 1 2 3 4 0 2 4 6 8 0 MMSE 1 2 3 4 0 2 4 6 8 OM-LSA with MCRA 0 0 Original 0 6 8 0 4 3 2 1 1 2 2 3 4 0 2 4 6 8 4 Results Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ] Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ]

  19. Demonstration

  20. References Chen, Jingdong et al. (July 2006). “New insights into the noise reduction Wiener fjlter”. In: IEEE Transactions on Audio, Speech and Language Processing 14.4, pp. 1218–1234. Cohen, Israel and Baruch Berdugo (Nov. 2001). “Speech enhancement for non-stationary noise environments”. In: Signal Processing 81.11, pp. 2403–2418. Ephraim, Y. and D. Malah (Dec. 1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 32.6, pp. 1109–1121. — (Apr. 1985). “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 33.2, pp. 443–445. Gray, R. et al. (Aug. 1980). “Distortion measures for speech processing”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 28.4, pp. 367–376. Loizou, Philipos C. (Feb. 2013). Speech Enhancement . CRC Press.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend