Real-Time Capable Robust Noise Reduction Single microphone. - - PowerPoint PPT Presentation

real time capable robust noise reduction
SMART_READER_LITE
LIVE PREVIEW

Real-Time Capable Robust Noise Reduction Single microphone. - - PowerPoint PPT Presentation

Maximilian Luz Speech Signal Processing and Speech Enhancement Summer Semester 2019 IMS Real-Time Capable Robust Noise Reduction Single microphone. Real-time capable. Adaptive to changes in noise/signal. Processing in


slide-1
SLIDE 1

Real-Time Capable Robust Noise Reduction

Speech Signal Processing and Speech Enhancement

Maximilian Luz Summer Semester 2019

IMS

slide-2
SLIDE 2

Goals

  • Single microphone.
  • Real-time capable.
  • Adaptive to changes in noise/signal.
  • Processing in frequency-domain.
  • Unsupervised.

1/15

slide-3
SLIDE 3

Outline

Basics Short-Time Fourier Transform (Weighted) Overlapp and Add Methods Spectral Subtraction MMSE and log-MMSE Robustifjcation Demonstration

2/15

slide-4
SLIDE 4

Basics

slide-5
SLIDE 5

Short-Time Fourier Transform

M

  • verlap

N segment length Window Function h(n)

x(t) x1(t) x2(t) x3(t) |Xk(f)|

Segmentation DFT 3/15

slide-6
SLIDE 6

(Weighted) Overlapp and Add

R hop size |Xk(f)| ˆ x1(t) ˆ x2(t) ˆ x3(t) ˆ x(t)

IDFT Weighting Sum

Window Function h(n)

4/15

slide-7
SLIDE 7

Methods

slide-8
SLIDE 8

Spectral Subtraction

y(t)

noisy signal

= x(t)

signal

+ d(t)

noise

STFT ISTFT Y[k] arg Y[k] |.|p Noise Estimation

+

  • ˆ

D[k]

  • p

− |.|1/p

Open Questions:

  • How to estimate noise?
  • How to handle negative magnitude values after subtraction?

5/15

slide-9
SLIDE 9

Spectral Subtraction: Results

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

Original

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

Spectral Subtration

Issues

  • Residual (musical) noise.
  • Too much subtraction leads to speech distortion.

6/15

slide-10
SLIDE 10

Gain Function

y(t) = x(t) + d(t) where X[k] = A[k]ejα[k], Y[k] = R[k]ejϑ[k],

STFT ISTFT

×

Y[k] G G[k]

Usually: G

  • ξ[k], γ[k]

∈ R ξ[k] := λ[k]

x spectral signal power

λ[k]

d

(a priori SNR) γ[k] :=

  • R[k]2

λ[k]

d spectral noise power

(a posteriori SNR) Need to be estimated!

7/15

slide-11
SLIDE 11

Minimum Mean-Square Error Spectral Amplitude Estimator (MMSE)

Idea: Minimize E

  • A[k] − ˆ

A[k]2 Solution: (Assumes Gaussian distribution) ˆ A[k] = E

  • A[k] | Y[k]

= GMMSE

  • ξ[k], γ[k]

· R[k]

STFT ISTFT

×

Y[k] ξ Estimation Noise Estimation γ λ[k]

d

G G[k]

MMSE

ˆ ξ[k] γ[k]

8/15

slide-12
SLIDE 12

Minimum Mean-Square Error Log-Spectral Amplitude Estimator (log-MMSE)

Idea: Minimize E

  • log A[k] − log ˆ

A[k]2 Solution: (Assumes Gaussian distribution) ˆ A[k] = exp E

  • ln A[k] | Y[k]

= Glog-MMSE

  • ξ[k], γ[k]

· R[k] Notes:

  • MMSE with difgerent penalization.
  • Better measure for speech [Gray

et al. 1980].

STFT ISTFT

×

Y[k] ξ Estimation Noise Estimation γ λ[k]

d

G G[k]

log-MMSE

ˆ ξ[k] γ[k]

9/15

slide-13
SLIDE 13

MMSE and log-MMSE: Results

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

Original

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

Spectral Subtraction

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

MMSE

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

log-MMSE 10/15

slide-14
SLIDE 14

Incorporating Signal Presence Uncertainty (OM-LSA)

[Cohen and Berdugo 2001]

Idea: Two hypotheses H[k]

0 : Y[k] = D[k]

H[k]

1

: Y[k] = X[k] + D[k] p[k] := P

  • H[k]

1

| Y[k] Solution: G

  • ξ[k], γ[k]

= Gp[k]

H1

  • ξ[k], γ[k]

· G1−p[k]

min

Estimate pk via Gaussian model and q[k] := P

  • H[k]
  • A Priori Speech

Absence Proba- bility Estimation Conditional Speech Presence Proba- bility Estimation GH1 G ˆ p[k] ˆ q[k] G[k] ξ[k] γ[k]

11/15

slide-15
SLIDE 15

Estimating the a priori Speech Absence Probability q[k]

[Cohen and Berdugo 2001]

ξ Pframe Pglobal Plocal

  • avg. over freq. globally
  • avg. over freq. locally
  • exp. avg. over time

ˆ q[k] = 1 − P[k]

local · P[k] global · P[k] frame 12/15

slide-16
SLIDE 16

Estimating the a priori SNR ξ[k]

[Cohen and Berdugo 2001; Ephraim and Malah 1984]

Maximum Likelihood: ¯ γ[k,n] = α¯ γ[k,n−1] + (1 − α)γ[k,n] β , 0 ≤ α ≤ 1, β ≥ 1 ˆ ξ[k,n] = max

  • ¯

γ[k,n] − 1, 0

  • Decision-Directed:

ˆ ξ[k,n] = αG2

H1

  • ˆ

ξ[k,n−1], γ[k,n−1] · γ[k,n−1] + (1 − α) max

  • γ[k,n] − 1, 0
  • Decision-directed approach usually has less musical noise.

13/15

slide-17
SLIDE 17

Adaptive and Robust Noise Estimation (λ[k]

d )

[Cohen and Berdugo 2001]

Minima controlled recursive averaging (MCRA):

L

min

S Sf Y Smin Sr I pd

Sf /Smin > δ ⇒ 1 ≤ δ ⇒ 0

  • avg. over freq.
  • exp. avg. over time

localized minimum decision ratio speech indicator

  • exp. avg. over time

ˆ λ[k,n+1]

d

= ˜ α[k,n]

d

· ˆ λ[k,n]

d

+

  • 1 − ˜

α[k,n]

d

  • ·
  • Y[k,n]
  • 2

˜ α[k,n]

d

= αd + (1 − αd)p[k,n]

d 14/15

slide-18
SLIDE 18

Results

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

Original

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

MMSE

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

log-MMSE

1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]

OM-LSA with MCRA 15/15

slide-19
SLIDE 19

Demonstration

slide-20
SLIDE 20

References

Chen, Jingdong et al. (July 2006). “New insights into the noise reduction Wiener fjlter”. In: IEEE Transactions on Audio, Speech and Language Processing 14.4, pp. 1218–1234. Cohen, Israel and Baruch Berdugo (Nov. 2001). “Speech enhancement for non-stationary noise environments”. In: Signal Processing 81.11, pp. 2403–2418. Ephraim, Y. and D. Malah (Dec. 1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 32.6,

  • pp. 1109–1121.

— (Apr. 1985). “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 33.2, pp. 443–445. Gray, R. et al. (Aug. 1980). “Distortion measures for speech processing”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 28.4, pp. 367–376. Loizou, Philipos C. (Feb. 2013). Speech Enhancement. CRC Press.