Real-Time Capable Robust Noise Reduction Single microphone. - - PowerPoint PPT Presentation
Real-Time Capable Robust Noise Reduction Single microphone. - - PowerPoint PPT Presentation
Maximilian Luz Speech Signal Processing and Speech Enhancement Summer Semester 2019 IMS Real-Time Capable Robust Noise Reduction Single microphone. Real-time capable. Adaptive to changes in noise/signal. Processing in
SLIDE 1
SLIDE 2
Goals
- Single microphone.
- Real-time capable.
- Adaptive to changes in noise/signal.
- Processing in frequency-domain.
- Unsupervised.
1/15
SLIDE 3
Outline
Basics Short-Time Fourier Transform (Weighted) Overlapp and Add Methods Spectral Subtraction MMSE and log-MMSE Robustifjcation Demonstration
2/15
SLIDE 4
Basics
SLIDE 5
Short-Time Fourier Transform
M
- verlap
N segment length Window Function h(n)
…
x(t) x1(t) x2(t) x3(t) |Xk(f)|
Segmentation DFT 3/15
SLIDE 6
(Weighted) Overlapp and Add
R hop size |Xk(f)| ˆ x1(t) ˆ x2(t) ˆ x3(t) ˆ x(t)
IDFT Weighting Sum
Window Function h(n)
4/15
SLIDE 7
Methods
SLIDE 8
Spectral Subtraction
y(t)
noisy signal
= x(t)
signal
+ d(t)
noise
STFT ISTFT Y[k] arg Y[k] |.|p Noise Estimation
+
- ˆ
D[k]
- p
− |.|1/p
Open Questions:
- How to estimate noise?
- How to handle negative magnitude values after subtraction?
5/15
SLIDE 9
Spectral Subtraction: Results
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
Original
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
Spectral Subtration
Issues
- Residual (musical) noise.
- Too much subtraction leads to speech distortion.
6/15
SLIDE 10
Gain Function
y(t) = x(t) + d(t) where X[k] = A[k]ejα[k], Y[k] = R[k]ejϑ[k],
STFT ISTFT
×
Y[k] G G[k]
Usually: G
- ξ[k], γ[k]
∈ R ξ[k] := λ[k]
x spectral signal power
λ[k]
d
(a priori SNR) γ[k] :=
- R[k]2
λ[k]
d spectral noise power
(a posteriori SNR) Need to be estimated!
7/15
SLIDE 11
Minimum Mean-Square Error Spectral Amplitude Estimator (MMSE)
Idea: Minimize E
- A[k] − ˆ
A[k]2 Solution: (Assumes Gaussian distribution) ˆ A[k] = E
- A[k] | Y[k]
= GMMSE
- ξ[k], γ[k]
· R[k]
STFT ISTFT
×
Y[k] ξ Estimation Noise Estimation γ λ[k]
d
G G[k]
MMSE
ˆ ξ[k] γ[k]
8/15
SLIDE 12
Minimum Mean-Square Error Log-Spectral Amplitude Estimator (log-MMSE)
Idea: Minimize E
- log A[k] − log ˆ
A[k]2 Solution: (Assumes Gaussian distribution) ˆ A[k] = exp E
- ln A[k] | Y[k]
= Glog-MMSE
- ξ[k], γ[k]
· R[k] Notes:
- MMSE with difgerent penalization.
- Better measure for speech [Gray
et al. 1980].
STFT ISTFT
×
Y[k] ξ Estimation Noise Estimation γ λ[k]
d
G G[k]
log-MMSE
ˆ ξ[k] γ[k]
9/15
SLIDE 13
MMSE and log-MMSE: Results
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
Original
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
Spectral Subtraction
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
MMSE
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
log-MMSE 10/15
SLIDE 14
Incorporating Signal Presence Uncertainty (OM-LSA)
[Cohen and Berdugo 2001]
Idea: Two hypotheses H[k]
0 : Y[k] = D[k]
H[k]
1
: Y[k] = X[k] + D[k] p[k] := P
- H[k]
1
| Y[k] Solution: G
- ξ[k], γ[k]
= Gp[k]
H1
- ξ[k], γ[k]
· G1−p[k]
min
Estimate pk via Gaussian model and q[k] := P
- H[k]
- A Priori Speech
Absence Proba- bility Estimation Conditional Speech Presence Proba- bility Estimation GH1 G ˆ p[k] ˆ q[k] G[k] ξ[k] γ[k]
11/15
SLIDE 15
Estimating the a priori Speech Absence Probability q[k]
[Cohen and Berdugo 2001]
ξ Pframe Pglobal Plocal
- avg. over freq. globally
- avg. over freq. locally
- exp. avg. over time
ˆ q[k] = 1 − P[k]
local · P[k] global · P[k] frame 12/15
SLIDE 16
Estimating the a priori SNR ξ[k]
[Cohen and Berdugo 2001; Ephraim and Malah 1984]
Maximum Likelihood: ¯ γ[k,n] = α¯ γ[k,n−1] + (1 − α)γ[k,n] β , 0 ≤ α ≤ 1, β ≥ 1 ˆ ξ[k,n] = max
- ¯
γ[k,n] − 1, 0
- Decision-Directed:
ˆ ξ[k,n] = αG2
H1
- ˆ
ξ[k,n−1], γ[k,n−1] · γ[k,n−1] + (1 − α) max
- γ[k,n] − 1, 0
- Decision-directed approach usually has less musical noise.
13/15
SLIDE 17
Adaptive and Robust Noise Estimation (λ[k]
d )
[Cohen and Berdugo 2001]
Minima controlled recursive averaging (MCRA):
L
min
S Sf Y Smin Sr I pd
Sf /Smin > δ ⇒ 1 ≤ δ ⇒ 0
- avg. over freq.
- exp. avg. over time
localized minimum decision ratio speech indicator
- exp. avg. over time
ˆ λ[k,n+1]
d
= ˜ α[k,n]
d
· ˆ λ[k,n]
d
+
- 1 − ˜
α[k,n]
d
- ·
- Y[k,n]
- 2
˜ α[k,n]
d
= αd + (1 − αd)p[k,n]
d 14/15
SLIDE 18
Results
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
Original
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
MMSE
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
log-MMSE
1 2 3 4 2 4 6 8 Time [s] Frequency [kHz]
OM-LSA with MCRA 15/15
SLIDE 19
Demonstration
SLIDE 20
References
Chen, Jingdong et al. (July 2006). “New insights into the noise reduction Wiener fjlter”. In: IEEE Transactions on Audio, Speech and Language Processing 14.4, pp. 1218–1234. Cohen, Israel and Baruch Berdugo (Nov. 2001). “Speech enhancement for non-stationary noise environments”. In: Signal Processing 81.11, pp. 2403–2418. Ephraim, Y. and D. Malah (Dec. 1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 32.6,
- pp. 1109–1121.