Audio Declipping Using Sparse Multiscale Representations Boris - - PowerPoint PPT Presentation
Audio Declipping Using Sparse Multiscale Representations Boris - - PowerPoint PPT Presentation
Audio Declipping Using Sparse Multiscale Representations Boris Mailh Queen Mary University of London Center for Digital Music Boris.Mailhe@eecs.qmul.ac.uk November 4, 2012 SMALL project: Adler, Emyia, Jafari et al. Later works:
Audio clipping Declipping with sparse representations Multiscale undersampling for heavily clipped signals Experimental results Conclusion
Audio clipping
◮ Audio recording devices (microphones, amplifiers, ADCs,...)
have a maximum input level.
◮ If the input signal in any component excedes this level, the
- utput is clipped.
◮ Typical situations:
◮ the sound is louder than expected, ◮ the source is closer to the microphone than expected, ◮ the recording chain was not properly set up, ◮ .wav encoding,...
Effect in time domain
10 20 30 40 50 60 70 80 90 100 −4 −2 2 4 samples amplitude 0.5 1 1.5 2 2.5 3 3.5 4 x 10
4
−1 −0.5 0.5 1 samples amplitude
Audio inpainting [Adler et al.]
◮ Decompose the 8kHz signal into 50% overlapping frames of
length 512, restore each frame then reconstruct.
◮ A clean audio frame s of length N is sparse on a Gabor
dictionary D. s ≈ Dx x0 ≪ N
◮ The clipped samples are considered missing. The observed
signal y is split into a reliable and a clipped part: yr = Mry = Mrs ≈ MrDx yc = Mcy = sign(Mcs)θ
◮ Find a sparse represention ˜
x of yr = sr over Φ = MrD, then reconstruct the clipped part. ˜ x = argmin
x
x0 s.t. yr − MrDx2 ≤ ǫ ˜ sc = McD˜ x
Orthogonal Matching Pursuit (OMP)
◮ Select the atoms one by one ◮ Initially, Φ0 = ∅, r0 = yr. ◮ Atom selection:
Φi+1 = Φi ∪ {argmax
ϕ∈Φ
|ϕ, ri|}
◮ Coefficient and residual update:
xi+1 = argmin
x
yr − Φi+1x2
2
ri+1 = y − Φi+1xi+1
Additional declipping constraints
◮ All information is not lost on the clipped samples:
◮ the signs are known, ◮ the clipped values are higher than the clipping level θ.
◮ Modified OMP solver:
◮ select the highest correlated atom as in OMP
Φi+1 = Φi ∪ {argmax
ϕ∈Φ
|ϕ, ri|}
◮ add the constraints when updating the coefficients
xi+1 = argmin
x
y r − Φi+1x2
2
s.t. y[n] = θ ⇒ (Di+1x)[n] ≥ θ and y[n] = −θ ⇒ (Di+1x)[n] ≤ −θ
Declipping intervals
◮ The most challenging inpainting tasks are the ones where long
intervals are missing.
◮ In clipped audio, the missing samples are always clustered. ◮ Idea: decimate the signal by a factor 2 to reduce the length of
the clipped intervals y → (y[2n])n≤N/2
◮ First declip the decimated signal. ◮ Same window length (512):
◮ longer in time, ◮ undersampled in frequency.
◮ Then declip the whole signal, with only the odd clipped
samples left to estimate.
On simulated clippings
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 80 Clipping level Average SNR (dB) Janssen dual−constraint OMP msC−OMP
(a) Music
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 15 20 25 30 35 40 45 50 55 Clipping level Average SNR (dB) Janssen dual−constraint OMP msC−OMP
(b) Speech
On recorded clipped data
5 4.5 4 3.5 3 2.5 2 1.5 1 10 15 20 25 30 35 40 45 Average SNR (dB) Janssen dual−constraint OMP msC−OMP
(c) Music
4 3.5 3 2.5 2 1.5 1 10 15 20 25 30 35 Average SNR (dB) Janssen dual−constraint OMP msC−OMP