 
              Audio Declipping Using Sparse Multiscale Representations Boris Mailhé Queen Mary University of London Center for Digital Music Boris.Mailhe@eecs.qmul.ac.uk November 4, 2012 ◮ SMALL project: Adler, Emyia, Jafari et al. ◮ Later works: Jafari, Clifford et al.
Audio clipping Declipping with sparse representations Multiscale undersampling for heavily clipped signals Experimental results Conclusion
Audio clipping ◮ Audio recording devices (microphones, amplifiers, ADCs,...) have a maximum input level. ◮ If the input signal in any component excedes this level, the output is clipped. ◮ Typical situations: ◮ the sound is louder than expected, ◮ the source is closer to the microphone than expected, ◮ the recording chain was not properly set up, ◮ .wav encoding,...
Effect in time domain 4 2 amplitude 0 −2 −4 0 10 20 30 40 50 60 70 80 90 100 samples 1 0.5 amplitude 0 −0.5 −1 0 0.5 1 1.5 2 2.5 3 3.5 4 samples 4 x 10
Audio inpainting [Adler et al.] ◮ Decompose the 8kHz signal into 50% overlapping frames of length 512, restore each frame then reconstruct. ◮ A clean audio frame s of length N is sparse on a Gabor dictionary D . s ≈ Dx � x � 0 ≪ N ◮ The clipped samples are considered missing. The observed signal y is split into a reliable and a clipped part: y r = M r y = M r s ≈ M r Dx y c = M c y = sign ( M c s ) θ x of y r = s r over Φ = M r D , then ◮ Find a sparse represention ˜ reconstruct the clipped part. � x � 0 s.t. � y r − M r Dx � 2 ≤ ǫ ˜ x = argmin x s c = M c D ˜ ˜ x
Orthogonal Matching Pursuit (OMP) ◮ Select the atoms one by one ◮ Initially, Φ 0 = ∅ , r 0 = y r . ◮ Atom selection: Φ i + 1 = Φ i ∪ { argmax |� ϕ, r i �|} ϕ ∈ Φ ◮ Coefficient and residual update: � y r − Φ i + 1 x � 2 x i + 1 = argmin 2 x r i + 1 = y − Φ i + 1 x i + 1
Additional declipping constraints ◮ All information is not lost on the clipped samples: ◮ the signs are known, ◮ the clipped values are higher than the clipping level θ . ◮ Modified OMP solver: ◮ select the highest correlated atom as in OMP Φ i + 1 = Φ i ∪ { argmax |� ϕ, r i �|} ϕ ∈ Φ ◮ add the constraints when updating the coefficients � y r − Φ i + 1 x � 2 x i + 1 = argmin 2 x s.t. y [ n ] = θ ⇒ ( D i + 1 x )[ n ] ≥ θ and y [ n ] = − θ ⇒ ( D i + 1 x )[ n ] ≤ − θ
Declipping intervals ◮ The most challenging inpainting tasks are the ones where long intervals are missing. ◮ In clipped audio, the missing samples are always clustered. ◮ Idea: decimate the signal by a factor 2 to reduce the length of the clipped intervals y → ( y [ 2 n ]) n ≤ N / 2 ◮ First declip the decimated signal. ◮ Same window length (512): ◮ longer in time, ◮ undersampled in frequency. ◮ Then declip the whole signal, with only the odd clipped samples left to estimate.
On simulated clippings 80 55 50 70 45 60 Average SNR (dB) Average SNR (dB) 40 50 35 30 40 25 30 20 Janssen Janssen 20 15 dual−constraint OMP dual−constraint OMP msC−OMP msC−OMP 10 10 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Clipping level Clipping level (a) Music (b) Speech
On recorded clipped data 45 35 Janssen Janssen dual−constraint OMP dual−constraint OMP 40 msC−OMP msC−OMP 30 35 Average SNR (dB) Average SNR (dB) 25 30 25 20 20 15 15 10 10 1 1.5 2 2.5 3 3.5 4 4.5 5 1 1.5 2 2.5 3 3.5 4 (c) Music (d) Speech
Conclusion Contributions: ◮ Audio clipping can be modelled as an inpainting problem with additional constraints ◮ Sparse representations can solve that problem ◮ Downsampling the signal improves the quality on real recorded signals Future works: ◮ investigate the influence of the parameters: window length, frequency sampling, overlap,... ◮ extend the method to soft clipping ◮ refine the evaluation on recorded data
Recommend
More recommend