Sparse Audio Models For Inverse Audio Problems Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France
remi.gribonval@inria.fr
Sparse Audio Models For Inverse Audio Problems Rmi Gribonval INRIA - - PowerPoint PPT Presentation
Sparse Audio Models For Inverse Audio Problems Rmi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Outline Inverse problems in audio processing audio inpainting source localization Learning
Sparse Audio Models For Inverse Audio Problems Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France
remi.gribonval@inria.fr
November 5th 2012
Outline
✓ audio inpainting ✓ source localization
✓ dictionaries ...
2
November 5th 2012
Contributors
✓ A. Adler, N. Bertin, V. Emiya,
✓ S. Nam
✓ F. Bach, R. Jenatton, K. Schnass
3
echange.inria.fr small-project.eu
November 5th 2012-
Audio inpainting
with A. Adler, V. Emiya, M. Elad, M. Jafari, M. Plumbley
November 5th 2012
Image Inpainting
5
Observed image Inpainted image
November 5th 2012
Audio Inpainting ?
6
Time (s) Frequency (Hz) 0.1 0.2 0.3 0.4 2000 4000 6000 8000 0.01 0.02 0.03 −1 1 Time (s) Amplitude
Clicks Limited bandwidth Holes (Packet Loss)
0.01 0.02 0.03 −1 1 Time (s) Amplitude
Clipping
November 5th 2012
Audio Inpainting ?
6
0.01 0.02 0.03 −1 1 Time (s) Amplitude
Clipping
November 5th 2012
Declipping as a linear inverse problem
7
M x y yreliable yreliable = x
November 5th 2012
Inverse problems & signal models
8
Observation Domain
Need for a model = prior knowledge
November 5th 2012
Sparse audio models
(Black = zero)
9
November 5th 2012
Mathematical expression
(ex: time-frequency atoms, wavelets)
10
Dictionary of atoms (Mallat & Zhang 93)
x ∈ Rd x ≈ X
k
zkdk = Dz ⇥z⇥0 = X
k
|zk|0 = card{k, zk = 0}
November 5th 2012
CoSparse models and inverse problems
11
Observation Domain
November 5th 2012
✓ sparsity in time-frequency dictionary
✓ find sparse coefficients such that
✦
(Orthonormal) Matching Pursuit (Mallat & Zhang 93) ✓ + ensure compatibility with clipping constraint
✦
Convex optimization ✓ estimate
Audio Speech and Language Proc., 2012
Audio Declipping
12 0.01 0.02 0.03 0.04 0.05 −0.5 0.5 time (s) Amplitude
x = Dz
y = MDˆ z
ˆ z
ˆ x = Dˆ z
November 5th 2012
✓ sparsity in time-frequency dictionary
✓ find sparse coefficients such that
✦
(Orthonormal) Matching Pursuit (Mallat & Zhang 93) ✓ + ensure compatibility with clipping constraint
✦
Convex optimization ✓ estimate
Audio Speech and Language Proc., 2012
Audio Declipping
12 0.01 0.02 0.03 0.04 0.05 −0.5 0.5 time (s) Amplitude
x = Dz
y = MDˆ z
ˆ z
ˆ x = Dˆ z
November 5th 2012
✓ sparsity in time-frequency dictionary
✓ find sparse coefficients such that
✦
(Orthonormal) Matching Pursuit (Mallat & Zhang 93) ✓ + ensure compatibility with clipping constraint
✦
Convex optimization ✓ estimate
Audio Speech and Language Proc., 2012
Audio Declipping
12 0.01 0.02 0.03 0.04 0.05 −0.5 0.5 time (s) Amplitude
x = Dz
y = MDˆ z
ˆ z
ˆ x = Dˆ z
Clipped
November 5th 2012
✓ sparsity in time-frequency dictionary
✓ find sparse coefficients such that
✦
(Orthonormal) Matching Pursuit (Mallat & Zhang 93) ✓ + ensure compatibility with clipping constraint
✦
Convex optimization ✓ estimate
Audio Speech and Language Proc., 2012
Audio Declipping
12 0.01 0.02 0.03 0.04 0.05 −0.5 0.5 time (s) Amplitude
Declipped
x = Dz
y = MDˆ z
ˆ z
ˆ x = Dˆ z
Clipped
November 5th 2012
✓ sparsity in time-frequency dictionary
✓ find sparse coefficients such that
✦
(Orthonormal) Matching Pursuit (Mallat & Zhang 93) ✓ + ensure compatibility with clipping constraint
✦
Convex optimization ✓ estimate
Audio Speech and Language Proc., 2012
Audio Declipping
12 0.01 0.02 0.03 0.04 0.05 −0.5 0.5 time (s) Amplitude
Declipped
x = Dz
y = MDˆ z
ˆ z
ˆ x = Dˆ z
Clipped Original
November 5th 2012
✓ sparsity in time-frequency dictionary
✓ find sparse coefficients such that
✦
(Orthonormal) Matching Pursuit (Mallat & Zhang 93) ✓ + ensure compatibility with clipping constraint
✦
Convex optimization ✓ estimate
Audio Speech and Language Proc., 2012
Audio Declipping
12 0.01 0.02 0.03 0.04 0.05 −0.5 0.5 time (s) Amplitude
Declipped
x = Dz
y = MDˆ z
ˆ z
ˆ x = Dˆ z
Clipped Original
see also talk by B. Mailhé
November 5th 2012-
Source localization
with S. Nam
November 5th 2012
Localization with «few» microphones
✓ localize emitting sources ✓ reconstruct emitted signals ✓ extrapolate acoustic field
14
y = Mx
time-series recorded at sensors (discretized) spatio-temporal acoustic field
∈ Rm ∈ RN
November 5th 2012
Physics-driven design of model
15
(∆p − 1
c2 ∂2 ∂t2 p)(
r, t) = s( r, t), r ∈ ˙ D p n(⇥ r, t) = 0, ⇥ r ∈ D p( r, t)
November 5th 2012
Physics-driven design of model
15
(∆p − 1
c2 ∂2 ∂t2 p)(
r, t) = s( r, t), r ∈ ˙ D p n(⇥ r, t) = 0, ⇥ r ∈ D p( r, t)
Ωx = z x
Discretization sources & boundaries
November 5th 2012
Group sparse source model
16
space time t
z
r,t
November 5th 2012
Group sparse regularization
✦
Promotes group sparsity, cf Kowalski & Torresani 2009, Eldar & Mishali 2009, Baraniuk & al 2010, Jenatton & al 2011
17
y = Mx ˆ x = arg min
x
1 2ky Mxk2
2 + λkΩxk1,2
November 5th 2012
✓ 2D+t vibrating plate 77x77 ✓ 2 sources, random location ✓ 6 microphones, random location ✓ known complex boundaries ✓ ground truth generated with naive
discretization
Sparse Field Reconstruction
18
Ground truth Sparse reconstruction
November 5th 2012
✓ 2D+t vibrating plate 77x77 ✓ 2 sources, random location ✓ 6 microphones, random location ✓ known complex boundaries ✓ ground truth generated with naive
discretization
Sparse Field Reconstruction
18
Ground truth Sparse reconstruction
November 5th 2012
Localizing the source next door
Microphones
19
November 5th 2012
Localizing the source next door
Microphones
19
Measured at microphone
November 5th 2012
Localizing the source next door
Microphones
19
Measured at microphone
November 5th 2012
Localizing the source next door
Microphones
19
Reasons of success
November 5th 2012
Localizing the source next door
Microphones
19
Reasons of success
What if shape is unknown ?
November 5th 2012
CoSparse models and inverse problems
20
Observation Domain
November 5th 2012
21
Observation Domain
CoSparse models and inverse problems
November 5th 2012
21
Observation Domain
CoSparse models and inverse problems
«Perception» «Knowledge»
November 5th 2012-
Dictionary learning
with K. Schnass, F. Bach, R. Jenatton
November 5th 2012
A quest for the perfect sparse model
23
Training image database
November 5th 2012
A quest for the perfect sparse model
23
Training image database
patch extraction
Training patches
xn = Dzn, 1 ≤ n ≤ N
November 5th 2012
Unknown sparse coefficients Unknown dictionary
A quest for the perfect sparse model
23
Training image database
patch extraction
Training patches
xn = Dzn, 1 ≤ n ≤ N
November 5th 2012
Unknown sparse coefficients Unknown dictionary
A quest for the perfect sparse model
sparse learning
23
Training image database
= edge-like atoms
[Olshausen & Field 96, Aharon et al 06, Mairal et al 09, ...]
= shifts of edge-like motifs
[Blumensath 05, Jost et al 05, ...] patch extraction
Training patches
xn = Dzn, 1 ≤ n ≤ N ˆ D
November 5th 2012
Dictionary Learning = Sparse Matrix Factorization
24
November 5th 2012
Dictionary Learning = Sparse Matrix Factorization
24
xn ≈ Dzn ∈ Rd D x1 ≈ z1
s-sparse = at most s nonzero entries
November 5th 2012
Dictionary Learning = Sparse Matrix Factorization
24
xn ≈ Dzn ∈ Rd D x1 x2 ≈ z1 z2
November 5th 2012
Dictionary Learning = Sparse Matrix Factorization
24
xn ≈ Dzn ∈ Rd D x1 x2 ≈ xN . . . . . . z1 z2 zN
November 5th 2012
Dictionary Learning = Sparse Matrix Factorization
25
D x1 x2 ≈ xN . . . . . . z1 z2 zN d × N
with s-sparse columns
d × K K × N
November 5th 2012
Many approaches
26
✦
[see e.g. book by Comon & Jutten 2011]
✦
[Bach et al., 2008; Bradley and Bagnell, 2009]
✦
[Krause and Cevher, 2010]
✦
[Zhou et al., 2009]
✦
[Olshausen and Field, 1997; Pearlmutter & Zibulevsky 2001, Aharon et al. 2006; Lee et al., 2007; Mairal et al., 2010 (... and many other authors)]
November 5th 2012
Learning = constrained minimization
✓ Constraint = dictionary with unit columns
27
D∈D FX(D)
D = {D = [d1, . . . , dD], k ⇥dk⇥2 = 1}
November 5th 2012-
Empirical findings
November 5th 2012
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples29
Numerical example (2D)
X = D0Z0
November 5th 2012
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples29
Numerical example (2D)
X = D0Z0 θ1 θ0 Dθ0,θ1
November 5th 2012
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples29
Numerical example (2D)
X = D0Z0 θ1 θ0 Dθ0,θ1 kD−1
θ0,θ1Xk1
FX(D)
November 5th 2012
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples29
Numerical example (2D)
X = D0Z0 θ1 θ0 Dθ0,θ1 kD−1
θ0,θ1Xk1
FX(D)
Symmetry = permutation ambiguity
November 5th 2012
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples29
Numerical example (2D)
a) Global minima match angles of the original basis b) There is no other local minimum. Empirical observations
X = D0Z0 θ1 θ0 Dθ0,θ1 kD−1
θ0,θ1Xk1
FX(D)
November 5th 2012
Sparsity vs coherence (2D)
30
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −4 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 N = 1000 Bernoulli−Gaussian training samplessparse weakly sparse
p 1
µ = | cos(θ1 − θ0)|
1
incoherent coherent
November 5th 2012
ground truth=local min ground truth=global min no spurious local min
1 0.9 0.8 0.7 0.6. 0.5
Sparsity vs coherence (2D)
30
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −4 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 N = 1000 Bernoulli−Gaussian training samplessparse weakly sparse
p 1
µ = | cos(θ1 − θ0)|
1
incoherent coherent
Empirical probability of success
November 5th 2012
ground truth=local min ground truth=global min no spurious local min
1 0.9 0.8 0.7 0.6. 0.5
Sparsity vs coherence (2D)
30
−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −4 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 N = 1000 Bernoulli−Gaussian training samplessparse weakly sparse
p 1
µ = | cos(θ1 − θ0)|
1
incoherent coherent
Empirical probability of success Rule of thumb: perfect recovery if: a) Incoherence b) Enough training samples (N large enough)
µ < 1 − p
November 5th 2012
Empirical findings
✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum
✓ sparsity of Z ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ?
31
November 5th 2012-
Theoretical guarantees
November 5th 2012
Theoretical guarantees
33
November 5th 2012
Theoretical guarantees
✦
[Maurer and Pontil, 2010; Vainsencher et al., 2010; Mehta and Gray, 2012]
33
FX( ˆ D) − min
D EXFX(D)
November 5th 2012
Theoretical guarantees
✦
[Maurer and Pontil, 2010; Vainsencher et al., 2010; Mehta and Gray, 2012]
✦
[Independent Component Analysis, e.g. book Comon & Jutten 2011]
33
k ˆ D D0kF
✓
Array processing perspective
✦
Dictionary ~ directions of arrival
✦
Identification ~ source localization
✓
Neural coding perspective:
✦
Dictionaries ~ receptive fields
FX( ˆ D) − min
D EXFX(D)
November 5th 2012
[G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.]
signal model
no yes yes
yes no yes noise no no yes cost function
Theoretical guarantees: overview
34
min FX(D) minD,Z kZk1 s.t.DZ = X
− − − − − − − − −November 5th 2012
[G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.]
signal model
no yes yes
yes no yes noise no no yes cost function
Theoretical guarantees: overview
34
min FX(D) minD,Z kZk1 s.t.DZ = X
− − − − − − − − −November 5th 2012
Learning Guarantees vs Empirical Findings
35
10
110
210
310
410
510
−210
−110 Hadamard−Dirac dictionary in dimension d number N of training signals relative error
d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.)10 10
110
−310
−210
−110 10
1Hadamard dictionary in dimension d Noise level Relative error
d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.)Predicted slope
November 5th 2012-
To conclude ...
November 5th 2012
37
Observation Domain
«Perception» «Knowledge»
CoSparse models and inverse problems
November 5th 2012
Synthesis vs Analysis
✓ Synthesis dictionary of atoms ✓ «Lego» model: building blocks ✓ Low-dimension = few atoms
✦
Ex: man-made codes in communications
✓ Analysis operator ✓ «Carving out» model: constraints ✓ Low-dimension = many constraints
✦
Ex: coupling with laws of physics
38
for many rows of
x = Dz = X
i
zidi
kzk0 ⌧ dimension hωi, xi = 0 kΩxk0 ⌧ dimension
Ω (∆x − 1
c2 ∂2 ∂t2 x)| ˙ D = 0
November 5th 2012
Synthesis vs Analysis
✓ Synthesis dictionary of atoms ✓ «Lego» model: building blocks ✓ Low-dimension = few atoms
✦
Ex: man-made codes in communications
✓ Analysis operator ✓ «Carving out» model: constraints ✓ Low-dimension = many constraints
✦
Ex: coupling with laws of physics
38
for many rows of
x = Dz = X
i
zidi
kzk0 ⌧ dimension hωi, xi = 0 kΩxk0 ⌧ dimension
Ω (∆x − 1
c2 ∂2 ∂t2 x)| ˙ D = 0
Misleadingly similar models, In fact, fundamentally different! Concept of cosparsity Nam & al 2011
November 5th 2012
Time scales in «knowledge building»
39
✓ Harmonic analysis ✓ Evolution of species
✓ Dictionary learning ✓ Individual experience
November 5th 2012
Time scales in «knowledge building»
39
✓ Harmonic analysis ✓ Evolution of species
✓ Dictionary learning ✓ Individual experience
see also talk by M. Yaghoobi
November 5th 2012
✓ A. Adler, N. Bertin, V. Emiya,
✓ S. Nam
✓ F. Bach, R. Jenatton, K. Schnass
40
echange.inria.fr small-project.eu