[PPT] - Sparse dictionary learning in the presence of noise & outliers PowerPoint Presentation

SLIDE 1

Sparse dictionary learning in the presence of noise & outliers Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

remi.gribonval@inria.fr

SLIDE 2 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Overview

2

Context: sparse signal processing
Dictionary learning
Statistical guarantees
Flavor of the proof
Conclusion

SLIDE 3 April 25th 2013-

R. GRIBONVAL - Séminaire LIF

Sparse signal processing

SLIDE 4

R. GRIBONVAL - Séminaire LIF

April 25th 2013

Sparse Signal / Image Processing

4

+ Compression, Source Localization, Separation, Compressed Sensing ...

SLIDE 5 April 25th 2013

R. GRIBONVAL - Séminaire LIF
Audio : time-frequency representations (MP3)
Images : wavelet transform (JPEG2000)

Typical Sparse Models

Black = zero White = zero

5

ANALYSIS ANALYSIS SYNTHESIS SYNTHESIS

SLIDE 6 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Mathematical expression

Signal / image = high dimensional vector
Model = linear combination of basis vectors

(ex: time-frequency atoms, wavelets)

Sparsity = small L0 (quasi)-norm

6

Dictionary of atoms (Mallat & Zhang 93)

x ∈ Rd x ≈ X

k

zkdk = Dz ⇥z⇥0 = X

k

|zk|0 = card{k, zk = 0}

SLIDE 7

R. GRIBONVAL - Séminaire LIF

April 25th 2013

CoSparse models and inverse problems

7

Observation Domain

SLIDE 8 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Acoustic Imaging

Ground truth: laser vibrometry

✓ direct optical measures ✓ sequential ✓ 2000 measures

Nearfield Acoustic Holography

✓ indirect acoustic measures ✓ 120 microphones at a time ✓ 120 x 16 = 1920 measures ✓ Tikhonov regularization

8

echange.inria.fr

SLIDE 9 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Compressive Nearfield Acoustic Holography

One shot with 120 micros
Sparse regularization

9

echange.inria.fr

SLIDE 10 April 25th 2013-

R. GRIBONVAL - Séminaire LIF

Dictionary learning

with K. Schnass, F. Bach, R. Jenatton

small-project.eu

SLIDE 11

R. GRIBONVAL - Séminaire LIF

April 25th 2013

Sparse Atomic Decompositions

11

x ≈ Dz

Signal Image (Overcomplete) dictionary of atoms Sparse Representation Coefficients

SLIDE 12 April 25th 2013

R. GRIBONVAL - Séminaire LIF
Sparsity: historically for signals & images

✓ bottleneck = large-scale algorithms

New “exotic” or composite data

✓ bottleneck = dictionary/operator design/learning

Data Deluge + Jungle

12

Signals Images Hyperspectral Satellite imaging Spherical geometry Cosmology, HRTF (3D audio) Graph data Social networks Brain connectivity Vector valued Diffusion tensor

SLIDE 13

R. GRIBONVAL - Séminaire LIF

April 25th 2013

Unknown sparse coefficients Unknown dictionary

A quest for the perfect sparse model

sparse learning

13

Training database

= edge-like atoms

[Olshausen & Field 96, Aharon et al 06, Mairal et al 09, ...]

= shifts of edge-like motifs

[Blumensath 05, Jost et al 05, ...] patch extraction

Training patches

xn = Dzn, 1 ≤ n ≤ N ˆ D

SLIDE 14

R. GRIBONVAL - Séminaire LIF

April 25th 2013

Dictionary Learning = Sparse Matrix Factorization

14

X ≈ DZ

D x1 x2 ≈ xN

. . . . . .

z1 z2 zN

d × N

with s-sparse columns

d × K K × N

SLIDE 15 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Many approaches

15

Independent component analysis

✦

[see e.g. book by Comon & Jutten 2011]

Convex

✦

[Bach et al., 2008; Bradley and Bagnell, 2009]

Submodular

✦

[Krause and Cevher, 2010]

Bayesian

✦

[Zhou et al., 2009]

Non-convex matrix-factorization

✦

[Olshausen and Field, 1997; Pearlmutter & Zibulevsky 2001, Aharon et al. 2006; Lee et al., 2007; Mairal et al., 2010 (... and many other authors)]

SLIDE 16 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Sparse coding objective function

16

Given one training sample: Basis Pursuit / LASSO
Given N training samples

fxn(D) = min

zn

1 2kxn Dznk2

2 + λkznk1

FX(D) = 1 N

N

X

n=1

fxn(D) / min

Z

1 2kX DZk2

F + λkZk1

SLIDE 17 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Learning = constrained minimization

✓ Online learning with SPAMS library (Mairal & al) ✓ Constraint = dictionary with unit columns

17

ˆ D = arg min

D∈D FX(D)

D = {D = [d1, . . . , dD], k ⇥dk⇥2 = 1}

SLIDE 18 April 25th 2013-

R. GRIBONVAL - Séminaire LIF

Empirical findings

SLIDE 19

R. GRIBONVAL - Séminaire LIF

April 25th 2013 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples

19

Numerical example (2D)

a) Global minima match angles of the original basis b) There is no other local minimum. Empirical observations

X = D0Z0 θ1 θ0 Dθ0,θ1 kD−1

θ0,θ1Xk1

FX(D)

Symmetry = permutation ambiguity

SLIDE 20 April 25th 2013

R. GRIBONVAL - Séminaire LIF

ground truth=local min ground truth=global min no spurious local min

1 0.9 0.8 0.7 0.6. 0.5

Sparsity vs coherence (2D)

20

−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −4 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 N = 1000 Bernoulli−Gaussian training samples

sparse weakly sparse

p

1

µ = | cos(θ1 − θ0)|

1

incoherent coherent

Empirical probability of success Rule of thumb: perfect recovery if: a) Incoherence b) Enough training samples (N large enough)

µ < 1 − p

SLIDE 21 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Empirical findings

Stable & robust dictionary identification

✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum

Role of parameters ?

✓ sparsity of Z ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ?

21

SLIDE 22 April 25th 2013-

R. GRIBONVAL - Séminaire LIF

Theoretical guarantees

SLIDE 23 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Theoretical guarantees

Excess risk analysis (~Machine Learning)

✦

[Maurer and Pontil, 2010; Vainsencher et al., 2010; Mehta and Gray, 2012]

Identifiability analysis (~Signal Processing)

✦

[Independent Component Analysis, e.g. book Comon & Jutten 2011]

23

k ˆ D D0kF

✓

Array processing perspective

✦

Dictionary ~ directions of arrival

✦

Identification ~ source localization

✓

Neural coding perspective:

✦

Dictionaries ~ receptive fields

FX( ˆ D) − min

D EXFX(D)

SLIDE 24 April 25th 2013

R. GRIBONVAL - Séminaire LIF

[G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.]

signal model

vercomplete (d<K)

no yes yes

utliers

yes no yes noise no no yes cost function

Theoretical guarantees: overview

24

min FX(D) minD,Z kZk1 s.t.DZ = X

− − − − − − − − −

SLIDE 25 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Sparse Signal Model

Random support
Sub-Gaussian iid coefficients, bounded below
Sub-Gaussian additive noise

25

P(|zi| < z) = 0

x = X

i∈J

zidi + ε = DJzJ + ε

J ⊂ [1, K], J = s

SLIDE 26 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Local stability & robustness

Theorem 1: local stability [Jenatton, Bach & G. 2012]

✓ Assumptions:

✦

vercomplete incoherent dictionary

✦

s-sparse sub-Gaussian coefficient model (no outlier) ✓ Conclusion:

✦

with high probability there exists a local minimum of such that

Theorem 2: robustness to noise

✓ technical assumption: bounded coefficient model

Theorem 3: robustness to outliers

26

FX(D)

⇤D D0⇤F ⇥ C r sdK3 · log N N D0 sµ(D0) ⌧ 1

SLIDE 27 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Learning Guarantees vs Empirical Findings

Robustness to noise
Sample complexity

27

10 1 10 2 10 3 10 4 10 5 10 −2 10 −1 10 Hadamard−Dirac dictionary in dimension d number N of training signals relative error d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.) 10 10 1 10 −3 10 −2 10 −1 10 10 1 Hadamard dictionary in dimension d Noise level Relative error d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.)

Predicted slope

dxd dx2d

SLIDE 28 April 25th 2013-

R. GRIBONVAL - Séminaire LIF

Flavor of the proof

SLIDE 29 April 25th 2013

R. GRIBONVAL - Séminaire LIF
Noiseless setting

✓ Minimum exactly at ground truth ✓ one-sided directional derivatives

Noisy setting

✓ Minimum close to ground truth ✓ Zero at ground truth ✓ Lower bound at radius r

r

D

Characterizing local minima (1)

29

D

ground truth

FX(D) − FX(D0) FX(D) − FX(D0) D0 D0

SLIDE 30 April 25th 2013

R. GRIBONVAL - Séminaire LIF
Problem: sum of complicated functions!
Solution: simplified expression if sparse recovery

✦

adaptation from [Fuchs, 2005; Zhao and Yu, 2006; Wainwright, 2009] ✓ Approximate cost function

Controlling the cost function

30

FX(D) fxn(D) = min

zn

1 2kxn Dznk2

2 + λkznk1

fx(D) = φx(D|sign(z0)) x = D0z0 + ε ΦX(D) ≈ FX(D)

SLIDE 31 April 25th 2013

R. GRIBONVAL - Séminaire LIF
Problem:

✓ Need uniform lower bound on the sphere

f the random function
Solution:

✓ Lower bound expectation for a given D ✓ Control Lipschitz constant (with high probability) ✓ Conclude with epsilon-net argument

Controlling the approximate cost function

31

kD D0kF = r

ΦX(D) − ΦX(D0)

SLIDE 32 April 25th 2013

R. GRIBONVAL - Séminaire LIF
With high probability:

✓ lower-bound on approximate cost function ✓ lower-bound on cost function

Outliers: «no model» but total energy bounded

r

D

Putting the pieces together

32

Admissible energy

f outliers

D0

ΦX(D) − ΦX(D0)

FX(D) − FX(D0)

1 N X

n∈outlier

kxnk2

2  c

SLIDE 33

R. GRIBONVAL - Séminaire LIF

April 25th 2013

From local to global guarantees ?

33

ground truth=local min ground truth=global min

1 0.9 0.8 0.7 0.6. 0.5

ˆ D = arg min

D∈D FX(D)

no spurious local min

SLIDE 34 April 25th 2013-

R. GRIBONVAL - Séminaire LIF

To conclude ...

SLIDE 35 April 25th 2013

R. GRIBONVAL - Séminaire LIF

Summary

Sparse Dictionary Learning

✓ widely used in image processing and machine learning ✓ from heuristics ...

✦

nline algorithms, empirically successful

✓ ... to statistics

✦

local stability and robustness guarantees

✦

http://hal.inria.fr/hal-00737152 [Jenatton, G. & Bach, Local stability and robustness of sparse dictionary learning in the presence of noise, Oct 2012]

35

SLIDE 36 April 25th 2013

R. GRIBONVAL - Séminaire LIF

What’s next ?

Immediate challenges

✓ global guarantees ? empirically yes ✓ sharp sample complexity ✓ guarantees from cost functions to algorithms

Sparse learning beyond dictionaries

✓ synthesis / analysis flavor (e.g. TV-like) ✓ structured models (shift-invariance, etc. ) ✓ structured sparsity (e.g. trees, graphs)

More examples = less work to learn ?

36

SLIDE 37 April 25th 2013

R. GRIBONVAL - Séminaire LIF

37

Sparse dictionary learning in the presence of noise & outliers - - PowerPoint PPT Presentation

x ≈ Dz

X ≈ DZ

ˆ D = arg min

ˆ D = arg min

TH###NKS #