Sparse dictionary learning in the presence of noise & outliers - - PowerPoint PPT Presentation

sparse dictionary learning in the presence of noise
SMART_READER_LITE
LIVE PREVIEW

Sparse dictionary learning in the presence of noise & outliers - - PowerPoint PPT Presentation

Sparse dictionary learning in the presence of noise & outliers Rmi Gribonval INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview Context: sparse signal processing Dictionary learning Statistical


slide-1
SLIDE 1

Sparse dictionary learning in the presence of noise & outliers Rémi Gribonval INRIA Rennes - Bretagne Atlantique, France

remi.gribonval@inria.fr

slide-2
SLIDE 2 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Overview

2

  • Context: sparse signal processing
  • Dictionary learning
  • Statistical guarantees
  • Flavor of the proof
  • Conclusion
slide-3
SLIDE 3 April 25th 2013-
  • R. GRIBONVAL - Séminaire LIF

Sparse signal processing

slide-4
SLIDE 4
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013

Sparse Signal / Image Processing

4

+ Compression, Source Localization, Separation, Compressed Sensing ...

slide-5
SLIDE 5 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF
  • Audio : time-frequency representations (MP3)
  • Images : wavelet transform (JPEG2000)

Typical Sparse Models

Black = zero White = zero

5

ANALYSIS ANALYSIS SYNTHESIS SYNTHESIS

slide-6
SLIDE 6 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Mathematical expression

  • Signal / image = high dimensional vector
  • Model = linear combination of basis vectors

(ex: time-frequency atoms, wavelets)

  • Sparsity = small L0 (quasi)-norm

6

Dictionary of atoms (Mallat & Zhang 93)

x ∈ Rd x ≈ X

k

zkdk = Dz ⇥z⇥0 = X

k

|zk|0 = card{k, zk = 0}

slide-7
SLIDE 7
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013

CoSparse models and inverse problems

7

Observation Domain

slide-8
SLIDE 8 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Acoustic Imaging

  • Ground truth: laser vibrometry

✓ direct optical measures ✓ sequential ✓ 2000 measures

  • Nearfield Acoustic Holography

✓ indirect acoustic measures ✓ 120 microphones at a time ✓ 120 x 16 = 1920 measures ✓ Tikhonov regularization

8

echange.inria.fr

slide-9
SLIDE 9 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Compressive Nearfield Acoustic Holography

  • One shot with 120 micros
  • Sparse regularization

9

echange.inria.fr

slide-10
SLIDE 10 April 25th 2013-
  • R. GRIBONVAL - Séminaire LIF

Dictionary learning

with K. Schnass, F. Bach, R. Jenatton

small-project.eu

slide-11
SLIDE 11
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013

Sparse Atomic Decompositions

11

x ≈ Dz

Signal Image (Overcomplete) dictionary of atoms Sparse Representation Coefficients

slide-12
SLIDE 12 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF
  • Sparsity: historically for signals & images

✓ bottleneck = large-scale algorithms

  • New “exotic” or composite data

✓ bottleneck = dictionary/operator design/learning

Data Deluge + Jungle

12

Signals Images Hyperspectral Satellite imaging Spherical geometry Cosmology, HRTF (3D audio) Graph data Social networks Brain connectivity Vector valued Diffusion tensor

slide-13
SLIDE 13
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013

Unknown sparse coefficients Unknown dictionary

A quest for the perfect sparse model

sparse learning

13

Training database

= edge-like atoms

[Olshausen & Field 96, Aharon et al 06, Mairal et al 09, ...]

= shifts of edge-like motifs

[Blumensath 05, Jost et al 05, ...] patch extraction

Training patches

xn = Dzn, 1 ≤ n ≤ N ˆ D

slide-14
SLIDE 14
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013

Dictionary Learning = Sparse Matrix Factorization

14

X ≈ DZ

D x1 x2 ≈ xN

. . . . . .

z1 z2 zN

d × N

with s-sparse columns

d × K K × N

slide-15
SLIDE 15 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Many approaches

15

  • Independent component analysis

[see e.g. book by Comon & Jutten 2011]

  • Convex

[Bach et al., 2008; Bradley and Bagnell, 2009]

  • Submodular

[Krause and Cevher, 2010]

  • Bayesian

[Zhou et al., 2009]

  • Non-convex matrix-factorization

[Olshausen and Field, 1997; Pearlmutter & Zibulevsky 2001, Aharon et al. 2006; Lee et al., 2007; Mairal et al., 2010 (... and many other authors)]

slide-16
SLIDE 16 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Sparse coding objective function

16

  • Given one training sample: Basis Pursuit / LASSO
  • Given N training samples

fxn(D) = min

zn

1 2kxn Dznk2

2 + λkznk1

FX(D) = 1 N

N

X

n=1

fxn(D) / min

Z

1 2kX DZk2

F + λkZk1

slide-17
SLIDE 17 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Learning = constrained minimization

✓ Online learning with SPAMS library (Mairal & al) ✓ Constraint = dictionary with unit columns

17

ˆ D = arg min

D∈D FX(D)

D = {D = [d1, . . . , dD], k ⇥dk⇥2 = 1}

slide-18
SLIDE 18 April 25th 2013-
  • R. GRIBONVAL - Séminaire LIF

Empirical findings

slide-19
SLIDE 19
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples

19

Numerical example (2D)

a) Global minima match angles of the original basis b) There is no other local minimum. Empirical observations

X = D0Z0 θ1 θ0 Dθ0,θ1 kD−1

θ0,θ1Xk1

FX(D)

Symmetry = permutation ambiguity

slide-20
SLIDE 20 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

ground truth=local min ground truth=global min no spurious local min

1 0.9 0.8 0.7 0.6. 0.5

Sparsity vs coherence (2D)

20

−3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −4 −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 4 N = 1000 Bernoulli−Gaussian training samples −3 −2 −1 1 2 3 −4 −3 −2 −1 1 2 3 N = 1000 Bernoulli−Gaussian training samples −2.5 −2 −1.5 −1 −0.5 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0.5 1 1.5 2 2.5 3 N = 1000 Bernoulli−Gaussian training samples

sparse weakly sparse

p

1

µ = | cos(θ1 − θ0)|

1

incoherent coherent

Empirical probability of success Rule of thumb: perfect recovery if: a) Incoherence b) Enough training samples (N large enough)

µ < 1 − p

slide-21
SLIDE 21 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Empirical findings

  • Stable & robust dictionary identification

✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum

  • Role of parameters ?

✓ sparsity of Z ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ?

21

slide-22
SLIDE 22 April 25th 2013-
  • R. GRIBONVAL - Séminaire LIF

Theoretical guarantees

slide-23
SLIDE 23 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Theoretical guarantees

  • Excess risk analysis (~Machine Learning)

[Maurer and Pontil, 2010; Vainsencher et al., 2010; Mehta and Gray, 2012]

  • Identifiability analysis (~Signal Processing)

[Independent Component Analysis, e.g. book Comon & Jutten 2011]

23

k ˆ D D0kF

Array processing perspective

Dictionary ~ directions of arrival

Identification ~ source localization

Neural coding perspective:

Dictionaries ~ receptive fields

FX( ˆ D) − min

D EXFX(D)

slide-24
SLIDE 24 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

[G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.]

signal model

  • vercomplete (d<K)

no yes yes

  • utliers

yes no yes noise no no yes cost function

Theoretical guarantees: overview

24

min FX(D) minD,Z kZk1 s.t.DZ = X

− − − − − − − − −
slide-25
SLIDE 25 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Sparse Signal Model

  • Random support
  • Sub-Gaussian iid coefficients, bounded below
  • Sub-Gaussian additive noise

25

P(|zi| < z) = 0

x = X

i∈J

zidi + ε = DJzJ + ε

J ⊂ [1, K], J = s

slide-26
SLIDE 26 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Local stability & robustness

  • Theorem 1: local stability [Jenatton, Bach & G. 2012]

✓ Assumptions:

  • vercomplete incoherent dictionary

s-sparse sub-Gaussian coefficient model (no outlier) ✓ Conclusion:

with high probability there exists a local minimum of such that

  • Theorem 2: robustness to noise

✓ technical assumption: bounded coefficient model

  • Theorem 3: robustness to outliers

26

FX(D)

⇤D D0⇤F ⇥ C r sdK3 · log N N D0 sµ(D0) ⌧ 1

slide-27
SLIDE 27 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Learning Guarantees vs Empirical Findings

  • Robustness to noise
  • Sample complexity

27

10 1 10 2 10 3 10 4 10 5 10 −2 10 −1 10 Hadamard−Dirac dictionary in dimension d number N of training signals relative error d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.) 10 10 1 10 −3 10 −2 10 −1 10 10 1 Hadamard dictionary in dimension d Noise level Relative error d=8 d=8 d=16 d=16 d=32 (random init.) d=32 (oracle init.)

Predicted slope

dxd dx2d

slide-28
SLIDE 28 April 25th 2013-
  • R. GRIBONVAL - Séminaire LIF

Flavor of the proof

slide-29
SLIDE 29 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF
  • Noiseless setting

✓ Minimum exactly at ground truth ✓ one-sided directional derivatives

  • Noisy setting

✓ Minimum close to ground truth ✓ Zero at ground truth ✓ Lower bound at radius r

r

D

Characterizing local minima (1)

29

D

ground truth

FX(D) − FX(D0) FX(D) − FX(D0) D0 D0

slide-30
SLIDE 30 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF
  • Problem: sum of complicated functions!
  • Solution: simplified expression if sparse recovery

adaptation from [Fuchs, 2005; Zhao and Yu, 2006; Wainwright, 2009] ✓ Approximate cost function

Controlling the cost function

30

FX(D) fxn(D) = min

zn

1 2kxn Dznk2

2 + λkznk1

fx(D) = φx(D|sign(z0)) x = D0z0 + ε ΦX(D) ≈ FX(D)

slide-31
SLIDE 31 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF
  • Problem:

✓ Need uniform lower bound on the sphere

  • f the random function
  • Solution:

✓ Lower bound expectation for a given D ✓ Control Lipschitz constant (with high probability) ✓ Conclude with epsilon-net argument

Controlling the approximate cost function

31

kD D0kF = r

ΦX(D) − ΦX(D0)

slide-32
SLIDE 32 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF
  • With high probability:

✓ lower-bound on approximate cost function ✓ lower-bound on cost function

  • Outliers: «no model» but total energy bounded

r

D

Putting the pieces together

32

Admissible energy

  • f outliers

D0

ΦX(D) − ΦX(D0)

FX(D) − FX(D0)

1 N X

n∈outlier

kxnk2

2  c

slide-33
SLIDE 33
  • R. GRIBONVAL - Séminaire LIF
April 25th 2013

From local to global guarantees ?

33

ground truth=local min ground truth=global min

1 0.9 0.8 0.7 0.6. 0.5

ˆ D = arg min

D∈D FX(D)

no spurious local min

slide-34
SLIDE 34 April 25th 2013-
  • R. GRIBONVAL - Séminaire LIF

To conclude ...

slide-35
SLIDE 35 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

Summary

  • Sparse Dictionary Learning

✓ widely used in image processing and machine learning ✓ from heuristics ...

  • nline algorithms, empirically successful

✓ ... to statistics

local stability and robustness guarantees

http://hal.inria.fr/hal-00737152 [Jenatton, G. & Bach, Local stability and robustness of sparse dictionary learning in the presence of noise, Oct 2012]

35

slide-36
SLIDE 36 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

What’s next ?

  • Immediate challenges

✓ global guarantees ? empirically yes ✓ sharp sample complexity ✓ guarantees from cost functions to algorithms

  • Sparse learning beyond dictionaries

✓ synthesis / analysis flavor (e.g. TV-like) ✓ structured models (shift-invariance, etc. ) ✓ structured sparsity (e.g. trees, graphs)

  • More examples = less work to learn ?

36

slide-37
SLIDE 37 April 25th 2013
  • R. GRIBONVAL - Séminaire LIF

37

TH###NKS #