learning low dimensional models with penalized matrix
play

Learning low-dimensional models with penalized matrix factorization - PowerPoint PPT Presentation

Learning low-dimensional models with penalized matrix factorization Rmi Gribonval - PANAMA Project-team INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview Context: inverse problems, sparsity &


  1. Learning = Constrained Minimization ˆ D = arg min D ∈ D F X ( D ) 1 2 k X � D Z k 2 / min F + Φ ( Z ) Z • Without constraint set : degenerate solution D D → ∞ , Z → 0 • Typical constraint = unit-norm columns D = { D = [ d 1 , . . . , d K ] , 8 k k d k k 2 = 1 } R. GRIBONVAL - 2015 January 2015 26

  2. A versatile matrix factorization framework • Sparse coding (typically d < K) D penalty: L1 norm ✦ constraint: unit-norm dictionary ✦ • K-means clustering penalty: indicator function of canonical basis vectors ✦ constraint: none ✦ • NMF (non-negative matrix factorization) (d > K) penalty: indicator function of non-negative coefficients ✦ constraint: unit-norm non-negative dictionary ✦ • PCA (typically d > K) D penalty: none ✦ constraint: dictionary with orthonormal columns ✦ R. GRIBONVAL - 2015 January 2015 27

  3. Algorithms for penalized matrix factorization R. GRIBONVAL - 2015 January 2015-

  4. Principle: Alternate Optimization • Global objective 2 k X � D Z k 2 1 min F + Φ ( Z ) D ,Z • Alternate two steps ✓ Update coefficients given current dictionary D 2 k x i � D z i k 2 1 min F + φ ( z i ) z i ✓ Update dictionary given current coefficients Z 2 k X � D Z k 2 1 min F D R. GRIBONVAL - 2015 January 2015 29

  5. Coefficient Update = Sparse Coding • Objective 2 k x i � D z i k 2 1 min F + φ ( z i ) z i • Two strategies ✓ Batch: for all training samples i at each iteration ✓ Online : for one (randomly selected) training sample i • Implementation: sparse coding algorithm ✓ L1 minimization , (Orthonormal) Matching Pursuit, ... R. GRIBONVAL - 2015 January 2015 30

  6. Dictionary Update • Objective 2 k X � D Z k 2 1 min F D • Main approaches ✓ Method of Optimal Directions (MOD) [Engan et al., 1999] ˆ D k X � D Z k 2 D = X · pinv( Z ) = arg min F ✓ K-SVD: with PCA [Aharon et al. 2006] coefficients are jointly updated ✦ ✓ Online L1: stoch. gradient [Engan & al 2007, Mairal et al., 2010] R. GRIBONVAL - 2015 January 2015 31

  7. ... but also • Related «learning» matrix factorizations ✓ Non-negativity (NMF): Multiplicative update [Lee & Seung 1999] ✦ ✓ Known rows up to gains ( blind calibration ) D = diag( g ) D 0 Convex formulation [G & al 2012, Bilen & al 2013] ✦ ✓ Know-rows up to permutation (cable chaos) D = Π D 0 Branch & bound [Emiya & al, 2014] ✦ • (Approximate) Message Passing [e.g. Krzakala & al, 2013] R. GRIBONVAL - 2015 January 2015 32

  8. Analytic vs Learned Dictionaries Learning Fast Transforms Ph.D. of Luc Le Magoarou R. GRIBONVAL - 2015 January 2015-

  9. Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Learned Yes High R. GRIBONVAL - 2015 January 2015 34

  10. Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Learned Yes High R. GRIBONVAL - 2015 January 2015 34

  11. Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Best of both worlds ? Learned Yes High R. GRIBONVAL - 2015 January 2015 34

  12. Sparse-KSVD • Principle: constrained dictionary learning ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35

  13. Sparse-KSVD • Principle: constrained dictionary learning ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem two unknown factors X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35

  14. Sparse-KSVD • Principle: constrained dictionary learning strong prior! ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem two unknown factors X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35

  15. Speed = Factorizable Structure • Fourier : FFT with butterfly algorithm Y • Wavelets : FWT tree of filter banks • Hadamard : Fast Hadamard Transform R. GRIBONVAL - 2015 January 2015 36

  16. Learning Fast Transforms = Chasing Butterflies M • Class of dictionaries of the form Y D D = S j j =1 ✓ covers standard fast transforms ✓ more flexible, better adaptation to training data ✓ benefits: Speed : inverse problems and more ✦ Storage : compression ✦ Statistical significance / sample complexity: denoising ✦ • Learning : ✓ Nonconvex optimization algorithm : PALM guaranteed convergence to stationary point ✦ ✓ Hierarchical strategy R. GRIBONVAL - 2015 January 2015 37

  17. Example 1: Reverse-Engineering the Fast Hadamard Transform • Hadamard Dictionary: Reference Factorization n 2 2 n log 2 n • Learned Factorization: different, but as fast O n 2 2 n log 2 n tested up to n=1024 R. GRIBONVAL - 2015 January 2015 38

  18. Example 2: Image Denoising with Learned Fast Transform • Patch-based dictionary learning ( n = 8x8 pixels) • Comparison using box small-project.eu R. GRIBONVAL - 2015 January 2015 39

  19. Example 2: Image Denoising with Learned Fast Transform • Patch-based dictionary learning ( n = 8x8 pixels) • Comparison using box small-project.eu • Learned dictionaries O ( n 2 ) O ( n log 2 n ) R. GRIBONVAL - 2015 January 2015 40

  20. Comparison with Sparse KSVD (KSVDS) R. GRIBONVAL - 2015 January 2015 41

  21. Comparison with Sparse KSVD (KSVDS) D = D 0 S very close to = DCT D 0 R. GRIBONVAL - 2015 January 2015 42

  22. Statistical guarantees R. GRIBONVAL - 2015 January 2015-

  23. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) R. GRIBONVAL - 2015 January 2015 44

  24. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... R. GRIBONVAL - 2015 January 2015 44

  25. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Goal = performance generalization E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» R. GRIBONVAL - 2015 January 2015 45

  26. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Goal = performance generalization E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher & al., ✦ 2010; Mehta and Gray, 2012; G. & al 2013] R. GRIBONVAL - 2015 January 2015 45

  27. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ E F X ( ˆ D N � D 0 k F D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher & al., ✦ 2010; Mehta and Gray, 2012; G. & al 2013] R. GRIBONVAL - 2015 January 2015 46

  28. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ E F X ( ˆ D N � D 0 k F D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 46

  29. Theorem: Excess Risk Control • Assume: [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

  30. Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

  31. Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

  32. Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ ✓ Constraint set : (upper box-counting) dimension h D typically: h = dK d = signal dimension, K = number of atoms ✦ [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

  33. Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ ✓ Constraint set : (upper box-counting) dimension h D typically: h = dK d = signal dimension, K = number of atoms ✦ • Then : with probability at least on X 1 − 2 e − x r ( h + x ) log N E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N η N ≤ C N [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

  34. A word about the proof • Classical approach based on three ingredients ✓ Concentration of around its mean F X ( D ) E x f x ( D ) ✓ Lipschitz behaviour of D 7! F X ( D ) ➡ main technical contribution under assumptions on penalty ✓ Union bound using covering numbers • High dimensional scaling d → ∞ ✓ Dimension dependent bound ! d r ! dK log N D O N ✓ With Rademacher’s complexities & Slepian’s Lemma, can recover known dimension independent bounds ✓ E.g., for PCA r ! K 2 O N R. GRIBONVAL - 2015 January 2015 48

  35. Versatility of the Sample Complexity Results • General penalty functions l1 norm / mixed norms / lp quasi-norms ✦ ... but also non-coercive penalties (with additional RIP on constraint set): ✦ • s-sparse constraint, non-negativity • General constraint sets unit norm / sparse / shift-invariant / tensor product / tight frame ... ✦ «complexity» captured by box-counting dimension ✦ • «Distribution free» bounded samples P ( k x k 2  1) = 1 ✦ ... but also sub-Gaussian P ( k x k 2 � At )  exp( � t ) , t � 1 ✦ • Selected covered examples: PCA / NMF / K-Means / sparse PCA ✦ R. GRIBONVAL - 2015 January 2015 49

  36. Identifiability analysis ? Empirical findings R. GRIBONVAL - 2015 January 2015-

  37. Numerical Example (2D) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 2 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51

  38. Numerical Example (2D) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 D θ 0 , θ 1 θ 0 2 θ 1 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51

  39. Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51

  40. Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Symmetry = permutation ambiguity R. GRIBONVAL - 2015 January 2015 51

  41. Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Empirical observations a) Global minima match angles of the original basis b) There is no other local minimum. R. GRIBONVAL - 2015 January 2015 51

  42. Sparsity vs Coherence (2D) weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 4 4 3 3 2 2 1 1 0 0 − 1 − 1 − 2 − 2 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | coherent incoherent R. GRIBONVAL - 2015 January 2015 52

  43. Sparsity vs Coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 no spurious local min 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | coherent incoherent R. GRIBONVAL - 2015 January 2015 52

  44. Sparsity vs Coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 no spurious local min 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | Rule of thumb : perfect recovery if: coherent incoherent a) Incoherence µ < 1 − p b) Enough training samples (N large enough) R. GRIBONVAL - 2015 January 2015 52

  45. Empirical Findings • Stable & robust dictionary identification ✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum • Role of parameters ? ✓ sparsity level ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ? R. GRIBONVAL - 2015 January 2015 53

  46. Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function R. GRIBONVAL - 2015 January 2015 54

  47. Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function φ ( z ) = λ k z k 1 R. GRIBONVAL - 2015 January 2015 54

  48. Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function φ ( z ) = λ k z k 1 See also: [Spielman&al 2012, Agarwal & al 2013/2014, Arora & al 2013/2014, Schnass 2013, Schnass 2014] R. GRIBONVAL - 2015 January 2015 54

  49. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55

  50. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? ✓ • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55

  51. Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? ✓ • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55

  52. «Ground Truth» = Sparse Signal Model X x = z i d i + ε = D J z J + ε i ∈ J • Random support J ⊂ [1 , K ] , � J = s • Bounded coefficient vector + bounded from below P ( k z J k 2 > M z ) = 0 P (min j ∈ J | z i | < z ) = 0 • Bounded & white noise P ( k ε k 2 > M ✏ ) = 0 ✓ (+ second moment assumptions) R. GRIBONVAL - 2015 January 2015 56

  53. «Ground Truth» = Sparse Signal Model X x = z i d i + ε = D J z J + ε i ∈ J • Random support J ⊂ [1 , K ] , � J = s • Bounded coefficient vector + bounded from below P ( k z J k 2 > M z ) = 0 P (min j ∈ J | z i | < z ) = 0 • Bounded & white noise P ( k ε k 2 > M ✏ ) = 0 ✓ (+ second moment assumptions) NB : z not required to have i.i.d. entries R. GRIBONVAL - 2015 January 2015 56

  54. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] R. GRIBONVAL - 2015 January 2015 57

  55. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ R. GRIBONVAL - 2015 January 2015 57

  56. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 R. GRIBONVAL - 2015 January 2015 57

  57. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) R. GRIBONVAL - 2015 January 2015 57

  58. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) • + stability to noise R. GRIBONVAL - 2015 January 2015 57

  59. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) • + stability to noise • + finite sample results R. GRIBONVAL - 2015 January 2015 57

  60. Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) • + stability to noise • + finite sample results • + robustness to outliers R. GRIBONVAL - 2015 January 2015 57

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend