Learning low-dimensional models with penalized matrix factorization - PowerPoint PPT Presentation

Learning = Constrained Minimization ˆ D = arg min D ∈ D F X ( D ) 1 2 k X � D Z k 2 / min F + Φ ( Z ) Z • Without constraint set : degenerate solution D D → ∞ , Z → 0 • Typical constraint = unit-norm columns D = { D = [ d 1 , . . . , d K ] , 8 k k d k k 2 = 1 } R. GRIBONVAL - 2015 January 2015 26

A versatile matrix factorization framework • Sparse coding (typically d < K) D penalty: L1 norm ✦ constraint: unit-norm dictionary ✦ • K-means clustering penalty: indicator function of canonical basis vectors ✦ constraint: none ✦ • NMF (non-negative matrix factorization) (d > K) penalty: indicator function of non-negative coefficients ✦ constraint: unit-norm non-negative dictionary ✦ • PCA (typically d > K) D penalty: none ✦ constraint: dictionary with orthonormal columns ✦ R. GRIBONVAL - 2015 January 2015 27

Algorithms for penalized matrix factorization R. GRIBONVAL - 2015 January 2015-

Principle: Alternate Optimization • Global objective 2 k X � D Z k 2 1 min F + Φ ( Z ) D ,Z • Alternate two steps ✓ Update coefficients given current dictionary D 2 k x i � D z i k 2 1 min F + φ ( z i ) z i ✓ Update dictionary given current coefficients Z 2 k X � D Z k 2 1 min F D R. GRIBONVAL - 2015 January 2015 29

Coefficient Update = Sparse Coding • Objective 2 k x i � D z i k 2 1 min F + φ ( z i ) z i • Two strategies ✓ Batch: for all training samples i at each iteration ✓ Online : for one (randomly selected) training sample i • Implementation: sparse coding algorithm ✓ L1 minimization , (Orthonormal) Matching Pursuit, ... R. GRIBONVAL - 2015 January 2015 30

Dictionary Update • Objective 2 k X � D Z k 2 1 min F D • Main approaches ✓ Method of Optimal Directions (MOD) [Engan et al., 1999] ˆ D k X � D Z k 2 D = X · pinv( Z ) = arg min F ✓ K-SVD: with PCA [Aharon et al. 2006] coefficients are jointly updated ✦ ✓ Online L1: stoch. gradient [Engan & al 2007, Mairal et al., 2010] R. GRIBONVAL - 2015 January 2015 31

... but also • Related «learning» matrix factorizations ✓ Non-negativity (NMF): Multiplicative update [Lee & Seung 1999] ✦ ✓ Known rows up to gains ( blind calibration ) D = diag( g ) D 0 Convex formulation [G & al 2012, Bilen & al 2013] ✦ ✓ Know-rows up to permutation (cable chaos) D = Π D 0 Branch & bound [Emiya & al, 2014] ✦ • (Approximate) Message Passing [e.g. Krzakala & al, 2013] R. GRIBONVAL - 2015 January 2015 32

Analytic vs Learned Dictionaries Learning Fast Transforms Ph.D. of Luc Le Magoarou R. GRIBONVAL - 2015 January 2015-

Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Learned Yes High R. GRIBONVAL - 2015 January 2015 34

Analytic vs Learned Dictionaries Adaptation to Computational Dictionary Training Data Complexity Analytic No Low (Fourier, wavelets, ...) Best of both worlds ? Learned Yes High R. GRIBONVAL - 2015 January 2015 34

Sparse-KSVD • Principle: constrained dictionary learning ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35

Sparse-KSVD • Principle: constrained dictionary learning ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem two unknown factors X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35

Sparse-KSVD • Principle: constrained dictionary learning strong prior! ✓ choose reference (fast) dictionary D 0 ✓ learn with the constraint: where is sparse D = D 0 S S • Resulting double-sparse factorization problem two unknown factors X ≈ D 0 S Z • [R. Rubinstein, M. Zibulevsky & M. Elad, “ Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation ,” IEEE TSP, vol. 58, no. 3, pp. 1553–1564. R. GRIBONVAL - 2015 January 2015 35

Speed = Factorizable Structure • Fourier : FFT with butterfly algorithm Y • Wavelets : FWT tree of filter banks • Hadamard : Fast Hadamard Transform R. GRIBONVAL - 2015 January 2015 36

Learning Fast Transforms = Chasing Butterflies M • Class of dictionaries of the form Y D D = S j j =1 ✓ covers standard fast transforms ✓ more flexible, better adaptation to training data ✓ benefits: Speed : inverse problems and more ✦ Storage : compression ✦ Statistical significance / sample complexity: denoising ✦ • Learning : ✓ Nonconvex optimization algorithm : PALM guaranteed convergence to stationary point ✦ ✓ Hierarchical strategy R. GRIBONVAL - 2015 January 2015 37

Example 1: Reverse-Engineering the Fast Hadamard Transform • Hadamard Dictionary: Reference Factorization n 2 2 n log 2 n • Learned Factorization: different, but as fast O n 2 2 n log 2 n tested up to n=1024 R. GRIBONVAL - 2015 January 2015 38

Example 2: Image Denoising with Learned Fast Transform • Patch-based dictionary learning ( n = 8x8 pixels) • Comparison using box small-project.eu R. GRIBONVAL - 2015 January 2015 39

Example 2: Image Denoising with Learned Fast Transform • Patch-based dictionary learning ( n = 8x8 pixels) • Comparison using box small-project.eu • Learned dictionaries O ( n 2 ) O ( n log 2 n ) R. GRIBONVAL - 2015 January 2015 40

Comparison with Sparse KSVD (KSVDS) R. GRIBONVAL - 2015 January 2015 41

Comparison with Sparse KSVD (KSVDS) D = D 0 S very close to = DCT D 0 R. GRIBONVAL - 2015 January 2015 42

Statistical guarantees R. GRIBONVAL - 2015 January 2015-

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) R. GRIBONVAL - 2015 January 2015 44

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... R. GRIBONVAL - 2015 January 2015 44

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Goal = performance generalization E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» R. GRIBONVAL - 2015 January 2015 45

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Goal = performance generalization E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher & al., ✦ 2010; Mehta and Gray, 2012; G. & al 2013] R. GRIBONVAL - 2015 January 2015 45

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ E F X ( ˆ D N � D 0 k F D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis (~Machine Learning) [Maurer and Pontil, 2010; Vainsencher & al., ✦ 2010; Mehta and Gray, 2012; G. & al 2013] R. GRIBONVAL - 2015 January 2015 46

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ E F X ( ˆ D N � D 0 k F D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 46

Theorem: Excess Risk Control • Assume: [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ ✓ Constraint set : (upper box-counting) dimension h D typically: h = dK d = signal dimension, K = number of atoms ✦ [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

Theorem: Excess Risk Control • Assume: ✓ X obtained from N draws, i.i.d., bounded P ( k x k 2  1) = 1 ✓ Penalty function φ ( z ) non-negative and minimum at zero ✦ lower semi-continuous ✦ coercive ✦ ✓ Constraint set : (upper box-counting) dimension h D typically: h = dK d = signal dimension, K = number of atoms ✦ • Then : with probability at least on X 1 − 2 e − x r ( h + x ) log N E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N η N ≤ C N [G. & al, Sample Complexity of Dictionary Learning and Other Matrix Factorizations, 2013, arXiv/HAL] R. GRIBONVAL - 2015 January 2015 47

A word about the proof • Classical approach based on three ingredients ✓ Concentration of around its mean F X ( D ) E x f x ( D ) ✓ Lipschitz behaviour of D 7! F X ( D ) ➡ main technical contribution under assumptions on penalty ✓ Union bound using covering numbers • High dimensional scaling d → ∞ ✓ Dimension dependent bound ! d r ! dK log N D O N ✓ With Rademacher’s complexities & Slepian’s Lemma, can recover known dimension independent bounds ✓ E.g., for PCA r ! K 2 O N R. GRIBONVAL - 2015 January 2015 48

Versatility of the Sample Complexity Results • General penalty functions l1 norm / mixed norms / lp quasi-norms ✦ ... but also non-coercive penalties (with additional RIP on constraint set): ✦ • s-sparse constraint, non-negativity • General constraint sets unit norm / sparse / shift-invariant / tensor product / tight frame ... ✦ «complexity» captured by box-counting dimension ✦ • «Distribution free» bounded samples P ( k x k 2  1) = 1 ✦ ... but also sub-Gaussian P ( k x k 2 � At )  exp( � t ) , t � 1 ✦ • Selected covered examples: PCA / NMF / K-Means / sparse PCA ✦ R. GRIBONVAL - 2015 January 2015 49

Identifiability analysis ? Empirical findings R. GRIBONVAL - 2015 January 2015-

Numerical Example (2D) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 2 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51

Numerical Example (2D) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 D θ 0 , θ 1 θ 0 2 θ 1 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51

Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 R. GRIBONVAL - 2015 January 2015 51

Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Symmetry = permutation ambiguity R. GRIBONVAL - 2015 January 2015 51

Numerical Example (2D) F X ( D ) X = D 0 Z 0 N = 1000 Bernoulli − Gaussian training samples 3 θ 0 , θ 1 X k 1 D θ 0 , θ 1 θ 0 2 θ 1 1 k D − 1 0 − 1 − 2 − 3 − 4 − 3 − 2 − 1 0 1 2 3 Empirical observations a) Global minima match angles of the original basis b) There is no other local minimum. R. GRIBONVAL - 2015 January 2015 51

Sparsity vs Coherence (2D) weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 4 4 3 3 2 2 1 1 0 0 − 1 − 1 − 2 − 2 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | coherent incoherent R. GRIBONVAL - 2015 January 2015 52

Sparsity vs Coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 no spurious local min 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | coherent incoherent R. GRIBONVAL - 2015 January 2015 52

Sparsity vs Coherence (2D) Empirical probability of success ground truth=local min ground truth=global min weakly sparse 1 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples 1 4 4 3 3 0.9 2 2 0.8 1 1 0.7 0 0 − 1 − 1 0.6. − 2 − 2 0.5 − 3 − 3 − 4 − 4 − 4 − 3 − 2 − 1 0 1 2 3 − 3 − 2 − 1 0 1 2 3 N = 1000 Bernoulli − Gaussian training samples N = 1000 Bernoulli − Gaussian training samples p 3 3 no spurious local min 2.5 2 2 1 1.5 1 0 0.5 − 1 0 − 2 − 0.5 − 1 − 3 − 1.5 − 4 sparse − 2 − 3 − 2 − 1 0 1 2 3 − 2.5 − 2 − 1.5 − 1 − 0.5 0 0.5 1 1.5 2 0 1 µ = | cos( θ 1 − θ 0 ) | Rule of thumb : perfect recovery if: coherent incoherent a) Incoherence µ < 1 − p b) Enough training samples (N large enough) R. GRIBONVAL - 2015 January 2015 52

Empirical Findings • Stable & robust dictionary identification ✓ Global minima often match ground truth ✓ Often, there is no spurious local minimum • Role of parameters ? ✓ sparsity level ? ✓ incoherence of D ? ✓ noise level ? ✓ presence / nature of outliers ? ✓ sample complexity (number of training samples) ? R. GRIBONVAL - 2015 January 2015 53

Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function R. GRIBONVAL - 2015 January 2015 54

Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function φ ( z ) = λ k z k 1 R. GRIBONVAL - 2015 January 2015 54

Identifiability Analysis: Overview [G. & Schnass 2010] [Geng & al 2011] [Jenatton, Bach & G.] − signal model − − − − − − − − overcomplete (d<K) no yes yes outliers yes no yes noise no no yes min D ,Z k Z k 1 s.t. D Z = X min F X ( D ) cost function φ ( z ) = λ k z k 1 See also: [Spielman&al 2012, Agarwal & al 2013/2014, Arora & al 2013/2014, Schnass 2013, Schnass 2014] R. GRIBONVAL - 2015 January 2015 54

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55

Theoretical Guarantees ? • Given N training samples in X : ˆ D N ∈ arg min D F X ( D ) Compression, denoising, calibration, ✓ Source localization, neural coding ... inverse problems ... ✓ No «ground truth dictionary» ✓ Ground truth x = D 0 z + ε ✓ Goal = performance generalization ✓ Goal = dictionary estimation k ˆ D N � D 0 k F E F X ( ˆ D N ) ≤ min D E F X ( D ) + η N • «How many training samples ?» • What recovery conditions ? ✓ • Excess risk analysis • Identifiability analysis (~Machine Learning) (~Signal Processing) [Maurer and Pontil, 2010; Vainsencher & al., ✦ [Independent Component Analysis, e.g. book ✦ 2010; Mehta and Gray, 2012; G. & al 2013] Comon & Jutten 2011] R. GRIBONVAL - 2015 January 2015 55

«Ground Truth» = Sparse Signal Model X x = z i d i + ε = D J z J + ε i ∈ J • Random support J ⊂ [1 , K ] , � J = s • Bounded coefficient vector + bounded from below P ( k z J k 2 > M z ) = 0 P (min j ∈ J | z i | < z ) = 0 • Bounded & white noise P ( k ε k 2 > M ✏ ) = 0 ✓ (+ second moment assumptions) R. GRIBONVAL - 2015 January 2015 56

«Ground Truth» = Sparse Signal Model X x = z i d i + ε = D J z J + ε i ∈ J • Random support J ⊂ [1 , K ] , � J = s • Bounded coefficient vector + bounded from below P ( k z J k 2 > M z ) = 0 P (min j ∈ J | z i | < z ) = 0 • Bounded & white noise P ( k ε k 2 > M ✏ ) = 0 ✓ (+ second moment assumptions) NB : z not required to have i.i.d. entries R. GRIBONVAL - 2015 January 2015 56

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] R. GRIBONVAL - 2015 January 2015 57

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ R. GRIBONVAL - 2015 January 2015 57

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 R. GRIBONVAL - 2015 January 2015 57

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) R. GRIBONVAL - 2015 January 2015 57

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) • + stability to noise R. GRIBONVAL - 2015 January 2015 57

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) • + stability to noise • + finite sample results R. GRIBONVAL - 2015 January 2015 57

Theorem: Robust Local Identifiability • Assume [Jenatton, Bach & G. 2012] µ ( D 0 ) = max i 6 = j | h d i , d j i | 2 [0 , 1] dictionary with small coherence ✦ 1 s . s -sparse coefficient model (no outlier, no noise) ✦ µ ( D 0 ) k | D 0 k | 2 1 • Then : consider 2 k X � D Z k 2 F X ( D ) = min F + λ k Z k 1 , 1 Z ✓ for any small enough , with high probability on , λ X ˆ there is a local minimum of such that F X ( D ) D k ˆ D � D 0 k F  O ( λ sµ k | D 0 k | 2 ) • + stability to noise • + finite sample results • + robustness to outliers R. GRIBONVAL - 2015 January 2015 57

Learning low-dimensional models with penalized matrix factorization - PowerPoint PPT Presentation

Learning low-dimensional models with penalized matrix factorization Rmi Gribonval - PANAMA Project-team INRIA Rennes - Bretagne Atlantique, France remi.gribonval@inria.fr Overview Context: inverse problems, sparsity &

Regularity results for a penalized boundary obstacle problem Donatella Danielli Purdue

The Many Flavors of Penalized Linear Discriminant Analysis Daniela M. Witten Assistant Professor

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Building an IoT Platform with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

Introductory Matrix Operations Matrix Entries Defn. For matrix A , notation a ij means the en-

Gov 2000: 10. Multiple Regression in Matrix Form Matthew Blackwell Fall 2016 1 / 64 1. Matrix

Liberating Communication with Matrix matthew@matrix.org http://www.matrix.org What is Matrix?

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Notes on Penalized Estimation and GAMs Introduction Generalized additive models (GAMs) extend

Contrasted Penalized Integrative Analysis Shuangge Ma School of Public Health, Yale University

Matrix COSEC Right People in Right Place at Right Time Matrix COmplete SECurity Matrix COSEC

Exploiting Matrix Reuse and Data Locality in Sparse Matrix-Vector and Matrix-Transpose-Vector

Complexity of matrix multiplication (For Hierarchical matrix) For Usual matrix The

CS 140 : Matrix multiplication Warmup: Matrix times vector: communication volume Matrix

What is it? Whats changed lately? Whats next? @benpa:matrix.org benp@matrix.org

Rise of the Machines Continuous Delivery at SEEK CO PRESENTED BY: CD @ SEEK To the cloud

A Linear Programming Approach to Max-sum Problem: A Review Tom a s Werner CZECH TECHNICAL

speaking, an extent measure of P either computes certain statistics of P itself or of a (possibly

Morphology Philipp Koehn 26 March 2015 Philipp Koehn Machine Translation: Morphology 26 March

Narrow ( c c ) above Flavor Threshold B c excited states (Stable) Doubly Heavy Tetraquarks

Precision Beauty at High Sensitivity Chris Quigg Fermilab Beauty 2019 Ljubljana

ALICE measurements of heavy-flavour production as a function of multiplicity in pp and p-Pb

Combination of D* measurements in Deep Inelastic Scattering at HERA Achim Geiser, DESY Hamburg