proximal identification and applications
play

Proximal Identification and Applications J er ome MALICK CNRS, - PowerPoint PPT Presentation

Proximal Identification and Applications J er ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble (France) Workshop Optimization for Machine Learning Luminy March 2020 talk based on materiel from joint work with G. Peyr e J. Fadili


  1. Proximal Identification and Applications J´ erˆ ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble (France) Workshop Optimization for Machine Learning – Luminy – March 2020 talk based on materiel from joint work with G. Peyr´ e J. Fadili G. Garrigos F. Iutzeler D. Grishchenko

  2. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 2 4 1 6 0 1 0 . 5 0 . 5 2 2 4 6 1 0 . 5 1 2 4 6 0 2 4 10 4 6 6 1 0 − 2 2 0 1 0 30 − 2 − 1 0 1 2 3 4 1

  3. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 4 2 1 2 4 1 6 0 6 0 . 5 10 1 1 0 2 . 5 0 . 5 4 2 2 2 0 1 0 . 5 . 5 4 2 6 1 0 . 5 1 2 6 1 4 4 2 2 6 0 2 0 10 6 4 10 4 4 4 6 1 0 6 6 1 0 − 2 − 2 10 2 0 30 2 0 1 0 30 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 1

  4. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 6 4 2 2 1 4 1 6 0 1 0 . 5 1 0 . 5 0 . 5 2 6 4 2 2 5 0 . 1 0 2 4 6 1 1 0 . 5 0 . 5 1 0 2 2 2 4 6 4 10 6 0 2 0 4 10 4 6 20 4 6 10 6 30 1 0 − 2 − 2 20 2 0 1 0 40 30 0 3 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 1

  5. Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 6 4 2 2 1 4 1 6 0 1 0 . 5 1 0 . 5 0 . 5 2 6 4 2 2 5 0 . 1 0 2 4 6 1 1 0 . 5 0 . 5 1 0 2 2 2 4 6 4 10 6 0 2 0 4 10 4 6 20 4 6 10 6 30 1 0 − 2 − 2 20 2 0 1 0 40 30 0 3 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 More generally: [Lewis ’02] sensitivity analysis of partly-smooth functions (remind Clarice’s talk, this morning) 1

  6. Example of identification 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Identification: (proximal-gradient) algorithms produce iterates... ...that eventually have the same support as the optimal solution 9 . 1 2 1 Proximal Gradient 1 . 3 . 4 4 Accelerated Proximal Gradient 10 . 2 1 . 5 2 . 3 1 . 1 4 . 5 5 . 7 6 . 8 8 x ⋆ 1 1 . 1 3 . 4 0 . 5 2 . 3 0 2 . 3 5 . 7 4 . 5 3 . 4 5 − 0 . 5 . 7 3 . 4 6 . 8 4 . 5 8 − 1 − 1 0 1 2 3 4 Runs of two proximal-gradient algos (same instance with d = 2 ) 2

  7. Example of identification 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Identification: (proximal-gradient) algorithms produce iterates... ...that eventually have the same support as the optimal solution 9 . 1 2 1 Proximal Gradient 1 . 3 . 4 4 Accelerated Proximal Gradient 10 . 2 1 . 5 2 . 3 1 . 1 4 . 5 5 . 7 6 . 8 8 x ⋆ 1 1 . 1 3 . 4 0 . 5 2 . 3 0 2 . 3 5 . 7 4 . 5 3 . 4 5 − 0 . 5 . 7 3 . 4 6 . 8 4 . 5 8 − 1 − 1 0 1 2 3 4 Runs of two proximal-gradient algos (same instance with d = 2 ) Well-studied, see e.g. [Bertsekas ’76] , [Wright ’96] , [Lewis Drusvyatskiy ’13] ... 2

  8. Outline General stability of regularized problems 1 Enlarged identification of proximal algorithms 2 Application: communication-efficient federated learning 3 Application: model consistency for regularized least-squares 4

  9. Outline General stability of regularized problems 1 Enlarged identification of proximal algorithms 2 Application: communication-efficient federated learning 3 Application: model consistency for regularized least-squares 4

  10. General stability of regularized problems Stability or sensitivity analysis Parameterized composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d Typically nonsmooth R traps solutions in low-dimensional manifolds x ⋆ ( p ) ∈ M Stability: Optimal solutions lie on a manifold: for p ∼ p 0 Studied in e.g. [Hare Lewis ’10] [Vaiter et al ’15] [Liang et al ’16] ... � x ⋆ ( p ) � � x ⋆ ( p 0 ) � Example 1: R = � · � 1 , supp = supp 3

  11. General stability of regularized problems Stability or sensitivity analysis Parameterized composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d Typically nonsmooth R traps solutions in low-dimensional manifolds x ⋆ ( p ) ∈ M Stability: Optimal solutions lie on a manifold: for p ∼ p 0 Studied in e.g. [Hare Lewis ’10] [Vaiter et al ’15] [Liang et al ’16] ... � x ⋆ ( p ) � � x ⋆ ( p 0 ) � Example 1: R = � · � 1 , supp = supp p p 0 Example 2: R = ι B ∞ (indicator function) projection onto the ℓ ∞ ball Many examples in machine learning... 3

  12. General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) x M x M x = { z : supp ( z )= supp ( x ) } 4

  13. General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) R ( X ) = � i | σ i ( X ) | = � σ ( X ) � 1 nuclear norm (aka trace-norm) x M x x M x M x = { z : supp ( z )= supp ( x ) } M x = { z : rank ( z )= rank ( x ) } 4

  14. General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) R ( X ) = � i | σ i ( X ) | = � σ ( X ) � 1 nuclear norm (aka trace-norm) R ( x ) = � b ∈B � x b � 2 group- ℓ 1 ( e.g. R ( x ) = � x 1 , 2 � + | x 3 | ) M x x x M x x M x M x = { z : supp ( z )= supp ( x ) } M x = { z : rank ( z )= rank ( x ) } M x = { 0 } × { 0 } × R 4

  15. General stability of regularized problems Recall on stratifications A stratification of a set D ⊂ R d is a (finite) partition M = { M i } i ∈ I � D = M i i ∈ I with so-called “strata” (e.g. smooth/affine manifolds) which fit nicely: M ∩ cl ( M ′ ) � = ∅ M ⊂ cl ( M ′ ) = ⇒ Example: B ∞ the unit ℓ ∞ -ball in R 2 M 2 M 1 a stratification with 9 (affine) strata M 3 M 4 Other examples: “tame” sets, remind Edouard’s talk 5

  16. General stability of regularized problems Recall on stratifications A stratification of a set D ⊂ R d is a (finite) partition M = { M i } i ∈ I � D = M i i ∈ I with so-called “strata” (e.g. smooth/affine manifolds) which fit nicely: M ∩ cl ( M ′ ) � = ∅ M ⊂ cl ( M ′ ) = ⇒ This relation induces a (partial) ordering M � M ′ Example: B ∞ the unit ℓ ∞ -ball in R 2 M 2 M 1 a stratification with 9 (affine) strata M 3 M 4 M 1 � M 2 � M 4 M 1 � M 3 � M 4 Other examples: “tame” sets, remind Edouard’s talk 5

  17. General stability of regularized problems Mirror-stratifiable regularizations (primal) stratification M = { M i } i ∈ I and (dual) stratification M ∗ = { M ∗ i } i ∈ I in one-to-one decreasing correspondence � through the transfert operator J R ( S ) = ri ( ∂ R ( x )) x ∈ S R ∗ = � · � 1 Simple example: R = ι B ∞ J R M ∗ 2 M 2 M 1 M ∗ 1 M 3 M ∗ M 4 3 M ∗ 4 J R ∗ � � ri ∂ R ( x ) = ri N B ∞ ( x ) = M ∗ ri ∂ R ∗ ( x ) = J R ∗ ( M ∗ J R ( M i ) = M i = ri ∂ � x � 1 = i ) i x ∈ Mi x ∈ M ∗ i 6

  18. General stability of regularized problems Enlarged stability result Theorem ( Fadili, M., Peyr´ e ’18 ) For the composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d satisfying mild assumptions (unique minimizer x ⋆ ( p 0 ) at p 0 and objective uniformly level-bounded in x ) , if R is mirror-stratifiable, then for p ∼ p 0 , M x ⋆ ( p 0 ) � M x ⋆ ( p ) � J R ∗ ( M ∗ u ⋆ ( p 0 ) ) If R = � · � 1 , then supp ( x ⋆ ( p 0 )) ⊆ supp ( x ⋆ ( p )) ⊆ { i : | u ⋆ ( p 0 ) i | = 1 } 7

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend