Proximal Identification and Applications J er ome MALICK CNRS, - PowerPoint PPT Presentation

Proximal Identification and Applications J´ erˆ ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble (France) Workshop Optimization for Machine Learning – Luminy – March 2020 talk based on materiel from joint work with G. Peyr´ e J. Fadili G. Garrigos F. Iutzeler D. Grishchenko

Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 2 4 1 6 0 1 0 . 5 0 . 5 2 2 4 6 1 0 . 5 1 2 4 6 0 2 4 10 4 6 6 1 0 − 2 2 0 1 0 30 − 2 − 1 0 1 2 3 4 1

Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 4 2 1 2 4 1 6 0 6 0 . 5 10 1 1 0 2 . 5 0 . 5 4 2 2 2 0 1 0 . 5 . 5 4 2 6 1 0 . 5 1 2 6 1 4 4 2 2 6 0 2 0 10 6 4 10 4 4 4 6 1 0 6 6 1 0 − 2 − 2 10 2 0 30 2 0 1 0 30 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 1

Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 6 4 2 2 1 4 1 6 0 1 0 . 5 1 0 . 5 0 . 5 2 6 4 2 2 5 0 . 1 0 2 4 6 1 1 0 . 5 0 . 5 1 0 2 2 2 4 6 4 10 6 0 2 0 4 10 4 6 20 4 6 10 6 30 1 0 − 2 − 2 20 2 0 1 0 40 30 0 3 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 1

Example of stability 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Stability: the support of optimal solutions is stable under small perturbations Illustration (on an instance with d = 2 ) 4 20 4 6 4 2 2 1 4 1 6 0 1 0 . 5 1 0 . 5 0 . 5 2 6 4 2 2 5 0 . 1 0 2 4 6 1 1 0 . 5 0 . 5 1 0 2 2 2 4 6 4 10 6 0 2 0 4 10 4 6 20 4 6 10 6 30 1 0 − 2 − 2 20 2 0 1 0 40 30 0 3 − 2 − 1 0 1 2 3 4 − 2 − 1 0 1 2 3 4 More generally: [Lewis ’02] sensitivity analysis of partly-smooth functions (remind Clarice’s talk, this morning) 1

Example of identification 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Identification: (proximal-gradient) algorithms produce iterates... ...that eventually have the same support as the optimal solution 9 . 1 2 1 Proximal Gradient 1 . 3 . 4 4 Accelerated Proximal Gradient 10 . 2 1 . 5 2 . 3 1 . 1 4 . 5 5 . 7 6 . 8 8 x ⋆ 1 1 . 1 3 . 4 0 . 5 2 . 3 0 2 . 3 5 . 7 4 . 5 3 . 4 5 − 0 . 5 . 7 3 . 4 6 . 8 4 . 5 8 − 1 − 1 0 1 2 3 4 Runs of two proximal-gradient algos (same instance with d = 2 ) 2

Example of identification 1 2 � A x − y � 2 + λ � x � 1 min (LASSO) x ∈ R d Identification: (proximal-gradient) algorithms produce iterates... ...that eventually have the same support as the optimal solution 9 . 1 2 1 Proximal Gradient 1 . 3 . 4 4 Accelerated Proximal Gradient 10 . 2 1 . 5 2 . 3 1 . 1 4 . 5 5 . 7 6 . 8 8 x ⋆ 1 1 . 1 3 . 4 0 . 5 2 . 3 0 2 . 3 5 . 7 4 . 5 3 . 4 5 − 0 . 5 . 7 3 . 4 6 . 8 4 . 5 8 − 1 − 1 0 1 2 3 4 Runs of two proximal-gradient algos (same instance with d = 2 ) Well-studied, see e.g. [Bertsekas ’76] , [Wright ’96] , [Lewis Drusvyatskiy ’13] ... 2

Outline General stability of regularized problems 1 Enlarged identification of proximal algorithms 2 Application: communication-efficient federated learning 3 Application: model consistency for regularized least-squares 4

General stability of regularized problems Stability or sensitivity analysis Parameterized composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d Typically nonsmooth R traps solutions in low-dimensional manifolds x ⋆ ( p ) ∈ M Stability: Optimal solutions lie on a manifold: for p ∼ p 0 Studied in e.g. [Hare Lewis ’10] [Vaiter et al ’15] [Liang et al ’16] ... � x ⋆ ( p ) � � x ⋆ ( p 0 ) � Example 1: R = � · � 1 , supp = supp 3

General stability of regularized problems Stability or sensitivity analysis Parameterized composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d Typically nonsmooth R traps solutions in low-dimensional manifolds x ⋆ ( p ) ∈ M Stability: Optimal solutions lie on a manifold: for p ∼ p 0 Studied in e.g. [Hare Lewis ’10] [Vaiter et al ’15] [Liang et al ’16] ... � x ⋆ ( p ) � � x ⋆ ( p 0 ) � Example 1: R = � · � 1 , supp = supp p p 0 Example 2: R = ι B ∞ (indicator function) projection onto the ℓ ∞ ball Many examples in machine learning... 3

General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) x M x M x = { z : supp ( z )= supp ( x ) } 4

General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) R ( X ) = � i | σ i ( X ) | = � σ ( X ) � 1 nuclear norm (aka trace-norm) x M x x M x M x = { z : supp ( z )= supp ( x ) } M x = { z : rank ( z )= rank ( x ) } 4

General stability of regularized problems Structure of nonsmooth regularizers Many of the regularizers used in machine learning or image processing have a strong primal-dual structure (“mirror-stratifiable” [Fadili, M., Peyr´ e ’18] ) ...that can be exploit to get (enlarged) stability/identification results Examples: (associated unit ball and low-dimensional manifold where x belongs) R = � · � 1 ( and � · � ∞ or other polyedral gauges) R ( X ) = � i | σ i ( X ) | = � σ ( X ) � 1 nuclear norm (aka trace-norm) R ( x ) = � b ∈B � x b � 2 group- ℓ 1 ( e.g. R ( x ) = � x 1 , 2 � + | x 3 | ) M x x x M x x M x M x = { z : supp ( z )= supp ( x ) } M x = { z : rank ( z )= rank ( x ) } M x = { 0 } × { 0 } × R 4

General stability of regularized problems Recall on stratifications A stratification of a set D ⊂ R d is a (finite) partition M = { M i } i ∈ I � D = M i i ∈ I with so-called “strata” (e.g. smooth/affine manifolds) which fit nicely: M ∩ cl ( M ′ ) � = ∅ M ⊂ cl ( M ′ ) = ⇒ Example: B ∞ the unit ℓ ∞ -ball in R 2 M 2 M 1 a stratification with 9 (affine) strata M 3 M 4 Other examples: “tame” sets, remind Edouard’s talk 5

General stability of regularized problems Recall on stratifications A stratification of a set D ⊂ R d is a (finite) partition M = { M i } i ∈ I � D = M i i ∈ I with so-called “strata” (e.g. smooth/affine manifolds) which fit nicely: M ∩ cl ( M ′ ) � = ∅ M ⊂ cl ( M ′ ) = ⇒ This relation induces a (partial) ordering M � M ′ Example: B ∞ the unit ℓ ∞ -ball in R 2 M 2 M 1 a stratification with 9 (affine) strata M 3 M 4 M 1 � M 2 � M 4 M 1 � M 3 � M 4 Other examples: “tame” sets, remind Edouard’s talk 5

General stability of regularized problems Mirror-stratifiable regularizations (primal) stratification M = { M i } i ∈ I and (dual) stratification M ∗ = { M ∗ i } i ∈ I in one-to-one decreasing correspondence � through the transfert operator J R ( S ) = ri ( ∂ R ( x )) x ∈ S R ∗ = � · � 1 Simple example: R = ι B ∞ J R M ∗ 2 M 2 M 1 M ∗ 1 M 3 M ∗ M 4 3 M ∗ 4 J R ∗ � � ri ∂ R ( x ) = ri N B ∞ ( x ) = M ∗ ri ∂ R ∗ ( x ) = J R ∗ ( M ∗ J R ( M i ) = M i = ri ∂ � x � 1 = i ) i x ∈ Mi x ∈ M ∗ i 6

General stability of regularized problems Enlarged stability result Theorem ( Fadili, M., Peyr´ e ’18 ) For the composite optimization problem (smooth + nonsmooth) min F ( x , p ) + R ( x ) , x ∈ R d satisfying mild assumptions (unique minimizer x ⋆ ( p 0 ) at p 0 and objective uniformly level-bounded in x ) , if R is mirror-stratifiable, then for p ∼ p 0 , M x ⋆ ( p 0 ) � M x ⋆ ( p ) � J R ∗ ( M ∗ u ⋆ ( p 0 ) ) If R = � · � 1 , then supp ( x ⋆ ( p 0 )) ⊆ supp ( x ⋆ ( p )) ⊆ { i : | u ⋆ ( p 0 ) i | = 1 } 7

Proximal Identification and Applications J er ome MALICK CNRS, - PowerPoint PPT Presentation

Proximal Identification and Applications J er ome MALICK CNRS, Lab. J. Kuntzmann, Grenoble (France) Workshop Optimization for Machine Learning Luminy March 2020 talk based on materiel from joint work with G. Peyr e J. Fadili

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Nonnegative Tensor Factorization using a proximal algorithm: application to 3D fluorescence

Zone of Proximal Development Vygotsky called the range of developmentally appropriate expecta7ons

PT Considerations for the Nonoperatively Treated Proximal Humerus Fractures John Cavanaugh PT

Risk Factors In Proximal Humerus Fractures: Males Vs. Females Jerjes, W / Callear, J /

Proximal point algorithm in Hadamard spaces Miroslav Bacak T el ecom ParisTech

Stochastic Perturbations of Proximal-Gradient methods for nonsmooth convex optimization: the

Deep Unfolded Proximal Interior Point Algorithm for Image Restoration C. Bertocchi 1 , E.

Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov

Deep Unfolding of a Proximal Interior Point Method for Image Restoration M.-C. Corbineau 1 in

Efficient Meta Learning via Minibatch Proximal Update Pan Zhou Joint work with Xiao-Tong Yuan,

RISK IDENTIFICATION Everything your competitor knows about Risk Identification on Software

The Proximal Workspace Architecture: A Latency- focused Approach to Supporting Context-Aware

Efficient Bayesian computation by proximal Markov chain Monte Carlo: when Langevin meets Moreau

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization David Stutz June 2, 2016 David

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua

Complexity of a quadratic penalty accelerated inexact proximal point method W. Kong 1 J.G. Melo 2

Using the DMM to unde r stand and r e spond to De ve lopme ntal T r auma in Child Pr ote c

Quantifying Dataflow Analysis with Gradients in LLVM Gabriel Ryan 1 , Abhishek Shah 1 , Dongdong

On Corson and Valdivia compact spaces* Reynaldo Rojas Hern andez Centro de Ciencias Matem