An adaptive backtracking strategy for non-smooth composite - PowerPoint PPT Presentation

An adaptive backtracking strategy for non-smooth composite optimisation problems Luca Calatroni ees (CMAP), ´ Centre de Mathematiqu´ es Appliqu´ Ecole Polytechnique, Palaiseau joint work with: A. Chambolle. CMIPI 2018 Workshop University of Insubria, DISAT July 16-18 2018 Como, IT

Table of contents 1. Introduction 2. GFISTA with backtracking 3. Accelerated convergence rates 4. Imaging applications 5. Conclusions & outlook 1

Introduction

Gradient based methods: a review ( X , � · � ), Hilbert space. Given f : X → R convex, l.s.c., with x ∗ ∈ arg min f , we want to solve: min x ∈X f ( x ) 2

Gradient based methods: a review ( X , � · � ), Hilbert space. Given f : X → R convex, l.s.c., with x ∗ ∈ arg min f , we want to solve: min x ∈X f ( x ) If f is differentiable with L f -Lipschitz gradient, explicit gradient descent reads: Algorithm 1 Gradient descent with fixed step. Input : 0 < τ ≤ 2 / L f , x 0 ∈ X . for k ≥ 0 do x k +1 = x k − τ ∇ f ( x k ) end for Quite restrictive smoothness assumption! 2

Gradient based methods: a review ( X , � · � ), Hilbert space. Given f : X → R convex, l.s.c., with x ∗ ∈ arg min f , we want to solve: min x ∈X f ( x ) No further assumptions on ∇ f : use implicit gradient descent. Algorithm 2 Implicit (proximal) gradient descent with fixed step. Input : τ > 0 , x 0 ∈ X . for k ≥ 0 do x k +1 = prox τ f ( x k )(= x k − τ ∇ f ( x k +1 )) end for Note : the iteration can be rewritten as: x ∈X f ( x ) + � x − x k � 2 x k +1 = x k − τ ∇ f τ ( x k ) , f τ ( x k ) := min with , 2 τ the Moreau-Yosida regularisation of f , which is 1 /τ -Lipschitz ⇒ explicit gradient descent on f τ . Same theory applies! References : Brezis-Lions (’73, ’78), G¨ uler (’91),. . . 2

Convergence rates Theorem: O (1 / k ) rate Let x 0 ∈ X and τ ≤ 2 / L f . Then, the sequence ( x k ) of iterates of gradient descent converges to x ∗ and satisfies: 1 2 τ k � x ∗ − x 0 � 2 . f ( x k ) − f ( x ∗ ) ≤ 3

Convergence rates Theorem: O (1 / k ) rate Let x 0 ∈ X and τ ≤ 2 / L f . Then, the sequence ( x k ) of iterates of gradient descent converges to x ∗ and satisfies: 1 2 τ k � x ∗ − x 0 � 2 . f ( x k ) − f ( x ∗ ) ≤ Assume : f is µ f - strongly convex , µ f > 0: f ( y ) ≥ f ( x ) + �∇ f ( x ) , y − x � + µ f 2 � y − x � 2 , for all x , y ∈ X . Theorem: Linear rate for strongly convex objectives Let f be µ f -strongly convex. Let x 0 ∈ X and τ ≤ 2 / ( L f + µ f ). Then, the sequence ( x k ) of iterates of gradient descent satisfies: 2 τ � x k − x ∗ � 2 ≤ ω k f ( x k ) − f ( x ∗ ) + 1 2 τ � x ∗ − x 0 � 2 , with ω = (1 − µ f / L ) / (1 + µ f L ) < 1. References : Bertsekas, ’15, Nesterov ’04 3

Lower bounds 1 Theorem (Lower bounds) Let x 0 ∈ R n , L f > 0 and k < n . Then, for any first-order method there exists a convex C 1 function f with L f -Lipschitz gradient such that: 1. convex case : L f 8( k + 1) 2 � x ∗ − x 0 � 2 . f ( x k ) − f ( x ∗ ) ≥ 2. strongly convex case : � √ q − 1 � 2 k f ( x k ) − f ( x ∗ ) ≥ µ f � x ∗ − x 0 � 2 , √ q + 1 2 where q = L f /µ f ≥ 1. Remark : If k ≥ n we could use conjugate gradient! However, for imaging n ≫ 1! Usually k < n : can we improve convergence speed? 1 Nesterov, ’04 4

Nesterov acceleration for gradient descent 2 To make it faster build extrapolated sequence ( inertia ). Algorithm 3 Nesterov accelerated gradient descent with fixed step. Input : 0 < τ ≤ 1 / L f , x 0 = x − 1 = y 0 ∈ X , t 0 = 0. for k ≥ 0 do � 1 + 4 t 2 1 + k t k +1 = 2 y k = x k + t k − 1 ( x k − x k − 1 ) t k +1 x k +1 = y k − τ ∇ f ( y k ) end for 2 Nesterov, ’83, ’04, G¨ uler ’92 5

Nesterov acceleration for gradient descent 2 Algorithm 4 Nesterov accelerated gradient descent with fixed step. Input : 0 < τ ≤ 1 / L f , x 0 = x − 1 = y 0 ∈ X , t 0 = 0. for k ≥ 0 do � 1 + 4 t 2 1 + k t k +1 = 2 y k = x k + t k − 1 ( x k − x k − 1 ) t k +1 x k +1 = y k − τ ∇ f ( y k ) end for Theorem (Acceleration) Let τ ≤ 1 / L f and ( x k ) the sequence generated by the accelerated gradient descent algorithm. Then: 2 τ ( k + 1) 2 � x 0 − x ∗ � 2 . f ( x k ) − f ( x ∗ ) ≤ 2 Nesterov, ’83, ’04, G¨ uler ’92 5

Standard problem in imaging: composite structure Variational regularisation of ill-posed inverse problems Compute a reconstructed version of a given degraded image f by solving: min {F ( x ) := R ( u ) + λ D ( u , f ) } , λ > 0 u ∈X with non-smooth regularisation and smooth data fidelity. 6

Standard problem in imaging: composite structure Variational regularisation of ill-posed inverse problems Compute a reconstructed version of a given degraded image f by solving: min {F ( x ) := R ( u ) + λ D ( u , f ) } , λ > 0 u ∈X with non-smooth regularisation and smooth data fidelity. Examples in inverse problems/imaging: • R ( u ) = TV , ICTV , TGV , ℓ 1 (Rudin, Osher, Fatemi, ’92, Chambolle-Lions’ 97, Bredies, ’10) • D ( u , f ) = � u − f � 2 2 (Gaussian Rudin, Osher, Fatemi, ’92 ), D ( u , f ) = � u − f � 1 ,γ (Laplace/impulse, Nikolova, ’04), D ( u , f ) = KL γ ( u , f ) (Poisson, Burger, Sawatzky, Brune, M¨ uller, ’09). . . 6

Composite optimisation We want to solve: min x ∈X { F ( x ) := f ( x ) + g ( x ) } • f is smooth : differentiable, convex with Lipschitz gradient �∇ f ( y ) − ∇ f ( x ) � ≤ L f � y − x � , for any x , y ∈ X . • g is convex, l.s.c., non-smooth , easy proximal map. 3 Combettes, Ways, ’05, Nesterov, ’13. . . 7

Composite optimisation We want to solve: min x ∈X { F ( x ) := f ( x ) + g ( x ) } • f is smooth : differentiable, convex with Lipschitz gradient �∇ f ( y ) − ∇ f ( x ) � ≤ L f � y − x � , for any x , y ∈ X . • g is convex, l.s.c., non-smooth , easy proximal map. Composite optimisation problem Forward-Backward splitting 3 . - forward gradient descent step in f ; - backward implicit gradient descent step in g . Basic algorithm : take x 0 ∈ X , fix τ > 0 and for k ≥ 0 do: x k +1 = prox τ g ( x k − τ ∇ f ( x k )) =: T τ x k . 3 Combettes, Ways, ’05, Nesterov, ’13. . . 7

Composite optimisation We want to solve: min x ∈X { F ( x ) := f ( x ) + g ( x ) } • f is smooth : differentiable, convex with Lipschitz gradient �∇ f ( y ) − ∇ f ( x ) � ≤ L f � y − x � , for any x , y ∈ X . • g is convex, l.s.c., non-smooth , easy proximal map. Composite optimisation problem Forward-Backward splitting 3 . - forward gradient descent step in f ; - backward implicit gradient descent step in g . Basic algorithm : take x 0 ∈ X , fix τ > 0 and for k ≥ 0 do: x k +1 = prox τ g ( x k − τ ∇ f ( x k )) =: T τ x k . Rate of convergence : O (1 / k ). 3 Combettes, Ways, ’05, Nesterov, ’13. . . 7

Accelerated forward-backward, FISTA: previous work In Nesterov ’04 and Beck, Teboulle ’09, accelerated O (1 / k 2 ) convergence of is achieved by extrapolation (as above). Further properties: - convergence of iterates (Chambolle, Dossal ’15); - monotone variants (Beck, Teboulle ’09, Tseng ’08, Tao, Boley, Zhang ’15) - acceleration for inexact evaluation of operators (Villa, Salzo, Baldassarre, Verri ’13, Bonettini, Prato, Rebegoldi, ’18) 8

Accelerated forward-backward, FISTA: previous work In Nesterov ’04 and Beck, Teboulle ’09, accelerated O (1 / k 2 ) convergence of is achieved by extrapolation (as above). Further properties: - convergence of iterates (Chambolle, Dossal ’15); - monotone variants (Beck, Teboulle ’09, Tseng ’08, Tao, Boley, Zhang ’15) - acceleration for inexact evaluation of operators (Villa, Salzo, Baldassarre, Verri ’13, Bonettini, Prato, Rebegoldi, ’18) Questions 1. Can we say more when f and/or g are strongly convex? Linear convergence? 2. Can we let the gradient step (proximal parameter) vary along the iterations AND preserving acceleration? 8

A strongly convex variant of FISTA (GFISTA) Let µ f , µ g ≥ 0. Then µ = µ f + µ g . For τ > 0 define: τµ q := 1 + τµ g ∈ [0 , 1) . Algorithm 5 GFISTA 4 (no backtracking) Input : 0 < τ ≤ 1 / L f , x 0 = x − 1 ∈ X and let t 0 ∈ R s.t. 0 ≤ t 0 ≤ 1 / √ q . for k ≥ 0 do y k = x k + β k ( x k − x k − 1 ) x k +1 = T τ y k = prox τ g ( y k − τ ∇ f ( y k )) � k ) 2 + 4 t 2 1 − qt 2 (1 − qt 2 k + k t k +1 = 2 β k = t k − 1 1 + τµ g − t k +1 τµ t k +1 1 − τµ f end for Remark: µ = q = 0 = ⇒ standard FISTA. 4 Chambolle, Pock ’16 9

GFISTA: acceleration results Theorem [Chambolle, Pock ’16] Let τ ≤ 1 / L f and 0 ≤ t 0 √ q ≤ 1. Then, the sequence ( x k ) of iterates of GFISTA satisfies � � 0 ( F ( x 0 ) − F ( x ∗ )) + 1 + τµ g F ( x k ) − F ( x ∗ ) ≤ r k ( q ) t 2 � x − x ∗ � 2 , 2 where x ∗ is a minimiser of F and: ( k + 1) 2 , (1 + √ q )(1 − √ q ) k , (1 − √ q ) k � � 4 r k ( q ) = min . t 2 0 Note : for µ = q = 0, t 0 = 0 this is the standard FISTA convergence result. 10

An adaptive backtracking strategy for non-smooth composite - PowerPoint PPT Presentation

An adaptive backtracking strategy for non-smooth composite optimisation problems Luca Calatroni ees (CMAP), Centre de Mathematiqu es Appliqu Ecole Polytechnique, Palaiseau joint work with: A. Chambolle. CMIPI 2018 Workshop University

In SVN: explain the concept of backtracking solve the n-queens problem using backtracking Qu

Strengthening Smooth Transition Strengthening Smooth Transition Strengthening Smooth Transition

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

CS 758/858: Algorithms http://www.cs.unh.edu/~ruml/cs758 Backtracking Local Search Wheeler Ruml

Exhaustive Generation: Backtracking and Branch-and-bound Lucia Moura Fall 2013 Exhaustive

CS 310 Advanced Data Structures and Algorithms Backtracking July 2, 2018 Mohammad Hadian

24.1 CSP Algorithms 22.23. Introduction 24.26. Basic Algorithms 24. Backtracking

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Adaptive Neymans smooth tests of homogeneity of two samples of survival data David Kraus

Extremal generalized smooth words Kolakoski word Run-length encoding Smooth words Generalized

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Ch Check out f from S SVN VN: Queens eens Exhaustive search, backtracking, and

Backtracking A short list of categories Algorithm types we will consider include: Simple

Applying bootstrap AMG in spectral clustering Luisa Cutillo School of Mathematics, University of

Comparison of 4DVAR and EnKF state estimates and forecasts in the Gulf of Mexico Ganesh

Compositions of Extended Top-down Tree Transducers Andreas Maletti March 30, 2007 Short

A Journey through the World of Incompressible Viscous Flows : an Evolution Equation Perspective

Applicability of LDP Advertisement Mode (draft-raza-mpls-ldp-applicability-label-adv-00.txt)

Lecture 3: Exercises Frank den Hollander Elena Pulvirenti June 26, 2020 1 Exercise 1:

Name Name Learning for Doncaster Learning Learning for Sport England Sharing locally,

RDF Streams Jean-Paul Calbimonte Institute of Information Systems University of Applied Sciences