an adaptive backtracking strategy for non smooth
play

An adaptive backtracking strategy for non-smooth composite - PowerPoint PPT Presentation

An adaptive backtracking strategy for non-smooth composite optimisation problems Luca Calatroni ees (CMAP), Centre de Mathematiqu es Appliqu Ecole Polytechnique, Palaiseau joint work with: A. Chambolle. CMIPI 2018 Workshop University


  1. An adaptive backtracking strategy for non-smooth composite optimisation problems Luca Calatroni ees (CMAP), ´ Centre de Mathematiqu´ es Appliqu´ Ecole Polytechnique, Palaiseau joint work with: A. Chambolle. CMIPI 2018 Workshop University of Insubria, DISAT July 16-18 2018 Como, IT

  2. Table of contents 1. Introduction 2. GFISTA with backtracking 3. Accelerated convergence rates 4. Imaging applications 5. Conclusions & outlook 1

  3. Introduction

  4. Gradient based methods: a review ( X , � · � ), Hilbert space. Given f : X → R convex, l.s.c., with x ∗ ∈ arg min f , we want to solve: min x ∈X f ( x ) 2

  5. Gradient based methods: a review ( X , � · � ), Hilbert space. Given f : X → R convex, l.s.c., with x ∗ ∈ arg min f , we want to solve: min x ∈X f ( x ) If f is differentiable with L f -Lipschitz gradient, explicit gradient descent reads: Algorithm 1 Gradient descent with fixed step. Input : 0 < τ ≤ 2 / L f , x 0 ∈ X . for k ≥ 0 do x k +1 = x k − τ ∇ f ( x k ) end for Quite restrictive smoothness assumption! 2

  6. Gradient based methods: a review ( X , � · � ), Hilbert space. Given f : X → R convex, l.s.c., with x ∗ ∈ arg min f , we want to solve: min x ∈X f ( x ) No further assumptions on ∇ f : use implicit gradient descent. Algorithm 2 Implicit (proximal) gradient descent with fixed step. Input : τ > 0 , x 0 ∈ X . for k ≥ 0 do x k +1 = prox τ f ( x k )(= x k − τ ∇ f ( x k +1 )) end for Note : the iteration can be rewritten as: x ∈X f ( x ) + � x − x k � 2 x k +1 = x k − τ ∇ f τ ( x k ) , f τ ( x k ) := min with , 2 τ the Moreau-Yosida regularisation of f , which is 1 /τ -Lipschitz ⇒ explicit gradient descent on f τ . Same theory applies! References : Brezis-Lions (’73, ’78), G¨ uler (’91),. . . 2

  7. Convergence rates Theorem: O (1 / k ) rate Let x 0 ∈ X and τ ≤ 2 / L f . Then, the sequence ( x k ) of iterates of gradient descent converges to x ∗ and satisfies: 1 2 τ k � x ∗ − x 0 � 2 . f ( x k ) − f ( x ∗ ) ≤ 3

  8. Convergence rates Theorem: O (1 / k ) rate Let x 0 ∈ X and τ ≤ 2 / L f . Then, the sequence ( x k ) of iterates of gradient descent converges to x ∗ and satisfies: 1 2 τ k � x ∗ − x 0 � 2 . f ( x k ) − f ( x ∗ ) ≤ Assume : f is µ f - strongly convex , µ f > 0: f ( y ) ≥ f ( x ) + �∇ f ( x ) , y − x � + µ f 2 � y − x � 2 , for all x , y ∈ X . Theorem: Linear rate for strongly convex objectives Let f be µ f -strongly convex. Let x 0 ∈ X and τ ≤ 2 / ( L f + µ f ). Then, the sequence ( x k ) of iterates of gradient descent satisfies: 2 τ � x k − x ∗ � 2 ≤ ω k f ( x k ) − f ( x ∗ ) + 1 2 τ � x ∗ − x 0 � 2 , with ω = (1 − µ f / L ) / (1 + µ f L ) < 1. References : Bertsekas, ’15, Nesterov ’04 3

  9. Lower bounds 1 Theorem (Lower bounds) Let x 0 ∈ R n , L f > 0 and k < n . Then, for any first-order method there exists a convex C 1 function f with L f -Lipschitz gradient such that: 1. convex case : L f 8( k + 1) 2 � x ∗ − x 0 � 2 . f ( x k ) − f ( x ∗ ) ≥ 2. strongly convex case : � √ q − 1 � 2 k f ( x k ) − f ( x ∗ ) ≥ µ f � x ∗ − x 0 � 2 , √ q + 1 2 where q = L f /µ f ≥ 1. Remark : If k ≥ n we could use conjugate gradient! However, for imaging n ≫ 1! Usually k < n : can we improve convergence speed? 1 Nesterov, ’04 4

  10. Nesterov acceleration for gradient descent 2 To make it faster build extrapolated sequence ( inertia ). Algorithm 3 Nesterov accelerated gradient descent with fixed step. Input : 0 < τ ≤ 1 / L f , x 0 = x − 1 = y 0 ∈ X , t 0 = 0. for k ≥ 0 do � 1 + 4 t 2 1 + k t k +1 = 2 y k = x k + t k − 1 ( x k − x k − 1 ) t k +1 x k +1 = y k − τ ∇ f ( y k ) end for 2 Nesterov, ’83, ’04, G¨ uler ’92 5

  11. Nesterov acceleration for gradient descent 2 Algorithm 4 Nesterov accelerated gradient descent with fixed step. Input : 0 < τ ≤ 1 / L f , x 0 = x − 1 = y 0 ∈ X , t 0 = 0. for k ≥ 0 do � 1 + 4 t 2 1 + k t k +1 = 2 y k = x k + t k − 1 ( x k − x k − 1 ) t k +1 x k +1 = y k − τ ∇ f ( y k ) end for Theorem (Acceleration) Let τ ≤ 1 / L f and ( x k ) the sequence generated by the accelerated gradient descent algorithm. Then: 2 τ ( k + 1) 2 � x 0 − x ∗ � 2 . f ( x k ) − f ( x ∗ ) ≤ 2 Nesterov, ’83, ’04, G¨ uler ’92 5

  12. Standard problem in imaging: composite structure Variational regularisation of ill-posed inverse problems Compute a reconstructed version of a given degraded image f by solving: min {F ( x ) := R ( u ) + λ D ( u , f ) } , λ > 0 u ∈X with non-smooth regularisation and smooth data fidelity. 6

  13. Standard problem in imaging: composite structure Variational regularisation of ill-posed inverse problems Compute a reconstructed version of a given degraded image f by solving: min {F ( x ) := R ( u ) + λ D ( u , f ) } , λ > 0 u ∈X with non-smooth regularisation and smooth data fidelity. Examples in inverse problems/imaging: • R ( u ) = TV , ICTV , TGV , ℓ 1 (Rudin, Osher, Fatemi, ’92, Chambolle-Lions’ 97, Bredies, ’10) • D ( u , f ) = � u − f � 2 2 (Gaussian Rudin, Osher, Fatemi, ’92 ), D ( u , f ) = � u − f � 1 ,γ (Laplace/impulse, Nikolova, ’04), D ( u , f ) = KL γ ( u , f ) (Poisson, Burger, Sawatzky, Brune, M¨ uller, ’09). . . 6

  14. Composite optimisation We want to solve: min x ∈X { F ( x ) := f ( x ) + g ( x ) } • f is smooth : differentiable, convex with Lipschitz gradient �∇ f ( y ) − ∇ f ( x ) � ≤ L f � y − x � , for any x , y ∈ X . • g is convex, l.s.c., non-smooth , easy proximal map. 3 Combettes, Ways, ’05, Nesterov, ’13. . . 7

  15. Composite optimisation We want to solve: min x ∈X { F ( x ) := f ( x ) + g ( x ) } • f is smooth : differentiable, convex with Lipschitz gradient �∇ f ( y ) − ∇ f ( x ) � ≤ L f � y − x � , for any x , y ∈ X . • g is convex, l.s.c., non-smooth , easy proximal map. Composite optimisation problem Forward-Backward splitting 3 . - forward gradient descent step in f ; - backward implicit gradient descent step in g . Basic algorithm : take x 0 ∈ X , fix τ > 0 and for k ≥ 0 do: x k +1 = prox τ g ( x k − τ ∇ f ( x k )) =: T τ x k . 3 Combettes, Ways, ’05, Nesterov, ’13. . . 7

  16. Composite optimisation We want to solve: min x ∈X { F ( x ) := f ( x ) + g ( x ) } • f is smooth : differentiable, convex with Lipschitz gradient �∇ f ( y ) − ∇ f ( x ) � ≤ L f � y − x � , for any x , y ∈ X . • g is convex, l.s.c., non-smooth , easy proximal map. Composite optimisation problem Forward-Backward splitting 3 . - forward gradient descent step in f ; - backward implicit gradient descent step in g . Basic algorithm : take x 0 ∈ X , fix τ > 0 and for k ≥ 0 do: x k +1 = prox τ g ( x k − τ ∇ f ( x k )) =: T τ x k . Rate of convergence : O (1 / k ). 3 Combettes, Ways, ’05, Nesterov, ’13. . . 7

  17. Accelerated forward-backward, FISTA: previous work In Nesterov ’04 and Beck, Teboulle ’09, accelerated O (1 / k 2 ) convergence of is achieved by extrapolation (as above). Further properties: - convergence of iterates (Chambolle, Dossal ’15); - monotone variants (Beck, Teboulle ’09, Tseng ’08, Tao, Boley, Zhang ’15) - acceleration for inexact evaluation of operators (Villa, Salzo, Baldassarre, Verri ’13, Bonettini, Prato, Rebegoldi, ’18) 8

  18. Accelerated forward-backward, FISTA: previous work In Nesterov ’04 and Beck, Teboulle ’09, accelerated O (1 / k 2 ) convergence of is achieved by extrapolation (as above). Further properties: - convergence of iterates (Chambolle, Dossal ’15); - monotone variants (Beck, Teboulle ’09, Tseng ’08, Tao, Boley, Zhang ’15) - acceleration for inexact evaluation of operators (Villa, Salzo, Baldassarre, Verri ’13, Bonettini, Prato, Rebegoldi, ’18) Questions 1. Can we say more when f and/or g are strongly convex? Linear convergence? 2. Can we let the gradient step (proximal parameter) vary along the iterations AND preserving acceleration? 8

  19. A strongly convex variant of FISTA (GFISTA) Let µ f , µ g ≥ 0. Then µ = µ f + µ g . For τ > 0 define: τµ q := 1 + τµ g ∈ [0 , 1) . Algorithm 5 GFISTA 4 (no backtracking) Input : 0 < τ ≤ 1 / L f , x 0 = x − 1 ∈ X and let t 0 ∈ R s.t. 0 ≤ t 0 ≤ 1 / √ q . for k ≥ 0 do y k = x k + β k ( x k − x k − 1 ) x k +1 = T τ y k = prox τ g ( y k − τ ∇ f ( y k )) � k ) 2 + 4 t 2 1 − qt 2 (1 − qt 2 k + k t k +1 = 2 β k = t k − 1 1 + τµ g − t k +1 τµ t k +1 1 − τµ f end for Remark: µ = q = 0 = ⇒ standard FISTA. 4 Chambolle, Pock ’16 9

  20. GFISTA: acceleration results Theorem [Chambolle, Pock ’16] Let τ ≤ 1 / L f and 0 ≤ t 0 √ q ≤ 1. Then, the sequence ( x k ) of iterates of GFISTA satisfies � � 0 ( F ( x 0 ) − F ( x ∗ )) + 1 + τµ g F ( x k ) − F ( x ∗ ) ≤ r k ( q ) t 2 � x − x ∗ � 2 , 2 where x ∗ is a minimiser of F and: ( k + 1) 2 , (1 + √ q )(1 − √ q ) k , (1 − √ q ) k � � 4 r k ( q ) = min . t 2 0 Note : for µ = q = 0, t 0 = 0 this is the standard FISTA convergence result. 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend