Inexact variable metric proximal gradient methods with line-search - PowerPoint PPT Presentation

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex optimization Silvia Bonettini Dipartimento di Scienze Fisiche, Optimization Algorithms and Software Informatiche e Matematiche for Inverse problemS Università di Modena e Reggio Emilia www.oasis.unimore.it Computational Methods for Inverse Problems in Imaging Como, 16-18 July 2018 Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 1 / 17

Collaborators and main references Joint works with: Marco Prato, Università di Modena e Reggio Emilia Federica Porta, Simone Rebegoldi, Valeria Ruggiero, Università di Ferrara Ignace Loris, Université Libre de Bruxelles Main references: S. B., I. Loris, F . Porta, M. Prato 2016, Variable metric inexact line–search based methods for nonsmooth optimization, SIAM J. Optim. , 26 (2), 891-921 S. B., F. Porta, V. Ruggiero 2016, A variable metric forward–backward method with extrapolation, SIAM J. Sci. Comput., 38 (4), A2558-A2584 S. B., I. Loris, F . Porta, M. Prato, S. Rebegoldi 2017, On the convergence of a line–search base proximal-gradient method for nonconvex optimization, Inverse Probl. , 33 (5), 055005 S. B., S. Rebegoldi, V. Ruggiero, 2018, Inertial variable metric techniques for the inexact forward-backward algorithm, submitted. Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 2 / 17

A general nonsmooth problem Several optimization problems arising from the Bayesian approach to inverse problems have the following structure x ∈ R n f ( x ) ≡ f 0 ( x ) + f 1 ( x ) , min where: f 0 ( x ) continuously differentiable, possibly nonconvex. usually expressing some kind of data discrepancy f 1 ( x ) convex, possibly nondifferentiable usually expressing regularization Goal: develop a numerical optimization algorithm producing a good approximation of the solution of the minimization problem in few, cheap iterations. Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 3 / 17

The class of proximal gradient methods Proximal gradient methods, aka forward-backward methods, exploit the smooth- ness of f 0 and the convexity of f 1 in problem x ∈ R n f ( x ) ≡ f 0 ( x ) + f 1 ( x ) , min Definition (Proximal gradient method) Any first order method based on the following two operations: Explicit Forward/Gradient step: computation of the gradient ∇ f 0 ( x ) Implicit Backward/Proximal step: computation of the proximity (or resolvent) operator: x ∈ R n f 1 ( x ) + 1 2 � x − z � 2 prox f 1 ( z ) = arg min Example: If Ω ⊂ R n is a closed convex set, we can define the indicator function � 0 if x ∈ Ω ι Ω ( x ) = ⇒ prox ι Ω ( z ) = Π Ω ( z ) + ∞ otherwise (orthogonal projection onto Ω ). NB: gradient projection methods are special instances of proximal gradient methods. Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 4 / 17

A basic forward-backward scheme x ( k ) − α k ∇ f 0 ( x ( k ) ) ← Forward step z ( k ) = y ( k ) prox α k f 1 ( z ( k ) ) ← Backward step = y ( k ) − x ( k ) d ( k ) = x ( k ) + λ k d ( k ) x ( k +1) = NB: The steplength parameters α k , λ k ∈ R > 0 , in standard convergence analysis, are related to the Lipschitz constant L of ∇ f 0 ( x ) [Combettes-Wajs 2006], [Com- bettes, Wu, 2014] requiring that α k and/or λ k ≤ C L A motivating problem: nonnegative image restoration from Poisson data n � g i � � x ∈ R n KL ( Hx, g ) min + ρ �∇ x � + ι R n ≥ 0 ( x ) where KL ( t, g ) = log + t i − g i t i � �� i =1 f 0 ( x ) f 1 ( x ) either ∇ f 0 is not Lipschitz or L is very large prox f 1 is not available in closed form Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 5 / 17

A line–search approach We propose to compute λ k with a line–search approach, starting from 1 and backtracking until a sufficient decrease of the objective function is obtained. Generalized Armijo rule [Tseng, Yun, 2009, Porta, Loris, 2015, B. et al. , 2016] f ( x ( k ) + λ k d ( k ) ) ≤ f ( x ( k ) ) + βλ k h ( k ) ( y ( k ) ) , where β ∈ (0 , 1) and 1 h ( k ) ( y ) = ∇ f 0 ( x ( k ) ) T ( y − x ( k ) ) + 2 α k � y − x ( k ) � 2 + f 1 ( y ) − f 1 ( x ( k ) ) , NB1: We have y ( k ) = prox α k f 1 ( x ( k ) − α k ∇ f 0 ( x ( k ) ) = arg min y ∈ R n h ( k ) ( y ) . Since h ( k ) ( y ( k ) ) < 0 , we obtain a monotone decrease of the objective function. NB2: For f 1 ≡ 0 , dropping the quadratic term we obtain the standard Armijo rule for smooth optimization. Pros: Cons: No need of any Lipschitz assumption Needs the evaluation of the function f at each backtracking loop (usually 1-2 Adaptive selection of λ k (no user per outer iteration). provided parameter) No assumptions on α k , just be bounded above and away from zero. Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 6 / 17

Inexact computation of the proximity operator (1) Basic idea y ( k ) of y ( k ) by applying an iterative optimization method Compute an approximation ˜ to the minimum problem defining the proximity operator: y ( k ) ≃ y ( k ) = arg min y ∈ R n h ( k ) ( y ) ˜ with an increasing accuracy as k increases. This results in a two loop algorithm and the question now is How to stop the inner iterations to preserve the convergence of the iterates { x ( k ) } to a solution? We need to define a criterion to measure the accuracy of the approximate proximity operator computation. Crucial properties of this criterion: It has to preserve the convergence properties of the whole scheme. It must be based on computable quantities. Borrowing the ideas in [Salzo,Villa,2012], [Villa etal 2013] replace 0 ∈ ∂h ( k ) ( y ( k ) ) with 0 ∈ ∂ ǫ k h ( k ) (˜ y ( k ) ) Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 7 / 17

Inexact computation of the proximity operator (2) A well defined primal-dual procedure Assume that f 1 ( x ) = g ( Ax ) , A ∈ R m × n (easy generalization to f 1 ( x ) = � p i =1 g i ( A i x ) ). The dual problem of the proximity operator computation is v ∈ R m Ψ ( k ) ( v ) ≡ − 1 2 α k � α k A T v − z ( k ) � 2 − g ∗ ( v ) + C k x ∈ R n h ( k ) ( x ) = max min where g ∗ is the Fenchel convex conjugate of g . If v ( k ) = arg max Ψ ( k ) ( v ) , then y ( k ) = z ( k ) − α k A T v ( k ) . y ( k ) as follows: Compute ˜ apply a maximization method to the dual problem, generating the dual sequence { v ( k,ℓ ) } ℓ ∈ N converging to v ( k ) y ( k,ℓ ) } ℓ ∈ N , with formula compute the corresponding primal sequence { ˜ y ( k,ℓ ) = z ( k ) − α k A T v ( k,ℓ ) ˜ stop the inner iterations when h ( k ) (˜ y ( k,ℓ ) ) − Ψ ( k ) ( v ( k,ℓ ) ) ≤ ǫ k where  C with q > 1 prefixed sequence choice  k q ǫ k = or  ηh ( k ) (˜ y ( k,ℓ ) ) with η ∈ (0 , 1] adaptive choice Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 8 / 17

Introducing Scaling Add a new parameter, a s.p.d. scaling matrix D k which determines a different metric at each iterate: replace � x � with � x � D k = x T D k x Variable Metric Inexact Line–Search Algorithm (VMILA) x ( k ) − α k D − 1 z ( k ) k ∇ f 0 ( x ( k ) ) ← Scaled Forward step = α k f 1 ( z ( k ) ) ≡ y ( k ) ← Scaled Inexact Backward step y ( k ) prox D k ˜ ≈ y ( k ) − x ( k ) d ( k ) = ˜ x ( k ) + λ k d ( k ) ← Armijo-like line–search x ( k +1) = Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 9 / 17

Summary of convergence results about VMILA VMILA λ k with line–search + inexact computation of the proximal point with increasing accuracy + α k bounded Convex case Nonconvex case k →∞ Assumption: D k − → I like Assumption: D k has bounded C/k p , p > 1 eigenvalues. Convergence to a Every accumulation point of { x ( k ) } k ∈ Z is a stationary point minimizer (without Lipschitz assumptions on If f satisfies the ∇ f 0 ( x ) ) Kurdyka–Lojasiewicz property and ∇ f 0 is locally Lipschitz, then Convergence rate f ( x ( k ) ) − f ∗ = O (1 /k ) { x ( k ) } k ∈ Z converges to a (proof with Lipschitz stationary point (with exact assumptions on ∇ f 0 ( x ) ) proximal point computation). Block-coordinate version of VMILA proposed in [B., Prato, Rebegoldi, 2018, to appear ]. NB: α k and D k are required only to be bounded ⇒ use them to implement some acceleration strategy. Silvia Bonettini Inexact variable metric proximal gradient methods CMIPI 2018 10 / 17

Inexact variable metric proximal gradient methods with line-search - PowerPoint PPT Presentation

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex optimization Silvia Bonettini Dipartimento di Scienze Fisiche, Optimization Algorithms and Software Informatiche e Matematiche for Inverse problemS

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Stochastic Perturbations of Proximal-Gradient methods for nonsmooth convex optimization: the

Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html

Outline Exploring inexact rhyme in TradiQons in studying Russian rhyme Russian verse The

Variable selection bias Bias in Ensemble Bias in Ensemble Methods Methods Variable selection

Welcome back... Metric spaces. Approximate metric using a tree. Tree metric: 16 16 A metric

Models for Inexact Reasoning Models for Inexact Reasoning Reasoning with Certainty Factors: The

Inexact Tensor Methods with Dynamic Accuracies Nikita Doikov Yurii Nesterov UCLouvain, Belgium

An alternating variable metric inexact linesearch based algorithm for nonconvex nonsmooth

Complexity of a quadratic penalty accelerated inexact proximal point method W. Kong 1 J.G. Melo 2

Metric Spaces Definition If d is a metric on X , then the metric topology on X induced by d is

Numberjack User Guide May 27, 2013 1 Variables Constructor for the class Variable : Constructor

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey

Information- -Velocity Metric Velocity Metric Information-Velocity Metric Information for the

Harnessing Structure in Optimization for Machine Learning Franck Iutzeler LJK, Univ. Grenoble

Proximity-based Clustering Clustering with no distance information What if one wants to

CS-5630 / CS-6630 Visualization for Data Science Design Guidelines Alexander Lex

Proverbs Series Lesson #017 May 19, 2013 Dean Bible Ministries www.deanbible.org Dr. Robert L.

Keyword-based Queries Single words

Constrained Tensor Factorization with Accelerated AO-ADMM Shaden Smith 1 , Alec Beri 2 , and

Optimization for data processing at a large scale Sparsity4PSL Summer School Emilie Chouzenoux

Signal analysis using sparse representation and proximal optimization methods Mai Quyen PHAM