splitting envelopes accelerated second order proximal
play

Splitting Envelopes Accelerated Second-order Proximal Methods Panos - PowerPoint PPT Presentation

Splitting Envelopes Accelerated Second-order Proximal Methods Panos Patrinos (joint work with Lorenzo Stella, Alberto Bemporad) September 8, 2014 Outline forward-backward envelope (FBE) forward-backward Newton method (FBN) dual FBE


  1. Splitting Envelopes Accelerated Second-order Proximal Methods Panos Patrinos (joint work with Lorenzo Stella, Alberto Bemporad) September 8, 2014

  2. Outline � forward-backward envelope (FBE) � forward-backward Newton method (FBN) � dual FBE and Augmented Lagrangian � alternating minimization Newton method (AMNM) � Douglas Rachford envelope (DRE) � accelerated Douglas Rachford splitting (ADRS) based on 1. P. Patrinos and A. Bemporad. Proximal Newton methods for convex composite optimization . In Proc. 52nd IEEE Conference on Decision and Control (CDC), pages 2358-2363, Florence, Italy, 2013. 2. P. Patrinos, L. Stella, and A. Bemporad, Forward-backward truncated Newton methods for convex composite optimization. submitted, arXiv:1402.6655, 2014. 3. P. Patrinos, L. Stella, and A. Bemporad. Douglas-Rachford splitting: complexity estimates and accelerated variants . In Proc. 53rd IEEE Conference on Decision and Control (CDC), Los Angeles, CA, arXiv:1407.6723, 2014. 4. L. Stella, P. Patrinos, and A. Bemporad, Alternating minimization Newton method for separable convex optimization , 2014 (submitted). fixed point implementation for MPC � A. Guiggiani, P. Patrinos, and A. Bemporad. Fixed-point implementation of a proximal Newton method for embedded model predictive control . In 19th IFAC, South Africa, 2014. 2 / 40

  3. Convex composite optimization minimize F ( x ) = f ( x ) + g ( x ) R n → I � f : I R convex, twice continuously differentiable with R n �∇ f ( x ) − ∇ f ( y ) � ≤ L f � x − y � , for all x, y ∈ I R n → I � g : I R convex, nonsmooth with inexpensive proximal mapping � 2 γ � z − x � 2 � g ( z ) + 1 prox γg ( x ) = arg min R n z ∈ I � many problem classes : QPs, cone programs, sparse least-squares, rank minimization, total variation minimization,. . . � applications : control, system identification, signal processing, image analysis, machine learning,. . . 3 / 40

  4. Proximal mappings � 2 γ � z − x � 2 � g ( z ) + 1 prox γg ( x ) = arg min γ > 0 R n z ∈ I � resolvent of maximal monotone operator ∂g prox γg ( x ) = ( I + γ∂g ) − 1 ( x ) � single-valued and (firmly) nonexpansive � explicitly computable for many functions (see Parikh, Boyd ’14, Combettes, Pesquet ’10) � reduces to projection when g is indicator of convex set prox γδ C ( x ) = Π C ( x ) � z = prox γg ( x ) is implicit a subgradient step ( 0 ∈ ∂g ( z )+ γ − 1 ( z − x ) ) z = x − γv v ∈ ∂g ( z ) 4 / 40

  5. Proximal Minimization Algorithm R n → I minimize g ( x ) , g : I R closed proper convex given x 0 ∈ I R n , repeat x k +1 = prox γg ( x k ) γ > 0 � fixed point iteration for optimality conditions ⇒ x ⋆ ∈ ( I + γ∂g )( x ⋆ ) ⇐ ⇒ x ⋆ = prox γg ( x ⋆ ) 0 ∈ ∂g ( x ⋆ ) ⇐ � special case of proximal point algorithm (Martinet ’70, Rockafellar ’76) � converges under very general conditions � mostly conceptual algorithm 5 / 40

  6. Moreau envelope R n → I Moreau envelope of closed proper convex g : I R � 2 γ � z − x � 2 � g γ ( x ) = inf g ( z ) + 1 , γ > 0 R n z ∈ I � g γ is real-valued, convex, differentiable with 1 /γ -Lipschitz gradient ∇ g γ ( x ) = (1 /γ )( x − prox γg ( x )) � minimizing nonsmooth g is equivalent to minimizing smooth g γ � proximal minimization algorithm = gradient method for g γ x k +1 = x k − γ ∇ g γ ( x k ) � can use any method of unconstrained smooth minimization for g γ 6 / 40

  7. Forward-Backward Splitting (FBS) minimize F ( x ) = f ( x ) + g ( x ) R n is optimal if and only if � optimality condition: x ⋆ ∈ I x ⋆ = prox γg ( x ⋆ − γ ∇ f ( x ⋆ )) , γ > 0 � forward-backward splitting (aka proximal gradient ) x k +1 = prox γg ( x k − γ ∇ f ( x k )) , γ ∈ (0 , 2 /L f ) � FBS is a fixed point iteration � g = 0 : gradient method, g = δ C : gradient projection, f = 0 : prox min � accelerated versions (Nesterov) 7 / 40

  8. Forward-Backward Envelope x − prox γg ( x − γ ∇ f ( x )) = 0 � use prox γg ( y ) = y − γ ∇ g γ ( y ) for y = x − γ ∇ f ( x ) γ ∇ f ( x ) + γ ∇ g γ ( x − γ ∇ f ( x )) = 0 � multiply with γ − 1 ( I − γ ∇ 2 f ( x )) (positive definite for γ ∈ (0 , 1 /L f ) ) � gradient of the F orward B ackward E nvelope (FBE) F FB ( x ) = f ( x ) − γ 2 �∇ f ( x ) � 2 2 + g γ ( x − γ ∇ f ( x )) γ � alternative expression for FBE F FB + g ( z ) + 1 2 γ � z − x � 2 } ( x ) = inf R n { f ( x ) + �∇ f ( x ) , z − x � γ z ∈ I � �� � linearize f around x 8 / 40

  9. Properties of FBE � stationary points of F FB = minimizers of F γ � reformulates original nonsmooth problem into a smooth one F FB minimize ( x ) equivalent to minimize F ( x ) γ R n R n x ∈ I x ∈ I � F FB is real-valued, continuously differentiable γ ( x ) = γ − 1 ( I − γ ∇ 2 f ( x ))( x − prox γg ( x − γ ∇ f ( x )) ∇ F FB γ � FBS is a variable metric gradient method for FBE x k +1 = x k − γD − 1 k ∇ F FB ( x k ) γ 9 / 40

  10. Forward-Backward Newton Method (FBN) Input : x 0 ∈ I R n , γ ∈ (0 , 1 /L f ) , σ ∈ (0 , 1 / 2) for k = 0 , 1 , 2 , . . . do Newton direction Choose H k ∈ ˆ ( x k ) . Compute d k by solving (approximately) ∂ 2 F FB γ H k d = −∇ F FB ( x k ) γ Line search Compute stepsize by backtracking ( x k + τ k d k ) ≤ F FB F FB ( x k ) + στ k �∇ F FB ( x k ) , d k � γ γ γ Update: x k +1 = x k + τ k d k end 10 / 40

  11. Linear Newton approximation FBE is C 1 but not C 2 Hd = −∇ F FB ( x ) where γ ∇ F FB ( x ) = γ − 1 ( I − γ ∇ 2 f ( x ))( x − prox γg ( x − γ ∇ f ( x )) γ and ˆ ∂ 2 F γ ( x ) is an approximate generalized Hessian H = γ − 1 ( I − γ ∇ 2 f ( x ))( I − P ( I − γ ∇ 2 f ( x ))) ∈ ˆ ∂ 2 F γ ( x ) where P ∈ ∂ C (prox γg ) ( x − γ ∇ f ( x )) � �� � Clarke’s generalized Jacobian � preserves all favorable properties of the Hessian for C 2 functions � “Gauss-Newton” generalized Hessian: we omit 3rd order terms 11 / 40

  12. Generalized Jacobians of proximal mappings � ∂ C prox γg ( x ) is the following set of matrices (Clarke, 1983) � limits of (ordinary) Jacobians for every sequence that converges � conv to x , consisting of points where prox γg is differentiable ◮ prox γg ( x ) simple to compute = ⇒ P ∈ ∂ C (prox γg )( x ) for free ◮ g (block) separable = ⇒ P ∈ ∂ C (prox γg )( x ) (block) diagonal example– ℓ 1 norm more examples in Patrinos, Stella, Bemporad (2014)  x i + γ, if x i ≤ − γ,   � g ( x ) = � x � 1 prox γf ( x ) i = 0 , if − γ ≤ x i ≤ γ   x i − γ, if x i ≥ γ � P ∈ ∂ C (prox γg )( x ) are diagonal matrices with  1 , if i ∈ { i | | x i | > γ } ,   P ii = ∈ [0 , 1] , if i ∈ { i | | x i | = γ } ,   0 , if i ∈ { i | | x i | < γ } . 12 / 40

  13. Convergence of FBN � every limit point of { x k } converges to arg min F ( x ) R n x ∈ I � all H ∈ ˆ ∂ 2 F γ ( x ⋆ ) nonsingular = ⇒ Q-quadratic asymptotic rate extension: FBN II � apply FB step after a Newton step � same asymptotic rate + global complexity estimates ◮ non-strongly convex f : sublinear rate for F ( x k ) − F ( x ⋆ ) � F ( x k ) − F ( x ⋆ ) � ◮ strongly convex f : linear rate for � x k − x ⋆ � 2 13 / 40

  14. FBN–CG large problems conjugate gradient (CG) on regularized Newton system until � ( H k + δ k I ) d k + ∇ F FB ( x k ) � ≤ η k �∇ F FB ( x k ) � γ γ � �� � residual with η k = O ( �∇ F FB ( x k ) � ) , δ k = O ( �∇ F FB ( x k ) � ) γ γ properties � no need to form ∇ 2 f ( x ) and H k – only matvec products � same convergence properties 14 / 40

  15. Box-constrained convex programs minimize f ( x ) subject to ℓ ≤ x ≤ u Newton direction solves 1 2 � d, ∇ 2 f ( x k ) d � + �∇ f ( x k ) , d � minimize d i = ℓ i − x k d i = u i − x k subject to i , i ∈ β 1 , i , i ∈ β 2 where β 1 = { i | x k i − γ ∇ i f ( x k ) ≤ ℓ i } estimate of x ⋆ i = ℓ i β 2 = { i | x k i − γ ∇ i f ( x k ) ≥ u i } estimate of x ⋆ i = u i Newton system becomes Q δδ d δ = − ( ∇ δ f ( x k ) + ∇ δβ f ( x k ) d β ) , ( β = β 1 ∪ β 2 , δ \ β = [ n ]) 15 / 40

  16. Example 1 minimize 2 � x, Qx � + � q, x � , n = 1000 subject to ℓ ≤ x ≤ u cond( Q ) = 10 4 cond( Q ) = 10 8 2 2 10 10 0 0 10 10 PNM FBN − 2 − 2 10 F ( x ν ) − F � 10 F ( x ν ) − F � PGNM FBN II FGM AFBS − 4 − 4 10 10 − 6 − 6 10 10 − 8 − 8 10 10 FBN PNM − 10 − 10 10 10 FBN II PGNM FGM AFBS − 12 − 12 10 10 10 20 30 40 50 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 time [sec] time [sec] GUROBI: 4.87 sec, CPLEX: 3.73 sec GUROBI: 5.96 sec, CPLEX: 4.83 sec FBN : much less sensitive to bad conditioning 16 / 40

  17. Sparse least-squares 1 2 � Ax − b � 2 minimize 2 + λ � x � 1 � Newton system becomes d β = − x β A ⊤ · δ A · δ d δ = − [ A ⊤ · δ ( A · δ x δ − b ) + λ sign( x δ − γ ∇ δ f ( x ))] � δ is an estimate of the nonzero components of x ⋆ δ = { i | | x i − γ ∇ i f ( x ) | > λγ } � close to solution δ small 17 / 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend