Splitting Envelopes Accelerated Second-order Proximal Methods Panos - PowerPoint PPT Presentation

Splitting Envelopes Accelerated Second-order Proximal Methods Panos Patrinos (joint work with Lorenzo Stella, Alberto Bemporad) September 8, 2014

Outline � forward-backward envelope (FBE) � forward-backward Newton method (FBN) � dual FBE and Augmented Lagrangian � alternating minimization Newton method (AMNM) � Douglas Rachford envelope (DRE) � accelerated Douglas Rachford splitting (ADRS) based on 1. P. Patrinos and A. Bemporad. Proximal Newton methods for convex composite optimization . In Proc. 52nd IEEE Conference on Decision and Control (CDC), pages 2358-2363, Florence, Italy, 2013. 2. P. Patrinos, L. Stella, and A. Bemporad, Forward-backward truncated Newton methods for convex composite optimization. submitted, arXiv:1402.6655, 2014. 3. P. Patrinos, L. Stella, and A. Bemporad. Douglas-Rachford splitting: complexity estimates and accelerated variants . In Proc. 53rd IEEE Conference on Decision and Control (CDC), Los Angeles, CA, arXiv:1407.6723, 2014. 4. L. Stella, P. Patrinos, and A. Bemporad, Alternating minimization Newton method for separable convex optimization , 2014 (submitted). fixed point implementation for MPC � A. Guiggiani, P. Patrinos, and A. Bemporad. Fixed-point implementation of a proximal Newton method for embedded model predictive control . In 19th IFAC, South Africa, 2014. 2 / 40

Convex composite optimization minimize F ( x ) = f ( x ) + g ( x ) R n → I � f : I R convex, twice continuously differentiable with R n �∇ f ( x ) − ∇ f ( y ) � ≤ L f � x − y � , for all x, y ∈ I R n → I � g : I R convex, nonsmooth with inexpensive proximal mapping � 2 γ � z − x � 2 � g ( z ) + 1 prox γg ( x ) = arg min R n z ∈ I � many problem classes : QPs, cone programs, sparse least-squares, rank minimization, total variation minimization,. . . � applications : control, system identification, signal processing, image analysis, machine learning,. . . 3 / 40

Proximal mappings � 2 γ � z − x � 2 � g ( z ) + 1 prox γg ( x ) = arg min γ > 0 R n z ∈ I � resolvent of maximal monotone operator ∂g prox γg ( x ) = ( I + γ∂g ) − 1 ( x ) � single-valued and (firmly) nonexpansive � explicitly computable for many functions (see Parikh, Boyd ’14, Combettes, Pesquet ’10) � reduces to projection when g is indicator of convex set prox γδ C ( x ) = Π C ( x ) � z = prox γg ( x ) is implicit a subgradient step ( 0 ∈ ∂g ( z )+ γ − 1 ( z − x ) ) z = x − γv v ∈ ∂g ( z ) 4 / 40

Proximal Minimization Algorithm R n → I minimize g ( x ) , g : I R closed proper convex given x 0 ∈ I R n , repeat x k +1 = prox γg ( x k ) γ > 0 � fixed point iteration for optimality conditions ⇒ x ⋆ ∈ ( I + γ∂g )( x ⋆ ) ⇐ ⇒ x ⋆ = prox γg ( x ⋆ ) 0 ∈ ∂g ( x ⋆ ) ⇐ � special case of proximal point algorithm (Martinet ’70, Rockafellar ’76) � converges under very general conditions � mostly conceptual algorithm 5 / 40

Moreau envelope R n → I Moreau envelope of closed proper convex g : I R � 2 γ � z − x � 2 � g γ ( x ) = inf g ( z ) + 1 , γ > 0 R n z ∈ I � g γ is real-valued, convex, differentiable with 1 /γ -Lipschitz gradient ∇ g γ ( x ) = (1 /γ )( x − prox γg ( x )) � minimizing nonsmooth g is equivalent to minimizing smooth g γ � proximal minimization algorithm = gradient method for g γ x k +1 = x k − γ ∇ g γ ( x k ) � can use any method of unconstrained smooth minimization for g γ 6 / 40

Forward-Backward Splitting (FBS) minimize F ( x ) = f ( x ) + g ( x ) R n is optimal if and only if � optimality condition: x ⋆ ∈ I x ⋆ = prox γg ( x ⋆ − γ ∇ f ( x ⋆ )) , γ > 0 � forward-backward splitting (aka proximal gradient ) x k +1 = prox γg ( x k − γ ∇ f ( x k )) , γ ∈ (0 , 2 /L f ) � FBS is a fixed point iteration � g = 0 : gradient method, g = δ C : gradient projection, f = 0 : prox min � accelerated versions (Nesterov) 7 / 40

Forward-Backward Envelope x − prox γg ( x − γ ∇ f ( x )) = 0 � use prox γg ( y ) = y − γ ∇ g γ ( y ) for y = x − γ ∇ f ( x ) γ ∇ f ( x ) + γ ∇ g γ ( x − γ ∇ f ( x )) = 0 � multiply with γ − 1 ( I − γ ∇ 2 f ( x )) (positive definite for γ ∈ (0 , 1 /L f ) ) � gradient of the F orward B ackward E nvelope (FBE) F FB ( x ) = f ( x ) − γ 2 �∇ f ( x ) � 2 2 + g γ ( x − γ ∇ f ( x )) γ � alternative expression for FBE F FB + g ( z ) + 1 2 γ � z − x � 2 } ( x ) = inf R n { f ( x ) + �∇ f ( x ) , z − x � γ z ∈ I � �� linearize f around x 8 / 40

Properties of FBE � stationary points of F FB = minimizers of F γ � reformulates original nonsmooth problem into a smooth one F FB minimize ( x ) equivalent to minimize F ( x ) γ R n R n x ∈ I x ∈ I � F FB is real-valued, continuously differentiable γ ( x ) = γ − 1 ( I − γ ∇ 2 f ( x ))( x − prox γg ( x − γ ∇ f ( x )) ∇ F FB γ � FBS is a variable metric gradient method for FBE x k +1 = x k − γD − 1 k ∇ F FB ( x k ) γ 9 / 40

Forward-Backward Newton Method (FBN) Input : x 0 ∈ I R n , γ ∈ (0 , 1 /L f ) , σ ∈ (0 , 1 / 2) for k = 0 , 1 , 2 , . . . do Newton direction Choose H k ∈ ˆ ( x k ) . Compute d k by solving (approximately) ∂ 2 F FB γ H k d = −∇ F FB ( x k ) γ Line search Compute stepsize by backtracking ( x k + τ k d k ) ≤ F FB F FB ( x k ) + στ k �∇ F FB ( x k ) , d k � γ γ γ Update: x k +1 = x k + τ k d k end 10 / 40

Linear Newton approximation FBE is C 1 but not C 2 Hd = −∇ F FB ( x ) where γ ∇ F FB ( x ) = γ − 1 ( I − γ ∇ 2 f ( x ))( x − prox γg ( x − γ ∇ f ( x )) γ and ˆ ∂ 2 F γ ( x ) is an approximate generalized Hessian H = γ − 1 ( I − γ ∇ 2 f ( x ))( I − P ( I − γ ∇ 2 f ( x ))) ∈ ˆ ∂ 2 F γ ( x ) where P ∈ ∂ C (prox γg ) ( x − γ ∇ f ( x )) � �� Clarke’s generalized Jacobian � preserves all favorable properties of the Hessian for C 2 functions � “Gauss-Newton” generalized Hessian: we omit 3rd order terms 11 / 40

Generalized Jacobians of proximal mappings � ∂ C prox γg ( x ) is the following set of matrices (Clarke, 1983) � limits of (ordinary) Jacobians for every sequence that converges � conv to x , consisting of points where prox γg is differentiable ◮ prox γg ( x ) simple to compute = ⇒ P ∈ ∂ C (prox γg )( x ) for free ◮ g (block) separable = ⇒ P ∈ ∂ C (prox γg )( x ) (block) diagonal example– ℓ 1 norm more examples in Patrinos, Stella, Bemporad (2014)  x i + γ, if x i ≤ − γ,   � g ( x ) = � x � 1 prox γf ( x ) i = 0 , if − γ ≤ x i ≤ γ   x i − γ, if x i ≥ γ � P ∈ ∂ C (prox γg )( x ) are diagonal matrices with  1 , if i ∈ { i | | x i | > γ } ,   P ii = ∈ [0 , 1] , if i ∈ { i | | x i | = γ } ,   0 , if i ∈ { i | | x i | < γ } . 12 / 40

Convergence of FBN � every limit point of { x k } converges to arg min F ( x ) R n x ∈ I � all H ∈ ˆ ∂ 2 F γ ( x ⋆ ) nonsingular = ⇒ Q-quadratic asymptotic rate extension: FBN II � apply FB step after a Newton step � same asymptotic rate + global complexity estimates ◮ non-strongly convex f : sublinear rate for F ( x k ) − F ( x ⋆ ) � F ( x k ) − F ( x ⋆ ) � ◮ strongly convex f : linear rate for � x k − x ⋆ � 2 13 / 40

FBN–CG large problems conjugate gradient (CG) on regularized Newton system until � ( H k + δ k I ) d k + ∇ F FB ( x k ) � ≤ η k �∇ F FB ( x k ) � γ γ � �� residual with η k = O ( �∇ F FB ( x k ) � ) , δ k = O ( �∇ F FB ( x k ) � ) γ γ properties � no need to form ∇ 2 f ( x ) and H k – only matvec products � same convergence properties 14 / 40

Box-constrained convex programs minimize f ( x ) subject to ℓ ≤ x ≤ u Newton direction solves 1 2 � d, ∇ 2 f ( x k ) d � + �∇ f ( x k ) , d � minimize d i = ℓ i − x k d i = u i − x k subject to i , i ∈ β 1 , i , i ∈ β 2 where β 1 = { i | x k i − γ ∇ i f ( x k ) ≤ ℓ i } estimate of x ⋆ i = ℓ i β 2 = { i | x k i − γ ∇ i f ( x k ) ≥ u i } estimate of x ⋆ i = u i Newton system becomes Q δδ d δ = − ( ∇ δ f ( x k ) + ∇ δβ f ( x k ) d β ) , ( β = β 1 ∪ β 2 , δ \ β = [ n ]) 15 / 40

Example 1 minimize 2 � x, Qx � + � q, x � , n = 1000 subject to ℓ ≤ x ≤ u cond( Q ) = 10 4 cond( Q ) = 10 8 2 2 10 10 0 0 10 10 PNM FBN − 2 − 2 10 F ( x ν ) − F � 10 F ( x ν ) − F � PGNM FBN II FGM AFBS − 4 − 4 10 10 − 6 − 6 10 10 − 8 − 8 10 10 FBN PNM − 10 − 10 10 10 FBN II PGNM FGM AFBS − 12 − 12 10 10 10 20 30 40 50 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4 time [sec] time [sec] GUROBI: 4.87 sec, CPLEX: 3.73 sec GUROBI: 5.96 sec, CPLEX: 4.83 sec FBN : much less sensitive to bad conditioning 16 / 40

Sparse least-squares 1 2 � Ax − b � 2 minimize 2 + λ � x � 1 � Newton system becomes d β = − x β A ⊤ · δ A · δ d δ = − [ A ⊤ · δ ( A · δ x δ − b ) + λ sign( x δ − γ ∇ δ f ( x ))] � δ is an estimate of the nonzero components of x ⋆ δ = { i | | x i − γ ∇ i f ( x ) | > λγ } � close to solution δ small 17 / 40

Splitting Envelopes Accelerated Second-order Proximal Methods Panos - PowerPoint PPT Presentation

Splitting Envelopes Accelerated Second-order Proximal Methods Panos Patrinos (joint work with Lorenzo Stella, Alberto Bemporad) September 8, 2014 Outline forward-backward envelope (FBE) forward-backward Newton method (FBN) dual FBE

Building envelopes, glass & metal construction Building envelopes, glass & metal

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Regularizing with BregmanMoreau envelopes Heinz H. Bauschke Minh N. Dao Scott B Lindstrom

1 Whered Ya Get Them P( )? Its Like Having Twins is the probability a coin turns

Thomas Bayes Needs a Volunteer So good to see you again! Two Envelopes I have two envelopes,

Functional envelopes of dynamical systems old and new results Functional envelopes of

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME

Splitting methods in geometric numerical integration of differential equations Fernando Casas

Lattice Envelopes Uri Bader Alex Furman Roman Sauer The Technion, Haifa University of Illinois

Functional envelopes of dynamical systems old and new results L ubom r Snoha Matej

Multi-D wavelet construction using Quillen-Suslin theorem for Laurent polynomials Youngmi Hur

Physics Plans and Machines in Germany Major Collaborations and places in Germany: BMW: J

CSP-based inference of function block finite-state models from execution traces Daniil

Convex Optimization in Machine Learning and Inverse Problems Part 2: First-Order Methods ario A.

Join the conversation @apartmentwire U.S. Trends Enrollment and Demographics U.S. Change in

Scientific & Computational Challenges at the Intensity Frontier Stephen R. Sharpe

CKM fits and kaon decays S ebastien Descotes-Genon Laboratoire de Physique Th eorique CNRS

Deterministic Oblivious Local Broadcast in the SINR Model Tomasz Jurdzi nski, Micha R

Splitting Envelopes Accelerated Second-order Proximal Methods Panos - PowerPoint PPT Presentation

Splitting Envelopes Accelerated Second-order Proximal Methods Panos Patrinos (joint work with Lorenzo Stella, Alberto Bemporad) September 8, 2014 Outline forward-backward envelope (FBE) forward-backward Newton method (FBN) dual FBE

Building envelopes, glass &amp; metal construction Building envelopes, glass &amp; metal

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Regularizing with BregmanMoreau envelopes Heinz H. Bauschke Minh N. Dao Scott B Lindstrom

1 Whered Ya Get Them P( )? Its Like Having Twins is the probability a coin turns

Thomas Bayes Needs a Volunteer So good to see you again! Two Envelopes I have two envelopes,

Functional envelopes of dynamical systems old and new results Functional envelopes of

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

On the Equivalence of Inexact Proximal ALM and ADMM for a Class of Convex Composite Programming

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE &amp; ICME

Splitting methods in geometric numerical integration of differential equations Fernando Casas

Lattice Envelopes Uri Bader Alex Furman Roman Sauer The Technion, Haifa University of Illinois

Functional envelopes of dynamical systems old and new results L ubom r Snoha Matej

Multi-D wavelet construction using Quillen-Suslin theorem for Laurent polynomials Youngmi Hur

Physics Plans and Machines in Germany Major Collaborations and places in Germany: BMW: J

CSP-based inference of function block finite-state models from execution traces Daniil

Convex Optimization in Machine Learning and Inverse Problems Part 2: First-Order Methods ario A.

Join the conversation @apartmentwire U.S. Trends Enrollment and Demographics U.S. Change in

Scientific &amp; Computational Challenges at the Intensity Frontier Stephen R. Sharpe

CKM fits and kaon decays S ebastien Descotes-Genon Laboratoire de Physique Th eorique CNRS

Deterministic Oblivious Local Broadcast in the SINR Model Tomasz Jurdzi nski, Micha R

Building envelopes, glass & metal construction Building envelopes, glass & metal

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME

Scientific & Computational Challenges at the Intensity Frontier Stephen R. Sharpe