Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang - PowerPoint PPT Presentation

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1

Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion 2

Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Problem Overview 3

Prox-Affine Problem Prox-affine convex optimization problem: � N minimize i =1 f i ( x i ) � N subject to i =1 A i x i = b with variables x i ∈ R n i for i = 1 , . . . , N ◮ A i ∈ R m × n i and b ∈ R m given data ◮ f i : R n i → R ∪ { + ∞} are closed, convex and proper ◮ Each f i can only be accessed via its proximal operator � � f i ( x i ) + 1 2 t � x i − v i � 2 prox tf i ( v i ) = argmin x i , 2 where t > 0 is a parameter Problem Overview 4

Why This Formulation? ◮ Encompasses many classes of convex problems (conic programs, consensus optimization) ◮ Block separable form ideal for distributed optimization ◮ Proximal operator can be provided as a “black box”, enabling privacy-preserving implementation Problem Overview 5

Previous Work ◮ Alternating direction method of multipliers (ADMM) ◮ Douglas-Rachford splitting (DRS) ◮ Augmented Lagrangian method (ALM) Problem Overview 6

Previous Work ◮ Alternating direction method of multipliers (ADMM) ◮ Douglas-Rachford splitting (DRS) ◮ Augmented Lagrangian method (ALM) These are typically slow to converge, prompting research into acceleration techniques: ◮ Adaptive penalty parameters ◮ Momentum methods ◮ Quasi-Newton method with line search Problem Overview 6

Our Method ◮ A2DR : Anderson acceleration (AA) applied to DRS ◮ DRS is a non-expansive fixed-point (NEFP) method that fits prox-affine framework ◮ AA is fast, efficient, and can be applied to NEFP iterations – but unstable without modification ◮ We introduce a type-II AA variant that converges globally in non-smooth, potentially pathological settings Problem Overview 7

Main Advantages ◮ A2DR produces primal and dual solutions, or a certificate of infeasibility/unboundedness ◮ Consistently converges faster with no parameter tuning ◮ Memory efficient ⇒ little extra cost per iteration ◮ Scales to large problems and is easily parallelized ◮ Python implementation: https://github.com/cvxgrp/a2dr Problem Overview 8

Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Douglas-Rachford Splitting 9

DRS Algorithm ◮ Define A = [ A 1 . . . A n ] and x = ( x 1 , . . . , x N ) ◮ Rewrite problem using set indicator I S � N minimize i =1 f i ( x i ) + I Ax = b ( x ) ◮ DRS iterates for k = 1 , 2 , . . . , x k +1 / 2 = prox tf i ( v k ) , i = 1 , . . . , N i v k +1 / 2 = 2 x k +1 / 2 − v k x k +1 = Π Av = b ( v k +1 / 2 ) v k +1 = v k + x k +1 − x k +1 / 2 Π S ( v ) is Euclidean projection of v onto S Douglas-Rachford Splitting 10

Convergence of DRS ◮ DRS iterations can be conceived as a fixed-point mapping v k +1 = F ( v k ) , where F is firmly non-expansive ◮ v k converges to a fixed point of F (if it exists) ◮ x k and x k +1 / 2 converge to a solution of our problem Douglas-Rachford Splitting 11

Convergence of DRS ◮ DRS iterations can be conceived as a fixed-point mapping v k +1 = F ( v k ) , where F is firmly non-expansive ◮ v k converges to a fixed point of F (if it exists) ◮ x k and x k +1 / 2 converge to a solution of our problem In practice, this convergence is often slow... Douglas-Rachford Splitting 11

Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Anderson Acceleration 12

Type-II AA ◮ Quasi-Newton method for accelerating fixed point iterations ◮ Extrapolates next iterate using M + 1 most recent iterates M v k +1 = � α k j F ( v k − M + j ) j =0 ◮ Let G ( v ) = v − F ( v ), then α k ∈ R M +1 is solution to � � M j =0 α k j G ( v k − M + j ) � 2 minimize 2 � M j =0 α k subject to j = 1 ◮ Typically only need M ≈ 10 for good performance Anderson Acceleration 13

Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation Anderson Acceleration 14

Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation ◮ Change variables to γ k ∈ R M α k 0 = γ k α k i = γ k i − γ k α k M = 1 − γ k i − 1 ∀ i = 1 , . . . , M − 1 , 0 , M − 1 ◮ Unconstrained AA problem is � g k − Y k γ k � 2 minimize 2 , where we define g k = G ( v k ) , y k = g k +1 − g k , Y k = [ y k − M . . . y k − 1 ] Anderson Acceleration 14

Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation ◮ Change variables to γ k ∈ R M α k 0 = γ k α k i = γ k i − γ k α k M = 1 − γ k 0 , i − 1 ∀ i = 1 , . . . , M − 1 , M − 1 ◮ Stabilized AA problem is � � γ k � 2 � g k − Y k γ k � 2 � � S k � 2 F + � Y k � 2 minimize 2 + η 2 , F where η ≥ 0 is a parameter and y k = g k +1 − g k , g k = G ( v k ) , Y k = [ y k − M . . . y k − 1 ] s k = v k +1 − v k , S k = [ s k − M . . . s k − 1 ] Anderson Acceleration 15

A2DR ◮ Parameters: M = max-memory, R = safeguarding parameter ◮ A2DR iterates for k = 1 , 2 , . . . , g k = v k − v k +1 1. v k +1 DRS = F ( v k ) , DRS 2. Compute α k by solving stabilized AA problem = � M j v k − M + j +1 3. v k +1 j =0 α k AA DRS 4. Safeguard check: If � G ( v k ) � 2 is small enough, v k + i = v k + i AA for i = 1 , . . . , R Otherwise, v k +1 = v k +1 DRS ◮ Safeguard ensures convergence in infeasible/unbounded case Anderson Acceleration 16

Stopping Criterion of A2DR ◮ Stop and output x k +1 / 2 when � r k � 2 ≤ ǫ tol prim = Ax k +1 / 2 − b r k t ( v k − x k +1 / 2 ) + A T λ k r k dual = 1 r k = ( r k prim , r k dual ) ◮ Dual variable is minimizer of dual residual norm t ( v k − x k +1 / 2 ) + A T λ λ k = argmin λ � � � 1 � � � 2 ◮ Note that this is a simple least-squares problem Anderson Acceleration 17

Convergence of A2DR Theorem (Solvable Case) If the problem is feasible and bounded, k →∞ � r k � 2 = 0 lim inf and the AA candidates are adopted infinitely often. Furthermore, if F has a fixed point, k →∞ v k = v ⋆ and k →∞ x k +1 / 2 = x ⋆ , lim lim where v ⋆ is a fixed point of F and x ⋆ is a solution to the problem. Anderson Acceleration 18

Convergence of A2DR Theorem (Pathological Case) If the problem is pathological (infeasible/unbounded), v k − v k +1 � � = δ v � = 0 . lim k →∞ Furthermore, if lim k →∞ Ax k +1 / 2 = b, the problem is unbounded and � δ v � 2 = t dist ( dom f ∗ , R ( A T )) . Otherwise, it is infeasible and � δ v � 2 ≥ dist ( dom f , { x : Ax = b } ) . Here f ( x ) = � N i =1 f i ( x i ) . Anderson Acceleration 19

Preconditioning ◮ Convergence greatly improved by rescaling problem ◮ Replace original A , b , f i with ˆ ˆ ˆ A = DAE , b = Db , f i (ˆ x i ) = f i ( e i ˆ x i ) ◮ D and E are diagonal positive, e i > 0 corresponds to i th block diagonal entry of E ◮ D and E chosen by equilibrating A (see paper for details) ◮ Proximal operator of ˆ f i can be evaluated using proximal operator of f i v i ) = 1 prox t ˆ f i (ˆ e i prox ( e 2 i t ) f i ( e i ˆ v i ) Anderson Acceleration 20

Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Numerical Experiments 21

Python Solver Interface result = a2dr(prox_list, A_list, b) Input arguments: ◮ prox_list is list of proximal function handles, e.g. , f i ( x i ) = x i ⇒ prox_list[i] = lambda v,t: v - t ◮ A_list is list of matrices A i , b is vector b Output dictionary keys: ◮ num_iters is total number of iterations K ◮ x_vals is list of final values x K i ◮ primal and dual are vectors containing r k prim and r k dual for k = 1 , . . . , K Numerical Experiments 22

Proximal Library We provide an extensive proximal library in a2dr.proximal f ( x ) prox tf ( v ) Function Handle x v − t prox_identity � x � 1 ( v − t ) + − ( − v − t ) + prox_norm1 � x � 2 (1 − t / � v � 2 ) + v prox_norm2 � x � ∞ Bisection prox_norm_inf e x v − W ( te v ) prox_exp √ � v 2 + 4 t � − log( x ) v + / 2 prox_neg_log i log(1 + e x i ) � Newton-CG prox_logistic � Fx − g � 2 LSQR prox_sum_squares_affine 2 I R n + ( x ) max( v , 0) prox_nonneg_constr ...And much more! See the documentation for full list Numerical Experiments 23

Nonnegative Least Squares (NNLS) � Fz − g � 2 minimize 2 subject to z ≥ 0 with respect to z ∈ R q ◮ Problem data: F ∈ R p × q and g ∈ R p ◮ Can be written in standard form with f 1 ( x 1 ) = � Fx 1 − g � 2 2 , f 2 ( x 2 ) = I R n + ( x 2 ) A 2 = − I , A 1 = I , b = 0 ◮ We evaluate proximal operator of f 1 using LSQR Numerical Experiments 24

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang - PowerPoint PPT Presentation

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1 Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments

An Accelerated Variance Reducing Stochastic Method with Douglas-Rachford Splitting Jingchang Liu

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos

Douglas-Rachford Splitting for Infeasible, Unbounded, and Pathological Problems Yanli Liu, Ernest

On the solution of Bingham fluids and a Preconditioned Douglas-Rachford splitting method for

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

March 2018 Progress Report March Feb Anderson March Feb Anderson March Feb Anderson March

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Evaluation and Evaluation and Design of Water- - Design of Water Splitting Cycles Splitting

New Foreign Tax Credit and FTC Splitting Regulations and FTC Splitting Regulations Mastering

Multicore Based Packet Splitting Multicore Based Packet Splitting Approaches for High Speed

Splitting methods in geometric numerical integration of differential equations Fernando Casas

Microservice Splitting the Monolith Software Engineering II Sharif University of Technology

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering

Classification of Poincar e inequalities and PI-rectifiablity Sylvester ErikssonBique

Distributed Frequency and Voltage Control of Islanded Microgrids John W. Simpson-Porco, Florian

On a Resampling Scheme for Empirical Copula Hideatsu Tsukahara (tsukahar@seijo.ac.jp) Dept of

An alternating variable metric inexact linesearch based algorithm for nonconvex nonsmooth

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang - PowerPoint PPT Presentation

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1 Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments

An Accelerated Variance Reducing Stochastic Method with Douglas-Rachford Splitting Jingchang Liu

Accelerated Douglas-Rachford splitting and ADMM for structured nonconvex optimization Panos

Douglas-Rachford Splitting for Infeasible, Unbounded, and Pathological Problems Yanli Liu, Ernest

On the solution of Bingham fluids and a Preconditioned Douglas-Rachford splitting method for

Introduction 1 Splitting unpack 2 Splitting pack 3 Reduction 4 Advanced technicalities 5

Convex Optimization ( EE227A: UC Berkeley ) Lecture 18 (Proximal methods; Incremental methods

With Splitting Steepest Descent Splitting yields adaptive net structure optimization Questions

Splitting and Propositional Variables in Resolution Theorem Provers Splitting and Propositional

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

March 2018 Progress Report March Feb Anderson March Feb Anderson March Feb Anderson March

NVGRAPH,FIREHOSE,PAGERANK GPU ACCELERATED ANALYTICS NOV 2016 Joe Eaton Ph.D. Accelerated

Evaluation and Evaluation and Design of Water- - Design of Water Splitting Cycles Splitting

New Foreign Tax Credit and FTC Splitting Regulations and FTC Splitting Regulations Mastering

Multicore Based Packet Splitting Multicore Based Packet Splitting Approaches for High Speed

Splitting methods in geometric numerical integration of differential equations Fernando Casas

Microservice Splitting the Monolith Software Engineering II Sharif University of Technology

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science &amp; Engineering

Classification of Poincar e inequalities and PI-rectifiablity Sylvester ErikssonBique

Distributed Frequency and Voltage Control of Islanded Microgrids John W. Simpson-Porco, Florian

On a Resampling Scheme for Empirical Copula Hideatsu Tsukahara (tsukahar@seijo.ac.jp) Dept of

An alternating variable metric inexact linesearch based algorithm for nonconvex nonsmooth

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering