anderson accelerated douglas rachford splitting
play

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang - PowerPoint PPT Presentation

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1 Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments


  1. Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME Departments Stanford University March 10, 2020 1

  2. Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion 2

  3. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Problem Overview 3

  4. Prox-Affine Problem Prox-affine convex optimization problem: � N minimize i =1 f i ( x i ) � N subject to i =1 A i x i = b with variables x i ∈ R n i for i = 1 , . . . , N ◮ A i ∈ R m × n i and b ∈ R m given data ◮ f i : R n i → R ∪ { + ∞} are closed, convex and proper ◮ Each f i can only be accessed via its proximal operator � � f i ( x i ) + 1 2 t � x i − v i � 2 prox tf i ( v i ) = argmin x i , 2 where t > 0 is a parameter Problem Overview 4

  5. Why This Formulation? ◮ Encompasses many classes of convex problems (conic programs, consensus optimization) ◮ Block separable form ideal for distributed optimization ◮ Proximal operator can be provided as a “black box”, enabling privacy-preserving implementation Problem Overview 5

  6. Previous Work ◮ Alternating direction method of multipliers (ADMM) ◮ Douglas-Rachford splitting (DRS) ◮ Augmented Lagrangian method (ALM) Problem Overview 6

  7. Previous Work ◮ Alternating direction method of multipliers (ADMM) ◮ Douglas-Rachford splitting (DRS) ◮ Augmented Lagrangian method (ALM) These are typically slow to converge, prompting research into acceleration techniques: ◮ Adaptive penalty parameters ◮ Momentum methods ◮ Quasi-Newton method with line search Problem Overview 6

  8. Our Method ◮ A2DR : Anderson acceleration (AA) applied to DRS ◮ DRS is a non-expansive fixed-point (NEFP) method that fits prox-affine framework ◮ AA is fast, efficient, and can be applied to NEFP iterations – but unstable without modification ◮ We introduce a type-II AA variant that converges globally in non-smooth, potentially pathological settings Problem Overview 7

  9. Main Advantages ◮ A2DR produces primal and dual solutions, or a certificate of infeasibility/unboundedness ◮ Consistently converges faster with no parameter tuning ◮ Memory efficient ⇒ little extra cost per iteration ◮ Scales to large problems and is easily parallelized ◮ Python implementation: https://github.com/cvxgrp/a2dr Problem Overview 8

  10. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Douglas-Rachford Splitting 9

  11. DRS Algorithm ◮ Define A = [ A 1 . . . A n ] and x = ( x 1 , . . . , x N ) ◮ Rewrite problem using set indicator I S � N minimize i =1 f i ( x i ) + I Ax = b ( x ) ◮ DRS iterates for k = 1 , 2 , . . . , x k +1 / 2 = prox tf i ( v k ) , i = 1 , . . . , N i v k +1 / 2 = 2 x k +1 / 2 − v k x k +1 = Π Av = b ( v k +1 / 2 ) v k +1 = v k + x k +1 − x k +1 / 2 Π S ( v ) is Euclidean projection of v onto S Douglas-Rachford Splitting 10

  12. Convergence of DRS ◮ DRS iterations can be conceived as a fixed-point mapping v k +1 = F ( v k ) , where F is firmly non-expansive ◮ v k converges to a fixed point of F (if it exists) ◮ x k and x k +1 / 2 converge to a solution of our problem Douglas-Rachford Splitting 11

  13. Convergence of DRS ◮ DRS iterations can be conceived as a fixed-point mapping v k +1 = F ( v k ) , where F is firmly non-expansive ◮ v k converges to a fixed point of F (if it exists) ◮ x k and x k +1 / 2 converge to a solution of our problem In practice, this convergence is often slow... Douglas-Rachford Splitting 11

  14. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Anderson Acceleration 12

  15. Type-II AA ◮ Quasi-Newton method for accelerating fixed point iterations ◮ Extrapolates next iterate using M + 1 most recent iterates M v k +1 = � α k j F ( v k − M + j ) j =0 ◮ Let G ( v ) = v − F ( v ), then α k ∈ R M +1 is solution to � � M j =0 α k j G ( v k − M + j ) � 2 minimize 2 � M j =0 α k subject to j = 1 ◮ Typically only need M ≈ 10 for good performance Anderson Acceleration 13

  16. Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation Anderson Acceleration 14

  17. Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation ◮ Change variables to γ k ∈ R M α k 0 = γ k α k i = γ k i − γ k α k M = 1 − γ k i − 1 ∀ i = 1 , . . . , M − 1 , 0 , M − 1 ◮ Unconstrained AA problem is � g k − Y k γ k � 2 minimize 2 , where we define g k = G ( v k ) , y k = g k +1 − g k , Y k = [ y k − M . . . y k − 1 ] Anderson Acceleration 14

  18. Adaptive Regularization ◮ Type-II AA is unstable (Scieur, d’Aspremont, Bach 2016) and can provably diverge (Mai, Johansson 2019) ◮ Add adaptive regularization term to unconstrained formulation ◮ Change variables to γ k ∈ R M α k 0 = γ k α k i = γ k i − γ k α k M = 1 − γ k 0 , i − 1 ∀ i = 1 , . . . , M − 1 , M − 1 ◮ Stabilized AA problem is � � γ k � 2 � g k − Y k γ k � 2 � � S k � 2 F + � Y k � 2 minimize 2 + η 2 , F where η ≥ 0 is a parameter and y k = g k +1 − g k , g k = G ( v k ) , Y k = [ y k − M . . . y k − 1 ] s k = v k +1 − v k , S k = [ s k − M . . . s k − 1 ] Anderson Acceleration 15

  19. A2DR ◮ Parameters: M = max-memory, R = safeguarding parameter ◮ A2DR iterates for k = 1 , 2 , . . . , g k = v k − v k +1 1. v k +1 DRS = F ( v k ) , DRS 2. Compute α k by solving stabilized AA problem = � M j v k − M + j +1 3. v k +1 j =0 α k AA DRS 4. Safeguard check: If � G ( v k ) � 2 is small enough, v k + i = v k + i AA for i = 1 , . . . , R Otherwise, v k +1 = v k +1 DRS ◮ Safeguard ensures convergence in infeasible/unbounded case Anderson Acceleration 16

  20. Stopping Criterion of A2DR ◮ Stop and output x k +1 / 2 when � r k � 2 ≤ ǫ tol prim = Ax k +1 / 2 − b r k t ( v k − x k +1 / 2 ) + A T λ k r k dual = 1 r k = ( r k prim , r k dual ) ◮ Dual variable is minimizer of dual residual norm t ( v k − x k +1 / 2 ) + A T λ λ k = argmin λ � � � 1 � � � 2 ◮ Note that this is a simple least-squares problem Anderson Acceleration 17

  21. Convergence of A2DR Theorem (Solvable Case) If the problem is feasible and bounded, k →∞ � r k � 2 = 0 lim inf and the AA candidates are adopted infinitely often. Furthermore, if F has a fixed point, k →∞ v k = v ⋆ and k →∞ x k +1 / 2 = x ⋆ , lim lim where v ⋆ is a fixed point of F and x ⋆ is a solution to the problem. Anderson Acceleration 18

  22. Convergence of A2DR Theorem (Pathological Case) If the problem is pathological (infeasible/unbounded), v k − v k +1 � � = δ v � = 0 . lim k →∞ Furthermore, if lim k →∞ Ax k +1 / 2 = b, the problem is unbounded and � δ v � 2 = t dist ( dom f ∗ , R ( A T )) . Otherwise, it is infeasible and � δ v � 2 ≥ dist ( dom f , { x : Ax = b } ) . Here f ( x ) = � N i =1 f i ( x i ) . Anderson Acceleration 19

  23. Preconditioning ◮ Convergence greatly improved by rescaling problem ◮ Replace original A , b , f i with ˆ ˆ ˆ A = DAE , b = Db , f i (ˆ x i ) = f i ( e i ˆ x i ) ◮ D and E are diagonal positive, e i > 0 corresponds to i th block diagonal entry of E ◮ D and E chosen by equilibrating A (see paper for details) ◮ Proximal operator of ˆ f i can be evaluated using proximal operator of f i v i ) = 1 prox t ˆ f i (ˆ e i prox ( e 2 i t ) f i ( e i ˆ v i ) Anderson Acceleration 20

  24. Outline Problem Overview Douglas-Rachford Splitting Anderson Acceleration Numerical Experiments Conclusion Numerical Experiments 21

  25. Python Solver Interface result = a2dr(prox_list, A_list, b) Input arguments: ◮ prox_list is list of proximal function handles, e.g. , f i ( x i ) = x i ⇒ prox_list[i] = lambda v,t: v - t ◮ A_list is list of matrices A i , b is vector b Output dictionary keys: ◮ num_iters is total number of iterations K ◮ x_vals is list of final values x K i ◮ primal and dual are vectors containing r k prim and r k dual for k = 1 , . . . , K Numerical Experiments 22

  26. Proximal Library We provide an extensive proximal library in a2dr.proximal f ( x ) prox tf ( v ) Function Handle x v − t prox_identity � x � 1 ( v − t ) + − ( − v − t ) + prox_norm1 � x � 2 (1 − t / � v � 2 ) + v prox_norm2 � x � ∞ Bisection prox_norm_inf e x v − W ( te v ) prox_exp √ � v 2 + 4 t � − log( x ) v + / 2 prox_neg_log i log(1 + e x i ) � Newton-CG prox_logistic � Fx − g � 2 LSQR prox_sum_squares_affine 2 I R n + ( x ) max( v , 0) prox_nonneg_constr ...And much more! See the documentation for full list Numerical Experiments 23

  27. Nonnegative Least Squares (NNLS) � Fz − g � 2 minimize 2 subject to z ≥ 0 with respect to z ∈ R q ◮ Problem data: F ∈ R p × q and g ∈ R p ◮ Can be written in standard form with f 1 ( x 1 ) = � Fx 1 − g � 2 2 , f 2 ( x 2 ) = I R n + ( x 2 ) A 2 = − I , A 1 = I , b = 0 ◮ We evaluate proximal operator of f 1 using LSQR Numerical Experiments 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend