accelerated primal dual methods for linearly constrained
play

Accelerated primal-dual methods for linearly constrained convex - PowerPoint PPT Presentation

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize F ( x ) := f ( x ) + g ( x ) x f


  1. Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23

  2. Accelerated proximal gradient For convex composite problem: minimize F ( x ) := f ( x ) + g ( x ) x • f : convex and Lipschitz differentiable • g : closed convex (possibly nondifferentiable) and simple Proximal gradient: �∇ f ( x k ) , x � + L f x k +1 = arg min 2 � x − x k � 2 + g ( x ) x • convergence rate: F ( x k ) − F ( x ∗ ) = O (1 /k ) Accelerated Proximal gradient [Beck-Teboulle’09, Nesterov’14]: x k ) , x � + L f x k +1 = arg min x k � 2 + g ( x ) �∇ f (ˆ 2 � x − ˆ x x k : extrapolated point • ˆ • convergence rate (with smart extrapolation): F ( x k ) − F ( x ∗ ) = O (1 /k 2 ) This talk: ways to accelerate primal-dual methods 2 / 23

  3. Part I: accelerated linearized augmented Lagrangian 3 / 23

  4. Affinely constrained composite convex problems minimize F ( x ) = f ( x ) + g ( x ) , subject to Ax = b (LCP) x • f : convex and Lipschitz differentiable • g : closed convex and simple Examples • nonnegative quadratic programming: f = 1 2 x ⊤ Qx + c ⊤ x , g = ι R n + • TV image denoising: min { 1 2 � X − B � 2 F + λ � Y � 1 , s . t . D ( X ) = Y } 4 / 23

  5. Augmented Lagrangian method (ALM) At iteration k , f ( x ) + g ( x ) − � λ k , Ax � + β x k +1 ← arg min 2 � Ax − b � 2 , x λ k +1 ← λ k − γ ( Ax k +1 − b ) • augmented dual gradient ascent with stepsize γ • β : penalty parameter; dual gradient Lipschitz constant 1 /β • 0 < γ < 2 β : convergence guaranteed • also popular for (nonlinear, nonconvex) constrained problems x -subproblem as difficult as original problem 5 / 23

  6. Linearized augmented Lagrangian method • Linearize the smooth term f : �∇ f ( x k ) , x � + η 2 � x − x k � 2 + g ( x ) − � λ k , Ax � + β x k +1 ← arg min 2 � Ax − b � 2 . x • Linearize both f and � Ax − b � 2 : �∇ f ( x k ) , x � + g ( x ) − � λ k , Ax � + � βA ⊤ r k , x � + η x k +1 ← arg min 2 � x − x k � 2 , x where r k = Ax k − b is the residual. Easier updates and nice convergence speed O (1 /k ) 6 / 23

  7. Accelerated linearized augmented Lagrangian method At iteration k , x k ← (1 − α k )¯ x k + α k x k , ˆ x k ) − A ⊤ λ k , x � + g ( x ) + β k 2 � Ax − b � 2 + η k x k +1 ← arg min 2 � x − x k � 2 , �∇ f (ˆ x x k +1 ← (1 − α k )¯ x k + α k x k +1 , ¯ λ k +1 ← λ k − γ k ( Ax k +1 − b ) . • Inspired by [Lan ’12] on accelerated stochastic approximation • reduces to linearized ALM if α k = 1 , β k = β, η k = η, γ k = γ, ∀ k • convergence rate: O (1 /k ) if η ≥ L f and 0 < γ < 2 β • adaptive parameters to have O (1 /k 2 ) (next slides) 7 / 23

  8. Better numerical performance Objective error Feasibility Violation 0 0 10 10 Nonaccelerated ALM Nonaccelerated ALM |objective minus optimal value| Accelerated ALM Accelerated ALM −1 10 −2 10 violation of feasibility −2 10 −4 10 −3 10 −6 10 −4 10 −8 10 −5 10 −6 −10 10 10 0 200 400 600 800 1000 0 200 400 600 800 1000 Iteration numbers Iteration numbers • Tested on quadratic programming (subproblems solved exactly) • Parameters set according to theorem (see next slide) • Accelerated ALM significantly better 8 / 23

  9. Guaranteed fast convergence Assumptions: • There is a pair of primal-dual solution ( x ∗ , λ ∗ ) . • ∇ f is Lipschitz continuous: �∇ f ( x ) − ∇ f ( y ) � ≤ L f � x − y � Convergence rate of order O (1 /k 2 ) : • Set parameters to k + 1 , γ k = kγ, β k ≥ γ k 2 2 , η k = η ∀ k : α k = k , where γ > 0 and η ≥ 2 L f . Then � � η � x 1 − x ∗ � 2 + 4 � λ ∗ � 2 1 x k +1 ) − F ( x ∗ ) | ≤ | F (¯ , k ( k + 1) γ � � η � x 1 − x ∗ � 2 + 4 � λ ∗ � 2 1 x t +1 − b � ≤ � A ¯ , k ( k + 1) max(1 , � λ ∗ � ) γ 9 / 23

  10. Sketch of proof Let Φ(¯ x, x, λ ) = F (¯ x ) − F ( x ) − � λ, A ¯ x − b � . 1. Fundamental inequality (for any λ ): x k +1 , x ∗ , λ ) − (1 − α k )Φ(¯ x k , x ∗ , λ ) Φ(¯ α 2 � x k +1 − x ∗ � 2 − � x k − x ∗ � 2 + � x k +1 − x k � 2 � k L f � x k +1 − x k � 2 ≤− α k η k � + 2 2 � λ k − λ � 2 − � λ k +1 − λ � 2 + � λ k +1 − λ k � 2 � � λ k +1 − λ k � 2 , + α k � − α k β k 2 γ k γ 2 k k +1 , γ k = kγ, β k ≥ γ k 2 2 , η k = η 2. α k = k and multiply k ( k + 1) to the above ineq.: x k +1 , x ∗ , λ ) − k ( k − 1)Φ(¯ x k , x ∗ , λ ) k ( k + 1)Φ(¯ + 1 � x k +1 − x ∗ � 2 − � x k − x ∗ � 2 � � λ k − λ � 2 − � λ k +1 − λ � 2 � ≤ − η � � . γ 3. Set λ 1 = 0 and sum the above inequality over k : 1 � η � x 1 − x ∗ � 2 + 1 γ � λ � 2 � x k +1 , x ∗ , λ ) ≤ Φ(¯ k ( k + 1) x k +1 − b 4. Take λ = max (1 + � λ ∗ � , 2 � λ ∗ � ) A ¯ x k +1 − b � and use the optimality condition � A ¯ x k +1 − b � x, x ∗ , λ ∗ ) ≥ 0 ⇒ F (¯ x k +1 ) − F ( x ∗ ) ≥ −� λ ∗ � · � A ¯ Φ(¯ 10 / 23

  11. Literature • [He-Yuan ’10]: accelerated ALM to O (1 /k 2 ) for smooth problems • [Kang et. al ’13]: accelerated ALM to O (1 /k 2 ) for nonsmooth problems • [Huang-Ma-Goldfarb ’13]: accelerated linearized ALM (with linearization of augmented term) to O (1 /k 2 ) for strongly convex problems 11 / 23

  12. Part II: accelerated linearized ADMM 12 / 23

  13. Two-block structured problems Variable is partitioned into two blocks, smooth part involves one block, and nonsmooth part is separable minimize h ( y ) + f ( z ) + g ( z ) , subject to By + Cz = b (LCP-2) y,z • f convex and Lipschitz differentiable • g and h closed convex and simple Examples: • Total-variation regularized regression: � y,z λ � y � 1 + f ( z ) , s . t . D z = y � min 13 / 23

  14. Alternating direction method of multipliers (ADMM) At iteration k , h ( y ) − � λ k , By � + β y k +1 ← arg min 2 � By + Cz k − b � 2 , y f ( z ) + g ( z ) − � λ k , Cz � + β z k +1 ← arg min 2 � By k +1 + Cz − b � 2 , z λ k +1 ← λ k − γ ( By k +1 + Cz k +1 − b ) √ • 0 < γ < 1+ 5 β : convergence guaranteed [Glowinski-Marrocco’75] 2 • updating y, z alternatingly: easier than jointly update • but z -subproblem can still be difficult 14 / 23

  15. Accelerated linearized ADMM At iteration k , h ( y ) − � λ k , By � + β k y k +1 ← arg min 2 � By + Cz k + − b � 2 , y 2 , z � + g ( z ) + η k z k +1 ← arg min �∇ f ( z k ) − C ⊤ λ k + β k C ⊤ r k + 1 2 � z − z k � 2 , z λ k +1 ← λ k − γ k ( By k +1 + Cz k +1 − b ) where r k + 1 2 = By k +1 + Cz k − b . • reduces to linearized ADMM if β k = β, η k = η, γ k = γ, ∀ k • convergence rate: O (1 /k ) if 0 < γ ≤ β and η ≥ L f + β � C � 2 • O (1 /k 2 ) if adaptive parameters and strong convexity on z (next two slides) 15 / 23

  16. Accelerated convergence speed Assumptions: • Existence of a pair of primal-dual solution ( y ∗ , z ∗ , λ ∗ ) • ∇ f Lipschitz continuous: �∇ f (ˆ z ) − ∇ f (˜ z ) � ≤ L f � ˆ z − ˜ z � • f strongly convex with modulus µ f (not required for y ) Convergence rate of order O (1 /k 2 ) • Set parameters as follows (with γ > 0 and γ < η ≤ µ f / 2 ) ∀ k : β k = γ k = ( k + 1) γ, η k = ( k + 1) η + L f , Then � z k − z ∗ � 2 , | F (¯ y k + C ¯ z k − b � � max � y k , ¯ z k ) − F ∗ | , � B ¯ ≤ O (1 /k 2 ) , where F ( y, z ) = h ( y ) + f ( z ) + g ( z ) and F ∗ = F ( y ∗ , z ∗ ) . 16 / 23

  17. Sketch of proof 1. Fundamental inequality from optimality conditions of each iterate: F ( y k +1 , z k +1 ) − F ( y, z ) − � λ, By k +1 + Cz k +1 − b � ≤− � 1 γ k ( λ k − λ k +1 ) , λ − λ k + β k γ k ( λ k − λ k +1 ) − β k C ( z k +1 − z k ) � L f µ f 2 � z k +1 − z k � 2 − 2 � z k − z � 2 − η k � z k +1 − z, z k +1 − z k � , + 2. Plug in parameters and bound cross terms: F ( y k +1 , z k +1 ) − F ( y ∗ , z ∗ ) − � λ, By k +1 + Cz k +1 − b � η ( k + 1) � z k +1 − z ∗ � 2 + L f � z k +1 − z ∗ � 2 � � + 1 2 γ ( k +1) � λ − λ k +1 � 2 1 + 2 η ( k + 1) � z k − z ∗ � 2 + ( L f − µ f ) � z k − z ∗ � 2 � ≤ 1 � 1 2 γ ( k +1) � λ − λ k � 2 . + 2 2 L f 3. Multiply k + k 0 (here k 0 ∼ µ f ) and sum the inequality over k : z k +1 − b � ≤ φ ( y ∗ , z ∗ , λ ) y k +1 + C ¯ y k +1 , ¯ z k +1 ) − F ( y ∗ , z ∗ ) − � λ, B ¯ F (¯ k 2 4. Take a special λ and use KKT conditions 17 / 23

  18. Literature • [Ouyang et. al’15] : O ( L f /k 2 + C 0 /k ) with only weak convexity • [Goldstein et. al’14] : O (1 /k 2 ) with strong convexity on both y and z • [Chambolle-Pock’11, Chambolle-Pock’16, Dang-Lan’14, Bredies-Sun’16] : accelerated first-order methods on bilinear saddle-point problems Open question: weakest conditions to have O (1 /k 2 ) 18 / 23

  19. Numerical experiments (More results in paper) 19 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend