Accelerated primal-dual methods for linearly constrained convex - PowerPoint PPT Presentation

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23

Accelerated proximal gradient For convex composite problem: minimize F ( x ) := f ( x ) + g ( x ) x • f : convex and Lipschitz differentiable • g : closed convex (possibly nondifferentiable) and simple Proximal gradient: �∇ f ( x k ) , x � + L f x k +1 = arg min 2 � x − x k � 2 + g ( x ) x • convergence rate: F ( x k ) − F ( x ∗ ) = O (1 /k ) Accelerated Proximal gradient [Beck-Teboulle’09, Nesterov’14]: x k ) , x � + L f x k +1 = arg min x k � 2 + g ( x ) �∇ f (ˆ 2 � x − ˆ x x k : extrapolated point • ˆ • convergence rate (with smart extrapolation): F ( x k ) − F ( x ∗ ) = O (1 /k 2 ) This talk: ways to accelerate primal-dual methods 2 / 23

Part I: accelerated linearized augmented Lagrangian 3 / 23

Affinely constrained composite convex problems minimize F ( x ) = f ( x ) + g ( x ) , subject to Ax = b (LCP) x • f : convex and Lipschitz differentiable • g : closed convex and simple Examples • nonnegative quadratic programming: f = 1 2 x ⊤ Qx + c ⊤ x , g = ι R n + • TV image denoising: min { 1 2 � X − B � 2 F + λ � Y � 1 , s . t . D ( X ) = Y } 4 / 23

Augmented Lagrangian method (ALM) At iteration k , f ( x ) + g ( x ) − � λ k , Ax � + β x k +1 ← arg min 2 � Ax − b � 2 , x λ k +1 ← λ k − γ ( Ax k +1 − b ) • augmented dual gradient ascent with stepsize γ • β : penalty parameter; dual gradient Lipschitz constant 1 /β • 0 < γ < 2 β : convergence guaranteed • also popular for (nonlinear, nonconvex) constrained problems x -subproblem as difficult as original problem 5 / 23

Linearized augmented Lagrangian method • Linearize the smooth term f : �∇ f ( x k ) , x � + η 2 � x − x k � 2 + g ( x ) − � λ k , Ax � + β x k +1 ← arg min 2 � Ax − b � 2 . x • Linearize both f and � Ax − b � 2 : �∇ f ( x k ) , x � + g ( x ) − � λ k , Ax � + � βA ⊤ r k , x � + η x k +1 ← arg min 2 � x − x k � 2 , x where r k = Ax k − b is the residual. Easier updates and nice convergence speed O (1 /k ) 6 / 23

Accelerated linearized augmented Lagrangian method At iteration k , x k ← (1 − α k )¯ x k + α k x k , ˆ x k ) − A ⊤ λ k , x � + g ( x ) + β k 2 � Ax − b � 2 + η k x k +1 ← arg min 2 � x − x k � 2 , �∇ f (ˆ x x k +1 ← (1 − α k )¯ x k + α k x k +1 , ¯ λ k +1 ← λ k − γ k ( Ax k +1 − b ) . • Inspired by [Lan ’12] on accelerated stochastic approximation • reduces to linearized ALM if α k = 1 , β k = β, η k = η, γ k = γ, ∀ k • convergence rate: O (1 /k ) if η ≥ L f and 0 < γ < 2 β • adaptive parameters to have O (1 /k 2 ) (next slides) 7 / 23

Better numerical performance Objective error Feasibility Violation 0 0 10 10 Nonaccelerated ALM Nonaccelerated ALM |objective minus optimal value| Accelerated ALM Accelerated ALM −1 10 −2 10 violation of feasibility −2 10 −4 10 −3 10 −6 10 −4 10 −8 10 −5 10 −6 −10 10 10 0 200 400 600 800 1000 0 200 400 600 800 1000 Iteration numbers Iteration numbers • Tested on quadratic programming (subproblems solved exactly) • Parameters set according to theorem (see next slide) • Accelerated ALM significantly better 8 / 23

Guaranteed fast convergence Assumptions: • There is a pair of primal-dual solution ( x ∗ , λ ∗ ) . • ∇ f is Lipschitz continuous: �∇ f ( x ) − ∇ f ( y ) � ≤ L f � x − y � Convergence rate of order O (1 /k 2 ) : • Set parameters to k + 1 , γ k = kγ, β k ≥ γ k 2 2 , η k = η ∀ k : α k = k , where γ > 0 and η ≥ 2 L f . Then � � η � x 1 − x ∗ � 2 + 4 � λ ∗ � 2 1 x k +1 ) − F ( x ∗ ) | ≤ | F (¯ , k ( k + 1) γ � � η � x 1 − x ∗ � 2 + 4 � λ ∗ � 2 1 x t +1 − b � ≤ � A ¯ , k ( k + 1) max(1 , � λ ∗ � ) γ 9 / 23

Sketch of proof Let Φ(¯ x, x, λ ) = F (¯ x ) − F ( x ) − � λ, A ¯ x − b � . 1. Fundamental inequality (for any λ ): x k +1 , x ∗ , λ ) − (1 − α k )Φ(¯ x k , x ∗ , λ ) Φ(¯ α 2 � x k +1 − x ∗ � 2 − � x k − x ∗ � 2 + � x k +1 − x k � 2 � k L f � x k +1 − x k � 2 ≤− α k η k � + 2 2 � λ k − λ � 2 − � λ k +1 − λ � 2 + � λ k +1 − λ k � 2 � � λ k +1 − λ k � 2 , + α k � − α k β k 2 γ k γ 2 k k +1 , γ k = kγ, β k ≥ γ k 2 2 , η k = η 2. α k = k and multiply k ( k + 1) to the above ineq.: x k +1 , x ∗ , λ ) − k ( k − 1)Φ(¯ x k , x ∗ , λ ) k ( k + 1)Φ(¯ + 1 � x k +1 − x ∗ � 2 − � x k − x ∗ � 2 � � λ k − λ � 2 − � λ k +1 − λ � 2 � ≤ − η � � . γ 3. Set λ 1 = 0 and sum the above inequality over k : 1 � η � x 1 − x ∗ � 2 + 1 γ � λ � 2 � x k +1 , x ∗ , λ ) ≤ Φ(¯ k ( k + 1) x k +1 − b 4. Take λ = max (1 + � λ ∗ � , 2 � λ ∗ � ) A ¯ x k +1 − b � and use the optimality condition � A ¯ x k +1 − b � x, x ∗ , λ ∗ ) ≥ 0 ⇒ F (¯ x k +1 ) − F ( x ∗ ) ≥ −� λ ∗ � · � A ¯ Φ(¯ 10 / 23

Literature • [He-Yuan ’10]: accelerated ALM to O (1 /k 2 ) for smooth problems • [Kang et. al ’13]: accelerated ALM to O (1 /k 2 ) for nonsmooth problems • [Huang-Ma-Goldfarb ’13]: accelerated linearized ALM (with linearization of augmented term) to O (1 /k 2 ) for strongly convex problems 11 / 23

Part II: accelerated linearized ADMM 12 / 23

Two-block structured problems Variable is partitioned into two blocks, smooth part involves one block, and nonsmooth part is separable minimize h ( y ) + f ( z ) + g ( z ) , subject to By + Cz = b (LCP-2) y,z • f convex and Lipschitz differentiable • g and h closed convex and simple Examples: • Total-variation regularized regression: � y,z λ � y � 1 + f ( z ) , s . t . D z = y � min 13 / 23

Alternating direction method of multipliers (ADMM) At iteration k , h ( y ) − � λ k , By � + β y k +1 ← arg min 2 � By + Cz k − b � 2 , y f ( z ) + g ( z ) − � λ k , Cz � + β z k +1 ← arg min 2 � By k +1 + Cz − b � 2 , z λ k +1 ← λ k − γ ( By k +1 + Cz k +1 − b ) √ • 0 < γ < 1+ 5 β : convergence guaranteed [Glowinski-Marrocco’75] 2 • updating y, z alternatingly: easier than jointly update • but z -subproblem can still be difficult 14 / 23

Accelerated linearized ADMM At iteration k , h ( y ) − � λ k , By � + β k y k +1 ← arg min 2 � By + Cz k + − b � 2 , y 2 , z � + g ( z ) + η k z k +1 ← arg min �∇ f ( z k ) − C ⊤ λ k + β k C ⊤ r k + 1 2 � z − z k � 2 , z λ k +1 ← λ k − γ k ( By k +1 + Cz k +1 − b ) where r k + 1 2 = By k +1 + Cz k − b . • reduces to linearized ADMM if β k = β, η k = η, γ k = γ, ∀ k • convergence rate: O (1 /k ) if 0 < γ ≤ β and η ≥ L f + β � C � 2 • O (1 /k 2 ) if adaptive parameters and strong convexity on z (next two slides) 15 / 23

Accelerated convergence speed Assumptions: • Existence of a pair of primal-dual solution ( y ∗ , z ∗ , λ ∗ ) • ∇ f Lipschitz continuous: �∇ f (ˆ z ) − ∇ f (˜ z ) � ≤ L f � ˆ z − ˜ z � • f strongly convex with modulus µ f (not required for y ) Convergence rate of order O (1 /k 2 ) • Set parameters as follows (with γ > 0 and γ < η ≤ µ f / 2 ) ∀ k : β k = γ k = ( k + 1) γ, η k = ( k + 1) η + L f , Then � z k − z ∗ � 2 , | F (¯ y k + C ¯ z k − b � � max � y k , ¯ z k ) − F ∗ | , � B ¯ ≤ O (1 /k 2 ) , where F ( y, z ) = h ( y ) + f ( z ) + g ( z ) and F ∗ = F ( y ∗ , z ∗ ) . 16 / 23

Sketch of proof 1. Fundamental inequality from optimality conditions of each iterate: F ( y k +1 , z k +1 ) − F ( y, z ) − � λ, By k +1 + Cz k +1 − b � ≤− � 1 γ k ( λ k − λ k +1 ) , λ − λ k + β k γ k ( λ k − λ k +1 ) − β k C ( z k +1 − z k ) � L f µ f 2 � z k +1 − z k � 2 − 2 � z k − z � 2 − η k � z k +1 − z, z k +1 − z k � , + 2. Plug in parameters and bound cross terms: F ( y k +1 , z k +1 ) − F ( y ∗ , z ∗ ) − � λ, By k +1 + Cz k +1 − b � η ( k + 1) � z k +1 − z ∗ � 2 + L f � z k +1 − z ∗ � 2 � � + 1 2 γ ( k +1) � λ − λ k +1 � 2 1 + 2 η ( k + 1) � z k − z ∗ � 2 + ( L f − µ f ) � z k − z ∗ � 2 � ≤ 1 � 1 2 γ ( k +1) � λ − λ k � 2 . + 2 2 L f 3. Multiply k + k 0 (here k 0 ∼ µ f ) and sum the inequality over k : z k +1 − b � ≤ φ ( y ∗ , z ∗ , λ ) y k +1 + C ¯ y k +1 , ¯ z k +1 ) − F ( y ∗ , z ∗ ) − � λ, B ¯ F (¯ k 2 4. Take a special λ and use KKT conditions 17 / 23

Literature • [Ouyang et. al’15] : O ( L f /k 2 + C 0 /k ) with only weak convexity • [Goldstein et. al’14] : O (1 /k 2 ) with strong convexity on both y and z • [Chambolle-Pock’11, Chambolle-Pock’16, Dang-Lan’14, Bredies-Sun’16] : accelerated first-order methods on bilinear saddle-point problems Open question: weakest conditions to have O (1 /k 2 ) 18 / 23

Numerical experiments (More results in paper) 19 / 23

Accelerated primal-dual methods for linearly constrained convex - PowerPoint PPT Presentation

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize F ( x ) := f ( x ) + g ( x ) x f

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

Projected primal-dual splitting for solving constrained convex optimization 1 L. M. Brice

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Duality Between Constrained Estimation and Control Jos e De Don a September 2004 Centre

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

Administrative Leadership Meeting Randy Woodson Chancellor Tuesday, January 9, 2018 Upcoming

ALM API for Topology Management ALM API for Topology Management and Network Layer Transparent

Spatial data with sf Spatial data with sf Programming for Statistical Programming for

Home Gardens for Resilience (HG4RR) and Recovery October 6 / 2020 Ugo Bernieri

Lessons Learned in Deploying PaaS Colin Humphreys What we have done and why we have done it

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

HOW TO DEVELOP WITH NTAG 5 NTAG 5 WEBINAR SERIES PABLO FUENTES FEBRUARY 2020 PUBLIC Agenda

Primordial non-Gaussianity and the Bispectrum of the Cosmic Microwave Background Filippo Oppizzi

Accelerated primal-dual methods for linearly constrained convex - PowerPoint PPT Presentation

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize F ( x ) := f ( x ) + g ( x ) x f

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % &amp; non-linearly. As

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained

Projected primal-dual splitting for solving constrained convex optimization 1 L. M. Brice

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &amp;

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

Accelerated Reader What is Accelerated Reader? Accelerated Reader is the number one software

Duality Between Constrained Estimation and Control Jos e De Don a September 2004 Centre

Calhoun Community College Dual Enrollment Info Session for Students &amp; Parents What is Dual

Administrative Leadership Meeting Randy Woodson Chancellor Tuesday, January 9, 2018 Upcoming

ALM API for Topology Management ALM API for Topology Management and Network Layer Transparent

Spatial data with sf Spatial data with sf Programming for Statistical Programming for

Home Gardens for Resilience (HG4RR) and Recovery October 6 / 2020 Ugo Bernieri

Lessons Learned in Deploying PaaS Colin Humphreys What we have done and why we have done it

Seminar 18122 Automatic Quality Assurance and Release Seminar 18122 Automatic Quality

HOW TO DEVELOP WITH NTAG 5 NTAG 5 WEBINAR SERIES PABLO FUENTES FEBRUARY 2020 PUBLIC Agenda

Primordial non-Gaussianity and the Bispectrum of the Cosmic Microwave Background Filippo Oppizzi

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual