Linear Convergence of Randomized Primal-Dual Coordinate Method for - PowerPoint PPT Presentation

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained Convex Programming Daoli Zhu and Lei Zhao Shanghai Jiao Tong University ICML 2020 July 16, 2020

PRO PRE CCR LCA NA CCL Outline 1 Research Problem 2 Preliminaries 3 Convergence and Convergence Rate Analysis of RPDC 4 Linear Convergence of RPDC under Global Strong Metric Subregularity 5 Numerical Analysis 6 Conclusions 2 / 28

PRO PRE CCR LCA NA CCL 1. Research Problem Linear Constrained Convex Programming (LCCP): (P): min F ( u ) = G ( u ) + J ( u ) s . t Au − b = 0 (1.1) u ∈ U Assumption 1 (H 1 ) J is a convex, lower semi-continuous function (not necessarily differentiable) such that dom J ∩ U � = ∅ . (H 2 ) G is convex and differentiable, and its derivative is Lipschitz with constant B G . (H 3 ) There exists at least one saddle point for the Lagrangian of (P). Decomposition for partial structured problem: N � Space decomposition of U : U = U 1 × U 2 · · · × U N , U i ⊂ R n i , n i = n . i = 1 N � J i ( u i ) and A = ( A 1 , A 2 , · · · , A N ) ∈ R m × n is an appropriate partition of A , where J ( u ) = i = 1 A i is an m × n i matrix. 3 / 28

PRO PRE CCR LCA NA CCL 1.1 Motivation Support vector machine (SVM) problem: 1 2 u ⊤ Qu − 1 ⊤ (SVM) min n u u ∈ [ 0 , c ] n y ⊤ u = 0 s.t. Q ∈ R n × n is symmetric and positive-definite. c > 0, y ∈ {− 1 , 1 } n . Machine learning portfolio (MLP) problem: 1 2 u ⊤ Σ u + λ � u � 1 (MLP) min u ∈ R n µ ⊤ u = ρ s.t. 1 ⊤ n u = 1 Σ ∈ R n × n is the estimated covariance matrix of asset returns. µ ∈ R n is the expectation of asset returns. ρ is a predefined prospective growth rate. 4 / 28

PRO PRE CCR LCA NA CCL 1.1 Motivation In the big data era, the datasets used for computation are very big and are often distributed in different locations. It is often impractical to assume that optimization algorithms can traverse an entire dataset once in each iteration, because doing so is either time consuming or unreliable. Coordinate-type methods can make progress by using distributed information and thus, provide much flexibility for their implementation in the distributed environments. Therefore, we adopt randomized coordinate methods for the constrained optimization problem with emphasis on the convergence and rate of convergence properties. 5 / 28

PRO PRE CCR LCA NA CCL 1.2 Related works: augmented Lagrangian decomposition method The augmented Lagrangian of (P) is L γ ( u , p ) = F ( u ) + � p , Au − b � + γ 2 � Au − b � 2 . Augmented Lagrangian method (ALM) (Hestenes, 1969; Powell, 1969) � u k + 1 = arg min u ∈ U L γ ( u , p k ); does not preserve separability p k + 1 = p k + γ ( Au k + 1 − b ) . Augmented Lagrangian decomposition method (I) Alternating Direction Method of Multipliers (ADMM) (Fortin & Glowinski, 1983)  u k + 1 L γ ( u 1 , u k 2 , u k 3 , ..., u k N − 1 , u k N , p k ); = arg min  1   u 1 ∈ U 1    u k + 1 L γ ( u k + 1 , u 2 , u k 3 , ..., u k N − 1 , u k N , p k ); = arg min   2 1  u 1 ∈ U 1   . Gauss-Seidel method for ALM . .     u k + 1 L γ ( u k + 1 , u k + 1 , u k + 1 , ..., u k + 1 N − 1 , u N , p k );  = arg min  N 1 2 3   u 1 ∈ U 1   p k + 1 = p k + γ ( Au k + 1 − b ) .  6 / 28

PRO PRE CCR LCA NA CCL 1.2 Related works: augmented Lagrangian decomposition method Augmented Lagrangian decomposition method (II) Auxiliary Problem Principle of Augmented Lagrangian (APP-AL) (Cohen & Zhu, 1983)  u k + 1 = arg min u ∈ U �∇ G ( u k ) , u � + J ( u )      + � p k + γ ( Au k − b ) , Au � linearize the smooth term in primal problem of ALM + 1 and add a regularization term ǫ D ( u , u k );     p k + 1 = p k + γ ( Au k + 1 − b ) .  where D ( u , v ) = K ( u ) − K ( v ) − �∇ K ( v ) , u − v � is a Bregman like function. Randomized Primal-Dual Coordinate method (RPDC) (This paper)  Choose i ( k ) from { 1 , ..., N } with equal probability;    u k + 1 = arg min u ∈ U �∇ i ( k ) G ( u k ) , u i ( k ) � + J i ( k ) ( u i ( k ) )   randomly updates one block   + � p k + γ ( Au k − b ) , A i ( k ) u i ( k ) � of variables in primal subproblem   + 1 ǫ D ( u , u k ); of APP-AL     p k + 1 = p k + ρ ( Au k + 1 − b ) .  7 / 28

PRO PRE CCR LCA NA CCL 1.2 Related works: comparison between RPDC and Randomized Coordinate Descent algorithm (RCD) by Necoara & Patrascu, 2014 Randomized Primal-Dual Coordinate method (RPDC) (This paper)  Choose i ( k ) from { 1 , ..., N } with equal probability;    u k + 1 = arg min u ∈ U �∇ i ( k ) G ( u k ) , u i ( k ) � + J i ( k ) ( u i ( k ) ) + � p k + γ ( Au k − b ) , A i ( k ) u i ( k ) � + 1 ǫ D ( u , u k );   p k + 1 = p k + ρ ( Au k + 1 − b ) .  Necoara & Patrascu, 2014 consider problem (P) with A ∈ R 1 × n , b = 0, and U = R n : a ⊤ u = 0 . (P’): min G ( u ) + J ( u ) , s . t u ∈ R n where a = ( a 1 , ..., a n ) ⊤ ∈ R n . And the randomized coordinate descent algorithm (RCD) by Necoara & Patrascu, 2014 for (P’) is  Choose i ( k ) and j ( k ) from { 1 , ..., n } with equal probability;   u k + 1 = arg min a i ( k ) u i ( k ) + a j ( k ) u j ( k ) = 0 �∇ i ( k ) G ( u k ) , u i ( k ) � + �∇ j ( k ) G ( u k ) , u j ( k ) � + J i ( k ) ( u i ( k ) )   + J j ( k ) ( u j ( k ) ) + 1 2 ǫ � u − u k � 2 . The RPDC method can deal with more complex problem than RCD. 8 / 28

PRO PRE CCR LCA NA CCL 1.2 Related works: similar schemes Paper Problem Algorithm Theoretical Results Xu similar to F is strongly convex: O ( 1 / t 2 ) rate . & Zhang, (P) RPDC 2018 Gao, Xu similar to & Zhang, (P) F is convex: O ( 1 / t ) rate . RPDC 2019 F is convex: (i) Almost surely convergence ; (ii) O ( 1 / t ) rate ; This paper (P) RPDC Global strong metric subregularity: (iii) Linear convergence . 9 / 28

PRO PRE CCR LCA NA CCL 1.3 Contribution We propose the randomized primal-dual coordinate (RPDC) method based on the first-order primal-dual method Cohen & Zhu, 1984; Zhao & Zhu, 2019. (i) We show that the sequence generated by RPDC converges to an optimal solution with probability 1. (ii) We show RPDC has expected O ( 1 / t ) rate for general LCCP . (iii) We establish the expected linear convergence of RPDC under global strong metric subregularity. (iv) We show that SVM and MLP problems satisfy global strong metric subregularity under some reasonable conditions. 10 / 28

PRO PRE CCR LCA NA CCL 2. Preliminaries Lagrangian of (P): L ( u , p ) = F ( u ) + � p , Au − b � , Saddle point inequality: ∀ u ∈ U , p ∈ R m : L ( u ∗ , p ) ≤ L ( u ∗ , p ∗ ) ≤ L ( u , p ∗ ) . (2.2) Karush-Kuhn-Tucker (KKT) system of (P): Let w = ( u , p ) and U ∗ × P ∗ be the set of saddle points. ∀ w ∈ U ∗ × P ∗ , � � � � ∇ G ( u ) + ∂ J ( u ) + A ⊤ p + N U ( u ) ∂ u L ( u , p ) + N U ( u ) 0 ∈ H ( w ) = = , −∇ p L ( u , p ) b − Au with N U ( u ) = { ξ : � ξ, ζ − u � ≤ 0 , ∀ ζ ∈ U } is the normal cone at u to U . 11 / 28

PRO PRE CCR LCA NA CCL 3. Convergence and Convergence Rate Analysis of RPDC: RPDC Algorithm Algorithm 1: Randomized Primal-Dual Coordinate method (RPDC) for k = 1 to t Choose i ( k ) from { 1 , . . . , N } with equal probability; u k + 1 = arg min u ∈ U �∇ i ( k ) G ( u k ) , u i ( k ) � + J i ( k ) ( u i ( k ) ) + � q k , A i ( k ) u i ( k ) � + 1 ǫ D ( u , u k ) ; p k + 1 = p k + ρ ( Au k + 1 − b ) . end for where q k = p k + γ ( Au k + b ) and D ( u , v ) = K ( u ) − K ( v ) − �∇ K ( v ) , u − v � is a Bregman like function with K is strongly convex and gradient Lipschitz. Assumption 2 (i) K is strongly convex with parameter β and gradient Lipschitz continuous with parameter B. 2 γ (ii) The parameters ǫ and ρ satisfy: 0 < ǫ < β/ [ B G + γλ max ( A ⊤ A )] and 0 < ρ < 2 N − 1 . 12 / 28

PRO PRE CCR LCA NA CCL 3. Convergence and Convergence Rate Analysis of RPDC: Preparation Filtration: def F k = { i ( 0 ) , i ( 1 ) , . . . , i ( k ) } , F k ⊂ F k + 1 . The conditional expectation with respect to F k : E F k + 1 = E ( ·|F k ) . The conditional expectation in the i ( k ) term for given i ( 0 ) , i ( 1 ) , . . . , i ( k − 1 ) : E i ( k ) . Reference point: APP-AL:  T u ( w k ) = arg min u ∈ U �∇ G ( u k ) , u � + J ( u ) + � q k , Au � w k = T ( w k ) =    � � T u ( w k ) , T p ( w k ) ( u k , p k ) + 1 ǫ D ( u , u k );  � �  T p ( w k ) = p k + γ AT u ( w k ) − b  . E i ( k ) u k + 1 = 1 N T u ( w k ) + ( 1 − 1 N ) u k 13 / 28

Linear Convergence of Randomized Primal-Dual Coordinate Method for - PowerPoint PPT Presentation

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained Convex Programming Daoli Zhu and Lei Zhao Shanghai Jiao Tong University ICML 2020 July 16, 2020 PRO PRE CCR LCA NA CCL Outline 1 Research

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Randomized Primal-Dual Algorithms for Asynchronous Distributed Optimization Lin Xiao Microsoft

An Algebraic Convergence Theory for Primal and Dual Substructuring Methods by Constraints Jan

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Chapter 1 Linear Programming Paragraph 7 Standard Formats MPS, LP, and the CPLEX callable

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Dual

An improved primal simplex algorithm and column generation for degenerate linear programs

Projected primal-dual splitting for solving constrained convex optimization 1 L. M. Brice

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering

Overview Motivation and Introduction Solving CMPs A heuristic Application Implementation

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME

Classification of Poincar e inequalities and PI-rectifiablity Sylvester ErikssonBique

Distributed Frequency and Voltage Control of Islanded Microgrids John W. Simpson-Porco, Florian

Linear Convergence of Randomized Primal-Dual Coordinate Method for - PowerPoint PPT Presentation

Linear Convergence of Randomized Primal-Dual Coordinate Method for Large-scale Linear Constrained Convex Programming Daoli Zhu and Lei Zhao Shanghai Jiao Tong University ICML 2020 July 16, 2020 PRO PRE CCR LCA NA CCL Outline 1 Research

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Randomized Primal-Dual Algorithms for Asynchronous Distributed Optimization Lin Xiao Microsoft

An Algebraic Convergence Theory for Primal and Dual Substructuring Methods by Constraints Jan

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &amp;

Randomized Algorithms Randomized Algorithms Two Types of Randomized Algorithms Two Types of

Chapter 1 Linear Programming Paragraph 7 Standard Formats MPS, LP, and the CPLEX callable

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Solving Linear and Integer Programs Robert E. Bixby ILOG, Inc. and Rice University Dual

An improved primal simplex algorithm and column generation for degenerate linear programs

Projected primal-dual splitting for solving constrained convex optimization 1 L. M. Brice

Restarting accelerated gradient methods with a rough strong convexity estimate Olivier Fercoq

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science &amp; Engineering

Overview Motivation and Introduction Solving CMPs A heuristic Application Implementation

The moment-LP and moment-SOS approaches Jean B. Lasserre LAAS-CNRS and Institute of Mathematics,

CS 744: PYTORCH Shivaram Venkataraman Fall 2020 ADMINISTRIVIA week ) ( Monday 10/5 next

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE &amp; ICME

Classification of Poincar e inequalities and PI-rectifiablity Sylvester ErikssonBique

Distributed Frequency and Voltage Control of Islanded Microgrids John W. Simpson-Porco, Florian

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Basics of Numerical Optimization: Iterative Methods Ju Sun Computer Science & Engineering

Anderson Accelerated Douglas-Rachford Splitting Anqi Fu Junzi Zhang Stephen Boyd EE & ICME