The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained - PowerPoint PPT Presentation

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi Hong Joint work with Davood Hajinezhad University of Minnesota ECE Department DIMACS Workshop on Distributed Opt., Information Process., and Learning August, 2017 Mingyi Hong (University of Minnesota) 0 / 56

Agenda We consider the following problem f ( x ) + h ( x ) min (P) Ax = b , x ∈ X s.t. f ( x ) : R N → R is a smooth non-convex function h ( x ) : R N → R is a nonsmooth non-convex regularizer X is a compact convex set, and { x | Ax = b } ∩ X � = ∅ . Mingyi Hong (University of Minnesota) 1 / 56

The Plan Design an efficient decomposition scheme decoupling the variables 1 Analyze convergence/rate of convergence 2 Discuss convergence to first/second-order stationary solutions 3 Explore different variants of the algorithms; obtain useful insights 4 Evaluate practical performance 5 Mingyi Hong (University of Minnesota) 2 / 56

App 1: Distributed optimization Consider a network consists of N agents, who collectively optimize N ∑ y ∈ X f ( y ) : = f i ( y ) + h i ( y ) , min i = 1 where f i ( y ) , h i ( y ) : X → R is cost/regularizer for local to agent i Each f i , h i is only known to agent i (e.g., through local measurements) y is assumed to be scalar for ease of presentation Agents are connected by a network defined by an undirected graph G = {V , E} , with |V| = N vertices and |E| = E edges Mingyi Hong (University of Minnesota) 3 / 56

App 1: Distributed optimization Introduce local variables { x i } , reformulate to the consensus problem N ∑ f i ( x i ) + h i ( x i ) min { x i } i = 1 s.t. Ax = 0 (consensus constraint) where A ∈ R E × N is the edge-node incidence matrix; x : = [ x 1 , · · · , x N ] T If e ∈ E and it connects vertex i and j with i > j , then A ev = 1 if v = i , A ev = − 1 if v = j and A ev = 0 otherwise. Mingyi Hong (University of Minnesota) 4 / 56

App 2: Partial consensus “Strict consensus” may not be practical; often not required [Koppel et al 16] Due to noises in local 1 communication The variables to be estimated has 2 spatial variability .... 3 Mingyi Hong (University of Minnesota) 5 / 56

App 2: Partial consensus Relax the consensus requirement N ∑ f i ( x i ) + h i ( x i ) min i i = 1 � x i − x j � 2 ≤ b ij , ∀ ( i , j ) ∈ E . s.t. Introduce “link variable” { z ij = x i − x j } ; Equivalent reformulation N ∑ min f i ( x i ) + h i ( x i ) i i = 1 s.t. Ax − z = 0, z ∈ Z Mingyi Hong (University of Minnesota) 6 / 56

App 2: Partial consensus The local cost functions can be non-convex in a number of situations The use of non-convex regularizers, e.g., SCAD/MCP [Fan-Li 01, Zhang 10] 1 Non-convex quadratic functions, e.g., high-dimensional regression with missing 2 data [Loh-Wainwright 12], sparse PCA Sigmoid loss function (approximating 0-1 loss) [Shalev-Shwartz et al 11] 3 Loss function for training neural nets [Allen-Zhu-Hazan 16] 4 Mingyi Hong (University of Minnesota) 7 / 56

App 3: Non-convex subspace estimation Let Σ ∈ R p × p be an unknown covariance matrix, with eigen-decomposition p λ i u i u T ∑ Σ = i i = 1 where λ 1 ≥ · · · ≥ λ p are eigenvalues; u 1 , · · · , u p are eigenvectors The k -dimensional principal subspace of Σ k Π ∗ = λ i u i u T i = UU T ∑ i = 1 Principal subspace estimation. Given i.i.d samples { x 1 , · · · , x n } , estimate Π ∗ , based on sample covariance matrix � Σ Mingyi Hong (University of Minnesota) 8 / 56

App 3: Non-convex subspace estimation Problem formulation [Gu et al 14] � − � � Σ , Π � + P α ( Π ) Π = arg min Π s.t. 0 � Π � I , Tr ( Π ) = k . (Fantope set) where P α ( Π ) is a non-convex regularizer (such as MCP/SCAD) Estimation result. [Gu et al 14] Under certain condition on α , every first-order stationary solution is “good”, with high probability: � � s log ( p ) Π − Π ∗ � F ≤ s 1 � � n + s 2 n s = | supp ( diag ( Π ∗ )) | is the subspace sparsity [Vu et al 13] Mingyi Hong (University of Minnesota) 9 / 56

App 3: Non-convex subspace estimation Question. How to find first-order stationary solution? Need to deal with both the Fantope and non-convex regularizer P α ( Π ) A heuristic approach proposed in [Gu et al 14] Introduce linear constraint X = Π 1 Impose non-convex regularizer on X , Fantope constraint on Π 2 � − � � Π = arg min Σ , Π � + P α ( X ) Π s.t. 0 � Π � I , Tr ( Π ) = k . (Fantope set) Π − X = 0 Same formulation as (P), only heuristic algorithm without any guarantee 3 Mingyi Hong (University of Minnesota) 10 / 56

The literature Mingyi Hong (University of Minnesota) 10 / 56

Literature The Augmented Lagrangian (AL) methods [Hestenes 69, Powell 69], is a classical algorithm for solving nonlinear non-convex constrained problems Many existing packages (e.g., LANCELOT) Recent developments [Curtis et al 16] [Friedlander 05], and many more Convex problem + linear constraints, [Lan-Monterio 15] [Liu et al 16] analyzed the iteration complexity for the AL method Requires double-loop In the non-convex setting difficult to handle non-smooth regularizers Difficult to be implemented in a distributed manner Mingyi Hong (University of Minnesota) 11 / 56

Literature Recent works consider AL-type methods for linearly constrained problems Nonconvex problem + linear constraints, [Artina-Fornasier-Solombrino 13] Approximate the Augmented Lagrangian using proximal point (make it convex) 1 Solve the linearly constrained convex approximation with increasing accuracy 2 AL based methods for smooth non-convex objective + linearly coupling constraints [Houska-Frasch-Diehl 16] AL based Alternating Direction Inexact Newton (ALADIN) 1 Combines SQP and AL, global line search, Hessian computation, etc. 2 Still requires double-loop No global rate analysis Mingyi Hong (University of Minnesota) 12 / 56

Literature Dual decomposition [Bertsekas 99] Gradient/subgradient applied to the dual 1 Convex separable objective + convex coupling constraints 2 Lots of application, e.g., in wireless communications [Palomar-Chiang 06] 3 Arrow-Hurwicz-Uzawa primal-dual algorithm [Arrow-Hurwicz-Uzawa 58] Applied to study saddle point problems [Gol’shtein 74][Nedi´ c-Ozdaglar 07] 1 Primal-dual hybrid gradient [Zhu-Chan 08] 2 ... 3 Do not to work for non-convex problem (difficult to use the dual structure) Mingyi Hong (University of Minnesota) 13 / 56

Literature ADMM is popular in solving linearly constrained problems Some theoretical results for applying ADMM for non-convex problems 1 [Hong-Luo-Razaviyayn 14]: non-convex consensus and sharing 2 [Li-Pong 14], [Wang-Yin-Zeng 15], [Melo-Monterio 17] with more relaxed conditions, or faster rates 3 [Pang-Tao 17] for non-convex DC program with sharp stationary solutions Block-wise structure, but requires a special block Does not apply to problem (P) Mingyi Hong (University of Minnesota) 14 / 56

The plan of the talk First consider the simpler problem (unconstrained, smooth) x ∈ R N f ( x ) , min s.t. Ax = b (Q) Algorithm, analysis and discussion First-/second order stationarity Then generalize Applications and numerical results Mingyi Hong (University of Minnesota) 15 / 56

The proposed algorithms Mingyi Hong (University of Minnesota) 15 / 56

The proposed algorithm We draw elements form AL and Uzawa methods The augmented Lagrangian for problem (P) is given by L β ( x , µ ) = f ( x ) + � µ , Ax − b � + β 2 � Ax − b � 2 where µ ∈ R M dual variable; β > 0 penalty parameter One primal gradient-type step + one dual gradient-type step Mingyi Hong (University of Minnesota) 16 / 56

The proposed algorithm Let B ∈ R M × N be some arbitrary matrix to be defined later The proposed Proximal Primal Dual Algorithm is given below Algorithm 1. The Proximal Primal Dual Algorithm (Prox-PDA) At iteration 0 , initialize µ 0 and x 0 ∈ R N . At each iteration r + 1 , update variables by: x r + 1 = arg min x ∈ R n �∇ f ( x r ) , x − x r � + � µ r , Ax − b � + β 2 � Ax − b � 2 + β 2 � x − x r � 2 B T B ; (1a) µ r + 1 = µ r + β ( Ax r + 1 − b ) . (1b) Mingyi Hong (University of Minnesota) 17 / 56

Comments The primal iteration has to choose the proximal term β 2 � x − x r � 2 B T B Choose B appropriately to ensure the following key properties: The primal problem is strongly convex, hence easily solvable; 1 The primal problem is decomposable over different variable blocks. 2 Mingyi Hong (University of Minnesota) 18 / 56

Comments We illustrate this point using the distributed optimization problem A network consists of 3 users: 1 ↔ 2 ↔ 3 Define the signed graph Laplacian as L − = A T A ∈ R N × N Its ( i , i ) th diagonal entry is the degree of node i , and its ( i , j ) th entry is − 1 if e = ( i , j ) ∈ E , and 0 otherwise.     1 − 1 0 1 1 0   ,   − 1 − 1 L − = 2 L + = 1 2 1 − 1 0 1 0 1 1 Define the signless incidence matrix B : = | A | Using this choice of B , we have B T B = L + ∈ R N × N , which is the signless graph Laplacian Mingyi Hong (University of Minnesota) 19 / 56

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained - PowerPoint PPT Presentation

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi Hong Joint work with Davood Hajinezhad University of Minnesota ECE Department DIMACS Workshop on Distributed Opt., Information Process., and

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

Stochastic Optimization for Learning over Networks Guanghui (George) Lan School of Industrial

Inner and Outer Approximating Flowpipes for Delay Differential Equations Eric Goubault 1 Sylvie

Plug & Manage Heterogeneous Sensing Devices Levent Grgen, Johan Nystrm-Persson, Amin

Stochastic Optimization Techniques for Big Data Machine Learning Tong Zhang Rutgers University

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov

Building Hardened Implementations of SCADA/ICS Protocols Using Language-Theoretic Security

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained - PowerPoint PPT Presentation

The Proximal Primal-Dual Approach for Nonconvex Linearly Constrained Problems Presenter: Mingyi Hong Joint work with Davood Hajinezhad University of Minnesota ECE Department DIMACS Workshop on Distributed Opt., Information Process., and

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % &amp; non-linearly. As

4 THE PRIMAL-DUAL METHOD FOR APPROXIMATION ALGORITHMS AND ITS APPLICATION TO NETWORK DESIGN

optimization problems for primal-dual algorithms minimize f ( x ) + g ( x ) + h ( Ax ) x f ,

New primal-dual subgradient methods for Convex Problems with Functional Constraints Yurii

Primal-dual Subgradient Method for Convex Problems with Functional Constraints Yurii Nesterov,

Inexact variable metric proximal gradient methods with line-search for convex and nonconvex

13.1 Review of Last Lecture Review of primal and dual of SVM. Insights: Dual only depends on

Primal-Dual Algorithm Math 482, Lecture 29 Misha Lavrov April 17, 2020 Introduction The

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &amp;

Assimilation of Multiple Linearly Dependent Data Vectors Trond Mannseth NORCE Energy Linearly

Convergence of perturbed Proximal Gradient algorithms Gersende Fort Institut de Math ematiques

Asymmetric Proximal Point Algorithms with Moving Proximal Centers Deren Han

Calhoun Community College Dual Enrollment Info Session for Students &amp; Parents What is Dual

DUAL CREDIT WHAT IS DUAL CREDIT? Dual credit means two things are happening at once. Students

Stochastic Optimization for Learning over Networks Guanghui (George) Lan School of Industrial

Inner and Outer Approximating Flowpipes for Delay Differential Equations Eric Goubault 1 Sylvie

Plug &amp; Manage Heterogeneous Sensing Devices Levent Grgen*, Johan Nystrm-Persson*, Amin

Stochastic Optimization Techniques for Big Data Machine Learning Tong Zhang Rutgers University

Proximal Method with Contractions for Smooth Convex Optimization Nikita Doikov Yurii Nesterov

Building Hardened Implementations of SCADA/ICS Protocols Using Language-Theoretic Security

Edit Timelines &amp; Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda

CSCE 970 Lecture 7: Earth, cant afford to visit each area to deter- Clustering: Basic Concepts

# non-linearly. ! As height ( H ) increases, ( f ) decreases, $ % & non-linearly. As

American Meat Cuts vs Chilean Meat Cuts American Primal Cuts Chilean Primal Cuts Cuts &

Calhoun Community College Dual Enrollment Info Session for Students & Parents What is Dual

Plug & Manage Heterogeneous Sensing Devices Levent Grgen, Johan Nystrm-Persson, Amin

Edit Timelines & Efficient Streaming of Media Mangala Prabhu and Eric Reinecke Agenda