On the O (1 / k ) Convergence of Asynchronous Distributed - PowerPoint PPT Presentation

On the O (1 / k ) Convergence of Asynchronous Distributed Alternating Direction Method of Multipliers (ADMM) Ermin Wei Asu Ozdaglar Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Big Data Workshop Simons Institute, Berkeley, CA October 2013 1

Introduction Motivation Many networks are large-scale and comprise of agents with local information and heterogeneous preferences. This motivated much interest in developing distributed schemes for control and optimization of multi-agent networked systems. Routing and congestion control in Parameter estimation Multi-agent wireline and wireless Smart grid systems in sensor networks cooperative control networks and coordination 2

Introduction Distributed Multi-agent Optimization Many of these problems can be represented within the general formulation: A set of agents (nodes) { 1 , . . . , N } connected through a network. The goal is to cooperatively solve f 1 ( x 1 , . . . , x n ) N f 2 ( x 1 , . . . , x n ) � min f i ( x ) x i =1 x ∈ R n , s.t. f i ( x ) : R n → R is a convex (possibly nonsmooth) function, f m ( x 1 , . . . , x n ) known only to agent i . Since such systems often lack a centralized processing unit, algorithms for this problem should involve each agent performing computations locally and communicating this information according to the underlying network. 3

Introduction Machine Learning Example A network of 3 sensors, supervised passive learning. Data is collected at different sensors: temperature t , electricity demand d . System goal: learn a Least square fit with polynomial max degree 3 30 degree 3 polynomial 28 electricity demand model: 26 d ( t ) = x 3 t 3 + x 2 t 2 + x 1 t + x 0 . 24 Electricity Demand 22 System objective: 20 18 3 i x − d i || 2 � min || A ′ 2 . 16 x i =1 14 12 i ] ′ at where A i = [1 , t i , t 2 i , t 3 10 20 30 40 50 60 70 80 90 100 110 Temperature input data t i . 4

Introduction Machine Learning General Set-up A network of agents i = 1 , . . . , N . Each agent i has access to local feature vectors A i and output b i . System objective: train weight vector x to N − 1 � L ( A ′ min i x − b i ) + p ( x ) , x i =1 for some loss function L (on the prediction error) and penalty function p (on the complexity of the model). Example: Least-Absolute Shrinkage and Selection Operator (LASSO): N − 1 i x − b i || 2 � || A ′ min 2 + λ || x || 1 . x i =1 Other examples from ML estimation, low rank matrix completion, image recovery [Schizas, Ribeiro, Giannakis 08], [Recht, Fazel, Parrilo 10], [Steidl, Teuber, 10] 5

Introduction Existing Distributed Algorithms Given an undirected connected graph G = { V , E } with M nodes, we reformulate the problem as f 4 ( x 4 ) x 1 = x 4 %# f 1 ( x 1 ) M ! � min f i ( x i ) x 3 = x 4 f 5 ( x 5 ) x x 1 = x 2 &# i =1 s . t . x i = x j , for ( i , j ) ∈ E , "# $# x 2 = x 3 f 2 ( x 2 ) f 3 ( x 3 ) Distributed gradient/subgradient methods for solving these problems: Each agent maintains an local estimate, updates it by taking a (sub)gradient step and averaging with neighbors’ estimates. √ Best known convergence rate: O (1 / k ).[Nedic, Ozdaglar 08], [Lobel, Ozdaglar 09], [Duchi, Agarwal, Wainwright 12]. 6

Distributed ADMM Algorithms Faster ADMM-based Distributed Algorithms Classical Augmented Lagrangian/Method of Multipliers and Alternating Direction Method of Multipliers (ADMM) methods: fast and parallel [Glowinski, Marrocco 75], [Eckstein, Bertsekas 92], [Boyd et al. 10]: Known convergence rates for synchronous ADMM type algorithm: [He, Yuan 11] General convex O (1 / k ). [Goldfarb et al. 10] Lipschitz gradient O (1 / k 2 ). [Deng, Yin 12] Lipschitz gradient, strong convexity linear rate. [Hong, Luo 12] Strong convexity linear rate. Highly decentralized nature of the problem calls for an asynchronous algorithm. Almost all known distributed algorithms are synchronous. 1 In this talk, we present asynchronous ADMM-type algorithms for general convex problems and show that it converges at the best known rate of O (1 / k ) [Wei, Ozdaglar 13]. 1 Exceptions: [Ram, Nedic, Veeravalli 09], [Iutzeler, Bianchi, Ciblat, and Hachem 13] without any rate results. 7

Distributed ADMM Algorithms Standard ADMM Standard ADMM solves a separable problem, where decision variable decomposes into two (linearly coupled) variables: min f ( x ) + g ( y ) (1) x , y s.t. Ax + By = c . Consider an Augmented Lagrangian function: L β ( x , y , p ) = f ( x ) + g ( y ) − p ′ ( Ax + By − c ) + β 2 || Ax + By − c || 2 2 . ADMM: approximate version of classical Augmented Lagrangian method. Primal variables: approximately minimize augmented Lagrangian through a single-pass coordinate descent (in a Gauss-Seidel manner). Dual variable: updated through gradient ascent. 8

Distributed ADMM Algorithms Standard ADMM More specifically, updates are as follows: x k +1 = argmin x L β ( x , y k , p k ) , y k +1 = argmin y L β ( x k +1 , y , p k ) , p k +1 = p k − β ( Ax k +1 − By k +1 − c ) . Each minimization involves (quadratic perturbations of) functions f and g separately. In many applications, these minimizations are easy (quadratic minimization, l 1 minimization, which arises in Huber fitting, basis pursuit, LASSO, total variation denoising). [Boyd et al. 10] 9

Distributed ADMM Algorithms ADMM for Multi-agent Optimization Problem Multi-agent optimization problem can be reformulated in the ADMM framework: Consider a set of agents V = { 1 , . . . , N } connected through an undirected connected graph G = { V , E } . We introduce a local copy x i for each of the agents and impose x i = x j for all ( i , j ) ∈ E . f 4 ( x 4 ) x 1 = x 4 %# f 1 ( x 1 ) ! N � x 3 = x 4 f 5 ( x 5 ) min f i ( x i ) x 1 = x 2 x &# i =1 s . t . x i = x j , for ( i , j ) ∈ E , "# $# x 2 = x 3 f 2 ( x 2 ) f 3 ( x 3 ) 10

Distributed ADMM Algorithms Special Case Study: 2-agent Optimization Problem Multi-agent optimization problem with two agents: special case of problem (1): min x 1 , x 2 f 1 ( x 1 ) + f 2 ( x 2 ) s.t. x 1 = x 2 . ADMM applied to this problem yields: x k +1 x k 1 2 p k 12 1 1 2 � 2 x k +1 2 ) + β f 1 ( x 1 ) + f 2 ( x k 2 ) − ( p k 12 ) ′ ( x 1 − x k � �� x 1 − x k �� = argmin x 1 1 2 2 2 11

Distributed ADMM Algorithms Special Case Study: 2-agent Optimization Problem Multi-agent optimization problem with two agents: special case of problem (1): min x 1 , x 2 f 1 ( x 1 ) + f 2 ( x 2 ) s.t. x 1 = x 2 . ADMM applied to this problem yields: x k +1 x k 1 2 p k 12 1 1 2 � 2 x k +1 12 ) ′ x 1 + β f 1 ( x 1 ) − ( p k �� x 1 − x k � �� = argmin x 1 1 2 2 2 11

Distributed ADMM Algorithms Special Case Study: 2-agent Optimization Problem Multi-agent optimization problem with two agents: special case of problem (1): min x 1 , x 2 f 1 ( x 1 ) + f 2 ( x 2 ) s.t. x 1 = x 2 . ADMM applied to this problem yields: x k +1 x k +1 1 2 p k 12 1 2 2 x k +1 = 2 2 � � � � f 1 ( x k +1 12 ) ′ ( x k +1 − x 2 ) + β � x k +1 ) + f 2 ( x 2 ) − ( p k argmin x 2 − x 2 � � � � 1 1 2 1 � � � 2 11

Distributed ADMM Algorithms Special Case Study: 2-agent Optimization Problem Multi-agent optimization problem with two agents: special case of problem (1): min x 1 , x 2 f 1 ( x 1 ) + f 2 ( x 2 ) s.t. x 1 = x 2 . ADMM applied to this problem yields: x k +1 x k +1 1 2 p k 12 1 2 2 2 � � � � x k +1 12 ) ′ x 2 + β � x k +1 f 2 ( x 2 ) + ( p k = argmin x 2 − x 2 � � � � 2 2 1 � � � 2 11

Distributed ADMM Algorithms Special Case Study: 2-agent Optimization Problem Multi-agent optimization problem with two agents: special case of problem (1): min x 1 , x 2 f 1 ( x 1 ) + f 2 ( x 2 ) s.t. x 1 = x 2 . ADMM applied to this problem yields: x k +1 x k +1 1 2 p k +1 12 1 2 p k +1 = p k − β ( x k +1 − x k +1 ) . 1 2 11

Asynchronous ADMM Multi-agent Asynchronous ADMM - Problem Formulation N � min f i ( x i ) x i =1 s . t . x i = x j , for ( i , j ) ∈ E . Reformulate to decouple x i and x j by introducing the auxiliary z variable [Bertsekas, Tsitsiklis 89], which allows us to simultaneously update x i and potentially improves performance. Each constraint x i − x j = 0 for f 4 ( x 4 ) edge e = ( i , j ) becomes x 1 = x 4 %# f 1 ( x 1 ) ! x i = z ei , − x j = z ej , x 3 = x 4 f 5 ( x 5 ) x 1 = x 2 &# z ei + z ej = 0 . "# $# x 2 = x 3 f 2 ( x 2 ) f 3 ( x 3 ) 12

Asynchronous ADMM Multi-agent Asynchronous ADMM - Algorithm N � %# min f i ( x i ) x k x , z 1 i =1 ! s . t . x i = z ei , − x j = z ej for ( i , j ) ∈ E , x ∈ X , i = 1 , . . . , N , &# x k +1 z ∈ Z . 3 "# $# Set Z = { z | z ei + z ej = 0 for all e = ( i , j ) } . Write constraint as Dx = z , set E ( i ): the set of edges incident to node i . We associate an independent Poisson local clock with each edge. At iteration k , if the clock corresponding to edge ( i , j ) ticks: The constraint x i = z ei , − x j = z ej (subject to z ei + z ej = 0) is active. The agents i and j are active. The dual variables p ei and p ej associated with edge ( i , j ) are active. 13

On the O (1 / k ) Convergence of Asynchronous Distributed - PowerPoint PPT Presentation

On the O (1 / k ) Convergence of Asynchronous Distributed Alternating Direction Method of Multipliers (ADMM) Ermin Wei Asu Ozdaglar Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

From asynchronous to synchronous specifications for distributed program synthesis David Janin

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing Julien Stainer

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - "

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Introduction to FEM Convergence

Spill Prevention Control & Spill Prevention Control & Countermeasure Countermeasure

1 Control of Blue Gum Eucalyptus in Coastal California Workshop Coastal Training Program Elkhorn

The Financial Control Unit Astrid Fenech Programme Manager, Financial Control Unit Funds and

Managing Water Levels in the Yahara Lakes Operational Challenges Through the Seasons Dane

Empirical Convergence Analysis Of Genetic Algorithm For Solving Unit Commitment Problem Domen

Systems Engineering: Optimizing Creation of Virtual Conversational Human Agents Daniel P. Burns

Moreno Valley College Comprehensive Master Plan Update Workshop 03: Presidents Cabinet March

Dea Larsen Converse Community & Watershed Resources Manager Yahara WINS, March 15 th , 2018

On the O (1 / k ) Convergence of Asynchronous Distributed - PowerPoint PPT Presentation

On the O (1 / k ) Convergence of Asynchronous Distributed Alternating Direction Method of Multipliers (ADMM) Ermin Wei Asu Ozdaglar Laboratory for Information and Decision Systems Department of Electrical Engineering and Computer Science

How to Design Fast Asynchronous How to Design Fast Asynchronous Routers for Asynchronous Routers

AN ASYNCHRONOUS DIVIDER IMPLEMENTATION Navaneeth Jamadagni and Jo Ebergen 2 Asynchronous

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Asynchronous Replication

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

Asynchronous Replication and Bayou Asynchronous Replication and Bayou Jeff Chase CPS 212, Fall

From asynchronous to synchronous specifications for distributed program synthesis David Janin

Asynchronous sequence circuits An asynchronous sequence machine is a sequence circuit without

Computability Abstractions for Fault-tolerant Asynchronous Distributed Computing Julien Stainer

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - &quot;

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Introduction to FEM Convergence

Spill Prevention Control &amp; Spill Prevention Control &amp; Countermeasure Countermeasure

1 Control of Blue Gum Eucalyptus in Coastal California Workshop Coastal Training Program Elkhorn

The Financial Control Unit Astrid Fenech Programme Manager, Financial Control Unit Funds and

Managing Water Levels in the Yahara Lakes Operational Challenges Through the Seasons Dane

Empirical Convergence Analysis Of Genetic Algorithm For Solving Unit Commitment Problem Domen

Systems Engineering: Optimizing Creation of Virtual Conversational Human Agents Daniel P. Burns

Moreno Valley College Comprehensive Master Plan Update Workshop 03: Presidents Cabinet March

Dea Larsen Converse Community &amp; Watershed Resources Manager Yahara WINS, March 15 th , 2018

II of large Number Lattin in probability almost convergence convergence sure - - "

Spill Prevention Control & Spill Prevention Control & Countermeasure Countermeasure

Dea Larsen Converse Community & Watershed Resources Manager Yahara WINS, March 15 th , 2018