Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, - - PowerPoint PPT Presentation

alternating direction method of multipliers
SMART_READER_LITE
LIVE PREVIEW

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, - - PowerPoint PPT Presentation

Alternating Direction Method of Multipliers Prof S. Boyd HYCON 2, Trento, 23/6/11 source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein) 1 Goals


slide-1
SLIDE 1

Alternating Direction Method of Multipliers

Prof S. Boyd

HYCON 2, Trento, 23/6/11

source: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein)

1

slide-2
SLIDE 2

Goals

robust methods for

arbitrary-scale optimization

– machine learning/statistics with huge data-sets – dynamic optimization on large-scale network

decentralized optimization

– devices/processors/agents coordinate to solve large problem, by passing relatively small messages

2

slide-3
SLIDE 3

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Dual decomposition 3

slide-4
SLIDE 4

Dual problem

convex equality constrained optimization problem

minimize f(x) subject to Ax = b

Lagrangian: L(x, y) = f(x) + yT (Ax − b) dual function: g(y) = infx L(x, y) dual problem:

maximize g(y)

recover x = argminx L(x, y)

Dual decomposition 4

slide-5
SLIDE 5

Dual ascent

gradient method for dual problem: yk+1 = yk + αk∇g(yk) ∇g(yk) = A˜

x − b, where ˜ x = argminx L(x, yk)

dual ascent method is

xk+1 := argminx L(x, yk) // x-minimization yk+1 := yk + αk(Axk+1 − b) // dual update

works, with lots of strong assumptions

Dual decomposition 5

slide-6
SLIDE 6

Dual decomposition

suppose f is separable:

f(x) = f1(x1) + · · · + fN(xN), x = (x1, . . . , xN)

then L is separable in x: L(x, y) = L1(x1, y) + · · · + LN(xN, y) − yT b,

Li(xi, y) = fi(xi) + yT Aixi

x-minimization in dual ascent splits into N separate minimizations

xk+1

i

:= argmin

xi

Li(xi, yk) which can be carried out in parallel

Dual decomposition 6

slide-7
SLIDE 7

Dual decomposition

dual decomposition (Everett, Dantzig, Wolfe, Benders 1960–65)

xk+1

i

:= argminxi Li(xi, yk), i = 1, . . . , N yk+1 := yk + αk(N

i=1 Aixk+1 i

− b)

scatter yk; update xi in parallel; gather Aixk+1 i solve a large problem

– by iteratively solving subproblems (in parallel) – dual variable update provides coordination

works, with lots of assumptions; often slow

Dual decomposition 7

slide-8
SLIDE 8

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Method of multipliers 8

slide-9
SLIDE 9

Method of multipliers

a method to robustify dual ascent use augmented Lagrangian (Hestenes, Powell 1969), ρ > 0

Lρ(x, y) = f(x) + yT (Ax − b) + (ρ/2)Ax − b2

2 method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982)

xk+1 := argmin

x

Lρ(x, yk) yk+1 := yk + ρ(Axk+1 − b) (note specific dual update step length ρ)

Method of multipliers 9

slide-10
SLIDE 10

Method of multipliers dual update step

  • ptimality conditions (for differentiable f):

Ax − b = 0, ∇f(x) + AT y = 0 (primal and dual feasibility)

since xk+1 minimizes Lρ(x, yk)

= ∇xLρ(xk+1, yk) = ∇xf(xk+1) + AT yk + ρ(Axk+1 − b)

  • =

∇xf(xk+1) + AT yk+1

dual update yk+1 = yk + ρ(xk+1 − b) makes (xk+1, yk+1) dual feasible primal feasibility achieved in limit: Axk+1 − b → 0

Method of multipliers 10

slide-11
SLIDE 11

Method of multipliers

(compared to dual decomposition)

good news: converges under much more relaxed conditions

(f can be nondifferentiable, take on value +∞, . . . )

bad news: quadratic penalty destroys splitting of the x-update, so can’t

do decomposition

Method of multipliers 11

slide-12
SLIDE 12

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Alternating direction method of multipliers 12

slide-13
SLIDE 13

Alternating direction method of multipliers

a method

– with good robustness of method of multipliers – which can support decomposition

“robust dual decomposition” or “decomposable method of multipliers” proposed by Gabay, Mercier, Glowinski, Marrocco in 1976

Alternating direction method of multipliers 13

slide-14
SLIDE 14

Alternating direction method of multipliers

ADMM problem form (with f, g convex)

minimize f(x) + g(z) subject to Ax + Bz = c

– two sets of variables, with separable objective

Lρ(x, z, y) = f(x) + g(z) + yT (Ax + Bz − c) + (ρ/2)Ax + Bz − c2 2 ADMM:

xk+1 := argminx Lρ(x, zk, yk) // x-minimization zk+1 := argminz Lρ(xk+1, z, yk) // z-minimization yk+1 := yk + ρ(Axk+1 + Bzk+1 − c) // dual update

Alternating direction method of multipliers 14

slide-15
SLIDE 15

Alternating direction method of multipliers

if we minimized over x and z jointly, reduces to method of multipliers instead, we do one pass of a Gauss-Seidel method we get splitting since we minimize over x with z fixed, and vice versa

Alternating direction method of multipliers 15

slide-16
SLIDE 16

ADMM and optimality conditions

  • ptimality conditions (for differentiable case):

– primal feasibility: Ax + Bz − c = 0 – dual feasibility: ∇f(x) + AT y = 0, ∇g(z) + BT y = 0

since zk+1 minimizes Lρ(xk+1, z, yk) we have

= ∇g(zk+1) + BT yk + ρBT (Axk+1 + Bzk+1 − c) = ∇g(zk+1) + BT yk+1

so with ADMM dual variable update, (xk+1, zk+1, yk+1) satisfies second

dual feasibility condition

primal and first dual feasibility are achieved as k → ∞

Alternating direction method of multipliers 16

slide-17
SLIDE 17

ADMM with scaled dual variables

combine linear and quadratic terms in augmented Lagrangian

Lρ(x, z, y) = f(x) + g(z) + yT (Ax + Bz − c) + (ρ/2)Ax + Bz − c2

2

= f(x) + g(z) + (ρ/2)Ax + Bz − c + u2

2 + const.

with uk = (1/ρ)yk

ADMM (scaled dual form):

xk+1 := argmin

x

  • f(x) + (ρ/2)Ax + Bzk − c + uk2

2

  • zk+1

:= argmin

z

  • g(z) + (ρ/2)Axk+1 + Bz − c + uk2

2

  • uk+1

:= uk + (Axk+1 + Bzk+1 − c)

Alternating direction method of multipliers 17

slide-18
SLIDE 18

Convergence

assume (very little!)

– f, g convex, closed, proper – L0 has a saddle point

then ADMM converges:

– iterates approach feasibility: Axk + Bzk − c → 0 – objective approaches optimal value: f(xk) + g(zk) → p

Alternating direction method of multipliers 18

slide-19
SLIDE 19

Related algorithms

  • perator splitting methods

(Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979)

proximal point algorithm (Rockafellar 1976) Dykstra’s alternating projections algorithm (1983) Spingarn’s method of partial inverses (1985) Rockafellar-Wets progressive hedging (1991) proximal methods (Rockafellar, many others, 1976–present) Bregman iterative methods (2008–present) most of these are special cases of the proximal point algorithm

Alternating direction method of multipliers 19

slide-20
SLIDE 20

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Common patterns 20

slide-21
SLIDE 21

Common patterns

x-update step requires minimizing f(x) + (ρ/2)Ax − v2 2

(with v = Bzk − c + uk, which is constant during x-update)

similar for z-update several special cases come up often can simplify update by exploit structure in these cases

Common patterns 21

slide-22
SLIDE 22

Decomposition

suppose f is block-separable,

f(x) = f1(x1) + · · · + fN(xN), x = (x1, . . . , xN)

A is conformably block separable: AT A is block diagonal then x-update splits into N parallel updates of xi

Common patterns 22

slide-23
SLIDE 23

Proximal operator

consider x-update when A = I

x+ = argmin

x

  • f(x) + (ρ/2)x − v2

2

  • = proxf,ρ(v)

some special cases:

f = IC (indicator fct. of set C) x+ := ΠC(v) (projection onto C) f = λ · 1 (ℓ1 norm) x+

i := Sλ/ρ(vi) (soft thresholding)

(Sa(v) = (v − a)+ − (−v − a)+)

Common patterns 23

slide-24
SLIDE 24

Quadratic objective

f(x) = (1/2)xT Px + qT x + r x+ := (P + ρAT A)−1(ρAT v − q) use matrix inversion lemma when computationally advantageous

(P + ρAT A)−1 = P −1 − ρP −1AT (I + ρAP −1AT )−1AP −1

(direct method) cache factorization of P + ρAT A (or I + ρAP −1AT ) (iterative method) warm start, early stopping, reducing tolerances

Common patterns 24

slide-25
SLIDE 25

Smooth objective

f smooth can use standard methods for smooth minimization

– gradient, Newton, or quasi-Newton – preconditionned CG, limited-memory BFGS (scale to very large problems)

can exploit

– warm start – early stopping, with tolerances decreasing as ADMM proceeds

Common patterns 25

slide-26
SLIDE 26

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Examples 26

slide-27
SLIDE 27

Constrained convex optimization

consider ADMM for generic problem

minimize f(x) subject to x ∈ C

ADMM form: take g to be indicator of C

minimize f(x) + g(z) subject to x − z = 0

algorithm:

xk+1 := argmin

x

  • f(x) + (ρ/2)x − zk + uk2

2

  • zk+1

:= ΠC(xk+1 + uk) uk+1 := uk + xk+1 − zk+1

Examples 27

slide-28
SLIDE 28

Lasso

lasso problem:

minimize (1/2)Ax − b2

2 + λx1 ADMM form:

minimize (1/2)Ax − b2

2 + λz1

subject to x − z = 0

ADMM:

xk+1 := (AT A + ρI)−1(AT b + ρzk − yk) zk+1 := Sλ/ρ(xk+1 + yk/ρ) yk+1 := yk + ρ(xk+1 − zk+1)

Examples 28

slide-29
SLIDE 29

Lasso example

example with dense A ∈ R1500×5000

(1500 measurements; 5000 regressors)

computation times

factorization (same as ridge regression) 1.3s subsequent ADMM iterations 0.03s lasso solve (about 50 ADMM iterations) 2.9s full regularization path (30 λ’s) 4.4s

not bad for a very short Matlab script

Examples 29

slide-30
SLIDE 30

Sparse inverse covariance selection

S: empirical covariance of samples from N(0, Σ), with Σ−1 sparse

(i.e., Gaussian Markov random field)

estimate Σ−1 via ℓ1 regularized maximum likelihood

minimize Tr(SX) − log det X + λX1

methods: COVSEL (Banerjee et al 2008), graphical lasso (FHT 2008)

Examples 30

slide-31
SLIDE 31

Sparse inverse covariance selection via ADMM

ADMM form:

minimize Tr(SX) − log det X + λZ1 subject to X − Z = 0

ADMM:

Xk+1 := argmin

X

  • Tr(SX) − log det X + (ρ/2)X − Zk + U k2

F

  • Zk+1

:= Sλ/ρ(Xk+1 + U k) U k+1 := U k + (Xk+1 − Zk+1)

Examples 31

slide-32
SLIDE 32

Analytical solution for X-update

compute eigendecomposition ρ(Zk − U k) − S = QΛQT form diagonal matrix ˜

X with ˜ Xii = λi +

  • λ2

i + 4ρ

let Xk+1 := Q ˜

XQT

cost of X-update is an eigendecomposition

Examples 32

slide-33
SLIDE 33

Sparse inverse covariance selection example

Σ−1 is 1000 × 1000 with 104 nonzeros

– graphical lasso (Fortran): 20 seconds – 3 minutes – ADMM (Matlab): 3 – 10 minutes – (depends on choice of λ)

very rough experiment, but with no special tuning, ADMM is in ballpark

  • f recent specialized methods

(for comparison, COVSEL takes 25+ min when Σ−1 is a 400 × 400

tridiagonal matrix)

Examples 33

slide-34
SLIDE 34

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Consensus and exchange 34

slide-35
SLIDE 35

Consensus optimization

want to solve problem with N objective terms

minimize N

i=1 fi(x)

– e.g., fi is the loss function for ith block of training data

ADMM form:

minimize N

i=1 fi(xi)

subject to xi − z = 0

– xi are local variables – z is the global variable – xi − z = 0 are consistency or consensus constraints – can add regularization using a g(z) term

Consensus and exchange 35

slide-36
SLIDE 36

Consensus optimization via ADMM

Lρ(x, z, y) = N i=1

  • fi(xi) + yT

i (xi − z) + (ρ/2)xi − z2 2

  • ADMM:

xk+1

i

:= argmin

xi

  • fi(xi) + ykT

i

(xi − zk) + (ρ/2)xi − zk2

2

  • zk+1

:= 1 N

N

  • i=1
  • xk+1

i

+ (1/ρ)yk

i

  • yk+1

i

:= yk

i + ρ(xk+1 i

− zk+1)

with regularization, averaging in z update is followed by proxg,ρ

Consensus and exchange 36

slide-37
SLIDE 37

Consensus optimization via ADMM

using N i=1 yk i = 0, algorithm simplifies to

xk+1

i

:= argmin

xi

  • fi(xi) + ykT

i

(xi − xk) + (ρ/2)xi − xk2

2

  • yk+1

i

:= yk

i + ρ(xk+1 i

− xk+1) where xk = (1/N) N

i=1 xk i in each iteration

– gather xk

i and average to get xk

– scatter the average xk to processors – update yk

i locally (in each processor, in parallel)

– update xi locally

Consensus and exchange 37

slide-38
SLIDE 38

Statistical interpretation

fi is negative log-likelihood for parameter x given ith data block xk+1 i

is MAP estimate under prior N(xk + (1/ρ)yk

i , ρI) prior mean is previous iteration’s consensus shifted by ‘price’ of processor

i disagreeing with previous consensus

processors only need to support a Gaussian MAP method

– type or number of data in each block not relevant – consensus protocol yields global maximum-likelihood estimate

Consensus and exchange 38

slide-39
SLIDE 39

Consensus classification

data (examples) (ai, bi), i = 1, . . . , N, ai ∈ Rn, bi ∈ {−1, +1} linear classifier sign(aT w + v), with weight w, offset v margin for ith example is bi(aT i w + v); want margin to be positive loss for ith example is l(bi(aT i w + v))

– l is loss function (hinge, logistic, probit, exponential, . . . )

choose w, v to minimize 1 N

N

i=1 l(bi(aT i w + v)) + r(w)

– r(w) is regularization term (2, 1, . . . )

split data and use ADMM consensus to solve

Consensus and exchange 39

slide-40
SLIDE 40

Consensus SVM example

hinge loss l(u) = (1 − u)+ with ℓ2 regularization baby problem with n = 2, N = 400 to illustrate examples split into 20 groups, in worst possible way:

each group contains only positive or negative examples

Consensus and exchange 40

slide-41
SLIDE 41

Iteration 1

−3 −2 −1 1 2 3 −10 −8 −6 −4 −2 2 4 6 8 10

Consensus and exchange 41

slide-42
SLIDE 42

Iteration 5

−3 −2 −1 1 2 3 −10 −8 −6 −4 −2 2 4 6 8 10

Consensus and exchange 42

slide-43
SLIDE 43

Iteration 40

−3 −2 −1 1 2 3 −10 −8 −6 −4 −2 2 4 6 8 10

Consensus and exchange 43

slide-44
SLIDE 44

Distributed lasso example

example with dense A ∈ R400000×8000 (roughly 30 GB of data)

– distributed solver written in C using MPI and GSL – no optimization or tuned libraries (like ATLAS, MKL) – split into 80 subsystems across 10 (8-core) machines on Amazon EC2

computation times

loading data 30s factorization 5m subsequent ADMM iterations 0.5–2s lasso solve (about 15 ADMM iterations) 5–6m

Consensus and exchange 44

slide-45
SLIDE 45

Exchange problem

minimize N

i=1 fi(xi)

subject to N

i=1 xi = 0 another canonical problem, like consensus in fact, it’s the dual of consensus can interpret as N agents exchanging n goods to minimize a total cost (xi)j ≥ 0 means agent i receives (xi)j of good j from exchange (xi)j < 0 means agent i contributes |(xi)j| of good j to exchange constraint N i=1 xi = 0 is equilibrium or market clearing constraint

  • ptimal dual variable y is a set of valid prices for the goods

suggests real or virtual cash payment (y)T xi by agent i

Consensus and exchange 45

slide-46
SLIDE 46

Exchange ADMM

solve as a generic constrained convex problem with constraint set

C = {x ∈ RnN | x1 + x2 + · · · + xN = 0}

scaled form:

xk+1

i

:= argmin

xi

  • fi(xi) + (ρ/2)xi − xk

i + xk + uk2 2

  • uk+1

:= uk + xk+1

unscaled form:

xk+1

i

:= argmin

xi

  • fi(xi) + ykT xi + (ρ/2)xi − (xk

i − xk)2 2

  • yk+1

:= yk + ρxk+1

Consensus and exchange 46

slide-47
SLIDE 47

Interpretation as tˆ atonnement process

atonnement process: iteratively update prices to clear market

work towards equilibrium by increasing/decreasing prices of goods based

  • n excess demand/supply

dual decomposition is the simplest tˆ

atonnement algorithm

ADMM adds proximal regularization

– incorporate agents’ prior commitment to help clear market – convergence far more robust convergence than dual decomposition

Consensus and exchange 47

slide-48
SLIDE 48

Distributed dynamic energy management

N devices exchange power in time periods t = 1, . . . , T xi ∈ RT is power flow profile for device i fi(xi) is cost of profile xi (and encodes constraints) x1 + · · · + xN = 0 is energy balance (in each time period) dynamic energy management problem is exchange problem exchange ADMM gives distributed method for dynamic energy

management

each device optimizes its own profile, with quadratic regularization for

coordination

residual (energy imbalance) is driven to zero

Consensus and exchange 48

slide-49
SLIDE 49

Generators

−xt 2 4 6 8 10 1 2 3 4 5 6 7 8 9 10 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10

3 example generators left: generator costs/limits; right: ramp constraints can add cost for power changes

Consensus and exchange 49

slide-50
SLIDE 50

Fixed loads

t 5 10 15 20 5 10 15

2 example fixed loads cost is +∞ for not supplying load; zero otherwise

Consensus and exchange 50

slide-51
SLIDE 51

Shiftable load

t 5 10 15 20 0.5 1 1.5 2

total energy consumed over an interval must exceed given minimum level limits on energy consumed in each period cost is +∞ for violating constraints; zero otherwise

Consensus and exchange 51

slide-52
SLIDE 52

Battery energy storage system

t 5 10 15 20 5 10 15 20 −0.4 −0.2 0.2 0.4 0.6 2 4 6

energy store with maximum capacity, charge/discharge limits black: battery charge, red: charge/discharge profile cost is +∞ for violating constraints; zero otherwise

Consensus and exchange 52

slide-53
SLIDE 53

Electric vehicle charging system

t 5 10 15 20 10 20 30 40 50 60 70 80

black: desired charge profile; blue: charge profile shortfall cost for not meeting desired charge

Consensus and exchange 53

slide-54
SLIDE 54

HVAC

t 5 10 15 20 5 10 15 20 0.5 1 1.5 2 50 60 70 80 90 100

thermal load (e.g., room, refrigerator) with temperature limits magenta: ambient temperature; blue: load temperature red: cooling energy profile cost is +∞ for violating constraints; zero otherwise

Consensus and exchange 54

slide-55
SLIDE 55

External tie

t 5 10 15 20 5 10 15 20 −1 1 0.5 1 1.5 2 2.5

buy/sell energy from/to external grid at price pext(t) ± γ(t) solid: pext(t); dashed: pext(t) ± γ(t)

Consensus and exchange 55

slide-56
SLIDE 56

Smart grid example

10 devices (already described above)

3 generators 2 fixed loads 1 shiftable load 1 EV charging systems 1 battery 1 HVAC system 1 external tie

Consensus and exchange 56

slide-57
SLIDE 57

Convergence

iteration: k = 1 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-58
SLIDE 58

Convergence

iteration: k = 3 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-59
SLIDE 59

Convergence

iteration: k = 5 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-60
SLIDE 60

Convergence

iteration: k = 10 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-61
SLIDE 61

Convergence

iteration: k = 15 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-62
SLIDE 62

Convergence

iteration: k = 20 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-63
SLIDE 63

Convergence

iteration: k = 25 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-64
SLIDE 64

Convergence

iteration: k = 30 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-65
SLIDE 65

Convergence

iteration: k = 35 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-66
SLIDE 66

Convergence

iteration: k = 40 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-67
SLIDE 67

Convergence

iteration: k = 45 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-68
SLIDE 68

Convergence

iteration: k = 50 t 5 10 15 20 5 10 15 20 5 10 15 20 0.5 1 1.5 2 4 6 5 10 t 5 10 15 20 −0.3 −0.2 −0.1 0.1 0.2 0.3

left: solid: optimal generator profile, dashed: profile at kth iteration right: residual vector ¯

xk

Consensus and exchange 57

slide-69
SLIDE 69

Outline

Dual decomposition Method of multipliers Alternating direction method of multipliers Common patterns Examples Consensus and exchange Conclusions

Conclusions 58

slide-70
SLIDE 70

Summary and conclusions

ADMM

is the same as, or closely related to, many methods with other names has been around since the 1970s gives simple single-processor algorithms that can be competitive with

state-of-the-art

can be used to coordinate many processors, each solving a substantial

problem, to solve a very large problem

Conclusions 59