Flexible ADMM for Block-Structured Convex and Nonconvex - - PowerPoint PPT Presentation

flexible admm for block structured convex and nonconvex
SMART_READER_LITE
LIVE PREVIEW

Flexible ADMM for Block-Structured Convex and Nonconvex - - PowerPoint PPT Presentation

Introduction The ADMM Algorithm The Main Result Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint work with Mingyi Hong, Tsung-Hui Chang, Xiangfeng Wang, Meisam Razaviyanyn, Shiqian Ma University


slide-1
SLIDE 1

Introduction The ADMM Algorithm The Main Result

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization

Zhi-Quan (Tom) Luo

Joint work with Mingyi Hong, Tsung-Hui Chang, Xiangfeng Wang, Meisam Razaviyanyn, Shiqian Ma University of Minnesota

September, 2014

1 / 57

slide-2
SLIDE 2

Introduction The ADMM Algorithm The Main Result

Problem

◮ We consider the following block-structured problem

minimize f(x) := g(x1, x2, · · · , xK) +

K

  • k=1

hk(xk) subject to Ex := E1x1 + E2x2 + · · · + EKxK = q xk ∈ Xk, k = 1, 2, ..., K, (1.1)

◮ x := (xT 1 , ..., xT K)T ∈ ℜn is a partition of the optimization

variable x, X = K

k=1 Xk is the feasible set for x ◮ g(·): smooth, possibly nonconvex; coupling all variables ◮ hk(·): convex, possibly nonsmooth ◮ E := (E1, E2, ..., EK) ∈ ℜm×n is a partition of E

2 / 57

slide-3
SLIDE 3

Introduction The ADMM Algorithm The Main Result

Applications

Lots of emerging applications

◮ Compressive Sensing Estimate a sparse vector x by solving

the following (K = 2) [Candes 08]: minimize z2 + λx1 subject to Ex + z = q, where E is a (fat) observation matrix and q ≈ Ex is a noisy

  • bservation vector

◮ If we require x ≥ 0 then we obtain a three block (K = 3)

convex separable optimization problem

3 / 57

slide-4
SLIDE 4

Introduction The ADMM Algorithm The Main Result

Applications (cont.)

◮ Stable Robust PCA Given a noise-corrupted observation

matrix M ∈ ℜm×n, separate a low rank matrix L and a sparse matrix S [Zhou 10] minimize L∗ + ρS1 + λZ2

F

subject to L + S + Z = M

◮ · ∗: the matrix nuclear norm ◮ · 1 and · F denote the ℓ1 and the Frobenius norm of a

matrix

◮ Z denotes the noise matrix

4 / 57

slide-5
SLIDE 5

Introduction The ADMM Algorithm The Main Result

Applications: The BP Problem

◮ Consider the basis pursuit (BP) problem [Chen et al 98]

min

x

x1 s.t. Ex = q, x ∈ X.

◮ Partition x by x = [xT 1 , · · · , xT K]T where xk ∈ ℜnk ◮ Partition E accordingly ◮ The BP problem becomes a K block problem

min

x K

  • k=1

xk1 s.t.

K

  • k=1

Ekxk = q, xk ∈ Xk, ∀ k.

5 / 57

slide-6
SLIDE 6

Introduction The ADMM Algorithm The Main Result

Applications: Wireless Networking

◮ Consider a network with K secondary users (SUs), L primary

users (PUs) and a secondary BS (SBS)

◮ sk: user k’s transmit power; rk the channel between user k

and the SBS; Pk SU k’s total power budget

◮ gkℓ: the channel between the kth SU to the ℓth PU

Figure: Illustration of the CR network.

6 / 57

slide-7
SLIDE 7

Introduction The ADMM Algorithm The Main Result

Applications: Wireless Networking

◮ Objective maximize the SUs’ throughput, subject to limited

interference to PUs: max

{sk}

log

  • 1 +

K

  • k=1

|rk|2sk

  • s.t.

0 ≤ sk ≤ Pk,

K

  • k=1

|gkℓ|2sk ≤ Iℓ, ∀ ℓ, k,

◮ Again in the form of (1.1) ◮ Similar formulation for systems with multiple channels,

multiple transmit/receive antennas

7 / 57

slide-8
SLIDE 8

Introduction The ADMM Algorithm The Main Result

Application: DR in Smart Grid Systems

◮ Utility company bids the electricity from the power market ◮ Total cost

Bidding cost in a wholesale day-ahead market Bidding cost in real-time market

◮ The demand response (DR) problem [Alizadeh et al 12]

Utility have control over the power consumption of users’ appliances (e.g., controlling the charging rate of electrical vehicles) Objective: minimize the total cost

8 / 57

slide-9
SLIDE 9

Introduction The ADMM Algorithm The Main Result

Application: DR in Smart Grid Systems

◮ K customers, L periods ◮ {pℓ}L ℓ=1: the bids in a day-ahead market for a period L ◮ xk ∈ ℜnk: control variables for the appliances of customer k ◮ Objective: Minimize the bidding cost + power imbalance

cost, by optimizing the bids and controlling the appliances [Chang et al 12]

min

{xk},p,z

Cp(z) + Cs

  • z + p −

K

  • k=1

Ψkxk

  • + Cd(p)

s.t.

K

  • k=1

Ψkxk − p − z ≤ 0, z ≥ 0, p ≥ 0, xk ∈ Xk, ∀ k.

9 / 57

slide-10
SLIDE 10

Introduction The ADMM Algorithm The Main Result

Challenges

◮ For huge scale (BIG data) applications, efficient algorithms

needed

◮ Many existing first-order algorithms do not apply

◮ The block coordinate descent algorithm (BCD) cannot deal

with linear coupling constraints [Bertsekas 99]

◮ The block successive upper-bound minimization (BSUM)

method cannot apply either [Razaviyayn-Hong-Luo 13]

◮ The alternating direction method of multipliers (ADMM) only

works for convex problem with 2 blocks of variables and separable objective [Boyd et al 11][Chen et al 13]

◮ General purpose algorithms can be very slow

10 / 57

slide-11
SLIDE 11

Introduction The ADMM Algorithm The Main Result

Agenda

◮ The ADMM for multi-block structured convex optimization

The main steps of the algorithm Rate of convergence analysis

◮ The BSUM-M for multi-block structured convex optimization

The main steps of the algorithm Convergence analysis

◮ The flexible ADMM for structured nonconvex optimization

The main steps of the algorithm Convergence analysis

◮ Conclusions

11 / 57

slide-12
SLIDE 12

Introduction The ADMM Algorithm The Main Result

Agenda

◮ The ADMM for multi-block structured convex optimization

The main steps of the algorithm Rate of convergence analysis

◮ The BSUM-M for multi-block structured convex optimization

The main steps of the algorithm Convergence analysis

◮ The flexible ADMM for structured nonconvex optimization

The main steps of the algorithm Convergence analysis

◮ Conclusions

12 / 57

slide-13
SLIDE 13

Introduction The ADMM Algorithm The Main Result

The ADMM Algorithm

◮ The augmented Lagrangian function for problem (1.1) is

L(x; y) = f(x) + y, q − Ex + ρ 2q − Ex2, (1.2) where ρ ≥ 0 is a constant

◮ The primal problem is given by

d(y) = min

x

f(x) + y, q − Ex + ρ 2q − Ex2 (1.3)

◮ The dual problem is

d∗ = max

y

d(y), (1.4) d∗ equals to the optimal solution of (1.1) under mild conditions

13 / 57

slide-14
SLIDE 14

Introduction The ADMM Algorithm The Main Result

The ADMM Algorithm

Alternating Direction Method of Multipliers (ADMM) At each iteration r ≥ 1, first update the primal variable blocks in the Gauss-Seidel fashion and then update the dual multiplier:

         xr+1

k

= arg min

xk∈Xk L(xr+1 1

, ..., xr+1

k−1, xk, xr k+1, ..., xr K; yr), ∀ k

yr+1 = yr + α(q − Exr+1) = yr + α

  • q −

K

  • k=1

Ekxr+1

k

  • ,

where α > 0 is the step size for the dual update.

◮ Inexact primal minimization ⇒ q − Ext+1 is no longer the

dual gradient!

◮ Dual ascent property d(yt+1) ≥ d(yt) is lost ◮ Consider α = 0, or α ≈ 0...

14 / 57

slide-15
SLIDE 15

Introduction The ADMM Algorithm The Main Result

The ADMM Algorithm (cont.)

◮ The Alternating Direction Method of Multipliers (ADMM)

  • ptimizes the augmented Lagrangian function one block

variable at each time [Boyd 11, Bertsekas 10]

◮ Recently found lots of applications in large-scale structured

  • ptimization; see [Boyd 11] for a survey

◮ Highly efficient, especially when the per-block subproblems are

easy to solve (with closed-form solution)

◮ Used widely (wildly?), even to nonconvex problems, with no

guarantee of convergence

15 / 57

slide-16
SLIDE 16

Introduction The ADMM Algorithm The Main Result

Known Convergence Results and Challenges

◮ K = 1: reduces to the conventional dual ascent algorithm

[Bertsekas 10]; The convergence and rate of convergence has been analyzed in [Luo 93, Tseng 87]

◮ K = 2: a special case of Douglas-Rachford splitting method,

and its convergence is studied in [Douglas 56, Eckstein 89]

◮ K = 2: the rate of convergence has recently been studied in

[Deng 12]; analysis based on strong convexity and a contraction argument; Iteration complexity has been studied in [He 12]

16 / 57

slide-17
SLIDE 17

Introduction The ADMM Algorithm The Main Result

Main Challenges: How about K ≥ 3?

◮ Oddly, when K ≥ 3, there is little convergence analysis ◮ Recently [Chen et al 13] discovered a counter example

showing three-block ADMM is not necessarily convergent

◮ When f(·) is strongly convex, and when α is small enough,

the algorithm converges [Han-Yuan 13]

◮ Some relaxed condition has been given recently in

[Lin-Ma-Zhang 14], but still need K − 1 blocks to be strongly convex

◮ What about the case when fk(·)’s are convex but not strongly

convex? nonsmooth?

◮ Besides convergence, can we characterize how fast the

algorithm converges?

17 / 57

slide-18
SLIDE 18

Introduction The ADMM Algorithm The Main Result

Agenda

◮ The ADMM for multi-block structured convex optimization

The main steps of the algorithm Rate of convergence analysis

◮ The BSUM-M for multi-block structured convex optimization

The main steps of the algorithm Convergence analysis

◮ The flexible ADMM for structured nonconvex optimization

The main steps of the algorithm Convergence analysis

◮ Conclusions

18 / 57

slide-19
SLIDE 19

Introduction The ADMM Algorithm The Main Result

Our Main Result [Hong-Luo 12]

Suppose some regularity conditions hold. If the stepsize α is suffi- ciently small, then

◮ the sequence of iterates {(xr, yr)} generated by the ADMM

algorithm (12) converges linearly to an optimal primal-dual solution for (1.1).

◮ the sequence of feasibility violation {Exr − q} converges

linearly.

◮ No strong convexity assumed ◮ Linear convergence here means certain measure of optimality

gap shrinks by a constant factor after each ADMM iteration

◮ This result applies to any finite K > 0

19 / 57

slide-20
SLIDE 20

Introduction The ADMM Algorithm The Main Result

Main Assumptions

The following are the main assumptions regarding f: (a) The global minimum of (1.1) is attained and so is its dual

  • ptimal value

(b) The smooth part g further decomposable as g(x1, · · · , xk) =

K

  • k=1

gk(Akxk) where gk is convex; Ak’s are some given matrices (not necessarily full column rank) (c) Each gk is strictly convex and continuously differentiable with a uniform Lipschitz continuous gradient AT

k ∇gk(Axk) − AT k ∇gk(Ax′ k) ≤ Lxk − x′ k, ∀ xk, x′ k ∈ Xk

20 / 57

slide-21
SLIDE 21

Introduction The ADMM Algorithm The Main Result

Main Assumptions (cont.)

(d) Each hk satisfies either one of the following conditions (1) The epigraph of hk(xk) is a polyhedral set. (2) hk(xk) = λkxk1 +

J wJxk,J2, where

xk = (· · · , xk,J, · · · ) is a partition of xk with J being the partition index. (3) Each hk(xk) is the sum of the functions described in the previous two items. (e) Each submatrix Ek has full column rank. (f) The feasible sets Xk’s are compact polyhedral sets.

21 / 57

slide-22
SLIDE 22

Introduction The ADMM Algorithm The Main Result

Preliminary: Measures of Optimality (cont.)

◮ Let X(yr) denote the set of optimal solutions for

d(yr) = min

x L(x; yr),

and let ¯ xr = argmin

¯ x∈X(yr)

¯ x − xr.

◮ Let us define

dist (xr, X(yr)) = min

¯ x∈X(yr) ¯

x − xr, and dist (yr, Y ∗) = min

¯ y∈Y ∗ ¯

y − yr.

22 / 57

slide-23
SLIDE 23

Introduction The ADMM Algorithm The Main Result

The Key Idea

◮ Define the dual optimality gap as

∆r

d = d∗ − d(yr) ≥ 0. ◮ Define the primal optimality gap as

∆r

p = L(xr+1; yr) − d(yr) ≥ 0. ◮ If ∆r d + ∆r p = 0, then an optimal solution is obtained ◮ The Key Step: Show that the combined dual and primal

gaps ∆r

d + ∆r p decreases linearly in each iteration

23 / 57

slide-24
SLIDE 24

Introduction The ADMM Algorithm The Main Result

Illustration of the Gaps (iteration r)

Figure: Illustration of the reduction of the combined gap.

24 / 57

slide-25
SLIDE 25

Introduction The ADMM Algorithm The Main Result

Illustration of the Gaps (iteration r + 1)

Figure: Illustration of the reduction of the combined gap.

25 / 57

slide-26
SLIDE 26

Introduction The ADMM Algorithm The Main Result

Illustration of the Gaps (iteration r + 2)

Figure: Illustration of the reduction of the combined gap.

26 / 57

slide-27
SLIDE 27

Introduction The ADMM Algorithm The Main Result

Agenda

◮ The ADMM for multi-block structured convex optimization

The main steps of the algorithm Rate of convergence analysis

◮ The BSUM-M for multi-block structured convex optimization

The main steps of the algorithm Convergence analysis

◮ The flexible ADMM for structured nonconvex optimization

The main steps of the algorithm Convergence analysis

◮ Conclusions

27 / 57

slide-28
SLIDE 28

Introduction The ADMM Algorithm The Main Result

The BSUM-M Algorithm: Motivation and Main Ideas

◮ Questions

◮ Can we do inexact primal update (i.e., proximal update)? ◮ How to choose the dual stepsize α? ◮ Can we consider more flexible block selection rules?

◮ To address these questions, we introduce the

Block Successive Upperbound Minimization method of Multipliers (BSUM-M)

◮ Main idea: Primal update

Pick the primal variables either sequentially or randomly Optimize some approximate version of L(x, y)

◮ Main idea: Dual update

Inexact dual ascent + proper step size control

28 / 57

slide-29
SLIDE 29

Introduction The ADMM Algorithm The Main Result

The BSUM-M Algorithm: Details

◮ At iteration r + 1, a block variable xk is updated by solving

min

xk∈Xk

uk

  • xk; xr+1

1

, · · · , xr+1

k−1, xr k, · · · , xr K

  • + yr+1, q − Ekxk + hk(xk)

◮ uk(· ; xr+1 1

, · · · , xr+1

k−1, xr k, · · · , xr K): is an upper-bound of

g(x) + ρ 2q − Ex2 at the current iterate (xr+1

1

, · · · , xr+1

k−1, xr k, · · · , xr K) ◮ Proximal gradient step, proximal point step are special cases

29 / 57

slide-30
SLIDE 30

Introduction The ADMM Algorithm The Main Result

The BSUM-M Algorithm: G-S Update Rule

The BSUM-M Algorithm At each iteration r ≥ 1:

       yr+1 = yr + αr(q − Exr) = yr + αr

  • q −

K

  • k=1

Ekxr

k

  • ,

xr+1

k

= arg min

xk∈Xk uk(xk; wr+1 k

) − yr+1, Ekxk + hk(xk), ∀ k

where αr > 0 is the dual stepsize.

◮ To simplify notations, we have defined

wr+1

k

:= (xr+1

1

, · · · , xr+1

k−1, xr k, xr k+1, · · · , xr K),

30 / 57

slide-31
SLIDE 31

Introduction The ADMM Algorithm The Main Result

The BSUM-M Algorithm: Randomized Update Rule

◮ Select a vector {pk > 0}K k=0 such that K k=0 pk = 1 ◮ Each iteration “t” only updates a single randomly selected

primal or dual variable

The Randomized BSUM-M Algorithm At iteration t ≥ 1, pick k ∈ {0, · · · , K} with probability pk and If k = 0 yt+1 = yt + αt(q − Ext), xt+1

k

= xt

k, k = 1, · · · , K.

Else If k ∈ {1, · · · , K} xt+1

k

= argminxk∈Xk uk(xk; xt) − yr, Ekxk + hk(xk), xt+1

j

= xt

j, ∀ j = k,

yt+1 = yt. End

31 / 57

slide-32
SLIDE 32

Introduction The ADMM Algorithm The Main Result

Key Features

◮ Primal update similar to (randomized) BCD [Nestrov 12]

[Richt´ arik- Tak´ aˇ c12] [Saha-Tewari 13]; but can deal with linear coupling constraint

◮ Primal-dual update similar to ADMM; but can deal with

multiple coupled blocks

◮ Using approximate upper bound function – closed-form

subproblem

◮ Flexibility in update schedule – deterministic+randomized ◮ Key Questions

How to select the approximate upper bound function How to select the primal/dual stepsize (ρ, α) Guaranteed convergence?

32 / 57

slide-33
SLIDE 33

Introduction The ADMM Algorithm The Main Result

Convergence Analysis: Assumptions

◮ Assumption A (on the problem)

(a) Problem (1.1) is convex and feasible (b) g(x) = ℓ(Ax) + x, b; ℓ(·) smooth strictly convex, A not necessarily full column rank (c) Nonsmooth function hk: hk(xk) = λkxk1 +

  • J

wJxk,J2, where xk = (· · · , xk,J, · · · ) is a partition of xk; λk ≥ 0 and wJ ≥ 0 are some constants. (d) The feasible sets {Xk} are compact polyhedral sets, and are given by Xk := {xk | Ckxk ≤ ck}.

33 / 57

slide-34
SLIDE 34

Introduction The ADMM Algorithm The Main Result

Convergence Analysis: Assumptions

◮ Assumption B (on uk)

(a) uk(vk; x) ≥ g(vk, x−k) + ρ

2Ekvk − q + E−kx−k2,

∀ vk ∈ Xk, ∀ x, k (upper-bound) (b) uk(xk; x) = g(x) + ρ

2Ex − q2, ∀ x, k

(locally tight) (c) ∇uk(xk; x) = ∇k

  • g(x) + ρ

2Ex − q2

, ∀ x, k (d) For any given x, uk(vk; x) is strongly convex in vk (e) For given x, uk(vk; x) has Lipchitz continuous gradient

Figure: Illustration of the upper-bound.

34 / 57

slide-35
SLIDE 35

Introduction The ADMM Algorithm The Main Result

The Convergence Result [Hong et al 13]

Suppose Assumptions A-B hold, and the dual stepsize αr satisfies

  • r=1

αr = ∞, lim

r→∞ αr = 0.

Then we have the following:

◮ For the BSUM-M, we have limr→∞ Exr − q = 0, and every

limit point of {xr, yr} is a primal and dual optimal solution.

◮ For the RBSUM-M, we have limt→∞ Ext − q = 0 w.p.1.

Further, every limit point of {xt, yt} is a primal and dual

  • ptimal solution w.p.1.

35 / 57

slide-36
SLIDE 36

Introduction The ADMM Algorithm The Main Result

Numerical Result: Counterexample for multi-block ADMM

◮ Recently [Chen-He-Ye-Yuan 13] shows (through an example)

that applying ADMM to multi-block problem can diverge

◮ We show applying (R)BSUM-M to the same problem

converges

◮ Main message: Dual stepsize control is crucial ◮ Consider the following linear systems of equations (unique

solution x1 = x2 = x3 = 0)

E1x1 + E2x2 + E3x3 = 0, with [E1 E2 E3] =   1 1 1 1 1 2 1 2 2   .

36 / 57

slide-37
SLIDE 37

Introduction The ADMM Algorithm The Main Result

Counterexample for multi-block ADMM (cont.)

50 100 150 200 250 300 350 400 −0.25 −0.2 −0.15 −0.1 −0.05 0.05 0.1 0.15 0.2 iteration (r) x1 x2 x3 ||x1+x2+x3||

Figure: Iterates generated by the BSUM-M. Each curve is averaged

  • ver 1000 runs (with random

starting points).

200 400 600 800 1000 1200 −0.5 −0.4 −0.3 −0.2 −0.1 0.1 0.2 0.3 0.4 0.5 iteration (t) x1 x2 x3 ||x1+x2+x3||

Figure: Iterates generated by the RBSUM-M algorithm. Each curve is averaged over 1000 runs (with random starting points)

37 / 57

slide-38
SLIDE 38

Introduction The ADMM Algorithm The Main Result

Agenda

◮ The ADMM for multi-block structured convex optimization

The main steps of the algorithm Rate of convergence analysis

◮ The BSUM-M for multi-block structured convex optimization

The main steps of the algorithm Convergence analysis

◮ The flexible ADMM for structured nonconvex optimization

The main steps of the algorithm Convergence analysis

◮ Conclusions

38 / 57

slide-39
SLIDE 39

Introduction The ADMM Algorithm The Main Result

ADMM for nonconvex problem?

◮ ADMM is known to work for separable convex problems ◮ But ADMM is also known to work well for nonconvex

problems, at least empirically

◮ Nonnegative matrix factorization [Zhang 10] [Sun-Fevotte 14] ◮ Phase retrieval [Wen et al 12] ◮ Distributed matrix factorization [Ling-Xu-Yin-Wen 12] ◮ Polynomial optimization [Jiang-Ma-Zhang 13] ◮ Asset allocation [Wen et al 13] ◮ Zero variance discriminant analysis [Ames-Hong 14] ◮ ...

◮ Although ADMM works very well empirically, theoretically

little is known

◮ To show convergence, most of the analysis assumes favorable

properties on the iterates generated by the algorithm...

39 / 57

slide-40
SLIDE 40

Introduction The ADMM Algorithm The Main Result

Convergence analysis of ADMM for nonconvex problems

◮ It is indeed possible to show ADMM globally converges for

nonconvex problems [Hong-Luo 14]

◮ For a family of nonconvex consensus problems ◮ For a family of nonconvex, multi-block sharing problems

◮ Key ingredients:

◮ Consider the vanilla ADMM ◮ Keep primal and dual stepsize identical (α = ρ) ◮ ρ large enough to make each subproblem strongly convex ◮ Use the augmented Lagrangian as the potential function

◮ Our analysis can extend to flexible block selection rules

◮ Gauss-Seidel block selection rule ◮ Randomized block selection rule ◮ Essentially Cyclic block selection rule 40 / 57

slide-41
SLIDE 41

Introduction The ADMM Algorithm The Main Result

The Consensus Problem

◮ Consider the following nonconvex problem

min f(x) :=

K

  • k=1

gk(x) + h(x) s.t. x ∈ X (3.5)

◮ gk: smooth, possibly nonconvex functions ◮ h: is a convex nonsmooth regularization term ◮ This is the global consensus problem discussed heavily in

[Section 7, Boyd et al 11], but there only convex cases are considered

41 / 57

slide-42
SLIDE 42

Introduction The ADMM Algorithm The Main Result

The Consensus Problem (cont.)

◮ In some applications, each gk handled by a single agent ◮ This motivates the following consensus formulation

min

K

  • k=1

gk(xk) + h(x) s.t. xk = x, ∀ k = 1, · · · , K, x ∈ X. (3.6)

◮ The augmented Lagrangian is given by

L({xk}, x; y) =

K

  • k=1

gk(xk) + h(x) +

K

  • k=1

yk, xk − x +

K

  • k=1

ρk 2 xk − x2.

42 / 57

slide-43
SLIDE 43

Introduction The ADMM Algorithm The Main Result

The ADMM for the Consensus Problem

Algorithm 1. ADMM for the Consensus Problem At each iteration t + 1, compute: xt+1 = argmin

x∈X L({xt k}, x; yt).

(3.7) Each node k computes xk by solving: xt+1

k

= arg min

xk gk(xk) + yt k, xk − xt+1 + ρk

2 xk − xt+12. (3.8) Update the dual variable: yt+1

k

= yt

k + ρk

  • xt+1

k

− xt+1 . (3.9)

43 / 57

slide-44
SLIDE 44

Introduction The ADMM Algorithm The Main Result

Main Assumptions

Assumption C

  • C1. Each ∇gk is Lipschitz Continuous with constant Lk; h is

convex (possible nonsmooth)

  • C2. For all k, the stepsize ρk is chosen large enough such that:

◮ For all k, the xk subproblem is strongly convex with modulus

γk(ρk);

◮ For all k, ρk > max{

2L2

k

γk(ρk), Lk}.

  • C3. f(x) is lower bounded for all x ∈ X.

44 / 57

slide-45
SLIDE 45

Introduction The ADMM Algorithm The Main Result

Convergence Analysis [Hong-Luo 14]

Suppose Assumption C is satisfied. Then lim

t→∞ xt+1 k

− xt+1 = 0. Further, we have the following

◮ Any limit point of the sequence generated by the ADMM is a

stationary solution of problem (3.6).

◮ If X is a compact set, then the sequence converges to the set

  • f stationary solutions of problem (3.6).

◮ Primal feasibility always satisfied in the limit ◮ No assumptions made on the iterates

45 / 57

slide-46
SLIDE 46

Introduction The ADMM Algorithm The Main Result

The Sharing Problem

◮ Consider the following problem

min f(x1, · · · , xK) :=

K

  • k=1

gk(xk) + ℓ K

  • k=1

Akxk

  • s.t.

xk ∈ Xk, k = 1, · · · , K. (3.10)

◮ ℓ: smooth nonconvex ◮ gk: either smooth nonconvex or convex (possibly nonsmooth) ◮ Similar to the well-known sharing problem discussed in

[Section 7.3, Boyd et al 11], but allows nonconvex objective

46 / 57

slide-47
SLIDE 47

Introduction The ADMM Algorithm The Main Result

Reformulation

◮ This problem can be equivalently formulated into

min

K

  • k=1

gk(xk) + ℓ (x) s.t.

K

  • k=1

Akxk = x, xk ∈ Xk, k = 1, · · · , K. (3.11)

◮ A K-block, nonconvex reformulation ◮ Even if gk’s and ℓ are convex, not clear whether ADMM

converges

47 / 57

slide-48
SLIDE 48

Introduction The ADMM Algorithm The Main Result

Main Assumptions

Assumption D

  • D1. ∇ℓ(x) is Lipcshitz continuous with constant L; Each Ak full

column rank, with ρmin(AT

k Ak) > 0.

  • D2. The stepsize ρ is chosen large enough such that:

(1) each xk and x subproblem is strongly convex, with modulus {γk(ρ)}K

k=1 and γ(ρ), respectively.

(2) ρ > max

  • 2L2

γ(ρ), L

  • .
  • D3. f(x1, · · · , xK) is lower bounded for all xk ∈ Xk and all k.
  • D4. gk is either nonconvex Lipcshitz continuous with constant Lk,
  • r convex (possibly nonsmooth).

48 / 57

slide-49
SLIDE 49

Introduction The ADMM Algorithm The Main Result

Convergence Analysis [Hong-Luo 14]

Suppose Assumption D is satisfied. Then lim

t→∞ xt+1 k

− xt+1 = 0. Further, we have the following

◮ Every limit point generated by ADMM is a stationary

solution of problem (3.11).

◮ If Xk is a compact set for all k, then ADMM converges to

the set of stationary solutions of problem (3.11).

◮ Primal feasibility always satisfied in the limit ◮ No assumptions made on the iterates

49 / 57

slide-50
SLIDE 50

Introduction The ADMM Algorithm The Main Result

Remarks

◮ For the sharing problem, if all objectives are convex, our result

shows that multi-block ADMM converges with ρ ≥ √ 2L

◮ Similar analysis applies for the 2-block reformulation of the

sharing problem

◮ Analysis can be extended to include proximal block updates ◮ Analysis can be generalized to flexible block update rules – all

xk’s do not need to update at the same time

50 / 57

slide-51
SLIDE 51

Introduction The ADMM Algorithm The Main Result

Conclusions and Future Works

◮ We have shown the convergence and the rate of convergence

for multiblock ADMM without strong convexity

◮ The key is to use the combined primal-dual gap as the

potential function

◮ We introduce a new algorithm called BSUM-M that can solve

multi-block linearly constrained convex problems

◮ The key is to use a diminishing dual stepsize ◮ We show that ADMM converges for two families of

nonconvex, possibly multiple problems

◮ The key is to use the Augemented Lagrangian as the potential

function

51 / 57

slide-52
SLIDE 52

Introduction The ADMM Algorithm The Main Result

Conclusions and Future Works (cont.)

◮ Iteration complexity analysis for multi-block and/or nonconvex

ADMM?

◮ Can we generalize the analysis for nonconvex ADMM to a

wider range of problems?

◮ Nonlinearly constrained problems?

52 / 57

slide-53
SLIDE 53

Introduction The ADMM Algorithm The Main Result

Thank You!

53 / 57

slide-54
SLIDE 54

Introduction The ADMM Algorithm The Main Result

Reference

1 [Ames-Hong 14] Ames, B. and Hong, M. “Alternating directions method

  • f multipliers for l1- penalized zero variance discriminant analysis and

principal component analysis,” Preprint 2 [Bertsekas 99] Bertsekas, D.P.: Nonlinear Programming. Athena Scientific. 3 [Boyd et al 11] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J.: Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning. 4 [Candes 09] Candes, E and Plan , Y.: Ann. Statist. 5 [Chen-He-Ye-Yuan 13] C. Chen, B. He, X. Yuan, and Y. Ye, “The direct extension of admm for multi-block convex minimization problems is not necessarily convergent,” 2013. 6 [Douglas 56] Douglas, J. and Rachford, H.H.: On the numerical solution

  • f the heat conduction problem in 2 and 3 space variables. Trans. of the

American Math. Soc.

54 / 57

slide-55
SLIDE 55

Introduction The ADMM Algorithm The Main Result

Reference

7 [Deng 12] Deng W. and Yin. W.: On the global and linear convergence

  • f the generalized alternating direction method of multipliers. Rice

CAAM tech report 8 [Eckstein 89] Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D Thesis, Operations Research Center, MIT. 9 [Nestrov 12] Y. Nesterov, “Efficiency of coordiate descent methods on huge-scale optimization problems,” SIAM Journal on Optimization, vol. 22, no. 2, 2012. 10 [Han-Yuan 12] Han D. and Yuan X.: A Note on the Alternating Direction Method of Multipliers, J Optim Theory Appl 11 [He-Yuan 12] He, B. S. and Yuan, X. M.: On the O(1/n) convergence rate of the Douglas-Rachford alternating direction method. SIAM J.

  • Numer. Anal.

12 [Hong-Luo 12] Hong, M. and Z.-Q., Luo: On the linear convergence of ADMM Algorithm. Manuscript. 13 [Hong-Luo 14] Hong, M. and Z.-Q., Luo: Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex

  • Problems. Manuscript.

55 / 57

slide-56
SLIDE 56

Introduction The ADMM Algorithm The Main Result

Reference

14 [Hong et al 13] Hong, M. et al: A Block Successive Upper Bound Minimization Method of Multipliers for Linearly Constrained Convex

  • Optimization. Manuscript.

15 [Jiang-Ma-Zhang 13] Jiang, B. and Ma, S. and Zhang, S. “Alternating direction method of multipliers for real and complex polynomial

  • ptimization models,” manuscript

16 [Lin-Ma-Zhang 14] Lin, T. and Ma, S. and Zhang, S. “On the Convergence Rate of Multi-Block ADMM,” manuscript, 2014 17 [Ling et al 21] Ling, Q. et al, “Decentralized low-rank matrix completion,” ICASSP, 2012 18 [Luo 93] Luo, Z.-Q. and Tseng, P.: On the convergence rate of dual ascent methods for strictly convex minimization. Math. of Oper. Res. 19 [Razaviyayn-Hong-Luo 13] Razaviyayn, M., and Hong, M. and Luo, Z.-Q.: A unified convergence analysis of block successive minimization methods for nonsmooth optimization. SIAM J. Opt. 2013 20 [Richt´ arik- Tak´ aˇ c12] P. Richtarik and M. Takac, “Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function,” Mathematical Programming, 2012.

56 / 57

slide-57
SLIDE 57

Introduction The ADMM Algorithm The Main Result

Reference

21 [Saha-Tewari 13] A. Saha and A. Tewari, “On the nonaymptotic convergence of cyclic coordinate descent method,” SIAM Journal on Optimization, vol. 23, no. 1, 2013. 22 [Tseng 87] Tseng, P., and Bertsekas D. P.: Relaxation methods for problems with strictly convex separable costs and linear constraints.

  • Math. Prog.

23 [Wang 13] Wang X. Hong M. Ma S. and Z.-Q. Luo: Solving Multiple-Block Separable Convex Minimization Problems Using Two-Block Alternating Direction Method of Multipliers. Manuscript 24 [Wen et al 12] Wen, Z. et al, “Alternating direction methods for classical and ptychographic phase retrieval,” Inverse Problems, 2012. 25 [Yang 11] Yang J. and Zhang Y. Alternating direction algorithms for l1-problems in compressive sensing. SIAM J. on Scientific Comp. 26 [Zhou 10] Zhou, Z., Li, X., Wright, J., Candes, E.J., and Ma, Y.: Stable principal component pursuit. Proceedings of IEEE ISIT

57 / 57