Distributed nonsmooth composite optimization via the proximal augmented Lagrangian
Neil K. Dhingra
neilkdh.com joint work with Sei Zhen Khong Mihailo Jovanović
LCCC Focus Period on Large-Scale and Distributed Optimization June 9, 2017
1 / 35
Distributed nonsmooth composite optimization via the proximal - - PowerPoint PPT Presentation
Distributed nonsmooth composite optimization via the proximal augmented Lagrangian Neil K. Dhingra neilkdh.com joint work with Sei Zhen Khong Mihailo Jovanovi LCCC Focus Period on Large-Scale and Distributed Optimization June 9, 2017 1 /
1 / 35
2 / 35
◮ f – possibly nonconvex; cts-differentiable ◮ g – convex; often non-differentiable ◮ Tx – promote structure in alternate coordinates ◮ g(x) admits easily computable proximal operator, g(Tx) does not
3 / 35
4 / 35
5 / 35
◮ Proximal operator
z
◮ Moreau envelope
z
6 / 35
◮ Soft-thresholding – proximal operator for ℓ1 norm
zi
7 / 35
x, z
◮ Decouples f and g ◮ Can use methods for constrained optimization
8 / 35
x,z
◮ Gradient ascent on a strengthened dual problem ◮ Requires joint minimization over x and z ◮ Well-studied: convergence to local minimum, adaptive µ update,
9 / 35
10 / 35
10 / 35
10 / 35
10 / 35
10 / 35
10 / 35
x
z
◮ Convenient for distributed implementation ◮ Convergence speed influenced by µ ◮ Challenge: convergence for nonconvex f
11 / 35
12 / 35
12 / 35
12 / 35
12 / 35
12 / 35
12 / 35
12 / 35
12 / 35
12 / 35
x
z
13 / 35
µ(x, y) = proxµg(Tx + µy)
µ(x, y); y)
14 / 35
x
◮ Nonconvex f: convergence to local minimum ◮ x-minimization step: differentiable problem
15 / 35
16 / 35
16 / 35
16 / 35
16 / 35
x
x
17 / 35
18 / 35
19 / 35
20 / 35
◮ Convenient for distributed implementation
21 / 35
◮ Convex f – asymptotic convergence
◮ Strongly cvx, Lip. cts gradient – linear convergence
22 / 35
x Lµ(x; y)
23 / 35
x Lµ(x; y)
23 / 35
x
x Lµ(x; y)
23 / 35
1 µ∇yLµ(x1; y0), min x Lµ(x; y)
23 / 35
x
x Lµ(x; y)
23 / 35
1 µ∇yLµ(x2; y1), min x Lµ(x; y)
23 / 35
x
x Lµ(x; y)
23 / 35
x Lµ(x; y)
24 / 35
x Lµ(x; y)
24 / 35
x Lµ(x; y)
24 / 35
µ(v − proxµg(v)) ◮ Distributed implementation if g separable and
25 / 35
µ(v − proxµg(v)) ◮ Distributed implementation if g separable and
◮ Each node xi
25 / 35
1 2Ax − b2 2 +
26 / 35
x
x1,x2,...
◮ T T is Laplacian or incidence matrix of connected network
x1,x2,...
27 / 35
x
x1,x2,...
◮ T T is Laplacian or incidence matrix of connected network
x1,x2,...
◮ Let ¯
1 µLx − ¯
27 / 35
◮ Discrete-time primal-dual
◮ EXTRA by Shi, Ling, Wu, Yin ‘15
k−1
28 / 35
◮ Discrete-time primal-dual
◮ EXTRA by Shi, Ling, Wu, Yin ‘15
k−1
µL, ˜
2(I + W), dual stepsize αy = α 2µ
k−1
= ¯ yk
◮ Introduce Lyapunov function with ˜
1 2˜
1 2˜
◮ Show ˙
◮ Convex → asymptotic convergence
29 / 35
30 / 35
◮ Linear system G :
30 / 35
◮ Linear system G :
µT T
30 / 35
◮ f − mf 2 ˜
◮ Lf Lipschitz continuous gradient of convex function
31 / 35
◮ Linear convergence
32 / 35
33 / 35
33 / 35
33 / 35
◮ Monitor targets and stay near neighbors
x
34 / 35
35 / 35
36 / 35
µ T ˜
37 / 35
µ T ˜
37 / 35
mλi µ
ˆ mˆ µ2λi µ
µ
f + µmf
38 / 35
39 / 35
◮ Distributed information exchange over edges zij
◮ Want nodes to compute average, ψi(t) → 1 nψi(0)
40 / 35
◮ If Lp is balanced, nodes approach average
41 / 35
◮ If Lp + Lc is balanced, nodes approach average
◮ F(z) = Lc is graph Laplacian of added edges z
41 / 35
◮ For each node ψi, in-degree equals out-degree, j zij = j zji
◮ Linear constraint on added edges Ez = 0 ◮ z = Tx parametrizes balanced graphs, Ez = E(Tx) = 0
42 / 35
1 L1 = 0,
1 = 1 √ 3
2 L2 = 0,
2 = 1 √ 5
◮ Weighted avg. doesn’t ‘move’, i.e.,
43 / 35
x
◮ H2 norm of deviations from average and control effort ◮ Nonconvex
◮ Balanced Lc ◮ Minimize number of edges
44 / 35