Accelerated primal-dual methods for linearly constrained convex - - PowerPoint PPT Presentation

accelerated primal dual methods for linearly constrained
SMART_READER_LITE
LIVE PREVIEW

Accelerated primal-dual methods for linearly constrained convex - - PowerPoint PPT Presentation

Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize F ( x ) := f ( x ) + g ( x ) x f


slide-1
SLIDE 1

Accelerated primal-dual methods for linearly constrained convex problems

Yangyang Xu SIAM Conference on Optimization May 24, 2017

1 / 23

slide-2
SLIDE 2

Accelerated proximal gradient

For convex composite problem: minimize

x

F(x) := f(x) + g(x)

  • f: convex and Lipschitz differentiable
  • g: closed convex (possibly nondifferentiable) and simple

Proximal gradient: xk+1 = arg min

x

∇f(xk), x + Lf 2 x − xk2 + g(x)

  • convergence rate: F(xk) − F(x∗) = O(1/k)

Accelerated Proximal gradient [Beck-Teboulle’09, Nesterov’14]: xk+1 = arg min

x

∇f(ˆ xk), x + Lf 2 x − ˆ xk2 + g(x)

  • ˆ

xk: extrapolated point

  • convergence rate (with smart extrapolation): F(xk) − F(x∗) = O(1/k2)

This talk: ways to accelerate primal-dual methods

2 / 23

slide-3
SLIDE 3

Part I: accelerated linearized augmented Lagrangian

3 / 23

slide-4
SLIDE 4

Affinely constrained composite convex problems

minimize

x

F(x) = f(x) + g(x), subject to Ax = b (LCP)

  • f: convex and Lipschitz differentiable
  • g: closed convex and simple

Examples

  • nonnegative quadratic programming: f = 1

2x⊤Qx + c⊤x, g = ιRn

+

  • TV image denoising: min{ 1

2X − B2 F + λY 1, s.t. D(X) = Y }

4 / 23

slide-5
SLIDE 5

Augmented Lagrangian method (ALM)

At iteration k, xk+1 ← arg min

x

f(x) + g(x) − λk, Ax + β 2 Ax − b2, λk+1 ← λk − γ(Axk+1 − b)

  • augmented dual gradient ascent with stepsize γ
  • β: penalty parameter; dual gradient Lipschitz constant 1/β
  • 0 < γ < 2β: convergence guaranteed
  • also popular for (nonlinear, nonconvex) constrained problems

x-subproblem as difficult as original problem

5 / 23

slide-6
SLIDE 6

Linearized augmented Lagrangian method

  • Linearize the smooth term f:

xk+1 ← arg min

x

∇f(xk), x + η 2 x − xk2 + g(x) − λk, Ax + β 2 Ax − b2.

  • Linearize both f and Ax − b2:

xk+1 ← arg min

x

∇f(xk), x + g(x) − λk, Ax + βA⊤rk, x + η 2 x − xk2, where rk = Axk − b is the residual.

Easier updates and nice convergence speed O(1/k)

6 / 23

slide-7
SLIDE 7

Accelerated linearized augmented Lagrangian method

At iteration k, ˆ xk ← (1 − αk)¯ xk + αkxk, xk+1 ← arg min

x

∇f(ˆ xk) − A⊤λk, x + g(x) + βk 2 Ax − b2 + ηk 2 x − xk2, ¯ xk+1 ← (1 − αk)¯ xk + αkxk+1, λk+1 ← λk − γk(Axk+1 − b).

  • Inspired by [Lan ’12] on accelerated stochastic approximation
  • reduces to linearized ALM if αk = 1, βk = β, ηk = η, γk = γ, ∀k
  • convergence rate: O(1/k) if η ≥ Lf and 0 < γ < 2β
  • adaptive parameters to have O(1/k2) (next slides)

7 / 23

slide-8
SLIDE 8

Better numerical performance

Objective error Feasibility Violation

200 400 600 800 1000 10

−6

10

−5

10

−4

10

−3

10

−2

10

−1

10

Iteration numbers |objective minus optimal value|

Nonaccelerated ALM Accelerated ALM 200 400 600 800 1000 10

−10

10

−8

10

−6

10

−4

10

−2

10

Iteration numbers violation of feasibility

Nonaccelerated ALM Accelerated ALM

  • Tested on quadratic programming (subproblems solved exactly)
  • Parameters set according to theorem (see next slide)
  • Accelerated ALM significantly better

8 / 23

slide-9
SLIDE 9

Guaranteed fast convergence

Assumptions:

  • There is a pair of primal-dual solution (x∗, λ∗).
  • ∇f is Lipschitz continuous: ∇f(x) − ∇f(y) ≤ Lfx − y

Convergence rate of order O(1/k2):

  • Set parameters to

∀k : αk = 2 k + 1 , γk = kγ, βk ≥ γk 2 , ηk = η k , where γ > 0 and η ≥ 2Lf. Then |F(¯ xk+1) − F(x∗)| ≤ 1 k(k + 1)

  • ηx1 − x∗2 + 4λ∗2

γ

  • ,

A¯ xt+1 − b ≤ 1 k(k + 1) max(1, λ∗)

  • ηx1 − x∗2 + 4λ∗2

γ

  • ,

9 / 23

slide-10
SLIDE 10

Sketch of proof

Let Φ(¯ x, x, λ) = F(¯ x) − F(x) − λ, A¯ x − b.

  • 1. Fundamental inequality (for any λ):

Φ(¯ xk+1, x∗, λ) − (1 − αk)Φ(¯ xk, x∗, λ) ≤− αkηk

2

  • xk+1 − x∗2 − xk − x∗2 + xk+1 − xk2

+

α2

kLf

2

xk+1 − xk2 + αk

2γk

  • λk − λ2 − λk+1 − λ2 + λk+1 − λk2

− αkβk

γ2

k

λk+1 − λk2,

  • 2. αk =

2 k+1 , γk = kγ, βk ≥ γk 2 , ηk = η k and multiply k(k + 1) to the above ineq.:

k(k + 1)Φ(¯ xk+1, x∗, λ) − k(k − 1)Φ(¯ xk, x∗, λ) ≤ − η xk+1 − x∗2 − xk − x∗2 + 1 γ

  • λk − λ2 − λk+1 − λ2

.

  • 3. Set λ1 = 0 and sum the above inequality over k:

Φ(¯ xk+1, x∗, λ) ≤ 1 k(k + 1)

  • ηx1 − x∗2 + 1

γ λ2

  • 4. Take λ = max (1 + λ∗, 2λ∗)

A¯ xk+1−b A¯ xk+1−b and use the optimality condition

Φ(¯ x, x∗, λ∗) ≥ 0 ⇒ F(¯ xk+1) − F(x∗) ≥ −λ∗ · A¯ xk+1 − b

10 / 23

slide-11
SLIDE 11

Literature

  • [He-Yuan ’10]: accelerated ALM to O(1/k2) for smooth problems
  • [Kang et. al ’13]: accelerated ALM to O(1/k2) for nonsmooth problems
  • [Huang-Ma-Goldfarb ’13]: accelerated linearized ALM (with linearization of

augmented term) to O(1/k2) for strongly convex problems

11 / 23

slide-12
SLIDE 12

Part II: accelerated linearized ADMM

12 / 23

slide-13
SLIDE 13

Two-block structured problems

Variable is partitioned into two blocks, smooth part involves one block, and nonsmooth part is separable minimize

y,z

h(y) + f(z) + g(z), subject to By + Cz = b (LCP-2)

  • f convex and Lipschitz differentiable
  • g and h closed convex and simple

Examples:

  • Total-variation regularized regression:

min

y,z λy1 + f(z), s.t. Dz = y

13 / 23

slide-14
SLIDE 14

Alternating direction method of multipliers (ADMM)

At iteration k, yk+1 ← arg min

y

h(y) − λk, By + β 2 By + Czk − b2, zk+1 ← arg min

z

f(z) + g(z) − λk, Cz + β 2 Byk+1 + Cz − b2, λk+1 ← λk − γ(Byk+1 + Czk+1 − b)

  • 0 < γ < 1+

√ 5 2

β: convergence guaranteed [Glowinski-Marrocco’75]

  • updating y, z alternatingly: easier than jointly update
  • but z-subproblem can still be difficult

14 / 23

slide-15
SLIDE 15

Accelerated linearized ADMM

At iteration k, yk+1 ← arg min

y

h(y) − λk, By + βk 2 By + Czk + −b2, zk+1 ← arg min

z

∇f(zk) − C⊤λk + βkC⊤rk+ 1

2 , z + g(z) + ηk

2 z − zk2, λk+1 ← λk − γk(Byk+1 + Czk+1 − b) where rk+ 1

2 = Byk+1 + Czk − b.

  • reduces to linearized ADMM if βk = β, ηk = η, γk = γ, ∀k
  • convergence rate: O(1/k) if 0 < γ ≤ β and η ≥ Lf + βC2
  • O(1/k2) if adaptive parameters and strong convexity on z (next two slides)

15 / 23

slide-16
SLIDE 16

Accelerated convergence speed

Assumptions:

  • Existence of a pair of primal-dual solution (y∗, z∗, λ∗)
  • ∇f Lipschitz continuous: ∇f(ˆ

z) − ∇f(˜ z) ≤ Lfˆ z − ˜ z

  • f strongly convex with modulus µf (not required for y)

Convergence rate of order O(1/k2)

  • Set parameters as follows (with γ > 0 and γ < η ≤ µf/2)

∀k : βk = γk = (k + 1)γ, ηk = (k + 1)η + Lf, Then max zk − z∗2, |F(¯ yk, ¯ zk) − F ∗|, B¯ yk + C¯ zk − b ≤ O(1/k2), where F(y, z) = h(y) + f(z) + g(z) and F ∗ = F(y∗, z∗).

16 / 23

slide-17
SLIDE 17

Sketch of proof

  • 1. Fundamental inequality from optimality conditions of each iterate:

F(yk+1, zk+1) − F(y, z) − λ, Byk+1 + Czk+1 − b ≤− 1

γk (λk − λk+1), λ − λk + βk γk (λk − λk+1) − βkC(zk+1 − zk)

+

Lf 2 zk+1 − zk2 − µf 2 zk − z2 − ηkzk+1 − z, zk+1 − zk,

  • 2. Plug in parameters and bound cross terms:

F(yk+1, zk+1) − F(y∗, z∗) − λ, Byk+1 + Czk+1 − b + 1

2

  • η(k + 1)zk+1 − z∗2 + Lfzk+1 − z∗2

+

1 2γ(k+1) λ − λk+12

≤ 1

2

  • η(k + 1)zk − z∗2 + (Lf − µf)zk − z∗2

+

1 2γ(k+1) λ − λk2.

  • 3. Multiply k + k0 (here k0 ∼

2Lf µf ) and sum the inequality over k:

F(¯ yk+1, ¯ zk+1) − F(y∗, z∗) − λ, B¯ yk+1 + C¯ zk+1 − b ≤ φ(y∗, z∗, λ) k2

  • 4. Take a special λ and use KKT conditions

17 / 23

slide-18
SLIDE 18

Literature

  • [Ouyang et. al’15]: O(Lf/k2 + C0/k) with only weak convexity
  • [Goldstein et. al’14]: O(1/k2) with strong convexity on both y and z
  • [Chambolle-Pock’11, Chambolle-Pock’16, Dang-Lan’14, Bredies-Sun’16]:

accelerated first-order methods on bilinear saddle-point problems Open question: weakest conditions to have O(1/k2)

18 / 23

slide-19
SLIDE 19

Numerical experiments

(More results in paper)

19 / 23

slide-20
SLIDE 20

Accelerated (linearized) ADMM

Tested problem: total-variation regularized image denoising minimize

X,Y

1 2X − B2

F + µY 1,

subject to DX = Y. (TVDN)

  • B observed noisy Cameraman image, and D finite difference operator

Compared methods:

  • original ADMM
  • accelerated ADMM
  • linearized ADMM
  • accelerated linearized ADMM
  • accelerated Chambolle-Pock

20 / 23

slide-21
SLIDE 21

Performance of compared methods

100 200 300 400 500 10

−8

10

−6

10

−4

10

−2

10 10

2

10

4

Iteration numbers |objective minus optimal value|

Accelerated ADMM Accelerated Linearized ADMM Nonaccelerated ADMM Nonaccelerated Linearized ADMM Chambolle−Pock 10 20 30 40 50 10

−15

10

−10

10

−5

10 10

5

Running time (sec.) |objective minus optimal value|

Accelerated ADMM Accelerated Linearized ADMM Nonaccelerated ADMM Nonaccelerated Linearized ADMM Chambolle−Pock

  • Accelerated (linearized) ADMM significantly better than nonaccelerated one
  • (accelerated) ADMM faster than (accelerated) linearized ADMM regarding

iteration number (but the latter takes less time)

21 / 23

slide-22
SLIDE 22

Conclusions

  • accelerated linearized ALM to O(1/k2) from O(1/k) with merely convexity
  • accelerated (linearized) ADMM to O(1/k2) from O(1/k) with strong

convexity on one block variable

  • performed numerical experiments

22 / 23

slide-23
SLIDE 23

References

  • 1. Y. Xu. Accelerated first-order primal-dual proximal methods for linearly

constrained composite convex programming, SIAM J. Optimization, 2017.

  • 2. T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk. Fast alternating

direction optimization methods, SIAM J. on Imaging Sciences, 2014.

  • 3. B. He and X. Yuan. On the acceleration of augmented Lagrangian method for

linearly constrained optimization, Optimization Online, 2010.

  • 4. B. Huang, S. Ma, and D. Goldfarb. Accelerated linearized Bregman method,

Journal of Scientific Computing, 2013.

  • 5. M. Kang, S. Yun, H. Woo, and M. Kang. Accelerated bregman method for

linearly constrained ℓ1-ℓ2 minimization, Journal of Scientific Computing, 2013.

23 / 23