Limited memory Kelleys Method Converges for Composite Convex and - - PowerPoint PPT Presentation

limited memory kelley s method converges for composite
SMART_READER_LITE
LIVE PREVIEW

Limited memory Kelleys Method Converges for Composite Convex and - - PowerPoint PPT Presentation

Limited memory Kelleys Method Converges for Composite Convex and Submodular Objectives Madeleine Udell Operations Research and Information Engineering Cornell University Song Zhou (Cornell), Swati Gupta (Georgia Tech) NeurIPS, December 2018


slide-1
SLIDE 1

Limited memory Kelley’s Method Converges for Composite Convex and Submodular Objectives

Madeleine Udell Operations Research and Information Engineering Cornell University Song Zhou (Cornell), Swati Gupta (Georgia Tech) NeurIPS, December 2018

1 / 11

slide-2
SLIDE 2

Problem to solve

minimize g(x) + f (x)

◮ g : Rn → R strongly convex ◮ f : Rn → R Lov´

asz extension of submodular function F

◮ piecewise linear ◮ convex envelope of F ◮ generically, exponentially many linear pieces

L-KM solves composite convex + submodular problems whose natural size is exponential with linear memory.

2 / 11

slide-3
SLIDE 3

Submodular optimization background

◮ Ground set V = {1, n}. ◮ F : 2V → R is submodular if for all A, B ⊆ V ,

F(A) + F(B) ≥ F(A ∪ B) + F(A ∩ B)

◮ the base polytope of F is

B(F) = {w ∈ Rn : w(V ) = F(V ), w(A) ≤ F(A), ∀ A ⊆ V }

◮ the Lov´

asz extension of F is the homogeneous piecewise linear convex function f (x) = max

w ∈ B(F) w⊤x ◮ linear optimization over B(F) is easy ◮ =

⇒ evaluating f (x) and ∂f (x) is easy

3 / 11

slide-4
SLIDE 4

Original Simplicial Method (OSM) [Bach 2013]

Intuition:

◮ approximate f with pwl function whose values and

(sub)gradients match f at all previous iterates

◮ minimize approximation to determine the next iterate

Advantages: Finite convergence [Bach 2013] Drawbacks:

◮ Memory. memory |V(i)| = i grows with iteration counter i ◮ Computation. subproblem size grows with memory ◮ Convergence rate. no known rate of convergence [Bach

2013]

4 / 11

slide-5
SLIDE 5

Limited Memory Kelley’s Method (L-KM)

Algorithm 1 L-KM (to minimize g(x) + f (x)) initialize V = ∅ affinely independent. repeat

  • 1. define ˆ

f (x) = maxw∈V w⊤x

  • 2. solve subproblem

ˆ x ← argmin g(x) + ˆ f (x)

  • 3. compute v ∈ ∂f (ˆ

x) = argmaxw∈B(F) ˆ x⊤w

  • 4. V ← {w ∈ V : w⊤x = f (ˆ

x)} ∪ v unlike OSM, L-KM drops subgradients w ∈ V that are not tight at current iterate

5 / 11

slide-6
SLIDE 6

L-KM: example z(0) x(1)

g + f(1) g + f

(1)

6 / 11

slide-7
SLIDE 7

L-KM: example z(1) x(2)

g + f(2) g + f

(2)

6 / 11

slide-8
SLIDE 8

L-KM: example z(2) x(3)

g + f(3) g + f

(3)

6 / 11

slide-9
SLIDE 9

L-KM: example

6 / 11

slide-10
SLIDE 10

Properties of L-KM

◮ Limited memory: In L-KM, for all i ≥ 0, vectors in V(i)

are affinely independent. Moreover, |V(i)| ≤ n + 1.

◮ Finite convergence: When g is strongly convex, L-KM

converges finitely.

◮ Linear convergence: When g is smooth and strongly

convex, the duality gap of L-KM and OSM converges linearly to 0.

7 / 11

slide-11
SLIDE 11

Limited-memory Fully Corrective Frank Wolfe L-FCFW

Algorithm 2 L-FCFW (to minimize −g∗(−y) over y ∈ B(F)) initialize V = ∅ affinely independent. repeat

  • 1. solve subproblem

minimize −g∗(−y) subject to y ∈ conv(V) do convex decomposition of the solution ˆ y =

w∈V λww

with λw ≥ 0 and

w∈V λw = 1

  • 2. compute gradient ˆ

x = ∇(−g∗(−ˆ y))

  • 3. solve linear optimization v = argmaxw∈B(F) ˆ

x⊤w

  • 4. V ← {w ∈ V : λw > 0} ∪ v

8 / 11

slide-12
SLIDE 12

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-13
SLIDE 13

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-14
SLIDE 14

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-15
SLIDE 15

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-16
SLIDE 16

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-17
SLIDE 17

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-18
SLIDE 18

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-19
SLIDE 19

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-20
SLIDE 20

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-21
SLIDE 21

Fully corrective Frank-Wolfe

−∇g(w(0)) −∇g(w(1)) −∇g(w(2)) v1 v2 v3 v4 v5 v1 v3 v4 v5 v3 v4 v5 w(0) w(1) w(2) w(2) minimize −g∗(−w) subject to w ∈ B(F)

9 / 11

slide-22
SLIDE 22

Properties of L-FCFW

◮ Limited memory: By Carath´

eodory’s theorem, we can choose ≤ n + 1 active vertices to represent the current iterate.

◮ Linear Convergence [Lacoste-Julien and Jaggi, 2015]:

When g is smooth and strongly convex, the duality gap of L-FCFW converges linearly to 0.

◮ Duality: Two algorithms are dual if their iterates solve dual

  • subproblems. If g is smooth and strongly convex and

◮ B(i) = {w ∈ V(i−1) : λw > 0}, L-FCFW is dual to L-KM. ◮ B(i) = V(i−1), L-FCFW is dual to OSM. 10 / 11

slide-23
SLIDE 23

Summary

L-KM solves composite convex + submodular problems whose natural size is exponential with linear memory.

◮ S. Zhou, S. Gupta, and M. Udell. Limited Memory Kelley’s

Method Converges for Composite Convex and Submodular

  • Objectives. NIPS 2018.

◮ 5–7pm Room 210 Poster #16

11 / 11