Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization - - PowerPoint PPT Presentation

▶

Jan 31, 2023 293 likes •418 views

Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization Martin Jaggi Ecole Polytechnique ICML Spotlight Presentation, 2013 / 06 / 19 Constrained Convex Optimization D R d f ( x ) Constrained Convex Optimization min x D

SLIDE 1

Revisiting Frank-Wolfe

Martin Jaggi Ecole Polytechnique ICML Spotlight Presentation, 2013 / 06 / 19

Projection-Free Sparse Convex Optimization

SLIDE 2

D ⊂ Rd

Constrained Convex Optimization

SLIDE 3

D ⊂ Rd

f(x)

Constrained Convex Optimization

min

x∈D f(x)

SLIDE 4

D ⊂ Rd

f(x)

An iterative algorithm

SLIDE 5

D ⊂ Rd

f(x)

SLIDE 6

D ⊂ Rd

f(x)

SLIDE 7

The Linearized Problem

min

s02D f(x) +

⌦ s0 x, rf(x) ↵

D ⊂ Rd

f(x) x

Algorithm 1 Frank-Wolfe Let x(0) 2 D for k = 0 . . . K do Compute s := arg min

s02D

⌦ s0, rf(x(k)) ↵ Let γ :=

2 k+2

Update x(k+1) := (1 γ)x(k) + γs end for

O 1

Convergence:

SLIDE 8

The Linearized Problem

min

s02D f(x) +

⌦ s0 x, rf(x) ↵

D ⊂ Rd

f(x) x

Algorithm 1 Frank-Wolfe Let x(0) 2 D for k = 0 . . . K do Compute s := arg min

s02D

⌦ s0, rf(x(k)) ↵ Let γ :=

2 k+2

Update x(k+1) := (1 γ)x(k) + γs end for

A N ALGORITHM FOR QUADRATIC PROGRAMMING

Marguerite Frank and P h i l i p Wolfel

Pr in

ce t o n Un i v e r s i t

A finite iteration method for calculating the solution of quadratic Extensions to more general non- programming problems is described.

r

linear Droblems a r e suggested.

INTRODUCTION problem of maximizing a concave quadratic function whose variables are constraints has been t h e subject of several recent studies, from theoretical

(see

Bibliography). Our aim here has been to programming problem which should be particularly called PI, is set forth Lagrange multipliers the'solutions quadratic programming

1 9 5 6

SLIDE 9

D ⊂ Rd

f(x) x

rf(x)

Frank-Wolfe Gradient Descent Iteration cost Iterates (approx.) solve linearized problem on D projection back to D sparse ✓

(in terms of used vertices)

dense ✗

Two kinds of first-order methods

SLIDE 10

D ⊂ Rd

f(x) x

Stronger Convergence Results
Affine Invariance
Optimality in Terms of Sparsity

g(x)

Primal Rate

f(x(k)) f(x⇤)  2Cf k + 2(1

Holds for all algorithm variants

Contributions

g(x(ˆ

k))  7Cf

k + 2

Primal-Dual

primal-dual analysis

with certificates for approximation quality

Approximate Subproblems

and inexact gradients (and domains)

SLIDE 11

Some Atomic Domains Suitable for Frank-Wolfe

X Optimization Domain Complexity of one Frank-Wolfe Iteration Atoms A D = conv(A) sups2Dhs, yi Complexity Rn Sparse Vectors k.k1-ball kyk1 O(n) Rn Sign-Vectors k.k1-ball kyk1 O(n) Rn `p-Sphere k.kp-ball kykq O(n) Rn Sparse Non-neg. Vectors Simplex ∆n maxi{yi} O(n) Rn Latent Group Sparse Vec. k.kG-ball maxg2G

y(g)
⇤

g2G |g|

Rm⇥n Matrix Trace Norm k.ktr-ball kykop = 1(y) ˜ O

p "0 (Lanczos) Rm⇥n Matrix Operator Norm k.kop-ball kyktr = k(i(y))k1 SVD Rm⇥n Schatten Matrix Norms k(i(.))kp-ball k(i(y))kq SVD Rm⇥n Matrix Max-Norm k.kmax-ball ˜ O

f(n + m)1.5/"02.5

Rn⇥n Permutation Matrices

Birkhoff polytope

O(n3) Rn⇥n Rotation Matrices SVD (Procrustes prob.) Sn⇥n

Rank-1 PSD matrices

f unit trace

{x⌫0, Tr(x)=1}

max(y) ˜ O

p "0 (Lanczos) Sn⇥n

PSD matrices

f bounded diagonal

{x⌫0, xii1}

˜ O

f n1.5/"02.5

Table 1: Some examples of atomic domains suitable for optimization using the Frank-Wolfe algorithm. Here SVD refers to the complexity of computing a singular value decomposition, which is O(min{mn2, m2n}). N

f is the number of non-zero entries in the gradient of the objective func-

tion f, and "0 = 2δCf

k+2 is the required accuracy for the linear subproblems. For any p 2 [1, 1],

the conjugate value q is meant to satisfy 1 + 1 = 1, allowing q = 1 for p = 1 and vice versa.

Applications

D := conv(A)

SLIDE 12

Factorized Matrix Domains

D := conv ⇣n uvT

u∈Aleft

v∈Aright

Aleft ⊂ Rn Aright ⊂ Rm

D := conv ⇣n uvT

u2Rn, kuk2=1

v2Rm, kvk2=1

(trace norm) Example: