Revisiting Frank-Wolfe
Martin Jaggi Ecole Polytechnique ICML Spotlight Presentation, 2013 / 06 / 19
Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization - - PowerPoint PPT Presentation
Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization Martin Jaggi Ecole Polytechnique ICML Spotlight Presentation, 2013 / 06 / 19 Constrained Convex Optimization D R d f ( x ) Constrained Convex Optimization min x D
Martin Jaggi Ecole Polytechnique ICML Spotlight Presentation, 2013 / 06 / 19
Constrained Convex Optimization
f(x)
x
Constrained Convex Optimization
min
x∈D f(x)
f(x)
x
An iterative algorithm
f(x)
x
f(x)
x
The Linearized Problem
min
s02D f(x) +
⌦ s0 x, rf(x) ↵
s
D ⊂ Rd
f(x) x
Algorithm 1 Frank-Wolfe Let x(0) 2 D for k = 0 . . . K do Compute s := arg min
s02D
⌦ s0, rf(x(k)) ↵ Let γ :=
2 k+2
Update x(k+1) := (1 γ)x(k) + γs end for
O 1
k
The Linearized Problem
min
s02D f(x) +
⌦ s0 x, rf(x) ↵
s
D ⊂ Rd
f(x) x
Algorithm 1 Frank-Wolfe Let x(0) 2 D for k = 0 . . . K do Compute s := arg min
s02D
⌦ s0, rf(x(k)) ↵ Let γ :=
2 k+2
Update x(k+1) := (1 γ)x(k) + γs end for
A N ALGORITHM FOR QUADRATIC PROGRAMMING
Marguerite Frank and P h i l i p Wolfel
Pr in
ce t o n Un i v e r s i t
y
A finite iteration method for calculating the solution of quadratic Extensions to more general non- programming problems is described.
linear Droblems a r e suggested.
INTRODUCTION problem of maximizing a concave quadratic function whose variables are constraints has been t h e subject of several recent studies, from theoretical
(see
Bibliography). Our aim here has been to programming problem which should be particularly called PI, is set forth Lagrange multipliers the'solutions quadratic programming
1 9 5 6
D ⊂ Rd
f(x) x
rf(x)
Frank-Wolfe Gradient Descent Iteration cost Iterates (approx.) solve linearized problem on D projection back to D sparse ✓
(in terms of used vertices)
dense ✗
D ⊂ Rd
f(x) x
g(x)
Primal Rate
f(x(k)) f(x⇤) 2Cf k + 2(1
Holds for all algorithm variants
g(x(ˆ
k)) 7Cf
k + 2
Primal-Dual
primal-dual analysis
with certificates for approximation quality
and inexact gradients (and domains)
Some Atomic Domains Suitable for Frank-Wolfe
X Optimization Domain Complexity of one Frank-Wolfe Iteration Atoms A D = conv(A) sups2Dhs, yi Complexity Rn Sparse Vectors k.k1-ball kyk1 O(n) Rn Sign-Vectors k.k1-ball kyk1 O(n) Rn `p-Sphere k.kp-ball kykq O(n) Rn Sparse Non-neg. Vectors Simplex ∆n maxi{yi} O(n) Rn Latent Group Sparse Vec. k.kG-ball maxg2G
g
P
g2G |g|
Rm⇥n Matrix Trace Norm k.ktr-ball kykop = 1(y) ˜ O
f/
p "0 (Lanczos) Rm⇥n Matrix Operator Norm k.kop-ball kyktr = k(i(y))k1 SVD Rm⇥n Schatten Matrix Norms k(i(.))kp-ball k(i(y))kq SVD Rm⇥n Matrix Max-Norm k.kmax-ball ˜ O
f(n + m)1.5/"02.5
Rn⇥n Permutation Matrices
Birkhoff polytope
O(n3) Rn⇥n Rotation Matrices SVD (Procrustes prob.) Sn⇥n
Rank-1 PSD matrices
{x⌫0, Tr(x)=1}
max(y) ˜ O
f/
p "0 (Lanczos) Sn⇥n
PSD matrices
{x⌫0, xii1}
˜ O
f n1.5/"02.5
Table 1: Some examples of atomic domains suitable for optimization using the Frank-Wolfe algorithm. Here SVD refers to the complexity of computing a singular value decomposition, which is O(min{mn2, m2n}). N
f is the number of non-zero entries in the gradient of the objective func-
tion f, and "0 = 2δCf
k+2 is the required accuracy for the linear subproblems. For any p 2 [1, 1],
the conjugate value q is meant to satisfy 1 + 1 = 1, allowing q = 1 for p = 1 and vice versa.
D := conv(A)
D := conv ⇣n uvT
v∈Aright
Aleft ⊂ Rn Aright ⊂ Rm
D := conv ⇣n uvT
v2Rm, kvk2=1
(trace norm) Example: