Revisiting Frank-Wolfe
Martin Jaggi Ecole Polytechnique Smile in Paris Seminar 2013 / 01 / 24 Projection-Free Sparse Convex Optimization [ Paper ]Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization - - PowerPoint PPT Presentation
Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization - - PowerPoint PPT Presentation
Revisiting Frank-Wolfe Projection-Free Sparse Convex Optimization Martin Jaggi Ecole Polytechnique Smile in Paris Seminar 2013 / 01 / 24 [ Paper ] Constrained Convex Optimization D R d f ( x ) x D f ( x ) min x D R d f ( x ) x
- n
r
linear Droblems a r e suggested. 1 . INTRODUCTION The problem of maximizing a concave quadratic function whose variables are subject to linear inequality constraints has been the subject of several recent studies, from both the com- putational side and the theoretical ( s e e Bibliography). Our aim h e r e has been to develop a method for solving this non-linear programming problem which should be particularly well adapted to high-speed machine computation. The quadratic programming problem as such, called PI, is set forth in Section 2. We find in Section 3 that with the aid of generalized Lagrange multipliers the'solutions- f PI can be exhibited in a simple way as parts of the solutions of a new quadratic programming
- bjective function
- large. The point at which the objective is maximized for the segment joining the initial and
- f
- f generalized Lagrange multipliers.
The Linearized Problem
min
s02D f(x) +
⌦ s0 x, rf(x) ↵
s
D ⊂ Rd
f(x)
x
Algorithm 1 Frank-Wolfe Let x(0) 2 D for k = 0 . . . K do Compute s := arg min
s02D
⌦ s0, rf(x(k)) ↵ Let γ :=
2 k+2
Update x(k+1) := (1 γ)x(k) + γs end for
The Linearized Problem
D ⊂ Rd
min
s02D f(x) +
⌦ s0 x, rf(x) ↵
f(x)
x
rf(x)
Frank-Wolfe Cost per step Sparse Solutions (approx.) solve linearized problem on D
✓
(in terms of used vertices)
Gradient Descent Projection back to D
✗
Algorithm 1 Frank-Wolfe for k = 0 . . . K do Compute s := arg min
s02D
⌦ s0, rf(x(k)) ↵ Let γ :=
2 k+2
Update x(k+1) := (1 γ)x(k) + γs end for
- Approximate
Subproblems
[ Dunn et al. 1978 ]
- Away-Steps
[ GuéLat et al. 1986 ]
Algorithm Variants
Line-Search
Algorithm 2 Frank-Wolfe for k = 0 . . . K do Compute s := arg min
s02D
⌦ s0, rf(x(k)) ↵ Optimize γ by line-search Update x(k+1) := (1 γ)x(k) + γs end for
Fully Corrective
Algorithm 3 Frank-Wolfe for k = 0 . . . K do Compute s := arg min
s02D
⌦ s0, rf(x(k)) ↵ Update x(k+1) := arg min
x2conv(s(0),...,s(k+1))
f(x) end for
- Primal-Dual Analysis
- Approximate Subproblems
- Affine Invariance
- Optimality in Terms of Sparsity
- More Applications
What’s new?
Convergence Analysis
Primal Convergence: Algorithms obtain after steps. k f(x(k)) − f(x∗) ≤ O 1 k- Primal-Dual Convergence:
- [ Clarkson 2008, J. 2013 ]
Affine Invariance
rf(x) s x D ⊂ Rd x s rf(x) D ⊂ RdOptimization over Atomic Sets
atoms A min x∈D f(x) D := conv (A) [ Chandrasekaran et al. 2012 ] Fact: Any linear function will attain its minimum over at an atom s ∈ A D- lower bound:
- Sparse Approximation
Sparse Approximation
`1-ball D := conv ({±ei | i ∈ [n]}) min kxk11 f(x) Greedy Algorithms in Signal Processing: Equivalent to (Orthogonal) Matching Pursuit kDx yk2 2 lower bound: Ω 1 k- Trade-Off:
Low Rank Approximation
D := conv ⇣n uvT- u2Rn, kuk2=1
- ⌘
- approx. top singular vector
- lower bound:
- norm problems
`p
min kxkp1 f(x) Projection: unknown? FW-step: linear time- ball
- y(g)
- ⇤
- N
- N
- f unit trace
- N
- f bounded diagonal
- N
“Factorized Matrix Norms”
D := conv ⇣n uvT- u2Rn, kuk2=1
- ⌘
- u∈Aleft
- ⌘
Extensions
- Faster Convergence for Strongly Convex f
- Penalized
- Block-Wise
- Submodular Minimization
Open Research Questions
- Faster Convergence for Strongly Convex f
- Penalized
- Non-Smooth f
- More Connections with Sparse Recovery?
- Find More Applications!
Thanks
Supplementary Slides
- J. PhD Thesis
- f bounded trace
Matrix Factorizations
The Netflix challenge: 17k Movies 500k Customers 100M Observed Entries (≈ 1%) for recommender systems Movies Customers ≈ UV T = ⎫ ⎬k ⎭ v(1) v(k) u(1) u(k) = Y min kXk∗t kX Y k2 Ω