Sparse Convex Optimization Methods
PhD Defense Talk 2011 / 10 / 04 Martin Jaggi
for Machine Learning
Examiner: Emo Welzl Co-Examiners: Bernd Gärtner, Elad Hazan, Joachim Giesen, Joachim Buhmann
Sparse Convex Optimization Methods for Machine Learning PhD - - PowerPoint PPT Presentation
Sparse Convex Optimization Methods for Machine Learning PhD Defense Talk 2011 / 10 / 04 Martin Jaggi Examiner: Co-Examiners: Emo Welzl Bernd Grtner, Elad Hazan, Joachim Giesen, Joachim Buhmann Convex Optimization D R n f ( x ) x
PhD Defense Talk 2011 / 10 / 04 Martin Jaggi
Examiner: Emo Welzl Co-Examiners: Bernd Gärtner, Elad Hazan, Joachim Giesen, Joachim Buhmann
x∈D f(x)
x∈D f(x)
x∈D f(x)
x∈D f(x)
D ⊂ Rn
x
f(x)
The Linearized Problem min
y∈D f(x) + hy x, dxi
Algorithm 1 Greedy on a Compact Convex Set Pick an arbitrary starting point x(0) ⇤ D for k = 0 . . . ⇥ do Let dx ⇤ ∂f(x(k)) be a subgradient to f at x(k) Compute s := approx arg min
y∈D
⌅y, dx⇧ Let α :=
2 k+2
Update x(k+1) := x(k) + α(s x(k)) end for
Theorem: Algorithm obtains accuracy after steps.
O 1
k
D ⊂ Rn
x
f(x)
The Linearized Problem min
y∈D f(x) + hy x, dxi
Our Method Gradient Descent Cost per step Convergence Sparse / Low Rank Solutions
linearized problem on D Projection back to D
✓
(depending on the domain)
✗ 1/k 1/k dx
History & Related Work
Domain Frank & Wolfe 1956 Dunn 1978, 1980 Zhang 2003 Clarkson 2008, 2010 Hazan 2008
linear inequality constraints general bounded convex domain convex hulls unit simplex semidefinite matrices
general bounded convex domain Known Stepsize Approx. Subproblem Primal-Dual Guarantee
✗ ✗ ✗ ✗ ✓ ✗ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✓ ✓ ✓ ✓
unit simplex
min
x∈∆n f(x)
Sparsity as a function of the approximation quality
“Coresets”
1
for k = 0 . . . ∞ do Let dx ∈ ∂f(x(k)) be a subgradient to f at x(k) Compute i := arg mini (dx)i Let α :=
2 k+2
Update x(k+1) := x(k) + α(ei − x(k)) end for
[ Clarkson SODA '08 ]
Corollary: Algorithm gives an -approximate solution of sparsity .
O 1
ε
D := conv({ei | i ∈ [n]}) Ω 1
ε
(such as Support Vector Machines, -loss)
`2 min
x∈∆n xT (K + t1)x
Variance Portfolio Optimization min
x∈∆n xT Cx − t · bT x
min
kxk11 f(x)
Ω 1
ε
Sparsity as a function of the approximation quality
“Coresets”
Corollary: Algorithm gives an -approximate solution of sparsity .
O 1
ε
for k = 0 . . . ∞ do Let dx ∈ @f(x(k)) be a subgradient to f at x(k) Compute i := arg maxi |(dx)i|, and let s := ei · sign ((−dx)i) Let ↵ :=
2 k+2
Update x(k+1) := x(k) + ↵(s − x(k)) end for
D := conv({±ei | i ∈ [n]})
Sparse Recovery
`1 min
kxk1t kAx bk2 2
spectahedron
X 2 Symn×n X ⌫ 0 Tr(X) = 1 =
[ Hazan LATIN '08 ]
Corollary: Algorithm gives an -approximate solution of rank .
O 1
ε
for k = 0 . . . 1 do Let D
X 2 ∂f(X(k)) be a subgradient to f at X(k)
Let α :=
2 k+2
Compute v := v(k) = ApproxEV (D
X, αCf)
Update X(k+1) := X(k) + α(vvT X(k)) end for
D := conv(
v2Rn,
kvk2=1
) Ω 1
ε
min
x∈D f(x)
Low-Rank Matrix Recovery min
kXk∗t f(X)
1 3 4 1 2 3 5 3 2 2 1 3
The Netflix challenge: 17k Movies 500k Customers 100M Observed Entries (≈ 1%)
for recommender systems
Movies Customers
≈ UV T
=
⎫ ⎬k ⎭ v(1)
v(k)
u(1) u(k)
= Y
min
U,V
X
(i,j)∈Ω
(Yij (UV T )ij)2 s.t. kUk2
F ro + kV k2 F ro = t
Sulovský
UU T UV T V U T V V T
1 2 4 1 2 3 5 3 2 2 1 3
1 2 4 1 2 3 5 3 2 2 1 3
=: X
min
X⌫0 f(X)
s.t. Tr(X) = t
Is equivalent to: [ J, Sulovský ICML 2010 ]
D ⊂ Rn
gap(x) The Problem
x∈D f(x)
Weak Duality ω(x) ≤ f(x⇤) ≤ f(x0) The Dual ω(x) := min
y∈D f(x) + hy x, dxi
ω(x)
The Parameterized Problem
gt(x) ≤ ε
2
“Better than necessary” gt0(x) ≤ ε “Still good enough” gt0(x) − gt(x) ≤ ε
2
100 200
ft(x∗
t )
t
t0
ωt0(x)
ft0(x)
⇐ |t0 − t| ≤ ε · Pf
“Continuity in the parameter”
min
x∈D ft(x)
There are many intervals
Theorem:
ε
O 1
ε
min
x∈∆n xT (K + t1)x
Variance Portfolio Optimization
min
x∈∆n xT Cx − t · bT x
test accuracy
t
ionosphere breast-cancer
Bernd Gärtner Joachim Giesen Soeren Laue Marek Sulovský
co-authors:
100 200
ft(x∗
t )
t t0
ωt0(x)
ft0(x)
D ⊂ Rn x
f(x)
3D visualization:
Robert Carnecky