First-order methods
(Optml++ Meeting 2)
First-order methods (Optml++ Meeting 2) Suvrit Sra Massachusetts - - PowerPoint PPT Presentation
First-order methods (Optml++ Meeting 2) Suvrit Sra Massachusetts Institute of Technology OPTML++, Fall 2015 Outline Lect 1: Recap on convexity Lect 1: Recap on duality, optimality First-order optimization algorithms Proximal
(Optml++ Meeting 2)
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 2 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 3 / 23
x∗ ∇f(x∗) = 0 xk xk+1 . . .
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 3 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 4 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 4 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 4 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 4 / 23
1 Start with some guess x0; 2 For each k = 0, 1, . . .
xk+1 ← xk + αkdk Check when to stop (e.g., if ∇f(xk+1) = 0)
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 5 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 6 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 6 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 6 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 6 / 23
ii ≈
(∂xi)2
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 7 / 23
ii ≈
(∂xi)2
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 7 / 23
α≥0
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 8 / 23
α≥0
0≤α≤s
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 8 / 23
α≥0
0≤α≤s
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 8 / 23
α≥0
0≤α≤s
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 8 / 23
α≥0
0≤α≤s
k αk = ∞.
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 8 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 9 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 9 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 9 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 9 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 10 / 23
1 2Ax − b2 + x ≥ 0
Machine learning Statistics Image Processing Computer Vision Medical Imaging Astronomy Physics Bioinformatics Remote Sensing Engineering Inverse problems Finance
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 11 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 12 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 13 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 13 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 14 / 23
5 10 15 20 25 30 35 40 10
−4
10
−3
10
−2
10
−1
10 10
1
10
2
Running time (seconds) Objective function value Naive BB+Projxn
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 15 / 23
5 10 15 20 25 30 35 40 10
−4
10
−3
10
−2
10
−1
10 10
1
10
2
Running time (seconds) Objective function value Naive BB+Projxn Naive BB + Linesearch
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 16 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 17 / 23
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 17 / 23
5 10 15 20 25 30 35 40 10
−10
10
−8
10
−6
10
−4
10
−2
10 Running time (seconds) Objective function value Naive BB+Projxn Naive BB + Linesearch SBB
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 18 / 23
Optimization for ML and beyond: OPTML++ 19 / 23
Optimization for ML and beyond: OPTML++ 19 / 23
L
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 20 / 23
L
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 20 / 23
L
2y − x2 2
L and
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 20 / 23
L,µ
2x − y2 2
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 21 / 23
L,µ, 0 < α < 2/(L + µ), then the gradient
2 ≤
2,
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 22 / 23
2(n − 1), there is a smooth f, s.t.
2
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 23 / 23
2(n − 1), there is a smooth f, s.t.
2
L,µ (µ > 0, κ > 1)
2.
Suvrit Sra (MIT) Optimization for ML and beyond: OPTML++ 23 / 23