Lecture: Fast Proximal Gradient Methods
http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe’s lecture notes
1/38
Lecture: Fast Proximal Gradient Methods - - PowerPoint PPT Presentation
Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes 1/38 Outline fast proximal gradient method (FISTA) 1 FISTA with
1/38
2/38
3/38
4/38
5/38
6/38
7/38
8/38
9/38
10/38
11/38
12/38
13/38
14/38
15/38
16/38
17/38
18/38
19/38
20/38
21/38
22/38
23/38
24/38
25/38
26/38
27/38
28/38
29/38
30/38
31/38
32/38
33/38
34/38
1
2
35/38
1
2
3
36/38
Main Goal: f(yk) − f(x∗)
≤ (1 − γk) (f(yk−1) − f(x∗))
+Bk. We have: f(x) ∈ C1,1
L
(X); convexity; optimality condition of subproblem. f(yk) ≤ f(zk) + ∇f(zk), yk − zk + L 2 yk − zk2 = (1 − γk)[f(zk) + ∇f(zk), yk−1 − zk] + γk[f(zk) + ∇f(zk), xk − zk] + Lγ2
k
2 xk − xk−12 ≤ (1 − γk)f(yk−1) + γk[f(zk) + ∇f(zk), xk − zk] + Lγ2
k
2 xk − xk−12 Since xk = argminx∈X
2 x − xk−12 2
⇒ ∇f(zk) + βk(xk − xk−1), xk − x ≤ 0, ∀ x ∈ X ⇒ xk−1 − xk, xk − x ≤ 1 βk ∇f(xk), x − xk 1 2 xk − xk−12 = 1 2 xk−1 − x2 − xk−1 − xk, xk − x − 1 2 xk − x2 ≤ 1 2 xk−1 − x2 + 1 βk ∇f(zk), x − xk − 1 2 xk − x2 Note Lγk ≤ βk
37/38
Main inequality: f(yk) − f(x) ≤ (1 − γk)[f(yk−1 − f(x))] + βkγk 2 (xk−1 − x2 − xk − x2) Main estimation: f(yk) − f(x) ≤ Γk(1 − γ1) Γ1 (f(y0) − f(x)) + Γk 2
k
βiγi Γi
(∗) = β1γ1 Γ1 x0 − x2 +
k
Γi − βi−1γi−1 Γi−1
≤ β1γ1 Γ1 x0 − x2 +
k
Γi − βi−1γi−1 Γi−1
X
(here DX = sup
x,y∈X
x − y) Observation: If βkγk
Γk
≥
βk−1γk−1 Γk−1
⇒ (∗) ≤ βkγk
Γk
D2
X ⇒ f(yk) − f(x) ≤ βkγk 2
D2
X
If βkγk
Γk
≤
βk−1γk−1 Γk−1
⇒ (∗) ≤ β1γ1
Γ1
x0 − x2 ⇒ f(yk) − f(x) ≤ Γk
β1γ1 2
x0 − x2
38/38
1
2
3