Solving composite optimization problems, with applications to phase - - PowerPoint PPT Presentation
Solving composite optimization problems, with applications to phase - - PowerPoint PPT Presentation
Solving composite optimization problems, with applications to phase retrieval John Duchi (based on joint work with Feng Ruan) Outline Composite optimization problems Methods for composite optimization Application: robust phase retrieval
SLIDE 1
SLIDE 2
Outline
Composite optimization problems Methods for composite optimization Application: robust phase retrieval Experimental evaluation Large scale composite optimization?
SLIDE 3
What I hope to accomplish today
◮ Investigate problem structures that are not quite convex but still
amenable to elegant solution approaches
◮ Show how we can leverage stochastic structure to turn hard
non-convex problems into “easy” ones
[Keshavan, Montanari, Oh 10; Loh & Wainwright 12]
◮ Consider large scale versions of these problems
SLIDE 4
Composite optimization problems
The problem: minimize
x
f(x) := h(c(x)) where h : Rm → R is convex and c : Rn → Rm is smooth
SLIDE 5
Motivation: the exact penalty
minimize
x
f(x) subject to x ∈ X equivalent (for all large enough λ) to minimize
x
f(x) + λ dist(x, X) dist(x, X)
SLIDE 6
Motivation: the exact penalty
minimize
x
f(x) subject to x ∈ X equivalent (for all large enough λ) to minimize
x
f(x) + λ dist(x, X) dist(x, X)
SLIDE 7
Motivation: the exact penalty
minimize
x
f(x) subject to x ∈ X equivalent (for all large enough λ) to minimize
x
f(x) + λ dist(x, X) dist(x, X)
SLIDE 8
Motivation: the exact penalty
minimize
x
f(x) subject to c(x) = 0 equivalent to (for all large enough λ) minimize
x
f(x) + λ c(x)
[Fletcher & Watson 80, 82; Burke 85]
SLIDE 9
Motivation: the exact penalty
minimize
x
f(x) subject to c(x) = 0 equivalent to (for all large enough λ) minimize
x
f(x) + λ c(x)
- =h(c(x))
where h(z) = λ z
[Fletcher & Watson 80, 82; Burke 85]
SLIDE 10
Motivation: nonlinear measurements and modeling
◮ Have true signal x⋆ ∈ Rn and measurement vectors ai ∈ Rn
SLIDE 11
Motivation: nonlinear measurements and modeling
◮ Have true signal x⋆ ∈ Rn and measurement vectors ai ∈ Rn ◮ Observe nonlinear measurements
bi = φ(ai, x⋆) + ξi, i = 1, . . . , m for φ(·) a nonlinear function but smooth function An objective: f(x) = 1 m
m
- i=1
- φ(ai, x) − bi
2
SLIDE 12
Motivation: nonlinear measurements and modeling
◮ Have true signal x⋆ ∈ Rn and measurement vectors ai ∈ Rn ◮ Observe nonlinear measurements
bi = φ(ai, x⋆) + ξi, i = 1, . . . , m for φ(·) a nonlinear function but smooth function An objective: f(x) = 1 m
m
- i=1
- φ(ai, x) − bi
2 Nonlinear least squares [Nocedal & Wright 06; Plan & Vershynin 15; Oymak &
Soltanolkotabi 16]
SLIDE 13
(Robust) Phase retrieval
[Cand` es, Li, Soltanolkotabi 15]
SLIDE 14
(Robust) Phase retrieval
[Cand` es, Li, Soltanolkotabi 15]
Observations (usually) bi = ai, x⋆2 yield objective f(x) = 1 m
m
- i=1
| ai, x2 − bi|
SLIDE 15
Optimization methods
How do we solve optimization problems?
- 1. Build a “good” but simple local model of f
- 2. Minimize the model (perhaps regularizing)
SLIDE 16
Optimization methods
How do we solve optimization problems?
- 1. Build a “good” but simple local model of f
- 2. Minimize the model (perhaps regularizing)
Gradient descent: Taylor (first-order) model f(y) ≈ fx(y) := f(x) + ∇f(x)T (y − x)
SLIDE 17
Optimization methods
How do we solve optimization problems?
- 1. Build a “good” but simple local model of f
- 2. Minimize the model (perhaps regularizing)
Newton’s method: Taylor (second-order) model f(y) ≈ fx(y) := f(x) + ∇f(x)T (y − x) + (1/2)(y − x)T ∇2f(x)(y − x)
SLIDE 18
Modeling composite problems
Now we make a convex model f(x) = h(c(x))
SLIDE 19
Modeling composite problems
Now we make a convex model f(x) = h( c(x)
- linearize
)
SLIDE 20
Modeling composite problems
Now we make a convex model f(y) ≈ h(c(x) + ∇c(x)T (y − x))
SLIDE 21
Modeling composite problems
Now we make a convex model f(y) ≈ h(c(x) + ∇c(x)T (y − x)
- =c(y)+O(x−y2)
)
SLIDE 22
Modeling composite problems
Now we make a convex model fx(y) := h
- c(x) + ∇c(x)T (y − x)
SLIDE 23
Modeling composite problems
Now we make a convex model fx(y) := h
- c(x) + ∇c(x)T (y − x)
- [Burke 85; Drusvyatskiy, Ioffe, Lewis 16]
SLIDE 24
Modeling composite problems
Now we make a convex model fx(y) := h
- c(x) + ∇c(x)T (y − x)
- Example: f(x) = |x2 − 1|, h(z) = |z| and c(x) = x2 − 1
SLIDE 25
Modeling composite problems
Now we make a convex model fx(y) := h
- c(x) + ∇c(x)T (y − x)
- Example: f(x) = |x2 − 1|, h(z) = |z| and c(x) = x2 − 1
SLIDE 26
Modeling composite problems
Now we make a convex model fx(y) := h
- c(x) + ∇c(x)T (y − x)
- Example: f(x) = |x2 − 1|, h(z) = |z| and c(x) = x2 − 1
SLIDE 27
The prox-linear method [Burke, Drusvyatskiy et al.]
Iteratively (1) form regularized convex model and (2) minimize it xk+1 = argmin
x∈X
- fxk(x) + 1
2α x − xk2
2
- = argmin
x∈X
- h
- c(xk) + ∇c(xk)T (x − xk)
- + 1
2α x − xk2
2
SLIDE 28
The prox-linear method [Burke, Drusvyatskiy et al.]
Iteratively (1) form regularized convex model and (2) minimize it xk+1 = argmin
x∈X
- fxk(x) + 1
2α x − xk2
2
- = argmin
x∈X
- h
- c(xk) + ∇c(xk)T (x − xk)
- + 1
2α x − xk2
2
SLIDE 29
The prox-linear method [Burke, Drusvyatskiy et al.]
Iteratively (1) form regularized convex model and (2) minimize it xk+1 = argmin
x∈X
- fxk(x) + 1
2α x − xk2
2
- = argmin
x∈X
- h
- c(xk) + ∇c(xk)T (x − xk)
- + 1
2α x − xk2
2
- |xk − x⋆| = .3
SLIDE 30
The prox-linear method [Burke, Drusvyatskiy et al.]
Iteratively (1) form regularized convex model and (2) minimize it xk+1 = argmin
x∈X
- fxk(x) + 1
2α x − xk2
2
- = argmin
x∈X
- h
- c(xk) + ∇c(xk)T (x − xk)
- + 1
2α x − xk2
2
- |xk − x⋆| = .024
SLIDE 31
The prox-linear method [Burke, Drusvyatskiy et al.]
Iteratively (1) form regularized convex model and (2) minimize it xk+1 = argmin
x∈X
- fxk(x) + 1
2α x − xk2
2
- = argmin
x∈X
- h
- c(xk) + ∇c(xk)T (x − xk)
- + 1
2α x − xk2
2
- |xk − x⋆| = 3 · 10−4
SLIDE 32
The prox-linear method [Burke, Drusvyatskiy et al.]
Iteratively (1) form regularized convex model and (2) minimize it xk+1 = argmin
x∈X
- fxk(x) + 1
2α x − xk2
2
- = argmin
x∈X
- h
- c(xk) + ∇c(xk)T (x − xk)
- + 1
2α x − xk2
2
- |xk − x⋆| = 4 · 10−8
SLIDE 33
Robust phase retrieval problems
A nice application for these composite methods
SLIDE 34
Robust phase retrieval problems
Data model: true signal x⋆ ∈ Rn, for pfail < 1
2 observe
bi = ai, x⋆2 + ξi where ξi =
- w.p. ≥ 1 − pfail
arbitrary
- therwise
SLIDE 35
Robust phase retrieval problems
Data model: true signal x⋆ ∈ Rn, for pfail < 1
2 observe
bi = ai, x⋆2 + ξi where ξi =
- w.p. ≥ 1 − pfail
arbitrary
- therwise
Goal: solve minimize
x
f(x) = 1 m
m
- i=1
| ai, x2 − bi|
SLIDE 36
Robust phase retrieval problems
Data model: true signal x⋆ ∈ Rn, for pfail < 1
2 observe
bi = ai, x⋆2 + ξi where ξi =
- w.p. ≥ 1 − pfail
arbitrary
- therwise
Goal: solve minimize
x
f(x) = 1 m
m
- i=1
| ai, x2 − bi| Composite problem: f(x) = 1
m φ(Ax) − b1 = h(c(x)) where φ(·) is
elementwise square, h(z) = 1 m z1 , c(x) = φ(Ax) − b
SLIDE 37
A convergence theorem
Three key ingredients. (1) Stability: f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2 (2) Close models: |fx(y) − f(y)| ≤ 1
m
- AT A
- p x − y2
2
(3) A good initialization
SLIDE 38
A convergence theorem
Three key ingredients. (1) Stability: f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2 (2) Close models: |fx(y) − f(y)| ≤ 1
m
- AT A
- p x − y2
2
(3) A good initialization
◮ Measurement matrix A = [a1 · · · am]T ∈ Rm×n and
1 mAT A = 1 m
m
- i=1
aiaT
i ◮ Convex model fx of f at x defined by
fx(y) = h(c(x) + ∇c(x)T (y − x))
SLIDE 39
A convergence theorem
Three key ingredients. (1) Stability: f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2 (2) Close models: |fx(y) − f(y)| ≤ 1
m
- AT A
- p x − y2
2
(3) A good initialization
◮ Measurement matrix A = [a1 · · · am]T ∈ Rm×n and
1 mAT A = 1 m
m
- i=1
aiaT
i ◮ Convex model fx of f at x defined by
fx(y) = 1 m
m
- i=1
- ai, x2 + 2 ai, x ai, y − x
SLIDE 40
A convergence theorem
Three key ingredients. (1) Stability: f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2 (2) Close models: |fx(y) − f(y)| ≤ 1
m
- AT A
- p x − y2
2
(3) A good initialization Theorem (D. & Ruan 17) Define dist(x, x⋆) = min{x − x⋆2 , x + x⋆2}. Let xk be generated by the prox-linear method and L = 1
m
- AT A
- p. Then
dist(xk, x⋆) ≤ 2L λ dist(x0, x⋆) 2k .
SLIDE 41
Unpacking the convergence theorem
Theorem (D. & Ruan 17) Define dist(x, x⋆) = min{x − x⋆2 , x + x⋆2}. Let xk be generated by the prox-linear method and L = 1
m
- AT A
- p. Then
dist(xk, x⋆) ≤ 2L λ dist(x0, x⋆) 2k .
◮ Quadratic convergence: for all intents and purposes, 6 iterations ◮ Requires solving explicit convex optimization problems (quadratic
programs) with no tuning parameters
SLIDE 42
Ingredients in convergence: stability
- 1. Stability: (cf. Eldar and Mendelson 14)
f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2
SLIDE 43
Ingredients in convergence: stability
- 1. Stability: (cf. Eldar and Mendelson 14)
f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2 What is necessary? Proposition (D. & Ruan 17) Assume uniformity condition: for all u, v ∈ Rn and a ∼ P P(|uT aaT v| ≥ ǫ0 u2 v2) ≥ c > 0. Then f is 1
2ǫ0-stable with probability at least 1 − e−cm.
SLIDE 44
Ingredients in convergence: stability
- 1. Stability: (cf. Eldar and Mendelson 14)
f(x) − f(x⋆) ≥ λ x − x⋆2 x + x⋆2 What is necessary? Proposition (D. & Ruan 17) Assume uniformity condition: for all u, v ∈ Rn and a ∼ P P(|uT aaT v| ≥ ǫ0 u2 v2) ≥ c > 0. Then f is 1
2ǫ0-stable with probability at least 1 − e−cm.
(Gaussians satisfy this)
SLIDE 45
Ingredients in convergence: stability
Growth condition (stability): ai, x2 − ai, x⋆2 = ai, x − x⋆ ai, x + x⋆ and under random ai with uniform enough support, f(x) = 1 m
m
- i=1
- (x − x⋆)T aiaT
i (x + x⋆)
- x − x⋆2 x + x⋆2
SLIDE 46
Ingredients in convergence
- 2. Approximation: need
1 m
- AT A
- p = O(1)
What is necessary? Proposition (Vershynin 11) If the measurement vectors ai are sub-Gaussian, then 1 m
- AT A
- p ≤ O(1) ·
n m + t w.p. ≥ 1 − e−mt2.
SLIDE 47
Ingredients in convergence
- 2. Approximation: need
1 m
- AT A
- p = O(1)
What is necessary? Proposition (Vershynin 11) If the measurement vectors ai are sub-Gaussian, then 1 m
- AT A
- p ≤ O(1) ·
n m + t w.p. ≥ 1 − e−mt2. Heavy-tailed data gets
1 m
- AT A
- p = O(1) with reasonable probability
for m a bit larger
SLIDE 48
Ingredients in convergence: spectral initialization
Insight: [Wang, Giannakis, Eldar 16] Most vectors ai ∈ Rn are orthogonal to x⋆
SLIDE 49
Ingredients in convergence: spectral initialization
Insight: [Wang, Giannakis, Eldar 16] Most vectors ai ∈ Rn are orthogonal to x⋆
Xinit :=
- i:bi≤median(b)
aiaT
i
satisfies Xinit ≈ E[aiaT
i ] − cd⋆d⋆T
where d⋆ = x⋆/ x⋆2
SLIDE 50
Ingredients in convergence: spectral initialization
Insight: [Wang, Giannakis, Eldar 16] Most vectors ai ∈ Rn are orthogonal to x⋆
Xinit :=
- i:bi≤median(b)
aiaT
i
satisfies Xinit ≈ E[aiaT
i ] − cd⋆d⋆T
where d⋆ = x⋆/ x⋆2 d⋆
SLIDE 51
Ingredients in convergence: spectral initialization
- 3. Initialization: We need dist(x0, x⋆) 1
2 x⋆2
SLIDE 52
Ingredients in convergence: spectral initialization
- 3. Initialization: We need dist(x0, x⋆) 1
2 x⋆2
Estimate direction d ≈ x⋆/ x⋆2 and radius r by Xinit :=
- i:bi≤median(b)
aiaT
i
and
- d = argmin
d∈Sn−1
- dT Xinitd
- r :=
1 m
m
- i=1
b2
i
1
2
≈ x⋆2
SLIDE 53
Ingredients in convergence: spectral initialization
- 3. Initialization: We need dist(x0, x⋆) 1
2 x⋆2
Estimate direction d ≈ x⋆/ x⋆2 and radius r by Xinit :=
- i:bi≤median(b)
aiaT
i
and
- d = argmin
d∈Sn−1
- dT Xinitd
- r :=
1 m
m
- i=1
b2
i
1
2
≈ x⋆2 Proposition (D. & Ruan 17) Under appropriate orthogonality conditions, x0 = r d satisfies dist(x0, x⋆) n m + t with probability at least 1 − e−mt2
SLIDE 54
Take-home result
◮ Stability: measurements ai are uniform enough in direction ◮ Closeness: ai are sub-Gaussian or normalized ◮ Sufficient conditions for initialization: for v ∈ Sn,
E[aiaT
i | ai, v2 ≤ v2 2] = In − cvvT + E
where c > 0 and E is a small error
◮ Measurement failure probability pfail ≤ 1 4
Theorem (D. & Ruan 17) If these conditions hold and m/n 1, then the spectral initialization succeeds and iterates xk of prox-linear algorithm satisfy dist(xk, x0) ≤ (O(1) · dist(x0, x⋆))2k
SLIDE 55
Experiments
- 1. Random (Gaussian) measurements
- 2. Adversarially chosen outliers
- 3. Real images
SLIDE 56
Experiment 1: random Gaussian measurements
◮ Data generation: dimension n = 3000,
ai
iid
∼ N(0, In) and bi = ai, x⋆2
◮ Compare to Wang, Giannakis, Eldar’s Truncated Amplitude Flow
(best performing non-convex approach)
◮ Look at success probability against m/n (note that m ≥ 2n − 1 is
necessary for injectivity)
SLIDE 57
Experiment 1: random Gaussian measurements
1.80 1.85 1.90 1.95 2.00 2.05 2.10 2.15 2.20
m/n
0.0 0.2 0.4 0.6 0.8 1.0
P(success)
Prox TAF
SLIDE 58
Experiment 1: random Gaussian measurements
1.80 1.85 1.90 1.95 2.00 2.05 2.10 2.15 2.20
m/n
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
Fraction of one-sided success Prox TAF
SLIDE 59
Experiment 2: corrupted measurements
◮ Data generation: dimension n = 200,
ai
iid
∼ N(0, In) and bi =
- w.p. pfail
ai, x⋆2
- therwise
(most confuses our initialization method)
◮ Compare to Zhang, Chi, Liang’s Median-Truncated Wirtinger Flow
(designed specially for standard Gaussian measurements)
◮ Look at success probability against m/n (note that m ≥ 2n − 1 is
necessary for injectivity)
SLIDE 60
Experiment 2: corrupted measurements
0.0 0.25 0.5 0.75 1.0 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 1.8 2.0 2.5 3.0 4.0 6.0 8.0 0.0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 1.8 2.0 2.5 3.0 4.0 6.0 8.0
pfail
SLIDE 61
Experiment 3: digit recovery
◮ Data generation: handwritten 16×16 grayscale digits, sensing matrix
A = HnS1 HnS2 HnS3 ∈ R3n×n where n = 256, Sl are diagonal random sign matrices, Hn is Hadamard transform matrix
◮ Observe
b = (Ax⋆)2 + ξ where ξi =
- w.p. 1 − pfail
Cauchy
- therwise
◮ Other non-convex approaches designed for Gaussian data; unclear
how to parameterize them
SLIDE 62
Experiment 3: digit recovery
Left: true image. Middle: spectral initialization. Right: solution.
SLIDE 63
Experiment 3: digit recovery
0.00 0.05 0.10 0.15 0.20 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Success probability
2000 4000 6000 8000 10000 12000 14000 16000 18000
Matrix multiply count
pfail Performance of composite optimization scheme versus failure probability
SLIDE 64
Experiment 4: real images
Signal size n = 222, measurements m = 3 · 224
SLIDE 65
Experiment 4: real images
Signal size n = 222, measurements m = 3 · 224
SLIDE 66
Composite optimization at scale
Question: What if we have composite problems with a really big sample?
SLIDE 67
Composite optimization at scale
Question: What if we have composite problems with a really big sample?
◮ Typical stochastic optimization setup,
f(x) = E[F(x; S)] where F(x; S) = h(c(x; S); S)
SLIDE 68
Composite optimization at scale
Question: What if we have composite problems with a really big sample?
◮ Typical stochastic optimization setup,
f(x) = E[F(x; S)] where F(x; S) = h(c(x; S); S)
◮ Example: large scale (robust) nonlinear regression
f(x) = 1 m
m
- i=1
|φ(ai, x) − bi|
SLIDE 69
A stochastic composite method
◮ Define (random) convex approximation
Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x); s)
SLIDE 70
A stochastic composite method
◮ Define (random) convex approximation
Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x)
- ≈c(y;s)
; s)
SLIDE 71
A stochastic composite method
◮ Define (random) convex approximation
Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x)
- ≈c(y;s)
; s)
◮ Then iterate for k = 1, 2, . . .
Sk
iid
∼ P xk+1 = argmin
x∈X
- Fxk(x; Sk) +
1 2αk x − xk2
2
SLIDE 72
Understanding convergence behavior
Ordinary differential equations (gradient flow): ˙ x = −∇f(x) i.e. d dtx(t) = −∇f(x(t))
SLIDE 73
Understanding convergence behavior
Ordinary differential inclusions (subgradient flow): ˙ x ∈ −∂f(x) i.e. d dtx(t) ∈ −∂f(x(t))
SLIDE 74
The differential inclusion
For stochastic function f(x) := E[F(x; S)] = E[h(c(x; S); S)] =
- h(c(x; s); s)dP(s)
the generalized subgradient (for non-convex, non-smooth) is [D. & Ruan 17]
∂f(x) =
- ∇c(x; s)∂h(c(x; s); s)dP(s)
Theorem (D. & Ruan 17) For stochastic composite problem, the subdifferential inclusion ˙ x ∈ −∂f(x) has a unique trajectory for all time and f(x(t)) − f(x(0)) ≤ − t ∂f(x(τ))2 dτ. It also has limit points and they are stationary.
SLIDE 75
The limiting differential inclusion
Recall our iteration xk+1 = argmin
x
- Fxk(x; Sk) +
1 2αk x − xk2
2
- .
Optimality conditions: using Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x)),
SLIDE 76
The limiting differential inclusion
Recall our iteration xk+1 = argmin
x
- Fxk(x; Sk) +
1 2αk x − xk2
2
- .
Optimality conditions: using Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x)), 0 ∈ ∇c(xk; s)∂h(c(xk; s) + ∇c(xk; s)T (xk+1 − xk)) + 1 αk [xk+1 − xk]
SLIDE 77
The limiting differential inclusion
Recall our iteration xk+1 = argmin
x
- Fxk(x; Sk) +
1 2αk x − xk2
2
- .
Optimality conditions: using Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x)), 0 ∈ ∇c(xk; s)∂h(c(xk; s) + ∇c(xk; s)T (xk+1 − xk)
- =c(xk;s)±O(xk−xk+12)
) + 1 αk [xk+1 − xk]
SLIDE 78
The limiting differential inclusion
Recall our iteration xk+1 = argmin
x
- Fxk(x; Sk) +
1 2αk x − xk2
2
- .
Optimality conditions: using Fx(y; s) = h(c(x; s) + ∇c(x; s)T (y − x)), 0 ∈ ∇c(xk; s)∂h(c(xk; s) + ∇c(xk; s)T (xk+1 − xk)
- =c(xk;s)±O(xk−xk+12)
) + 1 αk [xk+1 − xk] i.e. 1 αk [xk+1 − xk] ∈ −∇c(xk; s)∂h(c(xk; s); s) + subgradient mess + Noise = −∂f(xk) + subgradient mess + Noise
SLIDE 79
Graphical example
Iterate xk+1 = argmin
x
- Fxk(x; Sk) +
1 2αk x − xk2
2
SLIDE 80
A convergence guarantee
Consider the stochatsic composite optimization problem minimize
x∈X
f(x) := E[F(x; S)] where F(x; s) = h(c(x; s); s). Use the iteration xk+1 = argmin
x∈X
- Fxk(x; Sk) +
1 2αk x − xk2
2
- .
Theorem (D. & Ruan 17) Assume X is compact and ∞
k=1 αk = ∞, ∞ k=1 α2 k < ∞. Then the
sequence {xk} satisfies (1) f(xk) converges (2) All cluster points of xk are stationary
SLIDE 81
Experiment: noiseless phase retrieval
50 100 150 200 10-8 10-7 10-6 10-5 10-4 10-3 10-2
prox sprox sgd
SLIDE 82
Conclusions
- 1. Broadly interesting structures for non-convex problems that are still
approximable
- 2. Statistical modeling allows solution of non-trivial, non-smooth,
non-convex problems
- 3. Large scale efficient methods still important
SLIDE 83
Conclusions
- 1. Broadly interesting structures for non-convex problems that are still
approximable
- 2. Statistical modeling allows solution of non-trivial, non-smooth,
non-convex problems
- 3. Large scale efficient methods still important
References
◮ Solving (most) of a set of quadratic equalities: Composite
- ptimization for robust phase retrieval arXiv:1705.02356