Introductory Course on Non-smooth Optimisation Lecture 09 - - - PowerPoint PPT Presentation

introductory course on non smooth optimisation
SMART_READER_LITE
LIVE PREVIEW

Introductory Course on Non-smooth Optimisation Lecture 09 - - - PowerPoint PPT Presentation

Introductory Course on Non-smooth Optimisation Lecture 09 - Non-convex optimisation Jingwei Liang Department of Applied Mathematics and Theoretical Physics Table of contents Examples 1 2 Non-convex optimisation Convex relaxation 3 4


slide-1
SLIDE 1

Introductory Course on Non-smooth Optimisation

Lecture 09 - Non-convex optimisation Jingwei Liang

Department of Applied Mathematics and Theoretical Physics

slide-2
SLIDE 2

Table of contents

1

Examples

2

Non-convex optimisation

3

Convex relaxation

4

Łojasiewicz inequality

5

Kurdyka-Łojasiewicz inequality

slide-3
SLIDE 3

Compressed sensing

For

  • rwar

ard ob

  • bser

servation tion b = A˚ x, ˚ x ∈ Rn is sparse. A : Rn → Rm with m << n. Compr Compressed essed sensing sensing min

x∈Rn ||x||0

s.t. Ax = b. NB: NP-hard problem.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-4
SLIDE 4

Image processing

Two-phase

  • -phase segmen

segmentation tion Given an image I, which consists of foreground and background, segment the foreground. Ideally, I = fC + bΩ\C. Mum Mumfor

  • rd–Shah

d–Shah model model E(u, C) =

(u − I)2dx + λ

  • Ω\C

||∇u||2dx + α|C|

  • ,

where |C| = peri(C).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-5
SLIDE 5

Principal component pursuit

For

  • rwar

ard mix mixtur ture model model w = ˚ x +˚ y + ǫ, where˚ x ∈ Rm×n is κ-sparse,˚ y ∈ Rm×n is σ-low-rank and ǫ is noise. Non-c Non-con

  • nvex PCP

PCP min

x,y∈Rm×n

1 2||x + y − w||2

s.t. ||x||0 ≤ κ and rank(y) ≤ σ.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-6
SLIDE 6

Neural networks

Each ach la layer er of

  • f NNs

NNs is is con

  • nvex

Linear operation, e.g. convolution. Non-linear activation function, e.g. rectifier max{x, 0}. The composition of convex functions is not necessarily convex... Neural networks are universal function approximators. Hence need to approximate non-convex functions. Cannot approximate non-convex functions with convex functions.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-7
SLIDE 7

Outline

1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

slide-8
SLIDE 8

Non-convex optimisation Non-convex problem

Any problem that is not convex/concave is non-convex...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-9
SLIDE 9

Challenges

Potentially many local minima. Saddle points. Very flat regions. Widely varying curvature. NP-hard.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-10
SLIDE 10

Outline

1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

slide-11
SLIDE 11

Convex relaxation

Non-c Non-con

  • nvex op
  • ptimisa

timisation tion pr problem

  • blem

min

x

E(x). Con Convex op

  • ptimisa

timisation tion pr problem

  • blem

min

x

F(x). Wha What if if Argmin(F) ⊆ Argmin(E), Subtle and case-dependent. Somehow, finding F is almost equivalent to solving E.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-12
SLIDE 12

Convex relaxation

Loose relaxation Ideal relaxation In practice, it is easier to obtain Argmin(E) ⊆ Argmin(F). Loose relaxation will work

  • rk if two global minima are close enough.

Ideal relaxation will fail ail if Argmin(F) is too large.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-13
SLIDE 13

Convolution

For certain problems, non-convexity can be treated as noise... Original function Convolution Symmetric boundary condition for the convolution. Almost convex problem after convolution.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-14
SLIDE 14

Outline

1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

slide-15
SLIDE 15

Smooth problem

Let F ∈ C1

L.

Gradient descent xk+1 = xk − γ∇F(xk). Descent property F(xk) − F(xk+1) ≥ γ(1 − γL

2 )||∇F(xk)||2.

Let γ ∈]0, 2/L[, γ(1 − γL

2 ) k

i=0 ||∇F(xi)||2 ≤ F(x0) − F(xk+1) ≤ F(x0) − F(x⋆).

F(x⋆) > −∞, rhs is a positive constant. for lhs, let k → +∞, lim

k→+∞ ||∇F(xk)||2 = 0.

NB: for smooth case, a critical point is guarantee. For non-smooth problem...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-16
SLIDE 16

Semi-algebraic sets and functions Semi-algebraic set

A semi-algebraic subset of Rn is a finite union of sets of the form

  • x ∈ Rn : fi(x) = 0, gj(x) ≤ 0, i ∈ I, j ∈ J
  • where I, J are finite and fi, gj : Rn → R are real polynomial functions.

Stability under finite ∩, ∪ and complementation.

Semi-algebraic set

A function or a mapping is semi-algebraic if its graph is a semi-algebraic set. Same definition for real-extended function or multivalued mappings.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-17
SLIDE 17

Properties Tarski-Seidenberg

The image of a semi-algebraic set by a linear projection is semi-algebraic. The closure of a semi-algebraic set A is semi-algebraic. Ex Example ample The graph of the derivative of a semi-algebraic function is semi-algebraic. Let A be a semi-algebraic subset of Rn and f : Rn → Rp semi-algebraic. Then f(A) is semi-algebraic. g(x) = max{F(x, y) : y ∈ S} is semi-algebraic if F and S are semi-algebraic. Other examples min

x

1 2||Ax − b||2 + µ||x||p : p is rational,

min

X

1 2||AX − B||2 + µrank(X).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-18
SLIDE 18

Subdifferential

Con Convex subdiff subdiffer eren ential tial R ∈ Γ0(Rn) ∂R(x) =

  • g : R(x′) ≥ R(x) + g, x′ − x, ∀x′ ∈ Rn

.

Fréchet subdifferential

Given x ∈ dom(R), the Fréchet subdifferential ˆ ∂R(x) of R at x is the set of vectors v such that lim inf

x′→x, x′=x

1 ||x − x′||

  • R(x′) − R(x) − v, x′ − x
  • ≥ 0.

If x / ∈ dom(R), then ˆ ∂R(x) = ∅.

Limiting subdifferential

The limiting-subdifferential (or simply subdifferential) of R at x, written as ∂R(x), reads ∂R(x)

def

= {v ∈ Rn : ∃xk → x, R(xk) → R(x), vk ∈ ˆ ∂R(xk) → v}. ˆ ∂R is convex and ∂R is closed.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-19
SLIDE 19

Critical points

Minimal Minimal norm norm subgr subgradien adient ||∂R(x)||− = min{||v|| : v ∈ ∂R(x)}.

Critical points

Fermat’s rule: if x is a minimiser of R, then 0 ∈ ∂R(x). Conversely when 0 ∈ ∂R(x), the point x is called a critical point. When R is convex, any minimiser is a global minimiser. When R is non-convex – Local minima. – Local maxima. – Saddle point.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-20
SLIDE 20

Sharpness Sharpness

Function R : Rn → R ∪ {+∞} is called sharp on the slice [a < R < b]

def

=

  • x ∈ Rn : a < f(x) < b
  • .

If there exists α > 0 such that ||∂R(x)||− ≥ α, ∀x ∈ [a < R < b]. Norms, e.g. R(x) = ||x||.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-21
SLIDE 21

Łojasiewicz inequality Łojasiewicz inequality

Let R : Rn → R ∪ {+∞} be proper lower semi-continuous, and moreover continuous along its

  • domain. Then R is said to have Łojasiewicz property if: for any critical point ¯

x, there exist C, ǫ > 0 and θ ∈ [0, 1[ such that |R(x) − R(¯ x)|θ ≤ C||v||, ∀x ∈ B¯

x(ǫ), v ∈ ∂R(x).

By convention, let 00 = 0.

Property

Suppose that R has Łojasiewicz property. If S is a connected subset of the set of critical points of R, that is 0 ∈ ∂R(x) for all x ∈ S, then R is constant on S. If in addition S is a compact set, then there exist C, ǫ > 0 and θ ∈ [0, 1[ such that ∀x ∈ Rn, dist(x, S) ≤ ǫ, ∀v ∈ ∂R(x) : |R(x) − R(¯ x)|θ ≤ C||v||.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-22
SLIDE 22

Non-convex PPA Proximal point algorithm

Let R : Rn → R ∪ {+∞} be proper and lower semi-continuous. From arbitrary x0 ∈ Rn, xk+1 ∈ argminx γR(x) + 1

2||x − xk||2.

Assump Assumption tion R is proper, that is inf

x∈Rn R(x) > −∞.

This implies argminx γR(x) + 1

2||x − xk||2

is non-empty and compact. The restriction of R to its domain is a continuous function. R has the Łojasiewicz property.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-23
SLIDE 23

Property Property

Let {xk}k∈N be the sequence generated by non-convex PPA and ω(xk) the set of its limiting

  • points. Then

Sequence {R(xk)}k∈N is decreasing.

  • k ||xk − xk+1||2 < +∞.

If R satisfies assumption 2, then ω(xk) ⊂ crit(R). If moreover, {xk}k∈N is bounded ω(xk) is a non-empty compact set, and dist

  • xk, ω(xk)
  • → 0.

If R satisfies assumption 2, then R is finite and constant on ω(xk). NB: Boundedness can be guaranteed if R is coercive.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-24
SLIDE 24

Convergence Convergence of PPA

Suppose the sequence {xk}k∈N generated by non-convex PPA is bounded, then

  • k||xk − xk+1|| < +∞,

and the whole sequence converges to some critical point ¯ x ∈ crit(R). From definition of xk+1: R(xk+1) +

1 2γ ||xk − xk+1||2 ≤ R(xk) .

Consider g(s) = s1−θ, s > 0: ∇g(s) = (1 − θ)s−θ g

  • R(xk)
  • − g
  • R(xk+1)
  • ≥ (1 − θ)
  • R(xk+1)

−θ R(xk) − R(xk+1)

  • ≥ (1 − θ)
  • R(xk)

−θ 1

2γ ||xk − xk+1||2.

WLOG, assume R(¯ x) = 0 for ¯ x ∈ ω(xk). Let vk ∈ ∂R(xk), then for all k large enough 0 < R(xk)θ ≤ C||vk|| = C

γ ||xk − xk−1||.

There exists M > 0

||xk − xk+1||2 ||xk − xk−1|| ≤ M

  • R(xk)1−θ − R(xk+1)1−θ

.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-25
SLIDE 25

Convergence Convergence of PPA

Suppose the sequence {xk}k∈N generated by non-convex PPA is bounded, then

  • k||xk − xk+1|| < +∞,

and the whole sequence converges to some critical point ¯ x ∈ crit(R). Take r ∈]0, 1[, if ||xk − xk+1|| ≥ r||xk − xk−1||, then ||xk − xk+1|| ≤ M

r

  • R(xk)1−θ − R(xk+1)1−θ

. For all k large enough ||xk − xk+1|| ≤ r||xk − xk−1|| + M

r

  • R(xk)1−θ − R(xk+1)1−θ

. There exists some K > 0, such that for k ≥ K

k

i=K||xi − xi+1|| ≤

r 1 − r||xK − xK−1|| + M r(1 − r)

  • R(xK)1−θ − R(xK+1)1−θ

. R(x) is bounded from below. Take k → +∞...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-26
SLIDE 26

Rate of convergence Convergence rate

Suppose the convergence of the non-convex PPA is true. Denote θ the Łojasiewicz exponent of x∞. The following statements hold If θ = 0, then {xk}k∈N converges in finite number of steps. If θ ∈]0, 1/2], then there exists η ∈]0, 1[ such that ||xk − x∞|| = O(ηk). If θ ∈]1/2, 1[, then ||xk − x∞|| = O(k− 1−θ

2θ−1 ). Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-27
SLIDE 27

Outline

1 Examples 2 Non-convex optimisation 3 Convex relaxation 4 Łojasiewicz inequality 5 Kurdyka-Łojasiewicz inequality

slide-28
SLIDE 28

Kurdyka-Łojasiewicz inequality (KL)

Let R : Rn → R ∪ {+∞} be proper l.s.c. For a, b such that −∞ < a < b < +∞, [a < R < b]

def

= {x ∈ Rn : a < R(x) < b}.

Kurdyka-Łojasiewicz inequality

R is said to have the KL property at ¯ x ∈ dom(R) if there exists η ∈]0, +∞], a neighbourhood U

  • f ¯

x and a continuous conc

  • ncave function ϕ : [0, η[→ R+ such that

ϕ(0) = 0. ϕ is C1 on ]0, η[. for all s ∈]0, η[, ϕ′(s) > 0. for all x ∈ U ∩ [R(¯ x) < R < R(¯ x) + η], the KL inequality holds ϕ′ R(x) − R(¯ x)

  • dist
  • 0, ∂R(x)
  • ≥ 1.

Proper l.s.c. functions are KL at non-critical points. Proper l.s.c. functions which satisfy KL at each point of dom(∂R) are called KL functions. Typical KL functions are the class of semi-algebraic functions.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-29
SLIDE 29

Kurdyka-Łojasiewicz functions

||∇F(x)|| ≥ 0 ||∇(ϕ ◦ F)(x)|| ≥ 1 When R(¯ x) = 0, then the condition becomes ||∂(ϕ ◦ F)(x)||− ≥ 1. ϕ is called a desingularising function for R, i.e. sharp up to reparameterization via ϕ.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-30
SLIDE 30

Abstract descent methods

Let Φ be proper and lower semi-continuous. Suppose a sequence {xk}k∈N is generated such that the following conditions are satisfied.

Conditions

Let c, d > 0 be some constants A.1 Sufficien Sufficient decr decrease ease conditions

  • nditions For each k ∈ N,

Φ(xk+1) + c||xk+1 − xk||2 ≤ Φ(xk). A.2 Rela elativ tive err error

  • r condition
  • ndition For each k ∈ N, there exists gk+1 ∈ ∂Φ(xk+1) such that

||gk+1|| ≤ d||xk+1 − xk||. A.3 Con Continuity tinuity con

  • ndition

dition There exists a subsequence {xkj}j∈N and ¯ x such that xkj → ¯ x, Φ(xkj) → Φ(¯ x).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-31
SLIDE 31

Convergence Convergence

Let Φ : Rn → R ∪ {+∞} be proper and l.s.c. and KL at some ¯ x ∈ Rn. Let U, η and ϕ be in the KL

  • property. Let δ, ρ > 0 be such that B¯

x(δ) ⊂ U with ρ ∈]0, δ[. Consider a sequence {xk}k∈N

which satisfies (A.1)-(A.2). Suppose moreover Φ(¯ x) < Φ(x0) < Φ(¯ x) + η, ||x0 − ¯ x|| + 2

  • Φ(x0)−Φ(¯

x) c

+ d

c ϕ

  • Φ(x0) − Φ(¯

x)

  • < ρ,

and ∀k ∈ N, xk ∈ B¯

x(ρ) ⇒ xk+1 ∈ B¯ x(δ) with Φ(xk+1) ≥ Φ(¯

x). Then the sequence {xk}k∈N satisfies ∀k ∈ N, xk ∈ B¯

x(δ),

  • k ||xk − xk+1|| < +∞,

Φ(xk) → Φ(¯ x). and converges to a point x⋆ ∈ B¯

x(δ) such that Φ(x⋆) ≤ Φ(¯

x). If moreover, (A.3) is true, then x⋆ is a critical point and Φ(x⋆) = Φ(¯ x).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-32
SLIDE 32

Convergence

Condition (A.1) implies that {Φ(xk)}k∈N is non-increasing, and for all k ∈ N ||xk+1 − xk|| ≤

  • Φ(xk)−Φ(xk+1)

c

. Condition (A.2) and KL inequality ϕ′ Φ(xk) − Φ(¯ x)

1 ||gk|| ≥ 1 d||xk − xk−1||.

Since ϕ is concave, ϕ

  • Φ(xk) − Φ(¯

x)

  • − ϕ
  • Φ(xk) − Φ(¯

x)

  • ≥ ϕ′

Φ(xk) − Φ(¯ x)

  • Φ(xk) − Φ(xk+1)
  • ≥ ϕ′

Φ(xk) − Φ(¯ x)

  • c||xk − xk+1||2.

Combining the above two yields

||xk − xk+1||2 ||xk − xk−1|| ≤ d c

  • ϕ
  • Φ(xk) − Φ(¯

x)

  • − ϕ
  • Φ(xk) − Φ(¯

x)

  • .

Apply the inequality 2√xy ≤ x + y, 2||xk − xk+1|| ≤ ||xk − xk−1|| + d

c

  • ϕ(Φ(xk) − Φ(¯

x)) − ϕ(Φ(xk+1) − Φ(¯ x))

  • .

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-33
SLIDE 33

Convergence

Continue with (A.1), ||x1 − x0|| ≤

  • Φ(x0)−Φ(x1)

c

  • Φ(x0)−Φ(¯

x) c

. Then ||x1 − ¯ x|| ≤ ||x1 − x0|| + ||x0 − ¯ x|| ≤ ||x1 − x0|| +

  • Φ(x0)−Φ(¯

x) c

≤ ρ. By induction, we can show that for all k ∈ N xk ∈ B¯

x(ρ)

and

k

i=1||xi+1 − xi|| + ||xk+1 − xk|| ≤ ||x1 − x0|| + d

c

  • ϕ(Φ(x1) − Φ(¯

x)) − ϕ(Φ(xk+1) − Φ(¯ x))

  • .

The above directly implies

  • k||xk − xk+1|| ≤ ||x1 − x0|| + d

c ϕ(Φ(x1) − Φ(¯

x)) < +∞. Hence, there exists x⋆ ∈ ω(xk) xk → x⋆, gk → 0, Φ(xk) → v ≥ Φ(¯ x). KL inequality ϕ′ v − Φ(¯ x)

  • ||gk|| ≥ 1

indicates v = Φ(¯ x). Lower semi-continuous yields Φ(x⋆) ≤ Φ(¯ x).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-34
SLIDE 34

Forward–Backward splitting

Consider minimising min

x∈Rn

  • Φ(x)

def

= R(x) + F(x)

  • ,

R : Rn → R ∪ {+∞} is proper l.s.c. and bounded from below. F : Rn → R is finite-valued, differentiable and ∇F is L-Lipschitz.

Forward–Backward splitting

Let γ ∈]0, 1/L[: xk+1 ∈ proxγR

  • xk − γ∇F(xk)
  • .

Sufficien Sufficient decr decrease ease Φ(xk+1) + 1 − γL

||xk − xk+1||2 ≤ Φ(xj). Rela elativ tive err error

  • r gk+1

def

=

1 γ (xk − xk+1) − ∇F(xk) + ∇F(xk+1) ∈ ∂Φ(xk+1)

||gk+1|| ≤ 1

γ + L

  • ||xk − xk+1||.

Con Continuity tinuity sequence {xk}k∈N is bounded.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-35
SLIDE 35

A coupled minimisation problem A coupled problem

Consider minimising min

x∈Rn,y∈Rm

  • E(x, y)

def

= R(x) + F(x, y) + J(y)

  • ,

R : Rn → R ∪ {+∞}, J : Rm → R ∪ {+∞} are proper l.s.c. and bounded from below. F : Rn × Rm → R is finite-valued, differentiable and ∇F is L-Lipschitz. Subdifferential ∂E(x, y) =

  • ∂R(x) + ∇xF(x, y)
  • ×
  • ∂J(x) + ∇yF(x, y)
  • = ∂xE(x, y) × ∂yE(x, y).

Separate Lipschitz continuity for F: ∇xF is Lx-Lip. and ∇yF is Ly-Lip.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-36
SLIDE 36

Proximal alternating minimisation (PAM)

PAM is an alternating minimisation algorithm.

PAM

Let γx, γy ∈]0, 1/L[: xk+1 ∈ argminx∈Rn E(x, yk) +

1 2γx ||x − xk||2,

yk+1 ∈ argminy∈Rm E(xk+1, y) +

1 2γy ||y − yk||2.

PAM is an instance of PPA. Convergence, let Φ(x, y) = E(x, y). No closed form solution, xk+1 ∈ argminx∈Rn E(x, yk) +

1 2γx ||x − xk||2

= argminx∈Rn R(x) + F(x, yk) +

1 2γx ||x − xk||2.

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-37
SLIDE 37

Proximal alternating linearised minimisation (PALM)

PALM is linearised PAM. F(x, yk) ≤ F(xk, yk) + ∇xF(xk, yk), x − xk +

1 2γx ||x − xk||2.

PALM

Let γx, γy ∈]0, 1/L[: xk+1 ∈ proxγxR

  • xk − γx∇xF(xk, yk)
  • ,

yk+1 ∈ proxγyJ

  • yk − γy∇yF(xk+1, yk)
  • .

PAM is an instance of Forward–Backward. Convergence, let Φ(x, y) = E(x, y).

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-38
SLIDE 38

Remarks

Converges to global minimiser if starts close enough. Inertial acceleration can be applied to all of them. Step-size v.s. inertial parameter. Step-size and critical points. Stochastic optimisation methods can escape saddle-point or find global minimiser...

Jingwei Liang, DAMTP Introduction to Non-smooth Optimisation March 13, 2019

slide-39
SLIDE 39

Reference

  • H. Attouch, and J. Bolte. “On the convergence of the proximal algorithm for nonsmooth

functions involving analytic features”. Mathematical Programming 116.1-2 (2009): 5-16.

  • H. Attouch, J. Bolte, and B. Svaiter. “Convergence of descent methods for semi-algebraic and

tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods”. Mathematical Programming 137.1-2 (2013): 91-129.

  • H. Attouch, et al. “Proximal alternating minimization and projection methods for nonconvex

problems: An approach based on the Kurdyka-Łojasiewicz inequality”. Mathematics of Operations Research 35.2 (2010): 438-457.

  • J. Bolte, S. Sabach, and M. Teboulle. “Proximal alternating linearized minimization or

nonconvex and nonsmooth problems”. Mathematical Programming 146.1-2 (2014): 459-494.

  • J. Liang, J. Fadili, and G. Peyré. “A multi-step inertial forward-backward splitting method for

non-convex optimization”. Advances in Neural Information Processing Systems. 2016.