iPiano: Inertial Proximal Algorithm for Non-Convex Optimization - - PowerPoint PPT Presentation

ipiano inertial proximal algorithm for non convex
SMART_READER_LITE
LIVE PREVIEW

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization - - PowerPoint PPT Presentation

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization David Stutz June 2, 2016 David Stutz | June 2, 2016 David Stutz | June 2, 2016 0/34 1/34 Table of Contents 1 Problem Related Work 2 Algorithm 3 Convergence 4


slide-1
SLIDE 1

iPiano: Inertial Proximal Algorithm for Non-Convex Optimization

David Stutz

June 2, 2016

David Stutz | June 2, 2016 0/34 David Stutz | June 2, 2016 1/34

slide-2
SLIDE 2

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 2/34

slide-3
SLIDE 3
  • Problem. Minimize composite function

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x))

(1) where – f : Rd → R ∈ C1 with L-Lipschitz continuous gradient; – g : dom(g) ⊂ Rd → R∞ is proper closed convex and lower semicontinuous; – and h coercive and bounded below by

−∞ < hmin := inf

x∈Rd h(x).

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term to solve Equation (1) iteratively.

Problem

David Stutz | June 2, 2016 3/34

slide-4
SLIDE 4
  • Problem. Minimize composite function

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x))

(1) where – f : Rd → R ∈ C1 with L-Lipschitz continuous gradient; – g : dom(g) ⊂ Rd → R∞ is proper closed convex and lower semicontinuous; – and h coercive and bounded below by

−∞ < hmin := inf

x∈Rd h(x).

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term to solve Equation (1) iteratively.

Problem

David Stutz | June 2, 2016 3/34

slide-5
SLIDE 5
  • Problem. Minimize composite function

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x))

(1) where – f : Rd → R ∈ C1 with L-Lipschitz continuous gradient; – g : dom(g) ⊂ Rd → R∞ is proper closed convex and lower semicontinuous; – and h coercive and bounded below by

−∞ < hmin := inf

x∈Rd h(x).

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term to solve Equation (1) iteratively.

Problem

David Stutz | June 2, 2016 3/34

slide-6
SLIDE 6
  • Problem. Minimize composite function

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x))

(1) where – f : Rd → R ∈ C1 with L-Lipschitz continuous gradient; – g : dom(g) ⊂ Rd → R∞ is proper closed convex and lower semicontinuous; – and h coercive and bounded below by

−∞ < hmin := inf

x∈Rd h(x).

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term to solve Equation (1) iteratively.

Problem

David Stutz | June 2, 2016 3/34

slide-7
SLIDE 7
  • Problem. Minimize composite function

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x))

(1) where – f : Rd → R ∈ C1 with L-Lipschitz continuous gradient; – g : dom(g) ⊂ Rd → R∞ is proper closed convex and lower semicontinuous; – and h coercive and bounded below by

−∞ < hmin := inf

x∈Rd h(x).

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term to solve Equation (1) iteratively.

Problem

David Stutz | June 2, 2016 3/34

slide-8
SLIDE 8

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 4/34

slide-9
SLIDE 9

Gradient descent for h ∈ C1:

x(n+1) = x(n) − αn∇h(x(n)).

Gradient descent with inertial force/momentum term:

x(n+1) = x(n) − αn∇h(x(n)) + βn(x(n) − x(n−1)).

Proximal point for h being proper closed convex:

x(n+1) = proxαnh(x(n)).

Forward-backward splitting for h = f + g with f ∈ C1 and f, g being proper closed convex:

x(n+1) = proxαng(x(n) − αn∇f(x(n))).

Related Work

David Stutz | June 2, 2016 5/34

slide-10
SLIDE 10

Gradient descent for h ∈ C1:

x(n+1) = x(n) − αn∇h(x(n)).

Gradient descent with inertial force/momentum term:

x(n+1) = x(n) − αn∇h(x(n)) + βn(x(n) − x(n−1)).

Proximal point for h being proper closed convex:

x(n+1) = proxαnh(x(n)).

Forward-backward splitting for h = f + g with f ∈ C1 and f, g being proper closed convex:

x(n+1) = proxαng(x(n) − αn∇f(x(n))).

Related Work

David Stutz | June 2, 2016 5/34

slide-11
SLIDE 11

Gradient descent for h ∈ C1:

x(n+1) = x(n) − αn∇h(x(n)).

Gradient descent with inertial force/momentum term:

x(n+1) = x(n) − αn∇h(x(n)) + βn(x(n) − x(n−1)).

Proximal point for h being proper closed convex:

x(n+1) = proxαnh(x(n)).

Forward-backward splitting for h = f + g with f ∈ C1 and f, g being proper closed convex:

x(n+1) = proxαng(x(n) − αn∇f(x(n))).

Related Work

David Stutz | June 2, 2016 5/34

slide-12
SLIDE 12

Gradient descent for h ∈ C1:

x(n+1) = x(n) − αn∇h(x(n)).

Gradient descent with inertial force/momentum term:

x(n+1) = x(n) − αn∇h(x(n)) + βn(x(n) − x(n−1)).

Proximal point for h being proper closed convex:

x(n+1) = proxαnh(x(n)).

Forward-backward splitting for h = f + g with f ∈ C1 and f, g being proper closed convex:

x(n+1) = proxαng(x(n) − αn∇f(x(n))).

Related Work

David Stutz | June 2, 2016 5/34

slide-13
SLIDE 13

Gradient descent for h ∈ C1:

x(n+1) = x(n) − αn∇h(x(n)).

Gradient descent with inertial force/momentum term:

x(n+1) = x(n) − αn∇h(x(n)) + βn(x(n) − x(n−1)).

Proximal point for h being proper closed convex:

x(n+1) = proxαnh(x(n)).

Forward-backward splitting for h = f + g with f ∈ C1 and f, g being proper closed convex:

x(n+1) = proxαng(x(n) − αn∇f(x(n))).

Related Work

David Stutz | June 2, 2016 5/34

slide-14
SLIDE 14

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 6/34

slide-15
SLIDE 15

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term.

x(n+1) = proxαng(x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1)))

(2) with step size parameters (αn)n∈N and momentum parameters (βn)n∈N. Backtracking to estimate the local Lipschitz constant Ln such that

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2

(3)

Algorithm – Iterates and Backtracking

David Stutz | June 2, 2016 7/34

slide-16
SLIDE 16

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term:

x(n+1) = proxαng(x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1)))

(2) with step size parameters (αn)n∈N and momentum parameters (βn)n∈N. Backtracking to estimate the local Lipschitz constant Ln such that

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2

(3)

Algorithm – Iterates and Backtracking

David Stutz | June 2, 2016 7/34

slide-17
SLIDE 17

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term

x(n+1) = proxαng(x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1)))

(2) with step size parameters (αn)n∈N and momentum parameters (βn)n∈N. Backtracking to estimate the local Lipschitz constant Ln such that

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2

(3)

Algorithm – Iterates and Backtracking

David Stutz | June 2, 2016 7/34

slide-18
SLIDE 18

Ochs et al. [OCBP14] combine forward-backward splitting with an inertial force/momentum term

x(n+1) = proxαng(x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1)))

(2) with step size parameters (αn)n∈N and momentum parameters (βn)n∈N. Backtracking to estimate the local Lipschitz constant Ln such that

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2

(3)

Algorithm – Iterates and Backtracking

David Stutz | June 2, 2016 7/34

slide-19
SLIDE 19

Algorithm iPiano.

1: choose c1, c2 > 0 close to zero, L−1 > 0, η > 1, x(0) 2: x(−1) := x(0) 3: for n = 1, . . . do 4: 5: 6: 7: 8:

choose αn ≥ c1, βn ≥ 0

9: 10:

x(n+1) = proxαng

  • x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1))
  • 11:

12: end for

Algorithm – iPiano

David Stutz | June 2, 2016 8/34

slide-20
SLIDE 20

Algorithm iPiano.

1: choose c1, c2 > 0 close to zero, L−1 > 0, η > 1, x(0) 2: x(−1) := x(0) 3: for n = 1, . . . do 4: 5: 6: 7:

repeat

8:

choose αn ≥ c1, βn ≥ 0

9:

until δn :=

1 αn − Ln 2 − βn 2αn ≥ γn := 1 αn − Ln 2 − βn αn ≥ c2

10:

x(n+1) = proxαng

  • x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1))
  • 11:

12: end for

Algorithm – iPiano

David Stutz | June 2, 2016 8/34

slide-21
SLIDE 21

Algorithm iPiano.

1: choose c1, c2 > 0 close to zero, L−1 > 0, η > 1, x(0) 2: x(−1) := x(0) 3: for n = 1, . . . do 4:

Ln := 1

ηLn−1

5:

repeat

6:

Ln := ηLn

7:

repeat

8:

choose αn ≥ c1, βn ≥ 0

9:

until δn :=

1 αn − Ln 2 − βn 2αn ≥ γn := 1 αn − Ln 2 − βn αn ≥ c2

10:

x(n+1) = proxαng

  • x(n) − αn∇f(x(n)) + βn(x(n) − x(n−1))
  • 11:

until (3) is satisifed for x(n+1)

12: end for

Algorithm – iPiano

David Stutz | June 2, 2016 8/34

slide-22
SLIDE 22

Lemma

For each n ∈ N, given Ln > 0, there exist αn < 2(1 − βn)/Ln and

0 ≤ βn < 1 as in iPiano such that c2 ≤ γn ≤ δn and (δn)n∈N is

monotonically decreasing.

Proof Sketch.

With bn := (δn−1 + Ln

2 )/(c2 + Ln 2 ):

γn ≥ c2 ⇔ αn ≤ 1 − βn c2 + Ln

2

< 2(1 − βn) Ln δn−1 ≥ δn ⇔ 1 − βn c2 + Ln

2

≥ αn ≥ 1 − βn

2

δn−1 + Ln

2

⇒ βn ≤ bn − 1 bn − 1

2

Algorithm – Monotonically Decreasing δn

David Stutz | June 2, 2016 9/34

slide-23
SLIDE 23

Lemma

For each n ∈ N, given Ln > 0, there exist αn < 2(1 − βn)/Ln and

0 ≤ βn < 1 as in iPiano such that c2 ≤ γn ≤ δn and (δn)n∈N is

monotonically decreasing.

Proof Sketch.

With bn := (δn−1 + Ln

2 )/(c2 + Ln 2 ):

γn ≥ c2 ⇔ αn ≤ 1 − βn c2 + Ln

2

< 2(1 − βn) Ln δn−1 ≥ δn ⇔ 1 − βn c2 + Ln

2

≥ αn ≥ 1 − βn

2

δn−1 + Ln

2

⇒ βn ≤ bn − 1 bn − 1

2

Algorithm – Monotonically Decreasing δn

David Stutz | June 2, 2016 9/34

slide-24
SLIDE 24

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 10/34

slide-25
SLIDE 25

Convergence analysis is based on three requirements regarding

Hδn+1(x(n+1), x(n)) :=h(x(n+1)) + δn+1 x(n) − x(n−1)2

2

  • :=h(x(n+1)) + δn+1

∆2

n+1

and the sequence

(z(n+1))n∈N := (x(n+1), x(n))n∈N ⊂ R2d

generated by iPiano. Furthermore, Hδn is required to satisfy the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point ˜

z of Hδn.

Convergence – Overview

David Stutz | June 2, 2016 11/34

slide-26
SLIDE 26

Convergence analysis is based on three requirements regarding

Hδn+1(x(n+1), x(n)) :=h(x(n+1)) + δn+1 x(n) − x(n−1)2

2

  • :=h(x(n+1)) + δn+1

∆2

n+1

and the sequence

(z(n+1))n∈N := (x(n+1), x(n))n∈N ⊂ R2d

generated by iPiano. Furthermore, Hδn is required to satisfy the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point ˜

z of Hδn.

Convergence – Overview

David Stutz | June 2, 2016 11/34

slide-27
SLIDE 27

Convergence analysis is based on three requirements regarding

Hδn+1(x(n+1), x(n)) :=h(x(n+1)) + δn+1 x(n) − x(n−1)2

2

  • :=h(x(n+1)) + δn+1

∆2

n+1

and the sequence

(z(n+1))n∈N := (x(n+1), x(n))n∈N ⊂ R2d

generated by iPiano. Furthermore, Hδn is required to satisfy the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point ˜

z of Hδn.

Convergence – Overview

David Stutz | June 2, 2016 11/34

slide-28
SLIDE 28

Convergence analysis is based on three requirements regarding

Hδn+1(x(n+1), x(n)) :=h(x(n+1)) + δn+1 x(n) − x(n−1)2

2

  • :=h(x(n+1)) + δn+1

∆2

n+1

and the sequence

(z(n+1))n∈N := (x(n+1), x(n))n∈N ⊂ R2d

generated by iPiano. Furthermore, Hδn is required to satisfy the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point ˜

z of Hδn.

Convergence – Overview

David Stutz | June 2, 2016 11/34

slide-29
SLIDE 29

Definition

Given a, b > 0. H : R2d → R∞ and a sequence (z(n))n∈N ⊂ R2d satisfy: (H1) if for each n ∈ N, it holds

H(z(n+1)) + a∆2

n ≤ H(z(n));

(H2) if for each n ∈ N, there exists w(n+1) ∈ ∂H(z(n+1)) with

w(n+1)2 ≤ b 2(∆n + ∆n+1);

(H3) if there exists a subsequence (z(nj))j∈N with z(nj) → ˜

z = (˜ x, ˜ x)

and H(z(nj)) → H(˜

z) for j → ∞.

Convergence – Requirements

David Stutz | June 2, 2016 12/34

slide-30
SLIDE 30

Definition

Given a, b > 0. H : R2d → R∞ and a sequence (z(n))n∈N ⊂ R2d satisfy: (H1) if for each n ∈ N, it holds

H(z(n+1)) + a∆2

n ≤ H(z(n));

(H2) if for each n ∈ N, there exists w(n+1) ∈ ∂H(z(n+1)) with

w(n+1)2 ≤ b 2(∆n + ∆n+1);

(H3) if there exists a subsequence (z(nj))j∈N with z(nj) → ˜

z = (˜ x, ˜ x)

and H(z(nj)) → H(˜

z) for j → ∞.

Convergence – Requirements

David Stutz | June 2, 2016 12/34

slide-31
SLIDE 31

Definition

Given a, b > 0. H : R2d → R∞ and a sequence (z(n))n∈N ⊂ R2d satisfy: (H1) if for each n ∈ N, it holds

H(z(n+1)) + a∆2

n ≤ H(z(n));

(H2) if for each n ∈ N, there exists w(n+1) ∈ ∂H(z(n+1)) with

w(n+1)2 ≤ b 2(∆n + ∆n+1);

(H3) if there exists a subsequence (z(nj))j∈N with z(nj) → ˜

z = (˜ x, ˜ x)

and H(z(nj)) → H(˜

z) for j → ∞.

Convergence – Requirements

David Stutz | June 2, 2016 12/34

slide-32
SLIDE 32

Lemma Hδn and (z(n))n∈N as generated by iPiano satisfy Condition (H1), in

particular for each n ∈ N it holds

Hδn+1(z(n+1)) + γn∆2

n ≤ Hδn(z(n));

Proof Sketch.

Iteration (Equation (2)) ⇒

w := x(n) − x(n+1) αn − ∇f(x(n)) + βn αn (x(n) − x(n−1)) ∈ ∂g(x(n+1))

Convergence – Requirements, Condition (H1)

David Stutz | June 2, 2016 13/34

slide-33
SLIDE 33

Lemma Hδn and (z(n))n∈N as generated by iPiano satisfy Condition (H1), in

particular for each n ∈ N it holds

Hδn+1(z(n+1)) + γn∆2

n ≤ Hδn(z(n));

Proof Sketch.

Iteration (Equation (2)) ⇒

w := x(n) − x(n+1) αn − ∇f(x(n)) + βn αn (x(n) − x(n−1)) ∈ ∂g(x(n+1))

Convergence – Requirements, Condition (H1)

David Stutz | June 2, 2016 13/34

slide-34
SLIDE 34

Proof Sketch (cont’d).

With w ∈ ∂g(x(n+1)), using the convexity of g,

g(x(n+1)) ≤ g(x(n)) − wT (x(n) − x(n−1)),

and the Ln-Lipschitz continuity of ∇f,

f(x(n+1)) ≤ f(x(n)) − +∇f(x(n))T (x(n+1) − x(n)) + Ln 2 x(n) − x(n+1)2

2;

it can be shown

h(x(n+1)) ≤ h(x(n)) − δn∆2

n+1 + δn∆2 n − γn∆2 n

which implies the claim as δn is monotonically decreasing.

Convergence – Requirements, Condition (H1)

David Stutz | June 2, 2016 14/34

slide-35
SLIDE 35

Lemma Hδn and (z(n))n∈N as generated by iPiano satisfy Condition (H2), i.e. for

each n ∈ N there exists w(n+1) ∈ ∂Hδn+1(z(n+1)) such that

w(n+1)2 ≤ 7

c1 (∆n + ∆n+1).

Proof Sketch.

For w(n+1) ∈ ∂Hδn+1(z(n+1)) it is w(n+1) = (w(n+1)

1

, w(n+1)

2

) with w(n+1)

1

∈ ∂g(x(n+1)) + ∇f(x(n+1)) + 2δn(x(n+1) − x(n)) w(n+1)

2

= −2δn(x(n+1) − x(n))

and

w(n+1)2 ≤ ... ≤ ( 1 αn + 4δn + Ln)∆n+1 + βn αn ∆n ≤ 7 c1 (∆n+1 + ∆n)

Convergence – Requirements, Condition (H2)

David Stutz | June 2, 2016 15/34

slide-36
SLIDE 36

Lemma Hδn and (z(n))n∈N as generated by iPiano satisfy Condition (H2), i.e. for

each n ∈ N there exists w(n+1) ∈ ∂Hδn+1(z(n+1)) such that

w(n+1)2 ≤ 7

c1 (∆n + ∆n+1).

Proof Sketch.

For w(n+1) ∈ ∂Hδn+1(z(n+1)) it is w(n+1) = (w(n+1)

1

, w(n+1)

2

) with w(n+1)

1

∈ ∂g(x(n+1)) + ∇f(x(n+1)) + 2δn(x(n+1) − x(n)) w(n+1)

2

= −2δn(x(n+1) − x(n))

and

w(n+1)2 ≤ ... ≤ ( 1 αn + 4δn + Ln)∆n+1 + βn αn ∆n ≤ 7 c1 (∆n+1 + ∆n)

Convergence – Requirements, Condition (H2)

David Stutz | June 2, 2016 15/34

slide-37
SLIDE 37

Lemma Hδn and (z(n))n∈N as generated by iPiano satisfy Condition (H1), i.e.

there exists a subsequence (z(nj))j∈N with z(nj) → ˜

z = (˜ x, ˜ x) and Hδnj (z(nj)) → Hδ(˜ z) for j → ∞. Proof Sketch.

Claim 1: by summing Condition (H1) and deducing ∞

n=0 ∆2 n < ∞ it can

be shown that limn→∞ ∆n = 0. Claim 2: from the coercivity of h and the Bolzano-Weierstrass theorem it follows the existence of a subsequence (x(nj))j∈N with. Then:

lim

j→∞ Hδnj+1(x(nj+1), x(nj)) = Hδ(˜

x, ˜ x) = h(˜ x).

Convergence – Requirements, Condition (H3)

David Stutz | June 2, 2016 16/34

slide-38
SLIDE 38

Lemma Hδn and (z(n))n∈N as generated by iPiano satisfy Condition (H1), i.e.

there exists a subsequence (z(nj))j∈N with z(nj) → ˜

z = (˜ x, ˜ x) and Hδnj (z(nj)) → Hδ(˜ z) for j → ∞. Proof Sketch.

Claim 1: by summing Condition (H1) and deducing ∞

n=0 ∆2 n < ∞ it can

be shown that limn→∞ ∆n = 0. Claim 2: from the coercivity of h and the Bolzano-Weierstrass theorem it follows the existence of a subsequence (x(nj))j∈N with. Then:

lim

j→∞ Hδnj+1(x(nj+1), x(nj)) = Hδ(˜

x, ˜ x) = h(˜ x).

Convergence – Requirements, Condition (H3)

David Stutz | June 2, 2016 16/34

slide-39
SLIDE 39

The Kurdyka-Lojasiewicz property is intended to relate the behavior of the subdifferential ∂H to the function values.

Definition (Informally)

For a point ˜

z ∈ dom(∂H), H is said to satisfy the Kurdyka-Lojasiewicz

property if there exists a concave φ ∈ C1 with φ(0) = 0 and φ′ > 0 such that

φ′(H(z) − H(˜ z)) inf

ˆ z∈∂H(z) ˆ

z2 ≥ 1

for all z in an appropriate neighborhood of ˜

z.

Intuitively, the inequality controls the difference in function values by the subdifferential.

Convergence – Kurdyka-Lojasiewicz Property

David Stutz | June 2, 2016 17/34

slide-40
SLIDE 40

The Kurdyka-Lojasiewicz property is intended to relate the behavior of the subdifferential ∂H to the function values.

Definition (Informally)

For a point ˜

z ∈ dom(∂H), H is said to satisfy the Kurdyka-Lojasiewicz

property if there exists a concave φ ∈ C1 with φ(0) = 0 and φ′ > 0 such that

φ′(H(z) − H(˜ z)) inf

ˆ z∈∂H(z) ˆ

z2 ≥ 1

for all z in an appropriate neighborhood of ˜

z.

Intuitively, the inequality controls the difference in function values by the subdifferential.

Convergence – Kurdyka-Lojasiewicz Property

David Stutz | June 2, 2016 17/34

slide-41
SLIDE 41

The Kurdyka-Lojasiewicz property is intended to relate the behavior of the subdifferential ∂H to the function values.

Definition (Informally)

For a point ˜

z ∈ dom(∂H), H is said to satisfy the Kurdyka-Lojasiewicz

property if there exists a concave φ ∈ C1 with φ(0) = 0 and φ′ > 0 such that

φ′(H(z) − H(˜ z)) inf

ˆ z∈∂H(z) ˆ

z2 ≥ 1

for all z in an appropriate neighborhood of ˜

z.

Intuitively, the inequality controls the difference in function values by the subdifferential.

Convergence – Kurdyka-Lojasiewicz Property

David Stutz | June 2, 2016 17/34

slide-42
SLIDE 42

The Kurdyka-Lojasiewicz property is intended to relate the behavior of the subdifferential ∂H to the function values.

Definition (Informally)

For a point ˜

z ∈ dom(∂H), H is said to satisfy the Kurdyka-Lojasiewicz

property if there exists a concave φ ∈ C1 with φ(0) = 0 and φ′ > 0 such that

φ′(H(z) − H(˜ z)) inf

ˆ z∈∂H(z) ˆ

z2 ≥ 1

for all z in an appropriate neighborhood of ˜

z.

Intuitively, the inequality controls the difference in function values by the subdifferential.

Convergence – Kurdyka-Lojasiewicz Property

David Stutz | June 2, 2016 17/34

slide-43
SLIDE 43

Theorem

Let H be proper lower semicontinuous, satisfying the Kurdyka-Lojasiewicz property at ˜

z = (˜ x, ˜ x) specified by Condition (H3),

and (z(n))n∈N, satisfying Conditions (H1) - (H3). Then (x(n))n∈N converges to ˜

x such that ˜ z is a critical point of H.

It can further been shown that the convergence rate is O (1/√n) for the residual

r(x) := x − proxg(x − ∇f(x))

in L2 norm.

Convergence – Convergence Theorem

David Stutz | June 2, 2016 18/34

slide-44
SLIDE 44

Theorem

Let H be proper lower semicontinuous, satisfying the Kurdyka-Lojasiewicz property at ˜

z = (˜ x, ˜ x) specified by Condition (H3),

and (z(n))n∈N, satisfying Conditions (H1) - (H3). Then (x(n))n∈N converges to ˜

x such that ˜ z is a critical point of H.

It can further been shown that the convergence rate is O (1/√n) for the residual

r(x) := x − proxg(x − ∇f(x))

in L2 norm.

Convergence – Convergence Theorem

David Stutz | June 2, 2016 18/34

slide-45
SLIDE 45

Proof Sketch.

The proof is based on the following claim:

n

  • i=1

∆i ≤ 1 2(∆0 − ∆n) + b a

  • φ(H(z(1)) − H(˜

z)) − φ(H(z(n+1)) − H(˜ z))

  • which is shown by induction. Then, it follows ∞

n=0 ∆n < ∞ and

x(n) → ˜

  • x. Using the Kurdyka-Lojasiewicz property it can be shown that

H(z(n)) → H(˜ z). With Condition (H2) it also follows that ˜ z is a critical

point of H.

Convergence – Convergence Theorem (cont’d)

David Stutz | June 2, 2016 19/34

slide-46
SLIDE 46

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 20/34

slide-47
SLIDE 47

Remember, derived bounds for α0 and β0:

α0 < 2(1 − β0) L0 ; β0 ≤ b0 − 1 b0 − 1

2

with

b0 := δ−1 + Ln

2

c2 + Ln

2

.

Guessing an appropriate β0 is obviously easier than guessing δ−1, so fix

β0 and estimate L0 using ∇f(x(0)) − ∇f(ˆ x)2 x(0) − ˆ x2 ≤ L0

for ˆ

x = proxg(x(0) − ∇f(x(0))).

Implementation – Initialization

David Stutz | June 2, 2016 21/34

slide-48
SLIDE 48

Remember, derived bounds for α0 and β0:

α0 < 2(1 − β0) L0 ; β0 ≤ b0 − 1 b0 − 1

2

with

b0 := δ−1 + Ln

2

c2 + Ln

2

.

Guessing an appropriate β0 is obviously easier than guessing δ−1, so fix

β0 and estimate L0 using ∇f(x(0)) − ∇f(ˆ x)2 x(0) − ˆ x2 ≤ L0

for ˆ

x = proxg(x(0) − ∇f(x(0))).

Implementation – Initialization

David Stutz | June 2, 2016 21/34

slide-49
SLIDE 49

Remember, derived bounds for α0 and β0:

α0 < 2(1 − β0) L0 ; β0 ≤ b0 − 1 b0 − 1

2

with

b0 := δ−1 + Ln

2

c2 + Ln

2

.

Guessing an appropriate β0 is obviously easier than guessing δ−1, so fix

β0 and estimate L0 using ∇f(x(0)) − ∇f(ˆ x)2 x(0) − ˆ x2 ≤ L0

for ˆ

x = proxg(x(0) − ∇f(x(0))).

Implementation – Initialization

David Stutz | June 2, 2016 21/34

slide-50
SLIDE 50

Remember, derived bounds for α0 and β0:

α0 < 2(1 − β0) L0 ; β0 ≤ b0 − 1 b0 − 1

2

with

b0 := δ−1 + Ln

2

c2 + Ln

2

.

Guessing an appropriate β0 is obviously easier than guessing δ−1, so fix

β0 and estimate L0 using ∇f(x(0)) − ∇f(ˆ x)2 x(0) − ˆ x2 ≤ L0

for ˆ

x = proxg(x(0) − ∇f(x(0))).

Implementation – Initialization

David Stutz | June 2, 2016 21/34

slide-51
SLIDE 51

In practice, fix K ≫ 100 and compute

α(k) := α0 − ka0 − c1 K

with a0 := 2(1 − β0)

(L0 + 2c2) and k = 1, . . . , K

until α(k) satisfies

δ0 := 1 α(k) − L0 2 − β0 2α(k) ≥ γ0 := 1 α(k) − L0 2 − β0 α(k) ≥ c2.

Implementation – Initialization (cont’d)

David Stutz | June 2, 2016 22/34

slide-52
SLIDE 52

In practice, fix K ≫ 100 and compute

α(k) := α0 − ka0 − c1 K

with a0 := 2(1 − β0)

(L0 + 2c2) and k = 1, . . . , K

until α(k) satisfies

δ0 := 1 α(k) − L0 2 − β0 2α(k) ≥ γ0 := 1 α(k) − L0 2 − β0 α(k) ≥ c2.

Implementation – Initialization (cont’d)

David Stutz | June 2, 2016 22/34

slide-53
SLIDE 53

In practice, fix K ≫ 100 and compute

α(k) := α0 − ka0 − c1 K

with a0 := 2(1 − β0)

(L0 + 2c2) and k = 1, . . . , K

until α(k) satisfies

δ0 := 1 α(k) − L0 2 − β0 2α(k) ≥ γ0 := 1 α(k) − L0 2 − β0 α(k) ≥ c2.

Implementation – Initialization (cont’d)

David Stutz | June 2, 2016 22/34

slide-54
SLIDE 54

Given Ln−1 and η > 1, find the smallest l ∈ N such that

Ln := ηlLn−1

(4) satisfies

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2.

Alternatively, instead of Ln−1, use

∇f(x(n−1)) − ∇f(ˆ x)2 x(n−1) − ˆ x2 ≤ Ln

with ˆ

x = proxg(x(n−1) − ∇f(x(n−1))) as starting point for Equation (4).

Implementation – Finding αn and βn

David Stutz | June 2, 2016 23/34

slide-55
SLIDE 55

Given Ln−1 and η > 1, find the smallest l ∈ N such that

Ln := ηlLn−1

(4) satisfies

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2.

Alternatively, instead of Ln−1, use

∇f(x(n−1)) − ∇f(ˆ x)2 x(n−1) − ˆ x2 ≤ Ln

with ˆ

x = proxg(x(n−1) − ∇f(x(n−1))) as starting point for Equation (4).

Implementation – Finding αn and βn

David Stutz | June 2, 2016 23/34

slide-56
SLIDE 56

Given Ln−1 and η > 1, find the smallest l ∈ N such that

Ln := ηlLn−1

(4) satisfies

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2.

Alternatively, instead of Ln−1, use

∇f(x(n−1)) − ∇f(ˆ x)2 x(n−1) − ˆ x2 ≤ Ln

with ˆ

x = proxg(x(n−1) − ∇f(x(n−1))) as starting point for Equation (4).

Implementation – Finding αn and βn

David Stutz | June 2, 2016 23/34

slide-57
SLIDE 57

Given Ln−1 and η > 1, find the smallest l ∈ N such that

Ln := ηlLn−1

(4) satisfies

f(x(n+1)) ≤ f(x(n))+∇f(x(n))T (x(n+1) − x(n)) +Ln 2 x(n+1) − x(n)2

2.

Alternatively, instead of Ln−1, use

∇f(x(n−1)) − ∇f(ˆ x)2 x(n−1) − ˆ x2 ≤ Ln

with ˆ

x = proxg(x(n−1) − ∇f(x(n−1))) as starting point for Equation (4).

Implementation – Finding αn and βn

David Stutz | June 2, 2016 23/34

slide-58
SLIDE 58

Similar to initialization, fix J, K ≫ 100 and compute

β(j)

n

:= bn − 1 bn − 1

2

− j J bn − 1 bn − 1

2

with

bn := δn−1 + Ln

2

c2 + Ln

2

and j = 0, . . . , J,

α(k)

n

:= an − kan − c1 K

with

an := 2(1 − βn) (Ln + 2c2) and k = 1, . . . , K

until β(j)

n

and α(k)

n

satisfy

δn := 1 α(k)

n

− Ln 2 − β(j)

n

2α(k)

n

≥ γn := 1 α(k)

n

− Ln 2 − β(j)

n

α(k)

n

≥ c2

and

δn ≤ δn−1.

Implementation – Finding αn and βn (cont’d)

David Stutz | June 2, 2016 24/34

slide-59
SLIDE 59

Similar to initialization, fix J, K ≫ 100 and compute

β(j)

n

:= bn − 1 bn − 1

2

− j J bn − 1 bn − 1

2

with

bn := δn−1 + Ln

2

c2 + Ln

2

and j = 0, . . . , J,

α(k)

n

:= an − kan − c1 K

with

an := 2(1 − βn) (Ln + 2c2) and k = 1, . . . , K

until β(j)

n

and α(k)

n

satisfy

δn := 1 α(k)

n

− Ln 2 − β(j)

n

2α(k)

n

≥ γn := 1 α(k)

n

− Ln 2 − β(j)

n

α(k)

n

≥ c2

and

δn ≤ δn−1.

Implementation – Finding αn and βn (cont’d)

David Stutz | June 2, 2016 24/34

slide-60
SLIDE 60

Similar to initialization, fix J, K ≫ 100 and compute

β(j)

n

:= bn − 1 bn − 1

2

− j J bn − 1 bn − 1

2

with

bn := δn−1 + Ln

2

c2 + Ln

2

and j = 0, . . . , J,

α(k)

n

:= an − kan − c1 K

with

an := 2(1 − βn) (Ln + 2c2) and k = 1, . . . , K

until β(j)

n

and α(k)

n

satisfy

δn := 1 α(k)

n

− Ln 2 − β(j)

n

2α(k)

n

≥ γn := 1 α(k)

n

− Ln 2 − β(j)

n

α(k)

n

≥ c2

and

δn ≤ δn−1.

Implementation – Finding αn and βn (cont’d)

David Stutz | June 2, 2016 24/34

slide-61
SLIDE 61

Similar to initialization, fix J, K ≫ 100 and compute

β(j)

n

:= bn − 1 bn − 1

2

− j J bn − 1 bn − 1

2

with

bn := δn−1 + Ln

2

c2 + Ln

2

and j = 0, . . . , J,

α(k)

n

:= an − kan − c1 K

with

an := 2(1 − βn) (Ln + 2c2) and k = 1, . . . , K

until β(j)

n

and α(k)

n

satisfy

δn := 1 α(k)

n

− Ln 2 − β(j)

n

2α(k)

n

≥ γn := 1 α(k)

n

− Ln 2 − β(j)

n

α(k)

n

≥ c2

and

δn ≤ δn−1.

Implementation – Finding αn and βn (cont’d)

David Stutz | June 2, 2016 24/34

slide-62
SLIDE 62

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 25/34

slide-63
SLIDE 63

Given a noisy image u(0) : Ω = [0, 1]2 → [0, 1], minimize

h(u; u(0), λ) =

ρ1(u(x) − u(0)(x))dx + λ

ρ2(∇u(x)2)dx

with

ρ1,abs = |x| and ρ1,sqr(x) = x2; ρ2(x) = log

  • 1 + x2

σ2

  • .

ρ1,sqr and ρ2 are differentiable; the proximal mapping of ρ1,abs(x − x(0)) is

proxαρ1,abs(x) = max(0, |x| − α) · sign(x) − x(0).

Denoising – Model

David Stutz | June 2, 2016 26/34

slide-64
SLIDE 64

Given a noisy image u(0) : Ω = [0, 1]2 → [0, 1], minimize

h(u; u(0), λ) =

ρ1(u(x) − u(0)(x))dx + λ

ρ2(∇u(x)2)dx

with

ρ1,abs = |x| and ρ1,sqr(x) = x2; ρ2(x) = log

  • 1 + x2

σ2

  • .

ρ1,sqr and ρ2 are differentiable; the proximal mapping of ρ1,abs(x − x(0)) is

proxαρ1,abs(x) = max(0, |x| − α) · sign(x) − x(0).

Denoising – Model

David Stutz | June 2, 2016 26/34

slide-65
SLIDE 65

Given a noisy image u(0) : Ω = [0, 1]2 → [0, 1], minimize

h(u; u(0), λ) =

ρ1(u(x) − u(0)(x))dx + λ

ρ2(∇u(x)2)dx

with

ρ1,abs = |x| and ρ1,sqr(x) = x2; ρ2(x) = log

  • 1 + x2

σ2

  • .

ρ1,sqr and ρ2 are differentiable; the proximal mapping of ρ1,abs(x − x(0)) is

proxαρ1,abs(x) = max(0, |x| − α) · sign(x) − x(0).

Denoising – Model

David Stutz | June 2, 2016 26/34

slide-66
SLIDE 66

Given a noisy image u(0) : Ω = [0, 1]2 → [0, 1], minimize

h(u; u(0), λ) =

ρ1(u(x) − u(0)(x))dx + λ

ρ2(∇u(x)2)dx

with

ρ1,abs = |x| and ρ1,sqr(x) = x2; ρ2(x) = log

  • 1 + x2

σ2

  • .

ρ1,sqr and ρ2 are differentiable; the proximal mapping of ρ1,abs(x − x(0)) is

proxαρ1,abs(x) = max(0, |x| − α) · sign(x) − x(0).

Denoising – Model

David Stutz | June 2, 2016 26/34

slide-67
SLIDE 67

ρ1,abs λ = 0.2 ρ1,sqr λ = 0.2 ρ1,abs λ = 0.4 ρ1,sqr λ = 0.4 ρ1,abs λ = 0.6 ρ1,sqr λ = 0.6 ρ1,abs λ = 0.8 ρ1,sqr λ = 0.8

Figure: Signal denoising experiment; input signal shown on the left with the perturbed/noisy signal on its right. Results using ρ1,abs and ρ1,sqr with λ ∈ {0.2, 0.4, 0.6, 0.8} are shown.

Denoising – Results

David Stutz | June 2, 2016 27/34

slide-68
SLIDE 68

50 100 150 100 200

n

iPiano h(x(n)) Ln 50 100 150 0.2 0.4

n

iPiano αn βn ∆n

Figure: Convergence of iPiano. Shown is the value of the objective function h(x(n)) for each iterate x(n), n ≥ 0, as well as the corresponding parameters αn, βn and Ln. Furthermore, ∆n := x(n) − x(n−1)2 is shown.

Denoising – Convergence

David Stutz | June 2, 2016 28/34

slide-69
SLIDE 69

Figure: Image denoising experiment; noisy image in the top row, ρ1,abs in the middle row and ρ1,sqr in the bottom row.

Denoising – Results (cont’d)

David Stutz | June 2, 2016 29/34

slide-70
SLIDE 70

Binary segmentation based on an approximation of the Mumford-Shah model [MS89, She05]; u : [0, 1]2 → [−1, 1]:

hǫ(u; c+, c−, u(0), λ) =

  • 9ǫ∇u(x)2

2 + (1 − u(x)2)2

64ǫ

  • dx

+ λ

1 + u(x) 2 2 (u(0)(x) − c+)2dx + λ

1 − u(x) 2 2 (u(0)(x) − c−)2dx.

(It can be shown, that for ǫ → 0,

  • 9ǫ∇u(x)2

2 + (1 − u(x)2)2

64ǫ

  • dx

approximates |u|BV .)

Binary Segmentation – Model

David Stutz | June 2, 2016 30/34

slide-71
SLIDE 71

Binary segmentation based on an approximation of the Mumford-Shah model [MS89, She05]; u : [0, 1]2 → [−1, 1]:

hǫ(u; c+, c−, u(0), λ) =

  • 9ǫ∇u(x)2

2 + (1 − u(x)2)2

64ǫ

  • dx

+ λ

1 + u(x) 2 2 (u(0)(x) − c+)2dx + λ

1 − u(x) 2 2 (u(0)(x) − c−)2dx.

(It can be shown, that for ǫ → 0,

  • 9ǫ∇u(x)2

2 + (1 − u(x)2)2

64ǫ

  • dx

approximates |u|BV .)

Binary Segmentation – Model

David Stutz | June 2, 2016 30/34

slide-72
SLIDE 72

Binary segmentation based on an approximation of the Mumford-Shah model [MS89, She05]; u : [0, 1]2 → [−1, 1]:

hǫ(u; c+, c−, u(0), λ) =

  • 9ǫ∇u(x)2

2 + (1 − u(x)2)2

64ǫ

  • dx

+ λ

1 + u(x) 2 2 (u(0)(x) − c+)2dx + λ

1 − u(x) 2 2 (u(0)(x) − c−)2dx.

(It can be shown, that for ǫ → 0,

  • 9ǫ∇u(x)2

2 + (1 − u(x)2)2

64ǫ

  • dx

approximates |u|BV .)

Binary Segmentation – Model

David Stutz | June 2, 2016 30/34

slide-73
SLIDE 73

Figure: Segmentation results for thresholds τ = −0.2, 0.0, 0.2 and using gsqr; the foreground segment Sf is depicted in white.

Binary Segmentation – Results (cont’d)

David Stutz | June 2, 2016 31/34

slide-74
SLIDE 74

1

Problem

2

Related Work

3

Algorithm

4

Convergence

5

Implementation

6

Applications

7

Conclusion

Table of Contents

David Stutz | June 2, 2016 32/34

slide-75
SLIDE 75

We discussed the minimization of composite functions of the form

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x)).

Ochs et al. [OCBP14] proposed the iPiano algorithm to solve this problem under to following requirements: – g proper closed convex and lower semi continuous; – f ∈ C1 with L-Lipschitz continuous ∇f; – h coercive and bounded below; – and Hδn(x(n), x(n−1)) = h(x(n)) + δn∆n satisfying the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point. The algorithm can be implemented efficiently in C++ and used to solve image processing tasks.

Conclusion

David Stutz | June 2, 2016 33/34

slide-76
SLIDE 76

We discussed the minimization of composite functions of the form

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x)).

Ochs et al. [OCBP14] proposed the iPiano algorithm to solve this problem under to following requirements: – g proper closed convex and lower semi continuous; – f ∈ C1 with L-Lipschitz continuous ∇f; – h coercive and bounded below; – and Hδn(x(n), x(n−1)) = h(x(n)) + δn∆n satisfying the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point. The algorithm can be implemented efficiently in C++ and used to solve image processing tasks.

Conclusion

David Stutz | June 2, 2016 33/34

slide-77
SLIDE 77

We discussed the minimization of composite functions of the form

min

x∈Rd h(x) = min n∈Rd(f(x) + g(x)).

Ochs et al. [OCBP14] proposed the iPiano algorithm to solve this problem under to following requirements: – g proper closed convex and lower semi continuous; – f ∈ C1 with L-Lipschitz continuous ∇f; – h coercive and bounded below; – and Hδn(x(n), x(n−1)) = h(x(n)) + δn∆n satisfying the Kurdyka-Lojasiewicz property [Loj93, Kur98] at a critical point. The algorithm can be implemented efficiently in C++ and used to solve image processing tasks.

Conclusion

David Stutz | June 2, 2016 33/34

slide-78
SLIDE 78

Definition H has the Kurdyka-Lojasiewicz property at point ˜ z ∈ dom(∂H) there

exist η ∈ (0, ∞], a neighborhood U of ˜

z, and a continuous concave

function φ : [0, η) → R+ such that – φ ∈ C1((0, η)), φ(0) = 0, and for all s ∈ (0, η), φ′(s) > 0; – and for all z ∈ U ∩ {z ∈ R2d|H(˜

z) < H(z) < H(˜ z) + η} the

Kurdyka-Lojasiewicz inequality holds:

φ′(H(z) − H(˜ z)) inf

ˆ z∈∂H(z) ˆ

z2 ≥ 1.

Intuitively, for H ∈ C1, this means that φ has to be steep around critical points ˜

z of H where ∇H is flat.

Appendix – Kurdyka-Lojasiewicz Property

David Stutz | June 2, 2016 34/34

slide-79
SLIDE 79

Definition H has the Kurdyka-Lojasiewicz property at point ˜ z ∈ dom(∂H) there

exist η ∈ (0, ∞], a neighborhood U of ˜

z, and a continuous concave

function φ : [0, η) → R+ such that – φ ∈ C1((0, η)), φ(0) = 0, and for all s ∈ (0, η), φ′(s) > 0; – and for all z ∈ U ∩ {z ∈ R2d|H(˜

z) < H(z) < H(˜ z) + η} the

Kurdyka-Lojasiewicz inequality holds:

φ′(H(z) − H(˜ z)) inf

ˆ z∈∂H(z) ˆ

z2 ≥ 1.

Intuitively, for H ∈ C1, this means that φ has to be steep around critical points ˜

z of H where ∇H is flat.

Appendix – Kurdyka-Lojasiewicz Property

David Stutz | June 2, 2016 34/34

slide-80
SLIDE 80

Krzysztof Kurdyka. On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier, 48(3):769–783, 1998. Stanislas Lojasiewicz. Sur la géométrie semi- et sous- analytique. Annales de l’institut Fourier, 43(5):1575–1595, 1993. David Mumford and Jayant Shah. Optimal approximations by piecewise smooth functions and associated variational problems.

  • Comm. on Pure and Applied Mathematics, 42(5):577–685, 1989.

Peter Ochs, Yunjin Chen, Thomas Brox, and Thomas Pock. ipiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sciences, 7(2):1388–1419, 2014. Jianhong Shen. Gamma-convergence approximation to piecewise constant mumford-shah segmentation.

David Stutz | June 2, 2016 34/34

slide-81
SLIDE 81

In Advanced Concepts for Intelligent Vision Systems, International Conference on, volume 3708 of Lecture Notes in Computer Science, pages 499–506, Antwerpen, Belgium, September 2005. Springer.

David Stutz | June 2, 2016 34/34