Iterative regularization via dual diagonal descent Silvia Villa - - PowerPoint PPT Presentation

iterative regularization via dual diagonal descent
SMART_READER_LITE
LIVE PREVIEW

Iterative regularization via dual diagonal descent Silvia Villa - - PowerPoint PPT Presentation

Iterative regularization via dual diagonal descent Silvia Villa Department of Mathematics University of Genoa IHP, Paris, April 1 st , 2019 S. Villa (Unige) Dual iterative regularization 1 / 44 Outline Introduction and motivation S. Villa


slide-1
SLIDE 1

Iterative regularization via dual diagonal descent

Silvia Villa

Department of Mathematics University of Genoa

IHP, Paris, April 1st, 2019

  • S. Villa (Unige)

Dual iterative regularization 1 / 44

slide-2
SLIDE 2

Outline

Introduction and motivation

  • S. Villa (Unige)

Dual iterative regularization 2 / 44

slide-3
SLIDE 3

Outline

Introduction and motivation Part I: Quadratic data fit Joint work with: S.Matet, L. Rosasco, B.C. V˜ u Part II: General data fit Joint work with: L. Calatroni, G. Garrigos, L. Rosasco

  • S. Villa (Unige)

Dual iterative regularization 2 / 44

slide-4
SLIDE 4

Introduction and motivation

Inverse problems

H and G Hilbert spaces A: H → G linear and bounded Goal Let y ∈ G, approximate the solution of Ax = y, assuming that a solution exists.

  • S. Villa (Unige)

Dual iterative regularization 3 / 44

slide-5
SLIDE 5

Introduction and motivation

Inverse problems

H and G Hilbert spaces A: H → G linear and bounded Goal Let y ∈ G, approximate the solution of Ax = y, assuming that a solution exists. Selection principle: given R : H → R ∪ {+∞} strongly convex, select x†, the solution of min R(x) s.t. Ax = y

  • S. Villa (Unige)

Dual iterative regularization 3 / 44

slide-6
SLIDE 6

Introduction and motivation

Noisy data

We do not know y ∈ G. We have access only to y ∈ G such that

  • y − y ≤ δ,

δ > 0. Goal: find a stable approximation of x† using only y.

  • S. Villa (Unige)

Dual iterative regularization 4 / 44

slide-7
SLIDE 7

Introduction and motivation

Noisy data

We do not know y ∈ G. We have access only to y ∈ G such that

  • y − y ≤ δ,

δ > 0. Goal: find a stable approximation of x† using only y. Constraint Ax = y can be replaced by x ∈ argmin

x′

D(Ax′, y) for a data fit function D.

  • S. Villa (Unige)

Dual iterative regularization 4 / 44

slide-8
SLIDE 8

Introduction and motivation

Regularization

  • riginal image x

y = Ax ˆ y ˆ x†

  • S. Villa (Unige)

Dual iterative regularization 5 / 44

slide-9
SLIDE 9

Introduction and motivation

Regularization

  • riginal image x

y = Ax ˆ y ˆ x† Regularization is needed

  • S. Villa (Unige)

Dual iterative regularization 5 / 44

slide-10
SLIDE 10

Introduction and motivation

Tikhonov regularization

minimize

x∈argmin D(A·,ˆ y) R(x)

→ minimize

x∈H

1 λD(Ax, ˆ y) + R(x)

  • S. Villa (Unige)

Dual iterative regularization 6 / 44

slide-11
SLIDE 11

Introduction and motivation

Tikhonov regularization

minimize

x∈argmin D(A·,ˆ y) R(x)

→ minimize

x∈H

1 λD(Ax, ˆ y) + R(x) How to choose λ?

  • S. Villa (Unige)

Dual iterative regularization 6 / 44

slide-12
SLIDE 12

Introduction and motivation

Tikhonov regularization

minimize

x∈argmin D(A·,ˆ y) R(x)

→ minimize

x∈H

1 λD(Ax, ˆ y) + R(x) How to choose λ? Theorem Let D(Ax, y) = Ax − y2 and let xλ be the solution of the regularized

  • problem. Suppose that there exists q ∈ G such that A∗q ∈ ∂R(x†). Then
  • xλ − x† ≤ C

δ √ λ + √ δ + √ λ

  • Choosing λδ ∼ δ, we derive
  • xλδ − x† ≤ C

√ δ.

[Burger-Osher, Convergence rates of convex variational regularization, 2004]

  • S. Villa (Unige)

Dual iterative regularization 6 / 44

slide-13
SLIDE 13

Introduction and motivation

Tikhonov regularization

What about computations?

  • S. Villa (Unige)

Dual iterative regularization 7 / 44

slide-14
SLIDE 14

Introduction and motivation

Tikhonov regularization

What about computations? Tikhonov regularization in practice: choose an interval [λmin, λmax]

  • ptimization: approximately solve the regularized problem for

λ ∈ [λmin, λmax] up to a certain accuracy ǫ parameter selection: select the best λ according to a validation criterion

  • S. Villa (Unige)

Dual iterative regularization 7 / 44

slide-15
SLIDE 15

Introduction and motivation

Tikhonov regularization

What about computations? Tikhonov regularization in practice: choose an interval [λmin, λmax]

  • ptimization: approximately solve the regularized problem for

λ ∈ [λmin, λmax] up to a certain accuracy ǫ parameter selection: select the best λ according to a validation criterion Another point of view: integrate REGULARIZATION and OPTIMIZATION

  • S. Villa (Unige)

Dual iterative regularization 7 / 44

slide-16
SLIDE 16

Introduction and motivation

Iterative regularization

A (new) old idea

Solve: min

Ax=y R(x)

.

  • S. Villa (Unige)

Dual iterative regularization 8 / 44

slide-17
SLIDE 17

Introduction and motivation

Iterative regularization

A (new) old idea

Solve: min

Ax= y R(x)

and early stop the iterations.

  • S. Villa (Unige)

Dual iterative regularization 8 / 44

slide-18
SLIDE 18

Introduction and motivation

Iterative regularization

A (new) old idea

Solve: min

Ax= y R(x)

and early stop the iterations. An old idea in inverse problems for R = · 2/2: Landweber [Engl-Hanke-Neubauer, inverse problems] Recently revisited: [Osher-Burger-Yin-Cai-Resmerita-He.....∼ 2000s]

  • S. Villa (Unige)

Dual iterative regularization 8 / 44

slide-19
SLIDE 19

Introduction and motivation

Iterative regularization: idea of the proof

1 Choose a convergent algorithm to solve

min

Ax=y R(x).

Call the iterates (xt)t∈N.

2 Apply the same algorithm to

min

Ax= y R(x).

Call the iterates ( xt)t∈N.

3 Then

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • S. Villa (Unige)

Dual iterative regularization 9 / 44

slide-20
SLIDE 20

Introduction and motivation

Iterative regularization: idea of the proof

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization

xt − → xt+1 − → . . .

  • S. Villa (Unige)

Dual iterative regularization 10 / 44

slide-21
SLIDE 21

Introduction and motivation

Iterative regularization: idea of the proof

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization

xt − → xt+1 − → . . . − → x†

  • S. Villa (Unige)

Dual iterative regularization 10 / 44

slide-22
SLIDE 22

Introduction and motivation

Iterative regularization: idea of the proof

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization

xt − → xt+1 − → . . . − → x†

  • xt

ց

  • S. Villa (Unige)

Dual iterative regularization 10 / 44

slide-23
SLIDE 23

Introduction and motivation

Iterative regularization: idea of the proof

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization

xt − → xt+1 − → . . . − → x†

  • xt

ց

  • xt+1

ց . . . ց a solution of the noisy problem (if it exists)

  • S. Villa (Unige)

Dual iterative regularization 10 / 44

slide-24
SLIDE 24

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 11 / 44

slide-25
SLIDE 25

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 12 / 44

slide-26
SLIDE 26

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 13 / 44

slide-27
SLIDE 27

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 14 / 44

slide-28
SLIDE 28

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 15 / 44

slide-29
SLIDE 29

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 16 / 44

slide-30
SLIDE 30

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 17 / 44

slide-31
SLIDE 31

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 18 / 44

slide-32
SLIDE 32

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-33
SLIDE 33

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-34
SLIDE 34

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-35
SLIDE 35

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-36
SLIDE 36

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-37
SLIDE 37

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-38
SLIDE 38

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-39
SLIDE 39

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-40
SLIDE 40

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-41
SLIDE 41

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-42
SLIDE 42

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-43
SLIDE 43

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-44
SLIDE 44

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-45
SLIDE 45

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-46
SLIDE 46

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-47
SLIDE 47

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-48
SLIDE 48

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-49
SLIDE 49

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-50
SLIDE 50

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-51
SLIDE 51

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-52
SLIDE 52

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-53
SLIDE 53

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-54
SLIDE 54

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-55
SLIDE 55

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-56
SLIDE 56

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-57
SLIDE 57

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-58
SLIDE 58

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-59
SLIDE 59

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-60
SLIDE 60

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-61
SLIDE 61

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-62
SLIDE 62

Introduction and motivation

Iterative regularization at work

Recall that

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • riginal image

ˆ xt ˆ y

  • S. Villa (Unige)

Dual iterative regularization 19 / 44

slide-63
SLIDE 63

Quadratic data fit Derivation of the algorithm and convergence results

Dual problem

min

Ax=y R(x)

← → min

x∈H R(x) + ι{y}(Ax),

where ι{y}(x) = 0 if x = y and ι{y}(x) = +∞ otherwise.

  • S. Villa (Unige)

Dual iterative regularization 20 / 44

slide-64
SLIDE 64

Quadratic data fit Derivation of the algorithm and convergence results

Dual problem

min

Ax=y R(x)

← → min

x∈H R(x) + ι{y}(Ax),

where ι{y}(x) = 0 if x = y and ι{y}(x) = +∞ otherwise. The dual problem is min

u∈G d(u),

d(u) = R∗(−A∗u) + y, u. R strongly convex ⇒ the dual is smooth

  • S. Villa (Unige)

Dual iterative regularization 20 / 44

slide-65
SLIDE 65

Quadratic data fit Derivation of the algorithm and convergence results

Dual gradient descent

We can apply gradient descent to the dual problem:

  • xt = ∇R∗(−A∗ut)

ut+1 = ut + γ(Axt − y)

  • S. Villa (Unige)

Dual iterative regularization 21 / 44

slide-66
SLIDE 66

Quadratic data fit Derivation of the algorithm and convergence results

Dual gradient descent

We can apply gradient descent to the dual problem:

  • xt = ∇R∗(−A∗ut)

ut+1 = ut + γ(Axt − y) A.k.a. linearized Bregman iteration [Yin-Osher-Burger, several papers,

Bachmayr-Burger, 2005]

  • S. Villa (Unige)

Dual iterative regularization 21 / 44

slide-67
SLIDE 67

Quadratic data fit Derivation of the algorithm and convergence results

Dual gradient descent

We can apply gradient descent to the dual problem:

  • xt = ∇R∗(−A∗ut)

ut+1 = ut + γ(Axt − y) A.k.a. linearized Bregman iteration [Yin-Osher-Burger, several papers,

Bachmayr-Burger, 2005]

When R = · 2/2, it becomes the Landweber algorithm xt+1 = (I − γA∗A)xt + γA∗y

  • S. Villa (Unige)

Dual iterative regularization 21 / 44

slide-68
SLIDE 68

Quadratic data fit Derivation of the algorithm and convergence results

Dual gradient descent

We can apply gradient descent to the dual problem:

  • xt = ∇R∗(−A∗ut)

ut+1 = ut + γ(Axt − y) A.k.a. linearized Bregman iteration [Yin-Osher-Burger, several papers,

Bachmayr-Burger, 2005]

When R = · 2/2, it becomes the Landweber algorithm xt+1 = (I − γA∗A)xt + γA∗y Gradient method applied to (1/2)Ax − y2

  • S. Villa (Unige)

Dual iterative regularization 21 / 44

slide-69
SLIDE 69

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent

We can apply an accelerated gradient descent to the dual problem:            xt = ∇R∗(−A∗ut) zt = ∇R∗ − A∗wt

  • wt = ut + αt(ut − ut−1)

ut+1 = wt + γ(Azt − y) , αt = t − 1 t + α, α ≥ 2

  • S. Villa (Unige)

Dual iterative regularization 22 / 44

slide-70
SLIDE 70

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent

We can apply an accelerated gradient descent to the dual problem:            xt = ∇R∗(−A∗ut) zt = ∇R∗ − A∗wt

  • wt = ut + αt(ut − ut−1)

ut+1 = wt + γ(Azt − y) , αt = t − 1 t + α, α ≥ 2 When R = · 2/2, it becomes an accelerated Landweber algorithm

  • zt = xt + αt(xt − xt−1)

xt+1 = zt − γA∗(Azt − y)

  • S. Villa (Unige)

Dual iterative regularization 22 / 44

slide-71
SLIDE 71

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent

We can apply an accelerated gradient descent to the dual problem:            xt = ∇R∗(−A∗ut) zt = ∇R∗ − A∗wt

  • wt = ut + αt(ut − ut−1)

ut+1 = wt + γ(Azt − y) , αt = t − 1 t + α, α ≥ 2 When R = · 2/2, it becomes an accelerated Landweber algorithm

  • zt = xt + αt(xt − xt−1)

xt+1 = zt − γA∗(Azt − y) Accelerated gradient applied to (1/2)Ax − y2

  • S. Villa (Unige)

Dual iterative regularization 22 / 44

slide-72
SLIDE 72

Quadratic data fit Derivation of the algorithm and convergence results

A technical condition

1 Existence of the solution of the dual (for the exact y) needed for

convergence rates

2 From convergence on the dual to convergence on the primal

Qualification (source) condition (Only for the exact datum) There exists q ∈ G such that A∗q ∈ ∂R(x†)

  • S. Villa (Unige)

Dual iterative regularization 23 / 44

slide-73
SLIDE 73

Quadratic data fit Derivation of the algorithm and convergence results

A technical condition

1 Existence of the solution of the dual (for the exact y) needed for

convergence rates

2 From convergence on the dual to convergence on the primal

Qualification (source) condition (Only for the exact datum) There exists q ∈ G such that A∗q ∈ ∂R(x†) Same condition needed for establishing rates for Tikhonov regularization.

  • S. Villa (Unige)

Dual iterative regularization 23 / 44

slide-74
SLIDE 74

Quadratic data fit Derivation of the algorithm and convergence results

Dual gradient descent is a regularization method

Theorem (Matet-Rosasco-V.-Vu, 2017 ) Assume that there ∃q ∈ G such that A∗q ∈ ∂R(x†). Let u† be a solution

  • f the dual problem. For every δ > 0 there exists tδ ∼ δ−1 such that
  • xtδ − x† δ1/2
  • S. Villa (Unige)

Dual iterative regularization 24 / 44

slide-75
SLIDE 75

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent is a regularization method

Theorem (Matet-Rosasco-V.-Vu, 2017 ) Assume that there ∃q ∈ G such that A∗q ∈ ∂R(x†). Let u† be a solution

  • f the dual problem. For every δ > 0 there exists tδ ∼ δ−1/2 such that
  • xtδ − x† δ1/2

Based on the results of [Aujol-Dossal, 2016]

  • S. Villa (Unige)

Dual iterative regularization 25 / 44

slide-76
SLIDE 76

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent is a regularization method

Theorem (Matet-Rosasco-V.-Vu, 2017 ) Assume that there ∃q ∈ G such that A∗q ∈ ∂R(x†). Let u† be a solution

  • f the dual problem. For every δ > 0 there exists tδ ∼ δ−1/2 such that
  • xtδ − x† δ1/2

Based on the results of [Aujol-Dossal, 2016] For R = · 2/2 see also [A. Neubauer 2016]

  • S. Villa (Unige)

Dual iterative regularization 25 / 44

slide-77
SLIDE 77

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent is a regularization method

Theorem (Matet-Rosasco-V.-Vu, 2017 ) Assume that there ∃q ∈ G such that A∗q ∈ ∂R(x†). Let u† be a solution

  • f the dual problem. For every δ > 0 there exists tδ ∼ δ−1/2 such that
  • xtδ − x† δ1/2

Based on the results of [Aujol-Dossal, 2016] For R = · 2/2 see also [A. Neubauer 2016] What is the difference?

  • S. Villa (Unige)

Dual iterative regularization 25 / 44

slide-78
SLIDE 78

Quadratic data fit Derivation of the algorithm and convergence results

Accelerated dual gradient descent is a regularization method

Theorem (Matet-Rosasco-V.-Vu, 2017 ) Assume that there ∃q ∈ G such that A∗q ∈ ∂R(x†). Let u† be a solution

  • f the dual problem. For every δ > 0 there exists tδ ∼ δ−1/2 such that
  • xtδ − x† δ1/2

Based on the results of [Aujol-Dossal, 2016] For R = · 2/2 see also [A. Neubauer 2016] What is the difference? Gradient descent: tδ ∼ δ−1 Accelerated gradient descent: tδ ∼ δ−1/2

  • S. Villa (Unige)

Dual iterative regularization 25 / 44

slide-79
SLIDE 79

General data fit Dual descent algorithm

General data fit

If D(Ax, y) = Ax − y2 the previous approach does not work. Tikhonov regularization: original hierarchical problem is replaced by minimize 1 λD(Ax, y) + R(x), for a suitable λ > 0, and an algorithm is chosen to compute xt+1 = Algo(xt, λ).

  • S. Villa (Unige)

Dual iterative regularization 26 / 44

slide-80
SLIDE 80

General data fit Dual descent algorithm

General data fit

If D(Ax, y) = Ax − y2 the previous approach does not work. Tikhonov regularization: original hierarchical problem is replaced by minimize 1 λD(Ax, y) + R(x), for a suitable λ > 0, and an algorithm is chosen to compute xt+1 = Algo(xt, λ). A diagonal approach[Lemaire 80s-90s] xt+1 = Algo(xt, λt), with λt → 0.

  • S. Villa (Unige)

Dual iterative regularization 26 / 44

slide-81
SLIDE 81

General data fit Dual descent algorithm

A picture

The previous approach allows to describe: A diagonal strategy A warm restart strategy

  • S. Villa (Unige)

Dual iterative regularization 27 / 44

slide-82
SLIDE 82

General data fit Dual descent algorithm

A dual approach

Diagonal forward-backward: [Attouch, Cabot, Czarnecki, Peypouquet ...]

  • S. Villa (Unige)

Dual iterative regularization 28 / 44

slide-83
SLIDE 83

General data fit Dual descent algorithm

A dual approach

Diagonal forward-backward: [Attouch, Cabot, Czarnecki, Peypouquet ...] Not well-suited if D is not smooth Require to know the conditioning of D(A·; y) (might not exists)

  • S. Villa (Unige)

Dual iterative regularization 28 / 44

slide-84
SLIDE 84

General data fit Dual descent algorithm

A dual approach

Diagonal forward-backward: [Attouch, Cabot, Czarnecki, Peypouquet ...] Not well-suited if D is not smooth Require to know the conditioning of D(A·; y) (might not exists) min R(x) − → 1 λD(Ax, y) + R(x) s.t. D(Ax, y) = 0 ↑ ↓ minu∈G u, y + R∗(−A∗u)

  • =d(u)

← − 1 λD∗(λu, y) + R∗(−A∗u)

  • =dλ(u)

.

  • S. Villa (Unige)

Dual iterative regularization 28 / 44

slide-85
SLIDE 85

General data fit Dual descent algorithm

Dual diagonal descent algorithm (3D)

If R = F + (σR/2) · 2 is strongly convex: dλ(u) = R∗(−A∗u)

  • smooth

+ 1 λD∗(λu, y)

  • nonsmooth

We can use the forward-backward splitting algorithm on the dual. u0 ∈ G, λt → 0, τ = σR/A2 zt+1 = ut + τA∇R∗(−A∗ut) ut+1 = proxτλ−1

t

D∗(λt·,y)(zt+1).

  • S. Villa (Unige)

Dual iterative regularization 29 / 44

slide-86
SLIDE 86

General data fit Dual descent algorithm

Dual diagonal descent algorithm (3D)

If R = F + (σR/2) · 2 is strongly convex: dλ(u) = R∗(−A∗u)

  • smooth

+ 1 λD∗(λu, y)

  • nonsmooth

We can use the forward-backward splitting algorithm on the dual. u0 ∈ G, λt → 0, τ = σR/A2 zt+1 = ut + τA∇R∗(−A∗ut) ut+1 = zt+1 − τ prox(τλt)−1D(·,y)

  • τ −1zt+1
  • S. Villa (Unige)

Dual iterative regularization 29 / 44

slide-87
SLIDE 87

General data fit Dual descent algorithm

Dual diagonal descent algorithm (3D)

If R = F + (σR/2) · 2 is strongly convex: dλ(u) = R∗(−A∗u)

  • smooth

+ 1 λD∗(λu, y)

  • nonsmooth

We can use the forward-backward splitting algorithm on the dual. u0 ∈ G, λt → 0, τ = σR/A2 xt = ∇R∗(−A∗ut) = proxσ−1

R F(−A∗ut)

zt+1 = ut + τAxt ut+1 = zt+1 − τ prox(τλt)−1D(·,y)

  • τ −1zt+1
  • S. Villa (Unige)

Dual iterative regularization 29 / 44

slide-88
SLIDE 88

General data fit Dual descent algorithm

Convergence of diagonal dual descent algorithm

AD1) D : G × G → [0, +∞] and D(u, y) = 0 ⇐ ⇒ u = y.

  • S. Villa (Unige)

Dual iterative regularization 30 / 44

slide-89
SLIDE 89

General data fit Dual descent algorithm

Convergence of diagonal dual descent algorithm

AD1) D : G × G → [0, +∞] and D(u, y) = 0 ⇐ ⇒ u = y. AD2) Let p ∈ [1, +∞]. D(·, y) is p-well conditioned

  • S. Villa (Unige)

Dual iterative regularization 30 / 44

slide-90
SLIDE 90

General data fit Dual descent algorithm

Convergence of diagonal dual descent algorithm

AD1) D : G × G → [0, +∞] and D(u, y) = 0 ⇐ ⇒ u = y. AD2) Let p ∈ [1, +∞]. D(·, y) is p-well conditioned AR) There exists a solution ¯ x such that A¯ x = y ¯ x ∈ domR.

  • S. Villa (Unige)

Dual iterative regularization 30 / 44

slide-91
SLIDE 91

General data fit Dual descent algorithm

Convergence of diagonal dual descent algorithm

AD1) D : G × G → [0, +∞] and D(u, y) = 0 ⇐ ⇒ u = y. AD2) Let p ∈ [1, +∞]. D(·, y) is p-well conditioned AR) There exists a solution ¯ x such that A¯ x = y ¯ x ∈ domR. Theorem [Garrigos-Rosasco-V. 2017] Suppose that (λt)1/(p−1) ∈ ℓ1(N). Let x† be the solution of (P). Assume that there exists q ∈ G such that A∗q ∈ ∂R(x†) Then xt − x† = o(t−1/2)

  • S. Villa (Unige)

Dual iterative regularization 30 / 44

slide-92
SLIDE 92

General data fit Dual descent algorithm

Stability

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization
  • S. Villa (Unige)

Dual iterative regularization 31 / 44

slide-93
SLIDE 93

General data fit Dual descent algorithm

Stability

  • xt − x† ≤

xt − xt

  • stability

+ xt − x†

  • ptimization

Stability Theorem [Garrigos-Rosasco-V. 2017] Assume that the source/qualification condition holds. Let ˆ y ∈ Y , with ˆ y − y ≤ δ. Let (ˆ xt, ˆ ut) be the sequence generated by the (3D) algorithm with y = ˆ y and ˆ u0 = u0. Suppose that (λt)1/(p−1) ∈ ℓ1(N). Then xt − ˆ xt ≤ Cδt. For simplicity here D(u, y) = L(u − y). But this is not needed.

  • S. Villa (Unige)

Dual iterative regularization 31 / 44

slide-94
SLIDE 94

General data fit Dual descent algorithm

Stability with respect to errors = iterative regularization results

Theorem (Early-stopping) [Garrigos-Rosasco-V. 2017] Assume that the source/qualification condition holds. Let ˆ y ∈ Y , with ˆ y − y ≤ δ. Let (ˆ xt, ˆ ut) be the sequence generated by the (3D) algorithm with y = ˆ y and ˆ u0 = u0. Suppose that (λt)1/(p−1) ∈ ℓ1(N). Then there exists an early stopping rule t(δ) ∼ δ−2/3 which verifies ˆ xt(δ) − x† = O(δ

1 3 ) when δ → 0.

  • S. Villa (Unige)

Dual iterative regularization 32 / 44

slide-95
SLIDE 95

General data fit Accelerated dual descent algorithm

Accelerated dual diagonal descent algorithm (A3D)

If R = F + (σR/2) · 2 is strongly convex: dλ(u) = R∗(−A∗u)

  • smooth

+ 1 λD∗(λu, y)

  • nonsmooth

We can use the accelerated forward-backward splitting algorithm on the dual. u0 ∈ G, λt → 0, τ = σR/A2 xt = proxσ−1

R F(−A∗ut)

st = proxσ−1

R F(−A∗wt)

wt = ut + αt(ut − ut−1) zt+1 = wt + τAst ut+1 = zt+1 − τ prox(τλt)−1D(·,y)

  • τ −1zt+1
  • S. Villa (Unige)

Dual iterative regularization 33 / 44

slide-96
SLIDE 96

General data fit Accelerated dual descent algorithm

(A3D) is a regularization method

Theorem (Early-stopping) [Calatroni-Garrigos-Rosasco-V. 2019] Assume that the source/qualification condition holds. Let ˆ y ∈ Y , with ˆ y − y ≤ δ. Let (ˆ xt, ˆ ut) be the sequence generated by the (A3D) algorithm with y = ˆ y and ˆ u0 = u0. Suppose that (tλ1/(p−1)

t

) ∈ ℓ1(N) Then there exists an early stopping rule t(δ) ∼ δ−1/2 which verifies ˆ xt(δ) − x† = O(δ

1 2 ) when δ → 0.

  • S. Villa (Unige)

Dual iterative regularization 34 / 44

slide-97
SLIDE 97

General data fit Accelerated dual descent algorithm

(A3D) is a regularization method

Theorem (Early-stopping) [Calatroni-Garrigos-Rosasco-V. 2019] Assume that the source/qualification condition holds. Let ˆ y ∈ Y , with ˆ y − y ≤ δ. Let (ˆ xt, ˆ ut) be the sequence generated by the (A3D) algorithm with y = ˆ y and ˆ u0 = u0. Suppose that (tλ1/(p−1)

t

) ∈ ℓ1(N) Then there exists an early stopping rule t(δ) ∼ δ−1/2 which verifies ˆ xt(δ) − x† = O(δ

1 2 ) when δ → 0.

For simplicity here D(u, y) = L(u − y). But this is not needed.

  • S. Villa (Unige)

Dual iterative regularization 34 / 44

slide-98
SLIDE 98

General data fit Experimental results

Setting

deblurring and denoising (salt and pepper, gaussian, gaussian+salt and pepper, Poisson) of 512 x 512 images comparison between the two versions: diagonal and warm restart diagonal:

  • ne parameter = (λt)= n. iter.

warm restart: two parameters: (λt); accuracy

  • S. Villa (Unige)

Dual iterative regularization 35 / 44

slide-99
SLIDE 99

General data fit Experimental results

Diagonal works as well as warm restart (i.e. Tikhonov)

Euclidean distance from the true image Dotted lines: diagonal with 103 and 104 iterations Dashed lines: warm restart with 30 λs and accuracy : 10−3, 10−4, 10−5

  • S. Villa (Unige)

Dual iterative regularization 36 / 44

slide-100
SLIDE 100

General data fit Experimental results

Diagonal works better than(?) warm restart (i.e. Tikhonov)

Total number of iterations as a function of (λt) Dotted lines: diagonal Dashed lines: warm restart with 30 λs and accuracy: 10−3, 10−4, 10−5

  • S. Villa (Unige)

Dual iterative regularization 37 / 44

slide-101
SLIDE 101

General data fit Experimental results

Parameter selection

using the true image using SURE (and the ideas in : Deladalle-Vaiter-Fadili-Peyr´

e 2014 to

compute it) budget of 103 iterations for diagonal and warm restart

  • S. Villa (Unige)

Dual iterative regularization 38 / 44

slide-102
SLIDE 102

General data fit Experimental results

Results

Blurring + Salt and pepper 35%. D(u, y) = u − y1, R(x) = Wx1 + x2 or xTV + x2

noisy image, reconstruction with diagonal and warm restart using true image, reconstruction with diagonal and warm restart using SURE

  • S. Villa (Unige)

Dual iterative regularization 39 / 44

slide-103
SLIDE 103

General data fit Experimental results

Comparison between 3D and A3D

Blurring + Salt and pepper 35%. D(u, y) = u − y1, R(x) = Wx1 + x2

3D I3D Euclidean distance from the true image iteration number

  • S. Villa (Unige)

Dual iterative regularization 40 / 44

slide-104
SLIDE 104

General data fit Experimental results

3D vs. A3D

350

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-105
SLIDE 105

General data fit Experimental results

3D vs. A3D

400

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-106
SLIDE 106

General data fit Experimental results

3D vs. A3D

430

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-107
SLIDE 107

General data fit Experimental results

3D vs. A3D

500

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-108
SLIDE 108

General data fit Experimental results

3D vs. A3D

680

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-109
SLIDE 109

General data fit Experimental results

3D vs. A3D

750

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-110
SLIDE 110

General data fit Experimental results

3D vs. A3D

900

  • riginal image

3D A3D Iterations

  • S. Villa (Unige)

Dual iterative regularization 41 / 44

slide-111
SLIDE 111

Conclusions

Concluding remarks ad future perspecitves

Concluding remarks use the number of iterations as regularization parameters iterative regularization as an alternative to Tikhonov regularization

  • ptimization perspective: stability with respect to errors as a way to

prove regularization results

  • S. Villa (Unige)

Dual iterative regularization 42 / 44

slide-112
SLIDE 112

Conclusions

Concluding remarks ad future perspecitves

Concluding remarks use the number of iterations as regularization parameters iterative regularization as an alternative to Tikhonov regularization

  • ptimization perspective: stability with respect to errors as a way to

prove regularization results Future perspectives remove strong convexity better use of conditioning learning setting (A → A)?

  • S. Villa (Unige)

Dual iterative regularization 42 / 44

slide-113
SLIDE 113

Conclusions

References

  • S. Matet, L. Rosasco, B. C. V˜

u, Don’t relax: early stopping for convex regularization, arxiv 2017.

  • G. Garrigos, L. Rosasco, and S. Villa, Iterative regularization via dual

diagonal descent, JMIV 2018

  • L. Calatroni, G. Garrigos, L. Rosasco, and S. Villa, Accelerated

iterative regularization via dual diagonal descent, manuscript 2019

  • S. Villa (Unige)

Dual iterative regularization 43 / 44

slide-114
SLIDE 114

Conclusions

The end

Merci pour votre attention

  • S. Villa (Unige)

Dual iterative regularization 44 / 44