Unbiased Risk Estimation as Parameter Choice Rule for Filter-based - - PowerPoint PPT Presentation

unbiased risk estimation as parameter choice rule for
SMART_READER_LITE
LIVE PREVIEW

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based - - PowerPoint PPT Presentation

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods Frank Werner 1 Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry, G ottingen and Felix Bernstein


slide-1
SLIDE 1

Unbiased Risk Estimation as Parameter Choice Rule for Filter-based Regularization Methods

Frank Werner1

Statistical Inverse Problems in Biophysics Group Max Planck Institute for Biophysical Chemistry, G¨

  • ttingen

and Felix Bernstein Institute for Mathematical Statistics in the Biosciences University of G¨

  • ttingen

Chemnitz Symposium on Inverse Problems 2017 (on Tour in Rio)

1joint work with Housen Li Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 1 / 34

slide-2
SLIDE 2

Outline

1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 2 / 34

slide-3
SLIDE 3

Introduction

Outline

1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 3 / 34

slide-4
SLIDE 4

Introduction

Statistical inverse problems

Setting: X, Y Hilbert spaces, T : X → Y bounded, linear

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 4 / 34

slide-5
SLIDE 5

Introduction

Statistical inverse problems

Setting: X, Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 4 / 34

slide-6
SLIDE 6

Introduction

Statistical inverse problems

Setting: X, Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Noise: ξ is a standard Gaussian white noise process, σ > 0 noise level

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 4 / 34

slide-7
SLIDE 7

Introduction

Statistical inverse problems

Setting: X, Y Hilbert spaces, T : X → Y bounded, linear Task: Recover unknown f ∈ X from noisy measurements Y = Tf + σξ Noise: ξ is a standard Gaussian white noise process, σ > 0 noise level The model has to be understood in a weak sense: Yg := Tf , gY + σ ξ, g for all g ∈ Y with ξ, g ∼ N

  • 0, g2

Y

  • and E [ξ, g1 ξ, g2] = g1, g2Y.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 4 / 34

slide-8
SLIDE 8

Introduction

Statistical inverse problems

Assumptions:

  • T is injective and Hilbert-Schmidt ( σ2

k < ∞, σk singular values)

  • σ is known exactly

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 5 / 34

slide-9
SLIDE 9

Introduction

Statistical inverse problems

Assumptions:

  • T is injective and Hilbert-Schmidt ( σ2

k < ∞, σk singular values)

  • σ is known exactly

As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes ˆ fα := qα(T ∗T)T ∗Y , α > 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 5 / 34

slide-10
SLIDE 10

Introduction

Statistical inverse problems

Assumptions:

  • T is injective and Hilbert-Schmidt ( σ2

k < ∞, σk singular values)

  • σ is known exactly

As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes ˆ fα := qα(T ∗T)T ∗Y , α > 0.

Aim:

A posteriori choice of α such that rate of convergence (as σ ց 0) is order

  • ptimal (no loss of log-factors)

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 5 / 34

slide-11
SLIDE 11

Introduction

Statistical inverse problems

Assumptions:

  • T is injective and Hilbert-Schmidt ( σ2

k < ∞, σk singular values)

  • σ is known exactly

As the problem is ill-posed, regularization is needed. Consider filter-based regularization schemes ˆ fα := qα(T ∗T)T ∗Y , α > 0.

Aim:

A posteriori choice of α such that rate of convergence (as σ ց 0) is order

  • ptimal (no loss of log-factors)

Note: Heuristic parameter choice rules might work here as well, as the Bakushinski˘ ı veto does not hold in our setting (Becker ’11).

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 5 / 34

slide-12
SLIDE 12

A posteriori parameter choice methods

Outline

1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 6 / 34

slide-13
SLIDE 13

A posteriori parameter choice methods

The discrepancy principle

  • For deterministic data: αDP = max
  • α > 0
  • T ˆ

fα − Y

  • Y ≤ τσ
  • Frank Werner, MPIbpC G¨
  • ttingen

Unbiased Risk Estimation October 30, 2017 7 / 34

slide-14
SLIDE 14

A posteriori parameter choice methods

The discrepancy principle

  • For deterministic data: αDP = max
  • α > 0
  • T ˆ

fα − Y

  • Y ≤ τσ
  • But here: Y /

∈ Y! Either pre-smoothing (Y Z := T ∗Y ∈ X) ...

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 7 / 34

slide-15
SLIDE 15

A posteriori parameter choice methods

The discrepancy principle

  • For deterministic data: αDP = max
  • α > 0
  • T ˆ

fα − Y

  • Y ≤ τσ
  • But here: Y /

∈ Y! Either pre-smoothing (Y Z := T ∗Y ∈ X) ...

  • ... or discretization: Y ∈ Rn, ξ ∼ Nn (0, In) and choose

αDP = max

  • α > 0
  • T ˆ

fα − Y

  • 2 ≤ τσ√n
  • Frank Werner, MPIbpC G¨
  • ttingen

Unbiased Risk Estimation October 30, 2017 7 / 34

slide-16
SLIDE 16

A posteriori parameter choice methods

The discrepancy principle

  • For deterministic data: αDP = max
  • α > 0
  • T ˆ

fα − Y

  • Y ≤ τσ
  • But here: Y /

∈ Y! Either pre-smoothing (Y Z := T ∗Y ∈ X) ...

  • ... or discretization: Y ∈ Rn, ξ ∼ Nn (0, In) and choose

αDP = max

  • α > 0
  • T ˆ

fα − Y

  • 2 ≤ τσ√n
  • Pros:
  • Easy to implement
  • Works for all qα
  • Order-optimal convergence

rates Cons:

  • How to choose τ ≥ 1?
  • Only discretized meaningful
  • Early saturation

Davies & Anderssen ’86, Lukas ’95, Blanchard, Hoffmann & Reiß ’16

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 7 / 34

slide-17
SLIDE 17

A posteriori parameter choice methods

The quasi-optimality criterion

  • Neubauer ’08 (rα (λ) = 1 − λqα (λ)): αQO = argmin

α>0

  • rα (T ∗T) ˆ

  • X

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 8 / 34

slide-18
SLIDE 18

A posteriori parameter choice methods

The quasi-optimality criterion

  • Neubauer ’08 (rα (λ) = 1 − λqα (λ)): αQO = argmin

α>0

  • rα (T ∗T) ˆ

  • X
  • But for spectral cut-off rα (T ∗T) ˆ

fα = 0 for all α > 0

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 8 / 34

slide-19
SLIDE 19

A posteriori parameter choice methods

The quasi-optimality criterion

  • Neubauer ’08 (rα (λ) = 1 − λqα (λ)): αQO = argmin

α>0

  • rα (T ∗T) ˆ

  • X
  • But for spectral cut-off rα (T ∗T) ˆ

fα = 0 for all α > 0

  • Alternative formulation for Tikhonov regularization if candidates

α1 < ... < αm are given: nQO = argmin

1≤n≤m−1

  • ˆ

fαn − ˆ fαn+1

  • X ,

αQO := αnQO.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 8 / 34

slide-20
SLIDE 20

A posteriori parameter choice methods

The quasi-optimality criterion

  • Neubauer ’08 (rα (λ) = 1 − λqα (λ)): αQO = argmin

α>0

  • rα (T ∗T) ˆ

  • X
  • But for spectral cut-off rα (T ∗T) ˆ

fα = 0 for all α > 0

  • Alternative formulation for Tikhonov regularization if candidates

α1 < ... < αm are given: nQO = argmin

1≤n≤m−1

  • ˆ

fαn − ˆ fαn+1

  • X ,

αQO := αnQO. Pros:

  • Easy to implement, very fast
  • No knowledge of σ necessary
  • Order-optimal convergence

rates in mildly ill-posed situations Cons:

  • Only for special qα
  • Additional assumptions on

noise and/or f necessary

  • Performance unclear in

severely ill-posed situations Bauer & Kindermann ’08, Bauer & Reiß ’08, Bauer & Kindermann ’09

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 8 / 34

slide-21
SLIDE 21

A posteriori parameter choice methods

The Lepski˘ ı-type balancing principle

  • For given α, the standard deviation of ˆ

fα can be bounded by std (α) := σ

  • Tr
  • qαk (T ∗T)2 T ∗T
  • If candidates α1 < ... < αm are given:

nLEP = max

  • j
  • ˆ

fαj − ˆ fαk

  • X ≤ 4κstd (αk) for all 1 ≤ k ≤ j
  • and αLEP = αnLEP

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 9 / 34

slide-22
SLIDE 22

A posteriori parameter choice methods

The Lepski˘ ı-type balancing principle

  • For given α, the standard deviation of ˆ

fα can be bounded by std (α) := σ

  • Tr
  • qαk (T ∗T)2 T ∗T
  • If candidates α1 < ... < αm are given:

nLEP = max

  • j
  • ˆ

fαj − ˆ fαk

  • X ≤ 4κstd (αk) for all 1 ≤ k ≤ j
  • and αLEP = αnLEP

Pros:

  • Works for all qα
  • Robust in practice
  • convergence rates (mildly /

severely ill-posed) Cons:

  • Computationally expansive
  • κ ≥ 1 depends on decay of σk
  • loss of log factor compared to
  • rder-optimal rate

Bauer & Pereverzev ’05, Math´ e ’06, Math´ e & Pereverzev ’06

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 9 / 34

slide-23
SLIDE 23

A posteriori parameter choice methods

Unbiased risk estimation

  • Dating back to ideas of Mallows ’73 and Stein ’81 let

ˆ r (α, Y ) :=

  • T ˆ

  • 2

Y − 2

  • T ˆ

fα, Y

  • + 2σ2Tr (T ∗Tqα (T ∗T))

and choose αURE = argminα>0 ˆ r (α, Y )

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 10 / 34

slide-24
SLIDE 24

A posteriori parameter choice methods

Unbiased risk estimation

  • Dating back to ideas of Mallows ’73 and Stein ’81 let

ˆ r (α, Y ) :=

  • T ˆ

  • 2

Y − 2

  • T ˆ

fα, Y

  • + 2σ2Tr (T ∗Tqα (T ∗T))

and choose αURE = argminα>0 ˆ r (α, Y )

  • Note that E [ˆ

r (α, Y )] = E

  • T ˆ

fα − Tf

  • 2

Y

  • − c with c independent
  • f α (Unbiased Risk Estimation)

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 10 / 34

slide-25
SLIDE 25

A posteriori parameter choice methods

Unbiased risk estimation

  • Dating back to ideas of Mallows ’73 and Stein ’81 let

ˆ r (α, Y ) :=

  • T ˆ

  • 2

Y − 2

  • T ˆ

fα, Y

  • + 2σ2Tr (T ∗Tqα (T ∗T))

and choose αURE = argminα>0 ˆ r (α, Y )

  • Note that E [ˆ

r (α, Y )] = E

  • T ˆ

fα − Tf

  • 2

Y

  • − c with c independent
  • f α (Unbiased Risk Estimation)

For spectral cut-off and in mildly ill-posed situations, this gives order

  • ptimal rates (Chernousova & Golubev ’14). Besides this, only optimality

in the image space is known (Li ’87, Lukas ’93, Kneip ’94). Distributional behavior of αURE: Lucka et al ’17.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 10 / 34

slide-26
SLIDE 26

A posteriori parameter choice methods

Unbiased risk estimation

  • Dating back to ideas of Mallows ’73 and Stein ’81 let

ˆ r (α, Y ) :=

  • T ˆ

  • 2

Y − 2

  • T ˆ

fα, Y

  • + 2σ2Tr (T ∗Tqα (T ∗T))

and choose αURE = argminα>0 ˆ r (α, Y )

  • Note that E [ˆ

r (α, Y )] = E

  • T ˆ

fα − Tf

  • 2

Y

  • − c with c independent
  • f α (Unbiased Risk Estimation)

For spectral cut-off and in mildly ill-posed situations, this gives order

  • ptimal rates (Chernousova & Golubev ’14). Besides this, only optimality

in the image space is known (Li ’87, Lukas ’93, Kneip ’94). Distributional behavior of αURE: Lucka et al ’17. In general: Pros? Cons? Convergence rates? Order optimality?

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 10 / 34

slide-27
SLIDE 27

Error analysis

Outline

1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 11 / 34

slide-28
SLIDE 28

Error analysis A priori parameter choice

Assumptions

Filter

α |qα(λ)| ≤ C ′

q

and λ |qα(λ)| ≤ C ′′

q .

Source condition

Wφ :=

  • f ∈ X
  • f = φ(T ∗T)w, ωX ≤ C
  • .

Note: for any f ∈ X there exists φ such that f ∈ Wφ.

Qualification condition

The function φ is a qualification of qα if sup

λ∈[0,T ∗T

φ(λ) |1 − λqα(λ)| ≤ Cφφ(α)

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 12 / 34

slide-29
SLIDE 29

Error analysis A priori parameter choice

Assumptions

Let Σ (x) := #

  • k
  • σ2

k ≥ x

  • be the counting function of the singular values of T

Approximation by smooth surrogate

There exists S ∈ C 2, α1 ∈ (0, T ∗T] and CS ∈ (0, 2) such that (1) limαց0 S(α)/Σ(α) = 1 (approximation) (2) S′ < 0 (decreasing) (3) limαր∞ S(α) = limαր∞ S′(α) = 0 (behavior above σ2

1)

(4) limαց0 αS(α) = 0 (Hilbert-Schmidt) (5) αS′(α) is integrable on (0, α1] (6)

S′′(α) −S′(α) ≤ CS α on (0, α1]

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 13 / 34

slide-30
SLIDE 30

Error analysis A priori parameter choice

A priori convergence rates

Bissantz, Hohage, Munk, Ruymgaart ’07

Let α∗φ(α∗)2 = σ2S(α∗). (i) If φ is a qualification of qα, then sup

f ∈Wφ

E

  • ˆ

fα∗ − f

  • 2

X

  • φ(α∗)2 = σ2 S(α∗)

α∗ as σ ց 0. (ii) If λ → √ λφ(λ) is a qualification of the filter qα, then sup

f ∈Wφ

E

  • T ˆ

fα∗ − Tf

  • 2

Y

  • α∗φ(α∗)2 = σ2S(α∗)

as σ ց 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 14 / 34

slide-31
SLIDE 31

Error analysis A priori parameter choice

Mildly ill-posed situation: Example

Assume σ2

k ≍ k−a, Wb :=

  • f ∈ X : ∞

k=1 kbf 2 k ≤ 1

  • with a > 1, b > 0:

Bissantz, Hohage, Munk, Ruymgaart ’07

Let α∗ ≍ (σ2)a/(a+b+1).

  • If φ (λ) = λb/2a is a qualification of qα, then

sup

f ∈W⌊

E

  • ˆ

fα∗ − f

  • 2

X

  • (σ2)

b a+b+1 .

  • If φ (λ) = λb/2a+1/2 is a qualification of qα, then

sup

f ∈W⌊

E

  • T ˆ

fα∗ − Tf

  • 2

Y

  • (σ2)

a+b a+b+1 .

These rates are order optimal over Wb.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 15 / 34

slide-32
SLIDE 32

Error analysis Unbiased Risk estimation as parameter choice

Unbiased risk estimation vs. the oracle

Recall that ˆ r (α, Y ) :=

  • T ˆ

  • 2

Y − 2

  • T ˆ

fα, Y

  • + 2σ2Tr (T ∗Tqα (T ∗T))

is an unbiased estimator for r (α, f ) := E

  • T ˆ

fα − Tf

  • 2

Y

  • .

In the following, we will compare αURE = argmin

α>0

ˆ r (α, Y ) and αo = argmin

α>0

r(α, f ).

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 16 / 34

slide-33
SLIDE 33

Error analysis Unbiased Risk estimation as parameter choice

Additional assumptions

(a) α → {qα(σ2

k)}∞ k=1 is strictly monotone and continuous as R → ℓ2.

(b) As α ց 0, αqα(α) ≥ cq > 0. (c) For α > 0, the function λ → λqα(λ) is non-decreasing.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 17 / 34

slide-34
SLIDE 34

Error analysis Unbiased Risk estimation as parameter choice

Additional assumptions

(a) α → {qα(σ2

k)}∞ k=1 is strictly monotone and continuous as R → ℓ2.

(b) As α ց 0, αqα(α) ≥ cq > 0. (c) For α > 0, the function λ → λqα(λ) is non-decreasing. Satisfied by Tikhonov, spectral cut-off, Landweber, iterated Tikhonov and Showalter regularization, under proper parametrization. E.g. Tikhonov with re-parametrization α → √α (qα(λ) = 1/(√α + λ)) violates (b).

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 17 / 34

slide-35
SLIDE 35

Error analysis Unbiased Risk estimation as parameter choice

Additional assumptions

(a) α → {qα(σ2

k)}∞ k=1 is strictly monotone and continuous as R → ℓ2.

(b) As α ց 0, αqα(α) ≥ cq > 0. (c) For α > 0, the function λ → λqα(λ) is non-decreasing. Satisfied by Tikhonov, spectral cut-off, Landweber, iterated Tikhonov and Showalter regularization, under proper parametrization. E.g. Tikhonov with re-parametrization α → √α (qα(λ) = 1/(√α + λ)) violates (b). (d) ψ (λ) := λφ−1 √ λ

  • is convex

(e) There exists a constant Cq > c−2

q

such that ∞

1

Ψ′(Cqx) exp

  • −C

x 2

  • dx < ∞

with Ψ(x) := x (S−1 (x))2 . for some explicitly known C > 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 17 / 34

slide-36
SLIDE 36

Error analysis Unbiased Risk estimation as parameter choice

Additional assumptions

(a) α → {qα(σ2

k)}∞ k=1 is strictly monotone and continuous as R → ℓ2.

(b) As α ց 0, αqα(α) ≥ cq > 0. (c) For α > 0, the function λ → λqα(λ) is non-decreasing. Satisfied by Tikhonov, spectral cut-off, Landweber, iterated Tikhonov and Showalter regularization, under proper parametrization. E.g. Tikhonov with re-parametrization α → √α (qα(λ) = 1/(√α + λ)) violates (b). (d) ψ (λ) := λφ−1 √ λ

  • is convex

(e) There exists a constant Cq > c−2

q

such that ∞

1

Ψ′(Cqx) exp

  • −C

x 2

  • dx < ∞

with Ψ(x) := x (S−1 (x))2 . for some explicitly known C > 0. (d) can always be satisfied by weakening φ, (e) restricts the decay of the singular values

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 17 / 34

slide-37
SLIDE 37

Error analysis Unbiased Risk estimation as parameter choice

Oracle inequality

Li & W. ’16

There are positive constants Ci, i = 1, . . . , 6, such that for all f ∈ Wφ it holds E

  • ˆ

fαURE − f

  • 2

X

  • ≤C1ψ−1

C2r(αo, f ) + C3σ2 + C4σ2 + C5 r(αo, f ) + σ

  • r(αo, f )

S−1

  • C6

r(αo,f ) σ2

  • as σ ց 0.

Gives a comparison of the strong risk under αURE with the weak risk under the oracle αo.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 18 / 34

slide-38
SLIDE 38

Error analysis Unbiased Risk estimation as parameter choice

Convergence rates

Li & W. ’16

If also λ → √ λφ(λ) is a qualification of the filter qα, then for α∗φ(α∗)2 = σ2S(α∗) there are C1, C2, C3 > 0 such that sup

f ∈Wφ

E

  • ˆ

fαURE − f

  • 2

X

  • ≤C1σ2 S(α∗)

α∗ + C2 σ2S(α∗) S−1 (C3S(α∗)) as σ ց 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 19 / 34

slide-39
SLIDE 39

Error analysis Unbiased Risk estimation as parameter choice

Convergence rates

Li & W. ’16

If also λ → √ λφ(λ) is a qualification of the filter qα, then for α∗φ(α∗)2 = σ2S(α∗) there are C1, C2, C3 > 0 such that sup

f ∈Wφ

E

  • ˆ

fαURE − f

  • 2

X

  • ≤C1σ2 S(α∗)

α∗ + C2 σ2S(α∗) S−1 (C3S(α∗)) as σ ց 0. If there is C4 > 0 such that S(C4x) ≥ C3S(x), then this equals the a priori rate sup

f ∈Wφ

E

  • ˆ

fαURE − f

  • 2

X

  • φ(α∗)2 = σ2 S(α∗)

α∗ .

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 19 / 34

slide-40
SLIDE 40

Error analysis Unbiased Risk estimation as parameter choice

Order optimality in mildly ill-posed situations

Assume σ2

k ≍ k−a, Wb :=

  • f ∈ X : ∞

k=1 kbf 2 k ≤ 1

  • with a > 1, b > 0:

Oracle inequality

For all f ∈ W: E

  • ˆ

fαURE − f

  • 2

X

  • r(αo, f )

b a+b + σ−2ar(αo, f )1+a + σ1−2ar(αo, f ) 1+2a 2 . Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 20 / 34

slide-41
SLIDE 41

Error analysis Unbiased Risk estimation as parameter choice

Order optimality in mildly ill-posed situations

Assume σ2

k ≍ k−a, Wb :=

  • f ∈ X : ∞

k=1 kbf 2 k ≤ 1

  • with a > 1, b > 0:

Oracle inequality

For all f ∈ W: E

  • ˆ

fαURE − f

  • 2

X

  • r(αo, f )

b a+b + σ−2ar(αo, f )1+a + σ1−2ar(αo, f ) 1+2a 2 .

Convergence rate

Thus, if λ → λb/2a+1/2 is a qualification of qα, then sup

f ∈Wb

E

  • ˆ

fαURE − f

  • 2

X

  • σ

2b a+b+1

which is order-optimal.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 20 / 34

slide-42
SLIDE 42

Error analysis Unbiased Risk estimation as parameter choice

Unbiased risk estimation - pros and cons

αURE = argmin

α>0

  • T ˆ

  • 2

Y − 2

  • T ˆ

fα, Y

  • + 2σ2Tr (T ∗Tqα (T ∗T))
  • Pros:
  • Works for many qα
  • order-optimal convergence

rates in mildly ill-posed situations

  • no loss of log factor
  • no tuning parameter

Cons:

  • Computationally expansive
  • Early saturation
  • performance in severely

ill-posed situations unclear

  • H. Li and F. Werner (2017). Empirical risk minimization as parameter choice rule for

general linear regularization methods. Submitted, arXiv: 1703.07809.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 21 / 34

slide-43
SLIDE 43

Simulations

Outline

1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 22 / 34

slide-44
SLIDE 44

Simulations Rates of convergence

A mildly ill-posed situation - antiderivative

Let T : L2 ([0, 1]) → L2 ([0, 1]) given by (Tf ) (x) =

1

  • min {x (1 − y) , y (1 − x)} f (y) dy

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 23 / 34

slide-45
SLIDE 45

Simulations Rates of convergence

A mildly ill-posed situation - antiderivative

Let T : L2 ([0, 1]) → L2 ([0, 1]) given by (Tf ) (x) =

1

  • min {x (1 − y) , y (1 − x)} f (y) dy

As (Tf )′′ = −f the singular values σk of T satisfy σk ≍ k−2.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 23 / 34

slide-46
SLIDE 46

Simulations Rates of convergence

A mildly ill-posed situation - antiderivative

Let T : L2 ([0, 1]) → L2 ([0, 1]) given by (Tf ) (x) =

1

  • min {x (1 − y) , y (1 − x)} f (y) dy

As (Tf )′′ = −f the singular values σk of T satisfy σk ≍ k−2. We choose f (x) =

  • x

if 0 ≤ x ≤ 1

2,

1 − x if 1

2 ≤ x ≤ 1.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 23 / 34

slide-47
SLIDE 47

Simulations Rates of convergence

A mildly ill-posed situation - antiderivative

Let T : L2 ([0, 1]) → L2 ([0, 1]) given by (Tf ) (x) =

1

  • min {x (1 − y) , y (1 − x)} f (y) dy

As (Tf )′′ = −f the singular values σk of T satisfy σk ≍ k−2. We choose f (x) =

  • x

if 0 ≤ x ≤ 1

2,

1 − x if 1

2 ≤ x ≤ 1.

Fourier coefficients: fk = (−1)k−1

4π3k2 , so the optimal rate is O

  • σ

3 4 −ε

for any ε > 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 23 / 34

slide-48
SLIDE 48

Simulations Rates of convergence

A mildly ill-posed situation - Tikhonov regularization

10−7 10−5 10−3 10−1 10−4 10−2 100 102 σ

3 4

Empirical MISE 10−7 10−5 10−3 10−1 10−16 10−10 10−4 102 Empirical Variance

Figure: Empirical MISE and variance of ˆ f − f 2

2 over 104 repetitions:

αo ( ), αDP ( ), αQO ( ), αLEP ( ), αURE ( ).

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 24 / 34

slide-49
SLIDE 49

Simulations Rates of convergence

A severely ill-posed situation - satellite gradiometry

Let R > 1 and S ⊂ R2 the unit sphere. Given g = ∂2u

∂r2 on RS find f in

       ∆u = 0 in Rd \ B, u = f

  • n S,

|u (x)| = O

  • x−1

2

  • as x2 → ∞.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 25 / 34

slide-50
SLIDE 50

Simulations Rates of convergence

A severely ill-posed situation - satellite gradiometry

Let R > 1 and S ⊂ R2 the unit sphere. Given g = ∂2u

∂r2 on RS find f in

       ∆u = 0 in Rd \ B, u = f

  • n S,

|u (x)| = O

  • x−1

2

  • as x2 → ∞.

Corresponding T : L2 (S, µ) → L2 (RS, µ) has singular values σk = |k| (|k| + 1) R−|k|−2.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 25 / 34

slide-51
SLIDE 51

Simulations Rates of convergence

A severely ill-posed situation - satellite gradiometry

Let R > 1 and S ⊂ R2 the unit sphere. Given g = ∂2u

∂r2 on RS find f in

       ∆u = 0 in Rd \ B, u = f

  • n S,

|u (x)| = O

  • x−1

2

  • as x2 → ∞.

Corresponding T : L2 (S, µ) → L2 (RS, µ) has singular values σk = |k| (|k| + 1) R−|k|−2. We choose f (x) = π 2 − |x| , x ∈ [−π, π]

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 25 / 34

slide-52
SLIDE 52

Simulations Rates of convergence

A severely ill-posed situation - satellite gradiometry

Let R > 1 and S ⊂ R2 the unit sphere. Given g = ∂2u

∂r2 on RS find f in

       ∆u = 0 in Rd \ B, u = f

  • n S,

|u (x)| = O

  • x−1

2

  • as x2 → ∞.

Corresponding T : L2 (S, µ) → L2 (RS, µ) has singular values σk = |k| (|k| + 1) R−|k|−2. We choose f (x) = π 2 − |x| , x ∈ [−π, π] Optimal rate of convergence is O

  • (− log (σ))−3+ε

for any ε > 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 25 / 34

slide-53
SLIDE 53

Simulations Rates of convergence

A severely ill-posed situation - Tikhonov regularization

10−2 10−3 10−4 10−6 10−8 10−2 100 102 (− log (σ))−3 Empirical MISE 10−2 10−3 10−4 10−6 10−8 10−12 10−8 10−4 100 Empirical Variance

Figure: Empirical MISE and variance of ˆ f − f 2

2 over 104 repetitions:

αo ( ), αDP ( ), αQO ( ), αLEP ( ), αURE ( ).

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 26 / 34

slide-54
SLIDE 54

Simulations Rates of convergence

A severely ill-posed situation - backwards heat equation

Let ¯ t > 0. Given g = u (·, ¯ t) find f in     

∂u ∂t (x, t) = ∂2u ∂t2 (x, t)

in (−π, π] × (0, ¯ t) , u (x, 0) = f (x)

  • n [−π, π] ,

u (−π, t) = u (π, t)

  • n t ∈ (0, ¯

t] .

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 27 / 34

slide-55
SLIDE 55

Simulations Rates of convergence

A severely ill-posed situation - backwards heat equation

Let ¯ t > 0. Given g = u (·, ¯ t) find f in     

∂u ∂t (x, t) = ∂2u ∂t2 (x, t)

in (−π, π] × (0, ¯ t) , u (x, 0) = f (x)

  • n [−π, π] ,

u (−π, t) = u (π, t)

  • n t ∈ (0, ¯

t] . Corresponding T : L2 ([−π, π]) → L2 ([−π, π]) has singular values σk = exp

  • −k2¯

t

  • .

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 27 / 34

slide-56
SLIDE 56

Simulations Rates of convergence

A severely ill-posed situation - backwards heat equation

Let ¯ t > 0. Given g = u (·, ¯ t) find f in     

∂u ∂t (x, t) = ∂2u ∂t2 (x, t)

in (−π, π] × (0, ¯ t) , u (x, 0) = f (x)

  • n [−π, π] ,

u (−π, t) = u (π, t)

  • n t ∈ (0, ¯

t] . Corresponding T : L2 ([−π, π]) → L2 ([−π, π]) has singular values σk = exp

  • −k2¯

t

  • .

We choose f (x) = π 2 − |x| , x ∈ [−π, π]

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 27 / 34

slide-57
SLIDE 57

Simulations Rates of convergence

A severely ill-posed situation - backwards heat equation

Let ¯ t > 0. Given g = u (·, ¯ t) find f in     

∂u ∂t (x, t) = ∂2u ∂t2 (x, t)

in (−π, π] × (0, ¯ t) , u (x, 0) = f (x)

  • n [−π, π] ,

u (−π, t) = u (π, t)

  • n t ∈ (0, ¯

t] . Corresponding T : L2 ([−π, π]) → L2 ([−π, π]) has singular values σk = exp

  • −k2¯

t

  • .

We choose f (x) = π 2 − |x| , x ∈ [−π, π] Optimal rate of convergence is O

  • (− log (σ))−3/2+ε

for any ε > 0.

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 27 / 34

slide-58
SLIDE 58

Simulations Rates of convergence

A severely ill-posed situation - Tikhonov regularization

10−2 10−3 10−4 10−6 10−8 10−2 10−1 100 101 (− log (σ))− 3

2

Empirical MISE 10−2 10−3 10−4 10−6 10−8 10−14 10−10 10−6 10−2 Empirical Variance

Figure: Empirical MISE and variance of ˆ f − f 2

2 over 104 repetitions:

αo ( ), αDP ( ), αQO ( ), αLEP ( ), αURE ( ).

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 28 / 34

slide-59
SLIDE 59

Simulations Efficiency simulations

Efficiency simulations

Measure the efficiency of a parameter choice rule α∗ by the fraction R∗ := E

  • ˆ

fαo − f

  • 2

X

  • E
  • ˆ

fα∗ − f

  • 2

X

  • Frank Werner, MPIbpC G¨
  • ttingen

Unbiased Risk Estimation October 30, 2017 29 / 34

slide-60
SLIDE 60

Simulations Efficiency simulations

Efficiency simulations

Measure the efficiency of a parameter choice rule α∗ by the fraction R∗ := E

  • ˆ

fαo − f

  • 2

X

  • E
  • ˆ

fα∗ − f

  • 2

X

  • Numerical approximations of these as functions of σ with different

parameters a, ν > 0 in the following setting:

  • σk = exp (−ak)

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 29 / 34

slide-61
SLIDE 61

Simulations Efficiency simulations

Efficiency simulations

Measure the efficiency of a parameter choice rule α∗ by the fraction R∗ := E

  • ˆ

fαo − f

  • 2

X

  • E
  • ˆ

fα∗ − f

  • 2

X

  • Numerical approximations of these as functions of σ with different

parameters a, ν > 0 in the following setting:

  • σk = exp (−ak)
  • fk = ±k−ν ·
  • 1 + N
  • 0, 0.12

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 29 / 34

slide-62
SLIDE 62

Simulations Efficiency simulations

Efficiency simulations

Measure the efficiency of a parameter choice rule α∗ by the fraction R∗ := E

  • ˆ

fαo − f

  • 2

X

  • E
  • ˆ

fα∗ − f

  • 2

X

  • Numerical approximations of these as functions of σ with different

parameters a, ν > 0 in the following setting:

  • σk = exp (−ak)
  • fk = ±k−ν ·
  • 1 + N
  • 0, 0.12
  • Yk = σk · fk + N
  • 0, σ2

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 29 / 34

slide-63
SLIDE 63

Simulations Efficiency simulations

Efficiency simulations

Measure the efficiency of a parameter choice rule α∗ by the fraction R∗ := E

  • ˆ

fαo − f

  • 2

X

  • E
  • ˆ

fα∗ − f

  • 2

X

  • Numerical approximations of these as functions of σ with different

parameters a, ν > 0 in the following setting:

  • σk = exp (−ak)
  • fk = ±k−ν ·
  • 1 + N
  • 0, 0.12
  • Yk = σk · fk + N
  • 0, σ2
  • k = 1, ..., 300, 104 repetitions

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 29 / 34

slide-64
SLIDE 64

Simulations Efficiency simulations

Efficiency simulations

Measure the efficiency of a parameter choice rule α∗ by the fraction R∗ := E

  • ˆ

fαo − f

  • 2

X

  • E
  • ˆ

fα∗ − f

  • 2

X

  • Numerical approximations of these as functions of σ with different

parameters a, ν > 0 in the following setting:

  • σk = exp (−ak)
  • fk = ±k−ν ·
  • 1 + N
  • 0, 0.12
  • Yk = σk · fk + N
  • 0, σ2
  • k = 1, ..., 300, 104 repetitions
  • Tikhonov regularization

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 29 / 34

slide-65
SLIDE 65

Simulations Efficiency simulations

Efficiency simulations - results

10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 a = 0.2, ν = 1 σ 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 a = 0.3, ν = 1 σ

Figure: RQO ( ), RDP ( ), RLEP ( ), and RURE ( )

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 30 / 34

slide-66
SLIDE 66

Simulations Efficiency simulations

Efficiency simulations - results

10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 a = 0.4, ν = 1 σ 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 a = 0.6, ν = 1 σ

Figure: RQO ( ), RDP ( ), RLEP ( ), and RURE ( )

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 31 / 34

slide-67
SLIDE 67

Simulations Efficiency simulations

Efficiency simulations - results

10−7 10−6 10−5 10−4 10−3 10−2 10−5 10−3 10−1 a = 0.3, ν = 3 σ 10−7 10−6 10−5 10−4 10−3 10−2 10−5 10−3 10−1 a = 0.3, ν = 5 σ

Figure: RQO ( ), RDP ( ), RLEP ( ), and RURE ( )

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 32 / 34

slide-68
SLIDE 68

Conclusion

Outline

1 Introduction 2 A posteriori parameter choice methods 3 Error analysis 4 Simulations 5 Conclusion

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 33 / 34

slide-69
SLIDE 69

Conclusion

Presented results

  • Analysis of a parameter choice based on unbiased risk estimation:

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 34 / 34

slide-70
SLIDE 70

Conclusion

Presented results

  • Analysis of a parameter choice based on unbiased risk estimation:
  • oracle inequality
  • convergence rates
  • order optimality in mildly ill-posed situations

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 34 / 34

slide-71
SLIDE 71

Conclusion

Presented results

  • Analysis of a parameter choice based on unbiased risk estimation:
  • oracle inequality
  • convergence rates
  • order optimality in mildly ill-posed situations
  • Numerical comparison:

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 34 / 34

slide-72
SLIDE 72

Conclusion

Presented results

  • Analysis of a parameter choice based on unbiased risk estimation:
  • oracle inequality
  • convergence rates
  • order optimality in mildly ill-posed situations
  • Numerical comparison:
  • in this specific setting, quasi-optimality outperforms all other methods
  • unbiased risk estimation has higher variance (by design)
  • simulations suggest order optimality of quasi-optimality also in severely

ill-posed situations, not clear for unbiased risk estimation

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 34 / 34

slide-73
SLIDE 73

Conclusion

Presented results

  • Analysis of a parameter choice based on unbiased risk estimation:
  • oracle inequality
  • convergence rates
  • order optimality in mildly ill-posed situations
  • Numerical comparison:
  • in this specific setting, quasi-optimality outperforms all other methods
  • unbiased risk estimation has higher variance (by design)
  • simulations suggest order optimality of quasi-optimality also in severely

ill-posed situations, not clear for unbiased risk estimation

Thank you for your attention!

Frank Werner, MPIbpC G¨

  • ttingen

Unbiased Risk Estimation October 30, 2017 34 / 34