Inverse Problems and Regularization An Introduction Stefan - - PowerPoint PPT Presentation

inverse problems and regularization an introduction
SMART_READER_LITE
LIVE PREVIEW

Inverse Problems and Regularization An Introduction Stefan - - PowerPoint PPT Presentation

Inverse Problems and Regularization An Introduction Stefan Kindermann Industrial Mathematics Institute University of Linz, Austria Introduction to Regularization What are Inverse Problems ? One possible definition [Engl, Hanke, Neubauer


slide-1
SLIDE 1

Inverse Problems and Regularization – An Introduction

Stefan Kindermann Industrial Mathematics Institute University of Linz, Austria

Introduction to Regularization

slide-2
SLIDE 2

What are Inverse Problems ?

One possible definition [Engl, Hanke, Neubauer ’96]: Inverse problems are concerned with determining causes for a desired or an observed effect. Cause (Parameter, Unknown,

Solution of Inv. Prob, . . .)

Direct Problem = ⇒ Inverse Problem ⇐ = Effect (Data, Observation, . . .)

Introduction to Regularization

slide-3
SLIDE 3

Direct and Inverse Problems

The classification as direct or inverse is in the most cases based on the well/ill-posedness of the associated problems: Cause Stable = ⇒ Unstable ⇐ = Effect Inverse Problems ∼ Ill-posed/(Ill-conditioned) Problems

Introduction to Regularization

slide-4
SLIDE 4

What are Inverse Problems?

A central feature of inverse problems is their ill-posedness Well-Posedness in the sense of Hadamard [Hadamard ’23] Existence of a solution (for all admissible data) Uniqueness of a solution Continuous dependence of solution on the data Well-Posedness in the sense of Nashed [Nashed, ’87] A problem is well posed if the set of Data/Observations is a closed

  • set. (The range of the forward operator is closed).

Introduction to Regularization

slide-5
SLIDE 5

Abstract Inverse Problem

Abstract inverse problem: Solve equation for x ∈ X (Banach/Hilbert- ... space), given data y ∈ Y (Banach/Hilbert- ... space) F(x) = y, where F −1 does not exist or is not continuous. F . . . forward operator We want

′′x† = F −1(y)′′

x†.. (generalized) solution

Introduction to Regularization

slide-6
SLIDE 6

Abstract Inverse Problem

  • If the forward operator is linear ⇒ linear inverse problem.
  • A linear inverse problem is well-posed in the sense of Nashed if

the range of F is closed. Theorem: An linear operator with finite dimensional range is always well-posed (in Nashed’s sense). “Ill-posedness lives in infinite dimensional spaces”

Introduction to Regularization

slide-7
SLIDE 7

Abstract Inverse Problem

“Ill-posedness lives in infinite dimensional spaces” Problems with a few number of parameters usually do not need regularization. Discretization acts as Regularization/Stabilization Ill-posedness in finite dimensional space ∼ Ill-conditioning Measure of ill-posedness: decay of singular values of forward

  • perator

Introduction to Regularization

slide-8
SLIDE 8

Methodologies in studying Inverse Problems

Deterministic Inverse Problems

(Regularization, worst case convergence, infinite di- mensional, no assumptions on noise)

Statistics

(Estimators, average case analysis, often finite di- mensional, noise is random variable, specific struc- ture )

Bayesian Inverse Problems

(Posteriori distribution, finite dimensional, analysis

  • f post. dist. by estimators, specific assumptions
  • n noise and prior)

Control Theory

(x= control, F(x)= state, convergence of state not control, infinite dimensional, no assumptions)

Introduction to Regularization

slide-9
SLIDE 9

Deterministic Inverse Problems and Regularization

Try to solve F(x) = y, when

′′x† = F −1(y)′′

does not exist. Notation: x† the “true” (unknown) solution (minimal norm solution)

Even if F −1(y) exists, it might not be computable [Pour-El, Richards ’88]

Introduction to Regularization

slide-10
SLIDE 10

Deterministic Inverse Problems and Regularization

Data noise: Usually we do not have the exact data y = F(x†) but only noisy data yδ = F(x†) + noise Amount of noise: noiselevel δ = F(x†) − yδ

Introduction to Regularization

slide-11
SLIDE 11

Deterministic Inverse Problems and Regularization

Method to solve Ill-posed problems: Regularization: Approximate the inverse F −1 by a family of stable operators Rα F(x) = y “x† = F −1(y)“ ⇒ xα = Rα(y) Rα ∼ F −1 Rα Regularization operators α Regularization parameter

Introduction to Regularization

slide-12
SLIDE 12

Regularization

α small ⇒ Rα good approximation for F −1, but unstable α large ⇒ Rα stable but bad approximation for F −1, α ... controls Trade-off between approximation and stability. Total error = approximation error + propagated data error

Approximation Error → Propagated ← Data Error

Total Error

α ||xα−x||

How to select α: Parameter choice rules

Introduction to Regularization

slide-13
SLIDE 13

Example: Tikhonov Regularization

Tikhonov Regularization: [Phillips ’62; Tikhonov ’63] Let F : X → Y be linear between Hilbertspaces: A least squares solution to F(x) = y is given by the normal equations F ∗Fx = F ∗y Tikhonov regularization: Solve regularized problem F ∗Fx + αx = F ∗y xα = (F ∗F + αI)−1F ∗y

Introduction to Regularization

slide-14
SLIDE 14

Example: Tikhonov Regularization

Error estimates (under some conditions) xα − x†2 ≤ δ2 α + Cαν total Error (Stability) Approx.

Theory of linear and nonlinear problems in Hilbert spaces:

[Tikhonov, Arsensin ’77; Groetsch ’84; Hofmann ’86; Baumeister ’87, Louis ’89; Kunisch, Engl, Neubauer ’89; Bakushinskii, Goncharskii ’95; Engl, Hanke, Neubauer ’96; Tikhonov, Leonov, Yagola ’98; . . . ]

Introduction to Regularization

slide-15
SLIDE 15

Example: Landweber iteration

Landweber iteration [Landweber ’51] Solve normal equation by Richardson iteration Landweber iteration xk+1 = xk − F ∗(F(xk) − y) k = 0, . . . Iteration index is the regularization parameter α = 1

k

Introduction to Regularization

slide-16
SLIDE 16

Example: Landweber iteration

Error estimates (under some conditions) xk − x†2 ≤ kδ + C kν total Error (Stability) Approx. Semiconvergence Iterative Regularization Methods: Parameter choice = choice of stopping index k

Theory: [Landweber ’51; Fridman ’56; Bialy ’59; Strand ’74; Vasilev ’83; Groetsch

’85; Natterer ’86; Hanke, Neubauer, Scherzer ’95; Bakushinskii, Goncharskii ’95; Engl, Hanke, Neubauer ’96;. . . ]

Introduction to Regularization

slide-17
SLIDE 17

Notion of Convergence

Does the regularized solution converges to the true solution as the noise level tends to 0 lim

δ→0 xα → x†

(Worst case) convergence lim

δ→0 sup{xα − x† | ∀yδ : yδ − F(x†) ≤ δ} = 0

(for a given parameter choice rule) Convergence in expectation Exα − x†2 → 0 as Eyδ − F(x†)2 → 0

Introduction to Regularization

slide-18
SLIDE 18

Theory of Regularization of Inverse Problems

Convergence depends on x† Question of speed: convergence rates xα − x† ≤ f (α)

  • r xα − x† ≤ f (δ)

Introduction to Regularization

slide-19
SLIDE 19

Theoretical Results

[Schock ’85]: Convergence can be arbitrarily slow !

Theorem: For ill-posed problems in the sense of Nashed, there cannot be a function f with limδ→ f (δ) = 0 such that for all x† xα − x† ≤ f (δ) Uniform bounds on the convergence rates are impossible Convergence rates are possible if x† in some smoothness class

Introduction to Regularization

slide-20
SLIDE 20

Theoretical Results

Convergence rates: requires a source condition x† ∈ M Convergence rates ∼ modulus of continuity of the inverse Ω(δ, M) = sup{x†1 −x†2 | F(x†1)−F(x†2) ≤ δ, x†1, x†2 ∈ M} Theorem[Tikhonov, Arsenin ’77, Morozov ’92, Traub, Wozniakowski ’80] For an arbitrary regularization map, arbitrary parameter choice rule (with Rα(0) = 0) xα − x† ≥ Ω(δ, M)

Introduction to Regularization

slide-21
SLIDE 21

Theoretical Results

Standard smoothness classes: For linear ill-posed problems in Hilbert spaces we can form M = X µ = {x† = (F ∗F)νω | ω ∈ X} (H¨

  • lder) source condition (=abstract smoothness condition)

Ω(δ, X µ) = Cδ

2µ 2µ+1

Best convergence rate for H¨

  • lder source conditions

A regularization operator and a parameter choice rule such that xα − x† = Cδ

2µ 2µ+1

is called order optimal.

Introduction to Regularization

slide-22
SLIDE 22

Theoretical Results

Special case x† = F ∗ω Such source conditions can be generalized to nonlinear problems e.g. x† = F ′(x†)∗ω x† = (F ′(x†)∗F(x†))νω

Introduction to Regularization

slide-23
SLIDE 23

Theoretical Results

Many regularization method have shown to be order optimal. A significant amount of theoretical results in regularization theory deals with this issue: Convergence of method and parameter choice rule Optimal order convergence under source condition. Knowledge of the source condition does not have to be known.

Introduction to Regularization

slide-24
SLIDE 24

Parameter Choice Rules

How to choose the regularization parameter: Classification a-priori α = α(δ) a-posteriori α = α(δ, y) heuristic α = α(y)

Introduction to Regularization

slide-25
SLIDE 25

Bakushinskii veto

Bakushinskii veto: [Bakushinskii ’84] A parameter choice without knowledge of δ cannot yield a convergent regularization in the worst case (for ill-posed problems). Knowledge of δ is needed ! ⇒ heuristic parameter choice rules are nonconvergent in the worst case

Introduction to Regularization

slide-26
SLIDE 26

a-priori-rules

Example of a-priori rule: If x† ∈ X µ, then α = δ

1 2µ+1

yields optimal order for Tikhonov regularization + Easy to implement − Needs information on source condition

Introduction to Regularization

slide-27
SLIDE 27

a-posteriori rules

Example a-posteriori rules: Morozov’s Discrepancy principle: [Morozov ’66] Fix τ > 1, DP: Choose the largest α such that the residual is of the order of the noise level F(xα) − y ≤ τδ Yields in many situations a optimal order method + Easy to implement + No information on source conditions − In some cases not optimal order Other a-posteriori choice rules: Gferer-Raus-Rule (improved discrepancy principle) [Raus ’85;

Gferer ’87]

Balancing principle [Lepski ’90; Mathe, Pereverzev ’03] . . .

Introduction to Regularization

slide-28
SLIDE 28

Heuristic Parameter Choice rules

Example heuristic rules: Quasi-optimality Rule [Tikhonov, Glasko ’64] Choose a sequence of geometrically decaying regularization parameter αn = Cqn q < 1 For each α compute xαn Choose α = αn∗ where n∗ is the minimizer of xαn+1 − xαn

Introduction to Regularization

slide-29
SLIDE 29

Heuristic Parameter Choice rules

Example heuristic rules: Hanke-Raus Rule [Hanke, Raus ’96] Choose α as minimizer of 1 √αF(xα) − y

Introduction to Regularization

slide-30
SLIDE 30

Heuristic Parameter Choice rules: Theory

Heuristic Rules cannot converge in the worst case: Convergence in the restricted noise case [K., Neubauer ’08, K. ’11] lim

δ→0 xα − x† → 0

if yδ = F(x) + noise, noise ∈ N The condition noise ∈ N is an abstract noise condition.

Introduction to Regularization

slide-31
SLIDE 31

Heuristic Parameter Choice rules: Theory

In the linear case reasonable noise conditions can be stated and convergence and convergence rates can be shown: Noise condition: ”Data noise has to be sufficiently irregular”

Introduction to Regularization

slide-32
SLIDE 32

Nonlinear Case :Tikhonov Regularization

F(x) = y with F nonlinear Tikhonov Regularization for Nonlinear Problems

[Tikhonov, Arsenin ’77; Engl, Kunisch Neubauer, ’89; Neubauer ’89, . . . ]

xα is a (global) minimizer of the Tikhonov functional J(x) = F(x) − y2 + αR(x) R(x) is a regularization functional

Introduction to Regularization

slide-33
SLIDE 33

Nonlinear Case :Tikhonov Regularization

Convergence (Rates) Theory: Hilbert spaces [Engl, Kunisch Neubauer ’89; Neubauer ’89] Banach spaces [Kaltenbacher, Hofmann, P¨

  • schl, Scherzer ’08]

Parameter Choice rules: a-priori: α = δξ a-posteriori: Discrepancy principle

Introduction to Regularization

slide-34
SLIDE 34

Nonlinear Case :Tikhonov Regularization

Examples: Sobolev norm R(x) = x2

Hs

Total Variation R(x) =

  • |∇x|

L1-norm R(x) =

  • |x|

Maximum Entropy R(x) =

  • |x| log(x)

Introduction to Regularization

slide-35
SLIDE 35

Nonlinear Case :Tikhonov Regularization

Choice of the Regularization functional: Deterministic Theory: User can choose: Should stabilize problem Convergence theory should apply R(x) should reflect what we expect from solution Bayesian viewpoint: Regularization functional ∼ prior

Introduction to Regularization

slide-36
SLIDE 36

Nonlinear Case :Tikhonov Regularization

Computational issue: The regularized solution is a global minimum of a optimization problem xα is a (global) minimizer J(x) = F(x) − y2 + αR(x)

Introduction to Regularization

slide-37
SLIDE 37

Iterative Methods

Example: Nonlinear Landweber iteration [Hanke, Neubauer, Scherzer ’95] xk+1 = xk − F ′(xk)∗(F(xk) − y) Parameter choice by choosing the stopping index. Convergence rates theory needs a nonlinearity condition F(x) − F(x†) − F ′(x†)(x − x†) ≤ CF(x) − F(x†) Restricts the nonlinearity of the problem Variants of a nonlinearity condition Range-invariance [Blaschke/Kaltenbacher ’96] Curvature condition [Chavent, Kunisch ’98] Variational inequalities [Kaltenbacher, Hofmann, P¨

  • schl, Scherzer ’08]

Faster alternative: Gauss-Newton type iterations [Bakushinskii ’92,

Blaschke, Neubauer, Scherzer ’97]

Introduction to Regularization

slide-38
SLIDE 38

Summary

Theoretical issues: For a given inverse problem Understand ill-posedness (Uniqueness/Stability) Are data rich enough to characterize solution uniquely How unstable is the inverse problem (degree of ill-posedness) Method of Regularization + Parameter Choice Design efficient regularization method for class of problem Convergence, Convergence rates (optimal order), Interplay: Regularization, Discretization Practical issues: How to compute global optimum in TR (efficiently) Improving iterative methods (Newton-type, preconditioning) What Regularization term to choose

Introduction to Regularization

slide-39
SLIDE 39

Dynamic Inverse Problems

Forward operator/Solution x(t) depend on time F(x(t′ ≤ t), t) = y(t)

Introduction to Regularization

slide-40
SLIDE 40

Dynamic Inverse Problems

Examples: Volterra integral equation of the first kind t k(t, s)x(s)ds = y(t) Parameter identification in ODEs y′(t) = f (t, y(t), x(t)) Control theory z(t)′ = Az(t) + Bx(t) y(t)′ = Cz(t) + Dx(t)

Introduction to Regularization

slide-41
SLIDE 41

Methods

Example: Tikhonov Regularization T F(t, x(., t)) − y(t)2dt + αR(x(t) + Convergence − Not causal/sequential: Computation of x(t) requires all data (past/future)

Introduction to Regularization

slide-42
SLIDE 42

Methods

Alternative: Dynamic Programing [K.,Leitao ’06] x′(t) = G(x(t), V (t)) + Convergence − Only for linear problems − Partially causal/sequential: Computation of V (t) requires all data (past/future)

Introduction to Regularization

slide-43
SLIDE 43

Methods

Control Theoretic Methods: Feedback control x(t) = Ky(t) (x(t), x′(t)) = Ky(t) −Convergence in x (Asymptotic convergence)? − Fully causal/sequential: Computation of x(t) requires only data (at t) + Nonlinear

Introduction to Regularization

slide-44
SLIDE 44

Methods

Control Theoretic Methods: Kalman filter − Restrictive Assumptions on noise + Fully causal/sequential

Introduction to Regularization

slide-45
SLIDE 45

Methods

Local Regularization [Lamm, Scofield ’01; Lamm ’03] xα(t) is given by an ODE related to Volterra equation + Fully causal/sequential + Convergence theory + Nonlinear − Quite specific method for Volterra equations

Introduction to Regularization

slide-46
SLIDE 46

Methods

K¨ ugler’ online parameter identification [K¨

ugler ’08]

x′(t) = G(x(t))∗(F(x(t)) − y(t)) + Fully causal/sequential + Asymptotic convergence theory (also for nonlinear case) − Assumptions realistic ? − Assumes x does not depend on time

Introduction to Regularization

slide-47
SLIDE 47

The wanted method

fully causal/sequential method convergence theory in the illposed and nonlinear case no/weak assumptions on operator no/weak assumptions on solution no assumption on noise efficient to compute

Introduction to Regularization