Inverse Problems and Regularization An Introduction Stefan - - PowerPoint PPT Presentation

▶

Jul 06, 2023 11 likes •485 views

Inverse Problems and Regularization An Introduction Stefan Kindermann Industrial Mathematics Institute University of Linz, Austria Introduction to Regularization What are Inverse Problems ? One possible definition [Engl, Hanke, Neubauer

SLIDE 1

Inverse Problems and Regularization – An Introduction

Stefan Kindermann Industrial Mathematics Institute University of Linz, Austria

Introduction to Regularization

SLIDE 2

What are Inverse Problems ?

One possible definition [Engl, Hanke, Neubauer ’96]: Inverse problems are concerned with determining causes for a desired or an observed effect. Cause (Parameter, Unknown,

Solution of Inv. Prob, . . .)

Direct Problem = ⇒ Inverse Problem ⇐ = Effect (Data, Observation, . . .)

Introduction to Regularization

SLIDE 3

Direct and Inverse Problems

The classification as direct or inverse is in the most cases based on the well/ill-posedness of the associated problems: Cause Stable = ⇒ Unstable ⇐ = Effect Inverse Problems ∼ Ill-posed/(Ill-conditioned) Problems

Introduction to Regularization

SLIDE 4

What are Inverse Problems?

A central feature of inverse problems is their ill-posedness Well-Posedness in the sense of Hadamard [Hadamard ’23] Existence of a solution (for all admissible data) Uniqueness of a solution Continuous dependence of solution on the data Well-Posedness in the sense of Nashed [Nashed, ’87] A problem is well posed if the set of Data/Observations is a closed

set. (The range of the forward operator is closed).

Introduction to Regularization

SLIDE 5

Abstract Inverse Problem

Abstract inverse problem: Solve equation for x ∈ X (Banach/Hilbert- ... space), given data y ∈ Y (Banach/Hilbert- ... space) F(x) = y, where F −1 does not exist or is not continuous. F . . . forward operator We want

′′x† = F −1(y)′′

x†.. (generalized) solution

Introduction to Regularization

SLIDE 6

Abstract Inverse Problem

If the forward operator is linear ⇒ linear inverse problem.
A linear inverse problem is well-posed in the sense of Nashed if

the range of F is closed. Theorem: An linear operator with finite dimensional range is always well-posed (in Nashed’s sense). “Ill-posedness lives in infinite dimensional spaces”

Introduction to Regularization

SLIDE 7

Abstract Inverse Problem

“Ill-posedness lives in infinite dimensional spaces” Problems with a few number of parameters usually do not need regularization. Discretization acts as Regularization/Stabilization Ill-posedness in finite dimensional space ∼ Ill-conditioning Measure of ill-posedness: decay of singular values of forward

perator

Introduction to Regularization

SLIDE 8

Methodologies in studying Inverse Problems

Deterministic Inverse Problems

(Regularization, worst case convergence, infinite di- mensional, no assumptions on noise)

Statistics

(Estimators, average case analysis, often finite di- mensional, noise is random variable, specific struc- ture )

Bayesian Inverse Problems

(Posteriori distribution, finite dimensional, analysis

f post. dist. by estimators, specific assumptions
n noise and prior)

Control Theory

(x= control, F(x)= state, convergence of state not control, infinite dimensional, no assumptions)

Introduction to Regularization

SLIDE 9

Deterministic Inverse Problems and Regularization

Try to solve F(x) = y, when

′′x† = F −1(y)′′

does not exist. Notation: x† the “true” (unknown) solution (minimal norm solution)

Even if F −1(y) exists, it might not be computable [Pour-El, Richards ’88]

Introduction to Regularization

SLIDE 10

Deterministic Inverse Problems and Regularization

Data noise: Usually we do not have the exact data y = F(x†) but only noisy data yδ = F(x†) + noise Amount of noise: noiselevel δ = F(x†) − yδ

Introduction to Regularization

SLIDE 11

Deterministic Inverse Problems and Regularization

Method to solve Ill-posed problems: Regularization: Approximate the inverse F −1 by a family of stable operators Rα F(x) = y “x† = F −1(y)“ ⇒ xα = Rα(y) Rα ∼ F −1 Rα Regularization operators α Regularization parameter

Introduction to Regularization

SLIDE 12

Regularization

α small ⇒ Rα good approximation for F −1, but unstable α large ⇒ Rα stable but bad approximation for F −1, α ... controls Trade-off between approximation and stability. Total error = approximation error + propagated data error

Approximation Error → Propagated ← Data Error

Total Error

↓

α ||xα−x||

How to select α: Parameter choice rules

Introduction to Regularization

SLIDE 13

Example: Tikhonov Regularization

Tikhonov Regularization: [Phillips ’62; Tikhonov ’63] Let F : X → Y be linear between Hilbertspaces: A least squares solution to F(x) = y is given by the normal equations F ∗Fx = F ∗y Tikhonov regularization: Solve regularized problem F ∗Fx + αx = F ∗y xα = (F ∗F + αI)−1F ∗y

Introduction to Regularization

SLIDE 14

Example: Tikhonov Regularization

Error estimates (under some conditions) xα − x†2 ≤ δ2 α + Cαν total Error (Stability) Approx.

Theory of linear and nonlinear problems in Hilbert spaces:

[Tikhonov, Arsensin ’77; Groetsch ’84; Hofmann ’86; Baumeister ’87, Louis ’89; Kunisch, Engl, Neubauer ’89; Bakushinskii, Goncharskii ’95; Engl, Hanke, Neubauer ’96; Tikhonov, Leonov, Yagola ’98; . . . ]

Introduction to Regularization

SLIDE 15

Example: Landweber iteration

Landweber iteration [Landweber ’51] Solve normal equation by Richardson iteration Landweber iteration xk+1 = xk − F ∗(F(xk) − y) k = 0, . . . Iteration index is the regularization parameter α = 1

k

Introduction to Regularization

SLIDE 16

Example: Landweber iteration

Error estimates (under some conditions) xk − x†2 ≤ kδ + C kν total Error (Stability) Approx. Semiconvergence Iterative Regularization Methods: Parameter choice = choice of stopping index k

Theory: [Landweber ’51; Fridman ’56; Bialy ’59; Strand ’74; Vasilev ’83; Groetsch

’85; Natterer ’86; Hanke, Neubauer, Scherzer ’95; Bakushinskii, Goncharskii ’95; Engl, Hanke, Neubauer ’96;. . . ]

Introduction to Regularization

SLIDE 17

Notion of Convergence

Does the regularized solution converges to the true solution as the noise level tends to 0 lim

δ→0 xα → x†

(Worst case) convergence lim

δ→0 sup{xα − x† | ∀yδ : yδ − F(x†) ≤ δ} = 0

(for a given parameter choice rule) Convergence in expectation Exα − x†2 → 0 as Eyδ − F(x†)2 → 0

Introduction to Regularization

SLIDE 18

Theory of Regularization of Inverse Problems

Convergence depends on x† Question of speed: convergence rates xα − x† ≤ f (α)

r xα − x† ≤ f (δ)

Introduction to Regularization

SLIDE 19

Theoretical Results

[Schock ’85]: Convergence can be arbitrarily slow !

Theorem: For ill-posed problems in the sense of Nashed, there cannot be a function f with limδ→ f (δ) = 0 such that for all x† xα − x† ≤ f (δ) Uniform bounds on the convergence rates are impossible Convergence rates are possible if x† in some smoothness class

Introduction to Regularization

SLIDE 20

Theoretical Results

Convergence rates: requires a source condition x† ∈ M Convergence rates ∼ modulus of continuity of the inverse Ω(δ, M) = sup{x†1 −x†2 | F(x†1)−F(x†2) ≤ δ, x†1, x†2 ∈ M} Theorem[Tikhonov, Arsenin ’77, Morozov ’92, Traub, Wozniakowski ’80] For an arbitrary regularization map, arbitrary parameter choice rule (with Rα(0) = 0) xα − x† ≥ Ω(δ, M)

Introduction to Regularization

SLIDE 21

Theoretical Results

Standard smoothness classes: For linear ill-posed problems in Hilbert spaces we can form M = X µ = {x† = (F ∗F)νω | ω ∈ X} (H¨

lder) source condition (=abstract smoothness condition)

Ω(δ, X µ) = Cδ

2µ 2µ+1

Best convergence rate for H¨

lder source conditions

A regularization operator and a parameter choice rule such that xα − x† = Cδ

2µ 2µ+1

is called order optimal.

Introduction to Regularization

SLIDE 22

Theoretical Results

Special case x† = F ∗ω Such source conditions can be generalized to nonlinear problems e.g. x† = F ′(x†)∗ω x† = (F ′(x†)∗F(x†))νω

Introduction to Regularization

SLIDE 23

Theoretical Results

Many regularization method have shown to be order optimal. A significant amount of theoretical results in regularization theory deals with this issue: Convergence of method and parameter choice rule Optimal order convergence under source condition. Knowledge of the source condition does not have to be known.

Introduction to Regularization

SLIDE 24

Parameter Choice Rules

How to choose the regularization parameter: Classification a-priori α = α(δ) a-posteriori α = α(δ, y) heuristic α = α(y)

Introduction to Regularization

SLIDE 25

Bakushinskii veto

Bakushinskii veto: [Bakushinskii ’84] A parameter choice without knowledge of δ cannot yield a convergent regularization in the worst case (for ill-posed problems). Knowledge of δ is needed ! ⇒ heuristic parameter choice rules are nonconvergent in the worst case

Introduction to Regularization

SLIDE 26

a-priori-rules

Example of a-priori rule: If x† ∈ X µ, then α = δ

1 2µ+1

yields optimal order for Tikhonov regularization + Easy to implement − Needs information on source condition

Introduction to Regularization

SLIDE 27

a-posteriori rules

Example a-posteriori rules: Morozov’s Discrepancy principle: [Morozov ’66] Fix τ > 1, DP: Choose the largest α such that the residual is of the order of the noise level F(xα) − y ≤ τδ Yields in many situations a optimal order method + Easy to implement + No information on source conditions − In some cases not optimal order Other a-posteriori choice rules: Gferer-Raus-Rule (improved discrepancy principle) [Raus ’85;

Gferer ’87]

Balancing principle [Lepski ’90; Mathe, Pereverzev ’03] . . .

Introduction to Regularization

SLIDE 28

Heuristic Parameter Choice rules

Example heuristic rules: Quasi-optimality Rule [Tikhonov, Glasko ’64] Choose a sequence of geometrically decaying regularization parameter αn = Cqn q < 1 For each α compute xαn Choose α = αn∗ where n∗ is the minimizer of xαn+1 − xαn

Introduction to Regularization

SLIDE 29

Heuristic Parameter Choice rules

Example heuristic rules: Hanke-Raus Rule [Hanke, Raus ’96] Choose α as minimizer of 1 √αF(xα) − y

Introduction to Regularization

SLIDE 30

Heuristic Parameter Choice rules: Theory

Heuristic Rules cannot converge in the worst case: Convergence in the restricted noise case [K., Neubauer ’08, K. ’11] lim

δ→0 xα − x† → 0

if yδ = F(x) + noise, noise ∈ N The condition noise ∈ N is an abstract noise condition.

Introduction to Regularization

SLIDE 31

Heuristic Parameter Choice rules: Theory

In the linear case reasonable noise conditions can be stated and convergence and convergence rates can be shown: Noise condition: ”Data noise has to be sufficiently irregular”

Introduction to Regularization

SLIDE 32

Nonlinear Case :Tikhonov Regularization

F(x) = y with F nonlinear Tikhonov Regularization for Nonlinear Problems

[Tikhonov, Arsenin ’77; Engl, Kunisch Neubauer, ’89; Neubauer ’89, . . . ]

xα is a (global) minimizer of the Tikhonov functional J(x) = F(x) − y2 + αR(x) R(x) is a regularization functional

Introduction to Regularization

SLIDE 33

Nonlinear Case :Tikhonov Regularization

Convergence (Rates) Theory: Hilbert spaces [Engl, Kunisch Neubauer ’89; Neubauer ’89] Banach spaces [Kaltenbacher, Hofmann, P¨

schl, Scherzer ’08]

Parameter Choice rules: a-priori: α = δξ a-posteriori: Discrepancy principle

Introduction to Regularization

SLIDE 34

Nonlinear Case :Tikhonov Regularization

Examples: Sobolev norm R(x) = x2

Hs

Total Variation R(x) =

|∇x|

L1-norm R(x) =

Maximum Entropy R(x) =

|x| log(x)

Introduction to Regularization

SLIDE 35

Nonlinear Case :Tikhonov Regularization

Choice of the Regularization functional: Deterministic Theory: User can choose: Should stabilize problem Convergence theory should apply R(x) should reflect what we expect from solution Bayesian viewpoint: Regularization functional ∼ prior

Introduction to Regularization

SLIDE 36

Nonlinear Case :Tikhonov Regularization

Computational issue: The regularized solution is a global minimum of a optimization problem xα is a (global) minimizer J(x) = F(x) − y2 + αR(x)

Introduction to Regularization

SLIDE 37

Iterative Methods

Example: Nonlinear Landweber iteration [Hanke, Neubauer, Scherzer ’95] xk+1 = xk − F ′(xk)∗(F(xk) − y) Parameter choice by choosing the stopping index. Convergence rates theory needs a nonlinearity condition F(x) − F(x†) − F ′(x†)(x − x†) ≤ CF(x) − F(x†) Restricts the nonlinearity of the problem Variants of a nonlinearity condition Range-invariance [Blaschke/Kaltenbacher ’96] Curvature condition [Chavent, Kunisch ’98] Variational inequalities [Kaltenbacher, Hofmann, P¨

schl, Scherzer ’08]

Faster alternative: Gauss-Newton type iterations [Bakushinskii ’92,

Blaschke, Neubauer, Scherzer ’97]

Introduction to Regularization

SLIDE 38

Summary

Theoretical issues: For a given inverse problem Understand ill-posedness (Uniqueness/Stability) Are data rich enough to characterize solution uniquely How unstable is the inverse problem (degree of ill-posedness) Method of Regularization + Parameter Choice Design efficient regularization method for class of problem Convergence, Convergence rates (optimal order), Interplay: Regularization, Discretization Practical issues: How to compute global optimum in TR (efficiently) Improving iterative methods (Newton-type, preconditioning) What Regularization term to choose

Introduction to Regularization

SLIDE 39

Dynamic Inverse Problems

Forward operator/Solution x(t) depend on time F(x(t′ ≤ t), t) = y(t)

Introduction to Regularization

SLIDE 40

Dynamic Inverse Problems

Examples: Volterra integral equation of the first kind t k(t, s)x(s)ds = y(t) Parameter identification in ODEs y′(t) = f (t, y(t), x(t)) Control theory z(t)′ = Az(t) + Bx(t) y(t)′ = Cz(t) + Dx(t)

Introduction to Regularization

SLIDE 41

Methods

Example: Tikhonov Regularization T F(t, x(., t)) − y(t)2dt + αR(x(t) + Convergence − Not causal/sequential: Computation of x(t) requires all data (past/future)

Introduction to Regularization

SLIDE 42

Methods

Alternative: Dynamic Programing [K.,Leitao ’06] x′(t) = G(x(t), V (t)) + Convergence − Only for linear problems − Partially causal/sequential: Computation of V (t) requires all data (past/future)

Introduction to Regularization

SLIDE 43

Methods

Control Theoretic Methods: Feedback control x(t) = Ky(t) (x(t), x′(t)) = Ky(t) −Convergence in x (Asymptotic convergence)? − Fully causal/sequential: Computation of x(t) requires only data (at t) + Nonlinear

Introduction to Regularization

SLIDE 44

Methods

Control Theoretic Methods: Kalman filter − Restrictive Assumptions on noise + Fully causal/sequential

Introduction to Regularization

SLIDE 45

Methods

Local Regularization [Lamm, Scofield ’01; Lamm ’03] xα(t) is given by an ODE related to Volterra equation + Fully causal/sequential + Convergence theory + Nonlinear − Quite specific method for Volterra equations

Introduction to Regularization

SLIDE 46

Methods

K¨ ugler’ online parameter identification [K¨

ugler ’08]

x′(t) = G(x(t))∗(F(x(t)) − y(t)) + Fully causal/sequential + Asymptotic convergence theory (also for nonlinear case) − Assumptions realistic ? − Assumes x does not depend on time

Introduction to Regularization

SLIDE 47

The wanted method

fully causal/sequential method convergence theory in the illposed and nonlinear case no/weak assumptions on operator no/weak assumptions on solution no assumption on noise efficient to compute

Introduction to Regularization