Limits to Nonlinear Inversion Klaus Mosegaard Univ. of Copenhagen - - PowerPoint PPT Presentation

limits to nonlinear inversion
SMART_READER_LITE
LIVE PREVIEW

Limits to Nonlinear Inversion Klaus Mosegaard Univ. of Copenhagen - - PowerPoint PPT Presentation

Limits to Nonlinear Inversion Klaus Mosegaard Univ. of Copenhagen September 2008 Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 1 / 37 Outline (and basic theses to be substantiated) 1 The most di ffi cult


slide-1
SLIDE 1

Limits to Nonlinear Inversion

Klaus Mosegaard

  • Univ. of Copenhagen

September 2008

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 1 / 37

slide-2
SLIDE 2

Outline (and basic theses to be substantiated)

1 The most difficult task: To find a solution!. 2 Once the solutions are found, evaluation of uncertainties is usually

relatively easy.

3 If the inversion algorithm has not converged properly to the

solution(s), this is the most significant source of uncertainty!

4 The futility of blind inversion - the use of general purpose algorithms. 5 Inversion algorithms built for the specific problem perform better!

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 2 / 37

slide-3
SLIDE 3

The most difficult problem: To find a solution!

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 3 / 37

slide-4
SLIDE 4

The logic of Data Analysis

M

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 4 / 37

slide-5
SLIDE 5

The logic of Data Analysis

M

M(p)

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 5 / 37

slide-6
SLIDE 6

The logic of Data Analysis

M

M(d )

1

M(p)

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 6 / 37

slide-7
SLIDE 7

The logic of Data Analysis

M

M(d )

1

M(d )

2

M(p)

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 7 / 37

slide-8
SLIDE 8

The logic of Data Analysis

M

M(d )

1

M(d )

2

M(p) M(s) = M(d ) « M(d ) « M(p)

1 2

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 8 / 37

slide-9
SLIDE 9

The Bayesian view

Define indicator functions: Lj(m) = 1 if m ∈ M(dj) 0 otherwise ρ(m) = 1 if m ∈ M(p) 0 otherwise σ(m) = 1 if m is a solution 0 otherwise then σ(m) = L1(m) . . . LN(m)

  • L(m|d)

ρ(m) “Softening” the indicator functions to probability densities leaves us with Bayes’ Rule.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 9 / 37

slide-10
SLIDE 10

The Deterministic view

Models consistent with one datum usually reside in a “narrow neighbourhood” of a manifold with dimension Dim(M) − 1

M

M(d )

1

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 10 / 37

slide-11
SLIDE 11

The Deterministic view

Models consistent with N independent data usually reside in a “narrow neighbourhood” of a manifold with dimension Dim(M) − N

M

M(d )

1

M(d )

2

M(d « d )

1 2

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 11 / 37

slide-12
SLIDE 12

The “Curse of Dimensionality”

The volume of the solution space decreases at least exponentially with the number of independent data

Dimension

2 4 6 8 10 0.2 0.4 0.6 0.8 1

1 2 3 4 5 6 7 8 9 10 11

Volume hypersphere / Volume hypercube

0.6 0.4 0.2 0.0 0.8 1.0

2R 4 3 !R3 !R2 2R (2R)2 (2R)2n+1 (2R)2n (2R)3 !n 2n+1 (2n+1)!! R2n+1 !n R2n n! (...) (...)

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 12 / 37

slide-13
SLIDE 13

Preliminary observations

Let Dim(M) : Dimension of model parameter space Dim(D) : Dimension of data space Dim(P) : Number of independent a priori constraints

Observation 1 Given the path to a point in the solution space, the

search time along the path is only weakly dependent on Dim(M), Dim(D) and Dim(P).

Observation 2

Given no information about the solution space, the random search time increases at least exponentially with Dim(M) + Dim(D) + Dim(P) (1) when Dim(M) ≥ Dim(D)

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 13 / 37

slide-14
SLIDE 14

Once the solutions are found, evaluation of uncertainties, is usually relatively easy!

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 14 / 37

slide-15
SLIDE 15

Search and sampling

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 15 / 37

slide-16
SLIDE 16

Search and sampling

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 16 / 37

slide-17
SLIDE 17

Search and sampling

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 17 / 37

slide-18
SLIDE 18

Search and sampling

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 18 / 37

slide-19
SLIDE 19

If the inversion algorithm has not converged properly to the solution(s), this is the most significant source of uncertainty!

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 19 / 37

slide-20
SLIDE 20

Incomplete convergence

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 20 / 37

slide-21
SLIDE 21

The futility of blind inversion

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 21 / 37

slide-22
SLIDE 22

The question

Which one of the following general purpose algorithms is the most efficient? Simulated Annealing, Metropolis Algorithm, Random Search, Rejection Sampling, Genetic Algorithm, Taboo Search, Neighbourhood Algorithm, . . .

?

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 22 / 37

slide-23
SLIDE 23

A different viewpoint:

Double-discrete Analysis of Inverse Problems

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 23 / 37

slide-24
SLIDE 24

Double-discrete data analysis

Here, we shall assume that model parameters are doubly discrete: There is a finite number of model parameters (this is the usual assumption in parameter estimation) Model parameters can only take a finite number of parameter values!

Figure: Original image, image with few pixels, and image with few color levels

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 24 / 37

slide-25
SLIDE 25

How fine a discretization is needed for an inverse problem?

The misfit function f(m) usually inherits continuity from d = g(m), e.g., f(m) = d − g(m)2 2σ2 Now we can define a grid of points representing small regions ∆m1∆m2 . . . of almost constant f(m).

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 25 / 37

slide-26
SLIDE 26

How fine a discretization of parameters values is needed?

Figure: The Victoria Crater in 256 colors, 16 colors, and 4 colors.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 26 / 37

slide-27
SLIDE 27

Example: Seismic reflection data

∆mi < 2σ2ǫ/w2, where σ is the standard deviation of the noise, ǫ is the desired fractional change in misfit over ∆mi, and w is the seismic wavelet.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 27 / 37

slide-28
SLIDE 28

The discrete counterpart to ”The Curse of Dimensionality”

The inverse problem: d1 = g1(m1, m2 . . . , mM) d2 = g2(m1, m2 . . . , mM) . . . dK = gK(m1, m2 . . . , mM) Here, we can freely chose one out of N values for M − K model

  • parameters. This can be done in NM−K ways.

After this we have K equation with K unknowns left, and they may have a solution in one, several or all of the above NM−K cases.

Proposition

The curse of combinatorics. K data reduce the solution space by a factor ≤ N−K

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 28 / 37

slide-29
SLIDE 29

A Double-discrete Analysis of the Performance of Inversion Algorithms

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 29 / 37

slide-30
SLIDE 30

The typical scenario for nonlinear inversion

In the relations di = gi(m). we have no closed-form mathematical expression for gi(m). We only have a programme that is able to evaluate gi(m) for given values of the parameters in m. In short:

We are performing a blind search for the solution.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 30 / 37

slide-31
SLIDE 31

Notation 1

Two finite sets X and Y , The set FX of all fit functions/probability distributions f : X → Y .

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 31 / 37

slide-32
SLIDE 32

Notation 2

A sample of size m < |X|: {(x1, y1), . . . , (xm, ym)}. The set FX|C of all fit functions/probability distributions defined on X, but with fixed values in C.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 32 / 37

slide-33
SLIDE 33

Proposition

The total number of functions intersecting the m samples is |FX|C| = |Y ||X|−m. (2) This number is independent of the location of the sample points. The probability that an algorithm a sees the values y1, . . . , ym in the first m steps is then P(y1, . . . , ym|f, m, a) = |Y ||X|−m |Y ||X| = |Y |−m (3) This number is independent of the algorithm.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 33 / 37

slide-34
SLIDE 34

The No-Free-Lunch Theorem adapted to inversion

Theorem

NFL (Wolpert and Macready, 1995) For f ∈ FX and any pair of algorithms a1 and a2, P(y1, . . . , ym|f, m, a1) = P(y1, . . . , ym|f, m, a2) (4) where P(·|·) denotes conditional probability.

Corollary

(NFL for optimization) When all fit functions are equally probable (blind inversion), the distribution of any performance measure Φ(y1, . . . , ym) for inversion is exactly the same for all inversion algorithms. A simple performance measure for inversion could be Φ(y1, . . . , ym) = max{y1, . . . , ym} which must be large for good performance.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 34 / 37

slide-35
SLIDE 35

Critique of the NFL theorem

Postulate

Our fit functions (misfit functions or probability densities) belong to a narrow family of functions (e.g., smooth functions), and some algorithms work better than others on such families! So, the situation is different from the NFL-scenario: We have a narrow set of functions (albeit unknown to the algorithm). We can, however, extend the NFL Theorem to the following

Theorem

The average performance over all fit function families is exactly the same for all inversion algorithms.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 35 / 37

slide-36
SLIDE 36

Conclusion

The efficiency of all blind inversion schemes: Simulated Annealing, Metropolis Algorithm, Genetic Algorithm, Taboo Search, Neighbourhood Algorithm, . . . , when averaged over alle conceivable inverse problems, are exactly the same.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 36 / 37

slide-37
SLIDE 37

A final corollary

Corollary

Only an algorithm adapted to the specific problem has a chance of performing better than a random search. In fact, the following theorem can be demonstrated:

Theorem

A step length of 2n + 1, where n is the correlation distance of the fit function, is optimally reducing the set of possible solutions.

Klaus Mosegaard (Univ. of Copenhagen) Limits to Nonlinear Inversion September 2008 37 / 37