SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable - - PowerPoint PPT Presentation

sio 230 introduction to geophysical inverse theory
SMART_READER_LITE
LIVE PREVIEW

SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable - - PowerPoint PPT Presentation

SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable cconstable@ucsd.edu IGPP Munk Lab 329 x43183 http://igppweb.ucsd.edu/~cathy/Classes/SIO230 Background Reading Sections 1-4 (pages 1-19) of Supplementary Notes


slide-1
SLIDE 1

SIO 230 Introduction to Geophysical Inverse Theory

Cathy Constable cconstable@ucsd.edu IGPP Munk Lab 329 x43183 http://igppweb.ucsd.edu/~cathy/Classes/SIO230

slide-2
SLIDE 2

Background Reading

  • Sections 1-4 (pages 1-19) of Supplementary

Notes

  • Geophysical Inverse Theory Chapter 1 and

Chapter 2.01-2.06

slide-3
SLIDE 3
  • Forward Problem
  • Inverse Problem (distinct from parameter estimation)
  • Non-uniqueness
  • Ill-posed problems
  • Simplification
  • Regularization
  • Uncertainty
  • Averaging Scale and Resolution
  • Additional Constraints
  • Probabilistic frameworks

Key Concepts

SIO 230: Lecture 2

slide-4
SLIDE 4

A Magnetic Example

  • Forward problem -predict magnetic anomaly data

B from seamount magnetization vector M

  • Inverse problem - find magnetization M from

measurements B

  • Magnetic annihilator m has no signal in the data
  • Enter Drastic Simplification - M(s) is uniform -

cannot fit the data

  • Regularization M(s) =U + R(s). What do we know

about U? Not much in linear problems like this.

  • Limit ambiguity with additional constraints - e.g.

size or direction of M, introduce a norm ||M||

SIO 230: Lecture 2

slide-5
SLIDE 5

Bathymetry Model

slide-6
SLIDE 6

Ship Track and Magnetic Anomaly

slide-7
SLIDE 7

Forward Problem

What forward problem is of interest to you?

slide-8
SLIDE 8
slide-9
SLIDE 9

Our Magnetic Example

  • Forward problem -predict magnetic anomaly data

B from seamount magnetization vector M

  • Inverse problem - find magnetization M from

measurements B

  • Magnetic annihilator m has no signal in the data
  • Enter Drastic Simplification - M(s) is uniform -

cannot fit the data

  • Regularization M(s) =U + R(s). What do we know

about U? Not much in linear problems like this.

  • Limit ambiguity with additional constraints - e.g.

size or direction of M, introduce a norm ||M||

SIO 230: Lecture 2

slide-10
SLIDE 10

Vector Spaces and Linear Algebra

  • Definition of a vector space - 9 axioms
  • Zero element
  • Vectors, Matrices, and Functions
  • Linear combinations

SIO 230: Lecture 2

slide-11
SLIDE 11

Essential Linear Algebra

  • Special Matrices - diagonal, banded, upper

triangular, sparse

  • Linear vector space examples, linear

transformations

  • Inner and outer products
  • Inverse of a matrix
  • Solving linear equations

SIO 230: Lecture 2

slide-12
SLIDE 12

Important Square Matrices

  • Symmetric, Skew-symmetric
  • Positive definite, Positive
  • Orthogonal
  • Normal
  • Projection
  • Diagonally Dominant

SIO 230: Lecture 2

slide-13
SLIDE 13

More linear algebra

  • Orthogonal matrices as generalization of

rotation and reflection to N-dim space - volume unchanged, and inner product of any 2 vectors is preserved

  • Householder matrix, A=QR factorization
  • Linear independence, basis , dimension
  • Range space, column space and rank of A
  • Full rank if rank(A)= min (m,n)
  • Null space of A: set of x s.t. Ax=0

SIO 230: Lecture 2

slide-14
SLIDE 14

Least Squares Problems

  • see page 20-31 of notes, page 12-53 of GIT,

Chapter 2 of Aster et al.

  • Norms
  • Overdetermined LS problems
  • Normal equations
  • Projection Theorem
  • Condition Number

SIO 230: Lecture 3

slide-15
SLIDE 15
  • Let Ax=Y.
  • Y is the orthogonal

projection of y into the subspace a

  • Py = Y
  • Given a subspace like a,

every vector can be written as a unique sum of 2 parts

  • ne part is in a, the

second is orthogonal to a

SIO 230: Lecture 3

Projection Theorem for Hilbert Spaces

=Y

slide-16
SLIDE 16

Constrained Minimization

  • see page 26-34 of notes, page 12-53 of GIT,

Chapter 3 of Aster et al.

  • Lagrange multipliers
  • Over-determined least squares, m>n -

smallest residual norm

  • Under-determined least squares, m<n -

smallest model norm

SIO 230: Lecture 4

slide-17
SLIDE 17

SIO 230: Lecture 4

slide-18
SLIDE 18

SIO 230: Lecture 4

slide-19
SLIDE 19

SIO 230: Lecture 5

slide-20
SLIDE 20

Review of the Near Bottom Magnetic Linear Inverse Problem Notes p. 35-39

SIO 230: Lecture 6

slide-21
SLIDE 21

The Deconvolution Problem Notes p. 40-42

Noise at high and low wavenumber will be amplified

SIO 230: Lecture 6

slide-22
SLIDE 22

Representer for the Magnetic problem

SIO 230: Lecture 7

slide-23
SLIDE 23

SIO 230: Lecture 7

slide-24
SLIDE 24

Discretizing the Model

SIO 230: Lecture 7

slide-25
SLIDE 25

Revisiting the Near Bottom Magnetic Linear Inverse Problem

SIO 230: Lecture 8

slide-26
SLIDE 26

Representers and Minimum 2-norm Model

Must enforce some misfit level to accommodate noise

SIO 230: Lecture 8

slide-27
SLIDE 27

For noisy data (read: all data), we need a measure of how well a given model fits. Sum of squares is the venerable way:

χ2 =

M

i=1

1 σ2

i

  • di − f(xi, m)

⇥2 χ2 = ||Wd − Wˆ d||2 W = diag(1/σ1, 1/σ2, ...., 1/σM) .

  • r

where W is a diagonal of reciprocal data errors We can remove the dependence on data number and use RMS:

RMS = p χ2/M .

Accommodating Misfit

SIO 230: Lecture 8

slide-28
SLIDE 28

Global EM Response

SIO 230: Lecture 8

slide-29
SLIDE 29

Travel Time Picks

SIO 230: Lecture 8

slide-30
SLIDE 30

Properties

slide-31
SLIDE 31

SIO 230: Lecture 8

Estimating the Noise Choose Model with Specific Misfit

slide-32
SLIDE 32

Trading off Model Norm Against Misfit

SIO 230: Lecture 9

Newton’s Method to find correct Lagrange Multiplier Hazards of the L-curve

slide-33
SLIDE 33
  • No way to ascribe pointwise uncertainty

in linear problems

  • Specify averaging scale - basis for

resolution in a solution; can also be used for nonlinear problems

  • Add an additional constraint - e.g. upper

bound, sign of derivative

  • Use a Bayesian or statistical approach

Resolution and Inference Chapter 4, GIT

slide-34
SLIDE 34

Resolution?

SIO 230: Lecture 10

slide-35
SLIDE 35

Review of the Near Bottom Magnetic Linear Inverse Problem Notes p. 35-39

SIO 230: Lecture 6

slide-36
SLIDE 36

Specify Average Magnetization

slide-37
SLIDE 37

Back to the Annihilator

slide-38
SLIDE 38

Ideal bodies & Maximum Depth Rules

Background Reading, GIT Chapter 4

  • Gravity/magnetics problem is fundamentally

non-unique, but can bound important properties like density contrast/ depth to anomaly

  • Small sample theory is given in Section 4.05
  • f GIT
  • Use the uniform norm to find the least

upper bound on positive density anomaly

  • No longer Hilbert space because of

different norm - Banach space

slide-39
SLIDE 39

Gravity example

slide-40
SLIDE 40

Linear Programming Solution

slide-41
SLIDE 41

Linear & Quadratic Programing

  • Note that proposed solution of the gravity

problem did not address existence question

  • Section 4.06 of GIT
  • Need Fundamental Theorem of LP and

feasible solutions

slide-42
SLIDE 42

Adding Uncertainty to LP

  • Uniform norm is easy - just add slack

variables, but…

  • 1-norm see GIT p. 263-264
  • 2-norm, needs quadratic programming, see
  • p. 75 of notes on NNLS and BVLS
slide-43
SLIDE 43

A linear programming example: Estimating geomagnetic reversal rates, R(t)

Cumulative number of reversals as a function of time Reversal rate function constructed from adaptive Gaussian kernels. Symbols are 50 point sliding window estimates.

Data are reversal occurrence times, tj, j=1,...N

slide-44
SLIDE 44

What if the Cretaceous Normal Superchron is special?

slide-45
SLIDE 45

Is reversal rate monotonic over some interval?

Constable (2000), Physics Earth Planet. Inter., 118, 181-193

Confidence interval for the cdf based on sup norm Bounding monotonic rate functions using LP

slide-46
SLIDE 46

BVLS

slide-47
SLIDE 47

NNLS and BVLS

slide-48
SLIDE 48

Mapping the Electric Field

slide-49
SLIDE 49

Measuring E-fields with an AUV

slide-50
SLIDE 50

Location and Topography

slide-51
SLIDE 51

Electric Field

slide-52
SLIDE 52

Mapping the Electric Field

slide-53
SLIDE 53
slide-54
SLIDE 54
slide-55
SLIDE 55

Electric Field

slide-56
SLIDE 56
slide-57
SLIDE 57

Map of Self Potential

slide-58
SLIDE 58

A Simple Electric Dipole Model

slide-59
SLIDE 59

Nonlinear Equations

  • Newton’s Method for g(x)=0, finds

crossing point for tangent line

  • In n-dimension derivative becomes the

Jacobian matrix, J, and there are n surfaces each with a tangent plane

  • Modified Newton approximates J at each

iteration with the same derivative matrix

  • Quasi-Newton updates J with step

information from change in x. BFGS will maintain symmetry in J.

slide-60
SLIDE 60

To invert non-linear forward problems we often linearize around a starting model: ˆ d = f(m1) = f(m0 + ∆m) ≈ f(m0) + J∆m Jij = ∂f(xi, m0) ∂mj ∆m = m1 − m0 = (δm1, δm2, ...., δmN) χ2 ≈ ||Wd − Wf(m0) + WJ∆m||2 using a matrix of derivatives and a model perturbation Now our expression for χ2 χ2 = ||Wd − Wˆ d||2 Is then

slide-61
SLIDE 61

For a least squares solution we solve in the usual way by differentiating and setting to zero to get a linear system: where So, given a starting model we can find an update and iterate until we converge. (This is Gauss-Newton.)

β = α∆m β = (WJ)T W(d − f(m0)) α = (WJ)T WJ . m0 ∆m ∆m = α−1β

This can only work for small N (it isn’t even defined for N > M ). Even for very sparse parameterizations, it rarely works without modification.

slide-62
SLIDE 62

You need to start “linearly close” to the solution for this to work. Long, thin, valleys in space are common and present problems for Gauss-Newton methods.

χ2

χ2 min Start here and you will diverge Start here and you will get to the solution rapidly χ2

slide-63
SLIDE 63

Another approach is “steepest descent”, in which you go down the gradient of the contours of : χ2

χ2

χ2 min ∆m = µ(WJ)T [W(d − f(m0))] These solutions are of the form but the steps are always orthogonal and the step size is proportional to the slope, so this method stalls near the solution

slide-64
SLIDE 64

Steepest Descent example from notes

slide-65
SLIDE 65

When is large, this reduces to steepest descent. When is small, it reduces to the Newton method. Starting with large and then reducing it close to the solution works very well. For problems that are naturally discretely parameterized, Marquardt is hard to beat. For sparse parameterizations of infinite dimensional models, the parameterization (e.g. number of layers chosen) has a big influence on the outcome. The Marquardt method combines the steepest descent method and Newton method in one algorithm by modifying the curvature matrix:

αjk = αjk for j = k . λ λ λ αjk = αjk(1 + λ) for j = k

slide-66
SLIDE 66

Global versus local minima: For nonlinear problems, there are no guarantees that Gauss-Newton will converge. There are no guarantees that if it does converge the solution is a global one. The solution might well depend on the starting model. local minimum global minimum (maybe)

slide-67
SLIDE 67

If you increase N too much, even with the Marquardt approach the solution goes unstable. If N is big then the solutions become unstable, oscillatory, and generally useless (they are probably trying to converge to D+ type solutions). where is some measure of the model and is a trade-off parameter or Lagrange multiplier. Almost all inversion today incorporates some type of regularization, which minimizes some aspect of the model as well as fit to data:

U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2⇥

Rm

µ

slide-68
SLIDE 68

In 1D a typical might be:

R

U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2⇥

R1 =          −1 1 . . . −1 1 . . . −1 1 . . . ... ... −1 1         

which extracts a measure of slope. This stabilizes the inversion, creates a unique solution, and manufactures models with useful properties. This is easily extended to 2D and 3D modelling.

m1 m2 m3 m4 m5 m6 m7 m8

  • 1

+1

  • 1

+1

  • 1

+1

  • 1

+1

  • 1

+1

  • 1

+1

  • 1

+1

slide-69
SLIDE 69

When is small, model roughness is ignored and we try to fit the data. When is large, we smooth the model at the expense of data fit. One approach is to choose and minimize by least squares. There are various sets of machinery to do this (Newton, quasi-Newton, conjugate gradients, etc.). With many of these methods must be chosen by trial and error, increasing the computational burden and introducing some subjectivity.

U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2⇥ µ U µ µ µ

Picking a priori is simply choosing how rough your model is compared to the data misfit. But, we’ve no idea how rough our model should be. However, we ought to have a decent idea of how well our data can be fit.

µ

The Occam approach is to introduce some acceptable fit to the data ( ) and minimize:

χ2

U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2 − χ2

slide-70
SLIDE 70

U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2 − χ2

Linearizing: The only thing we need is to find the right value for . (Note we are solving for the next model m1 directly instead of . Bob Parker calls these “leaping” and “creeping” algorithms.)

µ

After differentiation and setting to zero we get

m1 =

  • µRT R + (WJ)T WJ

⇥−1(WJ)T W(d − f(m0) + Jm0) U = ||Rm1||2 + µ−1 ||Wd − W

  • f(m0) + J(m1 − m0)

⇥ ||2 − χ2

⇥ ∆m

slide-71
SLIDE 71

Occam finds by carrying out a line search to find the ideal value. Before is reached, we minimize . After is reached we choose the which gives us exactly .

χ2

χ2

χ2 µ χ2

χ2

χ2 µ µ

slide-72
SLIDE 72
slide-73
SLIDE 73

Non-linear Gravity Problem

slide-74
SLIDE 74

For zero-mean, Gaussian, independent errors, the sum-square misfit

χ2 = ||Wd − Wˆ d||2

is chi-squared distributed with M degrees of freedom. The expectation value is just M, which corresponds to an RMS of one, and so this could be a reasonable target misfit. Or, one could look up the 95% (or other) confidence interval for chi-squared M.

RMS = 1 RMS = 1.36

How to choose ?

χ2

slide-75
SLIDE 75

So, if our errors are well estimated and well behaved, this provides a statistical guideline for choosing .

χ2

Errors come from

  • statistical processing errors
  • systematic errors such as instrument calibrations, and
  • “geological or geophysical noise” (our inability to parameterize fine

details of geology or extraneous physical processes). Instrument noise should be captured by processing errors, but some error models assume stationarity (i.e. noise statistics don’t vary with time). In practice, we only have a good handle on processing errors - everything else is lumped into a noise floor.

slide-76
SLIDE 76

Target misfit 2.4 Target misfit 2.0 Target misfit 1.5 Target misfit 1.3 Target misfit 1.2 Target misfit 1.1

Even with well-estimated errors, choice of misfit can still be somewhat subjective. Joint 2D inversion of marine CSEM (3 frequencies, no phase) and MT (Gemini salt prospect, Gulf of Mexico):

slide-77
SLIDE 77

Beware of trade-off (“L”) curves:

100 200 300 400 500 600 700 800 1 1.5 2 2.5 2.4 2.0 1.7 1.5 1.3 1.2 1.1 Roughness measure RMS misfit

A

RMS 1.4

slide-78
SLIDE 78

Beware of trade-off (“L”) curves: (they are not as objective as proponents say...)

100 200 300 400 500 600 700 800 1 2 3 4 5 6 7 8 9 10 8.75 2.4 2.0 1.7 1.5 1.3 1.2 1.1 Roughness measure RMS misfit 100 200 300 400 500 600 700 800 1 1.5 2 2.5 2.4 2.0 1.7 1.5 1.3 1.2 1.1 Roughness measure RMS misfit

A B

RMS 1.4 RMS 1.9

slide-79
SLIDE 79

Regularization

Solution on the trade-off curve

Choice of Misfit Again

Regularization

Example of core surface field trade-off curve

Regularization

Solution on the trade-off curve

Regularization

Solution on the trade-off curve

Regularization

Solution on the trade-off curve

Regularization

Solution on the trade-off curve

slide-80
SLIDE 80

Bayesian Methods

  • Read Chapter 11 of Aster et al. - many

figures drawn from there

  • Read Appendix B if you need a refresher on

Statistics and Maximum Likelihood, especially sections on Conditional Probability and Bayes Theorem. Or consult notes from SIO223A (Chapters 4 and 5)

  • Bayesian Methods treat the model as a

random variable

slide-81
SLIDE 81

Travel Time Picks

SIO 230: Lecture 8

slide-82
SLIDE 82
  • For a Bayesian approach to the travel time

problem read Malinverno & Parker (2006, doi:10.1190/1.2194516),Two ways to quantify uncertainty in geophysical inverse problems.

slide-83
SLIDE 83

Shaw diffraction intensity problem

Aster et al., 2013

slide-84
SLIDE 84

Aster et al., 2013

slide-85
SLIDE 85

Aster et al., 2013

slide-86
SLIDE 86

Aster et al., 2013

slide-87
SLIDE 87

Aster et al., 2013

slide-88
SLIDE 88

Aster et al., 2013

slide-89
SLIDE 89

Aster et al., 2013

slide-90
SLIDE 90

Monte Carlo Methods in Inversion

For details see Sambridge & Mosegaard, 2002, Rev Geophysics, doi:10.1029/2000RG000089

  • Direct search method
  • Use pseudorandom sampling
  • includes random sampling

from highly nonuniform multidimensional distributions

slide-91
SLIDE 91

Sambridge & Mosegaard, 2002

slide-92
SLIDE 92
slide-93
SLIDE 93

Simulated Annealing

  • general purpose global optimization

method - based on Metropolis sampling algorithm

  • numerous geophysical applications in
  • ptimization problems since mid 1980s
  • Annealing schedule controls characteristics
  • f sampling
slide-94
SLIDE 94

Genetic Algorithms

  • not originally designed as function
  • ptimizers
  • broad class of methods
  • 1st used by geophysicists as global
  • ptimizers in 1990s. Several papers on

seismic waveform fitting

  • search involves control parameters and

tuning for each problem may be non-trivial

slide-95
SLIDE 95

Other global optimization

  • Evolutionary programming
  • Tabu search
  • Neighborhood algorithm direct search
  • Parallel Tempering
slide-96
SLIDE 96

Ensemble Inference

  • How to assess tradeoffs, constraints, and

resolution in multimodal nonlinear problems

  • Bayesian approach - Tarantola and

Valette (1982)

  • Bayesian inference named for Bayes (1763)
slide-97
SLIDE 97

Linearization or Monte Carlo?

  • depends on (i) complexity of data-model

relationship, (ii) number of unknowns, (iii) computational resources

  • Some advantages in stability from not

relying on model perturbations or matrix inversions

  • Appraisal of solution is usually more

reliable with MC - don’t use linearized derivatives for model covariance and resolution estimates

slide-98
SLIDE 98

MC issues

  • MC only applicable to discretized problems
  • Large number of unknowns renders direct

search impractical

  • Which MC methods should be used?
slide-99
SLIDE 99

Generating Random Samples

  • Pseudo random deviates vs

quasi random sequences

Sambridge & Mosegaard, 2002

slide-100
SLIDE 100

Trading off exploration and exploitation

Sambridge & Mosegaard, 2002

Smoothly varying near quadratic

  • bjective functions could converge

rapidly with Newton-type descent method. Highly nonlinear problems with multiple minima in objective function better approached with a combination of exploration and exploitation

slide-101
SLIDE 101

MC Sampling Methods

  • Uniform Sampling - no concentration of

sampling; “curse of dimensionality” leads to under sampling parameter space

  • Simulated annealing
  • MCMC: Metropolis-Hastings and Gibbs - 2

algorithms for generating samples of multidimensional probability distributions

slide-102
SLIDE 102

Trans-dimensional Inverse Problems

Sambridge et al, 2006, GJI, doi: 10.1111/j.1365-246X.2006.03155.x

slide-103
SLIDE 103
slide-104
SLIDE 104
slide-105
SLIDE 105

Data model Data errors and misfit choice Regularization Priors/ constraints geology

Finally, remember that geophysical inversion for model construction depends on much more than the data alone:

Geophysical inversion algorithm Model Parameterization