SIO 230 Introduction to Geophysical Inverse Theory
Cathy Constable cconstable@ucsd.edu IGPP Munk Lab 329 x43183 http://igppweb.ucsd.edu/~cathy/Classes/SIO230
SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable - - PowerPoint PPT Presentation
SIO 230 Introduction to Geophysical Inverse Theory Cathy Constable cconstable@ucsd.edu IGPP Munk Lab 329 x43183 http://igppweb.ucsd.edu/~cathy/Classes/SIO230 Background Reading Sections 1-4 (pages 1-19) of Supplementary Notes
Cathy Constable cconstable@ucsd.edu IGPP Munk Lab 329 x43183 http://igppweb.ucsd.edu/~cathy/Classes/SIO230
SIO 230: Lecture 2
B from seamount magnetization vector M
measurements B
cannot fit the data
about U? Not much in linear problems like this.
size or direction of M, introduce a norm ||M||
SIO 230: Lecture 2
B from seamount magnetization vector M
measurements B
cannot fit the data
about U? Not much in linear problems like this.
size or direction of M, introduce a norm ||M||
SIO 230: Lecture 2
SIO 230: Lecture 2
SIO 230: Lecture 2
SIO 230: Lecture 2
SIO 230: Lecture 2
Chapter 2 of Aster et al.
SIO 230: Lecture 3
projection of y into the subspace a
every vector can be written as a unique sum of 2 parts
second is orthogonal to a
SIO 230: Lecture 3
=Y
Chapter 3 of Aster et al.
SIO 230: Lecture 4
SIO 230: Lecture 4
SIO 230: Lecture 4
SIO 230: Lecture 5
SIO 230: Lecture 6
Noise at high and low wavenumber will be amplified
SIO 230: Lecture 6
SIO 230: Lecture 7
SIO 230: Lecture 7
SIO 230: Lecture 7
SIO 230: Lecture 8
Must enforce some misfit level to accommodate noise
SIO 230: Lecture 8
For noisy data (read: all data), we need a measure of how well a given model fits. Sum of squares is the venerable way:
χ2 =
M
⇤
i=1
1 σ2
i
⇥2 χ2 = ||Wd − Wˆ d||2 W = diag(1/σ1, 1/σ2, ...., 1/σM) .
where W is a diagonal of reciprocal data errors We can remove the dependence on data number and use RMS:
RMS = p χ2/M .
SIO 230: Lecture 8
SIO 230: Lecture 8
SIO 230: Lecture 8
SIO 230: Lecture 8
Estimating the Noise Choose Model with Specific Misfit
SIO 230: Lecture 9
Newton’s Method to find correct Lagrange Multiplier Hazards of the L-curve
SIO 230: Lecture 10
SIO 230: Lecture 6
Background Reading, GIT Chapter 4
Cumulative number of reversals as a function of time Reversal rate function constructed from adaptive Gaussian kernels. Symbols are 50 point sliding window estimates.
Constable (2000), Physics Earth Planet. Inter., 118, 181-193
Confidence interval for the cdf based on sup norm Bounding monotonic rate functions using LP
To invert non-linear forward problems we often linearize around a starting model: ˆ d = f(m1) = f(m0 + ∆m) ≈ f(m0) + J∆m Jij = ∂f(xi, m0) ∂mj ∆m = m1 − m0 = (δm1, δm2, ...., δmN) χ2 ≈ ||Wd − Wf(m0) + WJ∆m||2 using a matrix of derivatives and a model perturbation Now our expression for χ2 χ2 = ||Wd − Wˆ d||2 Is then
For a least squares solution we solve in the usual way by differentiating and setting to zero to get a linear system: where So, given a starting model we can find an update and iterate until we converge. (This is Gauss-Newton.)
β = α∆m β = (WJ)T W(d − f(m0)) α = (WJ)T WJ . m0 ∆m ∆m = α−1β
This can only work for small N (it isn’t even defined for N > M ). Even for very sparse parameterizations, it rarely works without modification.
You need to start “linearly close” to the solution for this to work. Long, thin, valleys in space are common and present problems for Gauss-Newton methods.
χ2 min Start here and you will diverge Start here and you will get to the solution rapidly χ2
Another approach is “steepest descent”, in which you go down the gradient of the contours of : χ2
χ2 min ∆m = µ(WJ)T [W(d − f(m0))] These solutions are of the form but the steps are always orthogonal and the step size is proportional to the slope, so this method stalls near the solution
When is large, this reduces to steepest descent. When is small, it reduces to the Newton method. Starting with large and then reducing it close to the solution works very well. For problems that are naturally discretely parameterized, Marquardt is hard to beat. For sparse parameterizations of infinite dimensional models, the parameterization (e.g. number of layers chosen) has a big influence on the outcome. The Marquardt method combines the steepest descent method and Newton method in one algorithm by modifying the curvature matrix:
αjk = αjk for j = k . λ λ λ αjk = αjk(1 + λ) for j = k
Global versus local minima: For nonlinear problems, there are no guarantees that Gauss-Newton will converge. There are no guarantees that if it does converge the solution is a global one. The solution might well depend on the starting model. local minimum global minimum (maybe)
If you increase N too much, even with the Marquardt approach the solution goes unstable. If N is big then the solutions become unstable, oscillatory, and generally useless (they are probably trying to converge to D+ type solutions). where is some measure of the model and is a trade-off parameter or Lagrange multiplier. Almost all inversion today incorporates some type of regularization, which minimizes some aspect of the model as well as fit to data:
U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2⇥
Rm
µ
In 1D a typical might be:
R
U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2⇥
R1 = −1 1 . . . −1 1 . . . −1 1 . . . ... ... −1 1
which extracts a measure of slope. This stabilizes the inversion, creates a unique solution, and manufactures models with useful properties. This is easily extended to 2D and 3D modelling.
m1 m2 m3 m4 m5 m6 m7 m8
+1
+1
+1
+1
+1
+1
+1
When is small, model roughness is ignored and we try to fit the data. When is large, we smooth the model at the expense of data fit. One approach is to choose and minimize by least squares. There are various sets of machinery to do this (Newton, quasi-Newton, conjugate gradients, etc.). With many of these methods must be chosen by trial and error, increasing the computational burden and introducing some subjectivity.
U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2⇥ µ U µ µ µ
Picking a priori is simply choosing how rough your model is compared to the data misfit. But, we’ve no idea how rough our model should be. However, we ought to have a decent idea of how well our data can be fit.
µ
The Occam approach is to introduce some acceptable fit to the data ( ) and minimize:
χ2
∗
U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2 − χ2
∗
⇥
U = ||Rm1||2 + µ−1 ||Wd − Wf(m1)||2 − χ2
∗
⇥
Linearizing: The only thing we need is to find the right value for . (Note we are solving for the next model m1 directly instead of . Bob Parker calls these “leaping” and “creeping” algorithms.)
µ
After differentiation and setting to zero we get
m1 =
⇥−1(WJ)T W(d − f(m0) + Jm0) U = ||Rm1||2 + µ−1 ||Wd − W
⇥ ||2 − χ2
∗
⇥ ∆m
Occam finds by carrying out a line search to find the ideal value. Before is reached, we minimize . After is reached we choose the which gives us exactly .
χ2
∗
χ2
∗
χ2 µ χ2
∗
χ2
∗
χ2 µ µ
For zero-mean, Gaussian, independent errors, the sum-square misfit
χ2 = ||Wd − Wˆ d||2
is chi-squared distributed with M degrees of freedom. The expectation value is just M, which corresponds to an RMS of one, and so this could be a reasonable target misfit. Or, one could look up the 95% (or other) confidence interval for chi-squared M.
RMS = 1 RMS = 1.36
How to choose ?
χ2
∗
So, if our errors are well estimated and well behaved, this provides a statistical guideline for choosing .
χ2
∗
Errors come from
details of geology or extraneous physical processes). Instrument noise should be captured by processing errors, but some error models assume stationarity (i.e. noise statistics don’t vary with time). In practice, we only have a good handle on processing errors - everything else is lumped into a noise floor.
Target misfit 2.4 Target misfit 2.0 Target misfit 1.5 Target misfit 1.3 Target misfit 1.2 Target misfit 1.1
Even with well-estimated errors, choice of misfit can still be somewhat subjective. Joint 2D inversion of marine CSEM (3 frequencies, no phase) and MT (Gemini salt prospect, Gulf of Mexico):
Beware of trade-off (“L”) curves:
100 200 300 400 500 600 700 800 1 1.5 2 2.5 2.4 2.0 1.7 1.5 1.3 1.2 1.1 Roughness measure RMS misfit
A
RMS 1.4
Beware of trade-off (“L”) curves: (they are not as objective as proponents say...)
100 200 300 400 500 600 700 800 1 2 3 4 5 6 7 8 9 10 8.75 2.4 2.0 1.7 1.5 1.3 1.2 1.1 Roughness measure RMS misfit 100 200 300 400 500 600 700 800 1 1.5 2 2.5 2.4 2.0 1.7 1.5 1.3 1.2 1.1 Roughness measure RMS misfit
A B
RMS 1.4 RMS 1.9
Regularization
Solution on the trade-off curve
Choice of Misfit Again
Regularization
Example of core surface field trade-off curve
Regularization
Solution on the trade-off curve
Regularization
Solution on the trade-off curve
Regularization
Solution on the trade-off curve
Regularization
Solution on the trade-off curve
SIO 230: Lecture 8
Aster et al., 2013
Aster et al., 2013
Aster et al., 2013
Aster et al., 2013
Aster et al., 2013
Aster et al., 2013
Aster et al., 2013
For details see Sambridge & Mosegaard, 2002, Rev Geophysics, doi:10.1029/2000RG000089
from highly nonuniform multidimensional distributions
Sambridge & Mosegaard, 2002
Sambridge & Mosegaard, 2002
Sambridge & Mosegaard, 2002
Smoothly varying near quadratic
rapidly with Newton-type descent method. Highly nonlinear problems with multiple minima in objective function better approached with a combination of exploration and exploitation
Sambridge et al, 2006, GJI, doi: 10.1111/j.1365-246X.2006.03155.x
Data model Data errors and misfit choice Regularization Priors/ constraints geology
Finally, remember that geophysical inversion for model construction depends on much more than the data alone:
Geophysical inversion algorithm Model Parameterization