Course on Inverse Problems Albert Tarantola Lesson XVII: - - PowerPoint PPT Presentation

course on inverse problems
SMART_READER_LITE
LIVE PREVIEW

Course on Inverse Problems Albert Tarantola Lesson XVII: - - PowerPoint PPT Presentation

Institut de Physique du Globe de Paris & Universit Pierre et Marie Curie (Paris VI) Course on Inverse Problems Albert Tarantola Lesson XVII: Least-Squares Involving Functions mathematica notebook II: Functional Formulation The a


slide-1
SLIDE 1

Institut de Physique du Globe de Paris & Université Pierre et Marie Curie (Paris VI)

Course on Inverse Problems

Albert Tarantola

Lesson XVII: Least-Squares Involving Functions

slide-2
SLIDE 2

⇒ mathematica notebook

slide-3
SLIDE 3

II: Functional Formulation The a priori information is represented by mprior = {mprior(z)} ; Cprior = {Cprior(z, z′)} . The observable parameters are of the kind

  • i =
  • dz Oi(z) m(z)

, where, in our example O1(z) is a delta function and O2(z) is a box-car function. This corresponds to a linear relation between the model function m and the observations o :

  • = O m

. We have some observations

  • obs = {oi
  • bs}

; Cobs = {Cij

  • bs}

.

slide-4
SLIDE 4

The solution is provided by the standard least-squares equa- tion, that we just need to interpret. The first equation is mpost = mprior − Cprior Ot ( O Cprior Ot + Cobs )-1 ( O mprior − oobs ) , i.e., mpost = mprior − P Q r P = Cprior Ot S = O Cprior Ot + Cobs Q = S-1 r = O mprior − oobs

slide-5
SLIDE 5

This leads to mpost(z) = mprior(z) − ∑

i ∑ j

Pi(z) Qij rj Pi(z) =

  • dz′ Cprior(z, z′) Oi(z′)

Sij =

  • dz
  • dz′ Oi(z) Cprior(z, z′) Oj(z′) + Cij
  • bs

Q = S-1 ri =

  • dz Oi(z) mprior(z) − oi
  • bs
slide-6
SLIDE 6

The second equation is Cpost = Cprior − Cprior Ot ( O Cprior Ot + Cobs )-1 O Cprior , i.e., Cpost = Cprior − P Q Pt P = Cprior Ot S = O Cprior Ot + Cobs Q = S-1

slide-7
SLIDE 7

This leads to Cpost(z, z′) = Cprior(z, z′) − ∑

i ∑ j

Pi(z) Qij Pj(z′) Pi(z) =

  • dz′ Cprior(z, z′) Oi(z′)

Sij =

  • dz
  • dz′ Oi(z) Cprior(z, z′) Oj(z′) + Cij
  • bs

Q = S-1

⇒ mathematica notebook

slide-8
SLIDE 8

Let me bring a clarification. We have examined two of the many possible algorithms providing the mean posterior model npost . The first one was the steepest-descent algorithm nk+1 = nk − µ( Cprior Lt C-1

  • bs(L nk − tobs) + (nk − nprior) )

that could advantageously be replaced by a preconditioned steepest descent algorithm, nk+1 = nk − P ( Cprior Lt C-1

  • bs(L nk − tobs) + (nk − nprior) )

where P is a suitably chosen, but arbitrary, positive definite

  • perator. A good choice of P will accelerate convergence (but

not change the final result). Let us guess what P could be.

slide-9
SLIDE 9

The first iteration, when choosing n0 = nprior gives n1 = nprior − P Cprior Lt C-1

  • bs(L nprior − tobs)

We also saw the Newton algorithm npost = nprior − Cprior Lt (L CpriorLt + Cobs)-1 (L nprior − tobs) that, because the forward relation is here linear, converges in just one step. Equivalent to the last equation is (see lesson XI) npost = nprior − ( Lt C-1

  • bs L + C-1

prior )-1 Lt C-1

  • bs ( L nprior − oobs)

that can also be written as npost = nprior − Π Cprior Lt C-1

  • bs ( L nprior − oobs)

, with Π = ( Cprior (Lt C-1

  • bs L + C-1

prior) )-1 .

slide-10
SLIDE 10

The conclusion is that the more the (arbitrary) preconditioning

  • perator P is close to

Π = ( Cprior (Lt C-1

  • bs L + C-1

prior) )-1

, the faster the preconditioned steepest descent algorithm will converge (and in only one step if P = Π , but this is usually impossible). Simplest example: let be Q = Cprior (Lt C-1

  • bs L + C-1

prior) , so we

must use P ≈ Q-1 . Letting Q(x, x′) be the kernel of Q , one may choose to approximate Q-1 by the inverse of its diagonal elements, P(x, x′) = 1 Q(x, x) δ(x − x′) . Evaluating Q(x, x) grossly corresponds to “counting how ma- ny rays pass through point x ”.

slide-11
SLIDE 11

I have already explained that, in the context of least-squares, and given a covariance function C(z, z′) , the norm of any func- tion m = {m(z)} is to be defined via

m 2 = ( m , m ) = C-1 m , m

. Following a discussion by Prof. Mark Simons, I realized that I have not mentioned that when considering the exponential covariance function C(z, z′) = σ2 exp(−|z − z′| L

)

,

  • ne obtains the simple and interesting result (Tarantola, 2005,

page 311)

m 2 =

1 σ2 1 L

  • dz m(z)2 + L
  • dz

dm dz (z) 2 .

slide-12
SLIDE 12

This implies that when solving the basic least-squares prob- lem, i.e., the problem of finding the model mpost that mini- mizes the misfit function 2 S(m) = o(m) − oobs 2

Cobs + m − mprior 2 Cprior

= C-1

  • bs (o(m) − oobs) , (o(m) − oobs)

+ C-1

prior (m − mprior) , (m − mprior)

, we are imposing that mpost(z) − mprior(z) must be small, but we are also imposing that dmpost

dz (z) − dmprior dz

(z) must be small.

In particular, if mprior(z) is smooth, then mpost(z) must also be smooth. See page 316 of my book for the 3D case.

slide-13
SLIDE 13

In lesson XII, I introduced the definition of transpose operator: Let L be a linear operator from linear space A into linear space B . The transpose of L , denoted Lt , is a linear oper- ator from B∗ into A∗ , defined by the condition that for any β ∈ B∗ and for any a ∈ A ,

β , L a B = Lt β , a A

.

slide-14
SLIDE 14

Then, without demonstration, I gave two examples: If the expression b = L a means bi = ∑

I

LiI aI then, an expression like α = Lt β means αI = ∑

i

LiI βi . If the expression b = L a means b(t) =

  • dV(x) L(t, x) a(x)

, then, an expression like α = Lt β means α(x) =

  • dt L(t, x) β(t)

. Ozgun, like St. Thomas, asks for the demonstrations.

slide-15
SLIDE 15

Case s = L v ⇔ si = ∑α Liα vα . There must be two duality products:

ω , v = ∑

α

ωα vα ;

σ , s = ∑

i

σi si . The transpose operator, say M = Lt , will operate on some σ to give some ω : ω = M σ ⇔ ωα = ∑i Mαi σi . Relation be- tween Mαi and Liα ? For any v and any σ one must have

σ , L v = M σ , v

i.e.,

i

σi (L v)i = ∑

α

(M σ)α vα

i.e.,

i

σi (∑

α

Liα vα ) = ∑

α

(∑

i

Mαi σi ) vα i.e.,

i ∑ α

Liα σi vα = ∑

i ∑ α

Mαi σi vα , and this implies that Mαi

=

Liα (the matrix representing M = Lt is the transpose of the matrix representing L ).

slide-16
SLIDE 16

Note: this demonstration is trivial if using matrix notations. Case s = L v . There must be two duality products:

ω , v = ωt v

;

σ , s = σt s

. The transpose operator, say M , will operate on some σ to give some ω : ω = M σ . Relation between M and L ? For any v and any σ one must have

σ , L v = M σ , v

i.e., σt (L v) = (M σ)t v = (σt Mt) v i.e., σt L v = σt Mt v , and this implies that Mt = L , i.e., that M = Lt .

slide-17
SLIDE 17

Case s = L v ⇔ si = dz Li(z) v(z) . There must be two du- ality products:

ω , v =

  • dz ω(z) v(z)

;

σ , s = ∑

i

σi si . The transpose operator, say M = Lt , will operate on some σ to give some ω : ω = M σ ⇔ ω(z) = ∑i Mi(z) σi . Relation between Mi(z) and Li(z) ? For any v and any σ one must have

σ , L v = M σ , v

i.e.,

i

σi (L v)i =

  • dz (M σ)(z) v(z)

i.e.,

i

σi (

  • dz Li(z) v(z) ) =
  • dz (∑

i

Mi(z) σi ) v(z) i.e.,

i

  • dz Li(z) σi v(z) = ∑

i

  • dz Mi(z) σi v(z)

,

⇒ Mi(z) = Li(z) (the two operators L and M = Lt have

slide-18
SLIDE 18

the same kernels; when L operates, we make a sum (integral)

  • ver z ; when Lt operates, we make a (discrete) sum over i .
slide-19
SLIDE 19

Case s = L v ⇔ s(t) = ∑α Lα(t) vα . Please do it. Case s = L v ⇔ s(t) = dz L(t, z) v(z) . Please do it. Case s = L v ⇔ si(t, ϕ) = ∑α ∑β

  • dz Liαβ(t, ϕ, z) vαβ(z) .

Answer: ω = Lt σ ⇔ ωαβ(z) = ∑

i

  • dt
  • dϕ Li

αβ(t, ϕ, z) σi(t, ϕ)

slide-20
SLIDE 20

The transpose of the derivative operator The derivative operator D maps a space X of functions x =

{x(t)} into a space V of functions v = {v(t)} . It is defined

as v = D x

v(t) = dx dt (t) . It is obviously a linear operator. By definition, the transpose

  • perator Dt maps the dual of V (with functions that we may

denote ω = {ω(t)} ) into the dual of X (with functions that we may denote χ = {χ(t)} ). We don’t need to be interested in the interpretation of the two spaces X∗ and V∗ . I want to prove that, excepted for some boundary condition (to be found), the derivative operator is antisymmetric, i.e., Dt = −D .

slide-21
SLIDE 21

In other words, I want to prove the second of these two ex- pressions v = D x

v(t) = dx dt (t) χ = Dt ω

χ(t) = −dω dt (t) .

slide-22
SLIDE 22

The elements of the two spaces X and V are functions of the same variable t . Therefore the duality products in the two spaces are here quite similar

χ , x X =

t2

t1

dt χ(t) x(t) ;

ω , v V =

t2

t1

dt ω(t) v(t)

slide-23
SLIDE 23

By definition of the transpose of a linear operator, for any x and any ω , one must have

ω , D x = Dt ω , x

. Here, this gives

t2

t1

dt ω(t) (D x)(t) =

t2

t1

dt (Dt ω)(t) x(t) . To prove that if (D x)(t) = dx

dt (t) , then (Dt ω)(t) = − dω dt (t) ,

we must then prove that for any two functions x(t) and ω(t) ,

t2

t1

dt ω(t) dx dt (t) +

t2

t1

dt dω dt (t) x(t) = 0 .

slide-24
SLIDE 24

So, it is true that for any two functions x(t) and ω(t) one has Ψ =

t2

t1

dt ω(t) dx dt (t) +

t2

t1

dt dω dt (t) x(t) = 0 ? One has Ψ =

t2

t1

dt ω(t) dx dt (t) +

t2

t1

dt dω dt (t) x(t)

=

t2

t1

dt

  • ω(t) dx

dt (t) + dω dt (t) x(t)

  • =

t2

t1

dt d dt

  • ω(t) x(t)
  • = ω(t2) x(t2) − ω(t1) x(t1)

. So, one only has Ψ = 0 (and, therefore, Dt = −D ) if all the functions in the two spaces X and V∗ satisfy the dual boundary condition ω(t2) x(t2) = ω(t1) x(t1) .

slide-25
SLIDE 25

Dual boundary condition (recall): ω(t2) x(t2) = ω(t1) x(t1) . Example: If all the functions x(t) satisfy the zero initial condi- tion x(t1) = 0 , then, in order to have Dt = −D , all the func- tions in V∗ must satisfy the zero final condition ω(t2) = 0 . We shall always work with these dual boundary conditions. Exercise: Demonstrate that if all the functions on which the second derivative operator D2 operates satisfy the initial con- ditions of rest, x(t1) = 0 and dx

dt (t1) = 0 , and all the functions

α(t) on which its transpose (D2)t operates satisfy the final conditions of rest, α(t2) = 0 and dα

dt (t2) = 0 , then, the opera-

tor D2 is symmetric (D2)t = D2 .

slide-26
SLIDE 26

Tangent linear mapping. Fréchet derivative. Let ϕ be a (possibly nonlinear) mapping from the linear space A into the linear space B . The elements (vectors) of the spaces A and B may be functions. The linear mapping Φ0 is the tangent linear mapping to the mapping ϕ at a0 if ϕ(a0 + δa) = ϕ(a0) + Φ0 δa + . . . (where the “dots” mean terms that are second order in δa or higher). The Fréchet derivative (at a0 ) of the mapping ϕ is the kernel of Φ0 .

slide-27
SLIDE 27

Example: Let be b = ϕ(a)

b(t) =

  • dz Q(t, z) ( a(z) )2

. One has ϕ(a0 + δa)(t) =

  • dz Q(t, z) ( a0(z) + δa(z) )2

=

  • dz Q(t, z) ( a0(z)2 + 2 a0(z) δa(z) + . . . )

=

  • dz Q(t, z) a0(z)2 +
  • dz 2 a0(z) Q(t, z) δa(z) + . . .

= ϕ(a0)(t) + (Φ0 δa)(t) + . . .

Therefore, the tangent linear mapping to the mapping ϕ at a0(z) is the linear mapping that to any function δa(z) asso- ciates the function δb(t) =

  • dz 2 a0(z) Q(t, z)
  • Φ0(t,z)

δa(z) .

slide-28
SLIDE 28

The Fréchet derivative of the mapping ϕ at a0(z) is Φ0(t, z) = 2 a0(z) Q(t, z) .