course on inverse problems
play

Course on Inverse Problems Albert Tarantola Lesson XVII: - PowerPoint PPT Presentation

Institut de Physique du Globe de Paris & Universit Pierre et Marie Curie (Paris VI) Course on Inverse Problems Albert Tarantola Lesson XVII: Least-Squares Involving Functions mathematica notebook II: Functional Formulation The a


  1. Institut de Physique du Globe de Paris & Université Pierre et Marie Curie (Paris VI) Course on Inverse Problems Albert Tarantola Lesson XVII: Least-Squares Involving Functions

  2. ⇒ mathematica notebook

  3. II: Functional Formulation The a priori information is represented by C prior = { C prior ( z , z ′ ) } m prior = { m prior ( z ) } ; . The observable parameters are of the kind � o i = dz O i ( z ) m ( z ) , where, in our example O 1 ( z ) is a delta function and O 2 ( z ) is a box-car function. This corresponds to a linear relation between the model function m and the observations o : o = O m . We have some observations C obs = { C ij o obs = { o i obs } ; obs } .

  4. The solution is provided by the standard least-squares equa- tion, that we just need to interpret. The first equation is m post = m prior − C prior O t ( O C prior O t + C obs ) -1 ( O m prior − o obs ) , i.e., m post = m prior − P Q r P = C prior O t S = O C prior O t + C obs Q = S -1 r = O m prior − o obs

  5. This leads to P i ( z ) Q ij r j m post ( z ) = m prior ( z ) − ∑ i ∑ j � dz ′ C prior ( z , z ′ ) O i ( z ′ ) P i ( z ) = � � S ij = dz ′ O i ( z ) C prior ( z , z ′ ) O j ( z ′ ) + C ij dz obs Q = S -1 � r i = dz O i ( z ) m prior ( z ) − o i obs

  6. The second equation is C post = C prior − C prior O t ( O C prior O t + C obs ) -1 O C prior , i.e., C post = C prior − P Q P t P = C prior O t S = O C prior O t + C obs Q = S -1

  7. This leads to P i ( z ) Q ij P j ( z ′ ) C post ( z , z ′ ) = C prior ( z , z ′ ) − ∑ i ∑ j � dz ′ C prior ( z , z ′ ) O i ( z ′ ) P i ( z ) = � � S ij = dz ′ O i ( z ) C prior ( z , z ′ ) O j ( z ′ ) + C ij dz obs Q = S -1 ⇒ mathematica notebook

  8. Let me bring a clarification. We have examined two of the many possible algorithms providing the mean posterior model n post . The first one was the steepest-descent algorithm n k + 1 = n k − µ ( C prior L t C -1 obs ( L n k − t obs ) + ( n k − n prior ) ) that could advantageously be replaced by a preconditioned steepest descent algorithm, n k + 1 = n k − P ( C prior L t C -1 obs ( L n k − t obs ) + ( n k − n prior ) ) where P is a suitably chosen, but arbitrary, positive definite operator. A good choice of P will accelerate convergence (but not change the final result). Let us guess what P could be.

  9. The first iteration, when choosing n 0 = n prior gives n 1 = n prior − P C prior L t C -1 obs ( L n prior − t obs ) We also saw the Newton algorithm n post = n prior − C prior L t ( L C prior L t + C obs ) -1 ( L n prior − t obs ) that, because the forward relation is here linear, converges in just one step. Equivalent to the last equation is (see lesson XI) n post = n prior − ( L t C -1 prior ) -1 L t C -1 obs L + C -1 obs ( L n prior − o obs ) that can also be written as n post = n prior − Π C prior L t C -1 obs ( L n prior − o obs ) , with Π = ( C prior ( L t C -1 prior ) ) -1 . obs L + C -1

  10. The conclusion is that the more the (arbitrary) preconditioning operator P is close to Π = ( C prior ( L t C -1 obs L + C -1 prior ) ) -1 , the faster the preconditioned steepest descent algorithm will converge (and in only one step if P = Π , but this is usually impossible). Simplest example: let be Q = C prior ( L t C -1 obs L + C -1 prior ) , so we must use P ≈ Q -1 . Letting Q ( x , x ′ ) be the kernel of Q , one may choose to approximate Q -1 by the inverse of its diagonal elements, 1 P ( x , x ′ ) = Q ( x , x ) δ ( x − x ′ ) . Evaluating Q ( x , x ) grossly corresponds to “counting how ma- ny rays pass through point x ”.

  11. I have already explained that, in the context of least-squares, and given a covariance function C ( z , z ′ ) , the norm of any func- tion m = { m ( z ) } is to be defined via � m � 2 = ( m , m ) = � C -1 m , m � . Following a discussion by Prof. Mark Simons, I realized that I have not mentioned that when considering the exponential covariance function C ( z , z ′ ) = σ 2 exp ( −| z − z ′ | ) , L one obtains the simple and interesting result (Tarantola, 2005, page 311) � 1 � 2 � � � 1 � dm � m � 2 = dz m ( z ) 2 + L dz dz ( z ) . σ 2 L

  12. This implies that when solving the basic least-squares prob- lem, i.e., the problem of finding the model m post that mini- mizes the misfit function 2 S ( m ) = � o ( m ) − o obs � 2 C obs + � m − m prior � 2 C prior = � C -1 obs ( o ( m ) − o obs ) , ( o ( m ) − o obs ) � + � C -1 prior ( m − m prior ) , ( m − m prior ) � , we are imposing that m post ( z ) − m prior ( z ) must be small, but we are also imposing that dm post dm prior dz ( z ) − ( z ) must be small. dz In particular, if m prior ( z ) is smooth, then m post ( z ) must also be smooth. See page 316 of my book for the 3D case.

  13. In lesson XII, I introduced the definition of transpose operator: Let L be a linear operator from linear space A into linear space B . The transpose of L , denoted L t , is a linear oper- ator from B ∗ into A ∗ , defined by the condition that for any β ∈ B ∗ and for any a ∈ A , � β , L a � B = � L t β , a � A .

  14. Then, without demonstration, I gave two examples: If the expression b = L a means b i = ∑ L iI a I I then, an expression like α = L t β means α I = ∑ L iI β i . i If the expression b = L a means � b ( t ) = dV ( x ) L ( t , x ) a ( x ) , then, an expression like α = L t β means � α ( x ) = dt L ( t , x ) β ( t ) . Ozgun, like St. Thomas, asks for the demonstrations.

  15. Case s = L v ⇔ s i = ∑ α L i α v α . There must be two duality products: � ω , v � = ∑ � σ , s � = ∑ ω α v α σ i s i ; . α i The transpose operator, say M = L t , will operate on some σ to give some ω : ω = M σ ⇔ ω α = ∑ i M α i σ i . Relation be- tween M α i and L i α ? For any v and any σ one must have � σ , L v � = � M σ , v � σ i ( L v ) i = ∑ ( M σ ) α v α ∑ i.e., α i L i α v α ) = ∑ M α i σ i ) v α ∑ σ i ( ∑ ( ∑ i.e., α α i i L i α σ i v α = ∑ M α i σ i v α ∑ i ∑ i ∑ i.e., , α α and this implies that M α i L i α (the matrix representing = M = L t is the transpose of the matrix representing L ).

  16. Note: this demonstration is trivial if using matrix notations. Case s = L v . There must be two duality products: � ω , v � = ω t v � σ , s � = σ t s ; . The transpose operator, say M , will operate on some σ to give some ω : ω = M σ . Relation between M and L ? For any v and any σ one must have � σ , L v � = � M σ , v � σ t ( L v ) = ( M σ ) t v = ( σ t M t ) v i.e., σ t L v = σ t M t v i.e., , and this implies that M t = L , i.e., that M = L t .

  17. Case s = L v ⇔ s i = � dz L i ( z ) v ( z ) . There must be two du- ality products: � � σ , s � = ∑ σ i s i � ω , v � = dz ω ( z ) v ( z ) ; . i The transpose operator, say M = L t , will operate on some σ to give some ω : ω = M σ ⇔ ω ( z ) = ∑ i M i ( z ) σ i . Relation between M i ( z ) and L i ( z ) ? For any v and any σ one must have � σ , L v � = � M σ , v � � σ i ( L v ) i = ∑ i.e., dz ( M σ )( z ) v ( z ) i � � dz L i ( z ) v ( z ) ) = M i ( z ) σ i ) v ( z ) ∑ dz ( ∑ i.e., σ i ( i i � � dz L i ( z ) σ i v ( z ) = ∑ dz M i ( z ) σ i v ( z ) ∑ i.e., , i i ⇒ M i ( z ) = L i ( z ) (the two operators L and M = L t have

  18. the same kernels; when L operates, we make a sum (integral) over z ; when L t operates, we make a (discrete) sum over i .

  19. Case s = L v ⇔ s ( t ) = ∑ α L α ( t ) v α . Please do it. Case s = L v ⇔ s ( t ) = � dz L ( t , z ) v ( z ) . Please do it. � Case s = L v ⇔ s i ( t , ϕ ) = ∑ α ∑ β dz L i αβ ( t , ϕ , z ) v αβ ( z ) . Answer: � � ω = L t σ ⇔ ω αβ ( z ) = ∑ d ϕ L i dt αβ ( t , ϕ , z ) σ i ( t , ϕ ) i

  20. The transpose of the derivative operator The derivative operator D maps a space X of functions x = { x ( t ) } into a space V of functions v = { v ( t ) } . It is defined as v ( t ) = dx v = D x ⇔ dt ( t ) . It is obviously a linear operator. By definition, the transpose operator D t maps the dual of V (with functions that we may denote ω = { ω ( t ) } ) into the dual of X (with functions that we may denote χ = { χ ( t ) } ). We don’t need to be interested in the interpretation of the two spaces X ∗ and V ∗ . I want to prove that, excepted for some boundary condition (to be found), the derivative operator is antisymmetric, i.e., D t = − D .

  21. In other words, I want to prove the second of these two ex- pressions v ( t ) = dx v = D x ⇔ dt ( t ) χ ( t ) = − d ω χ = D t ω ⇔ dt ( t ) .

  22. The elements of the two spaces X and V are functions of the same variable t . Therefore the duality products in the two spaces are here quite similar � t 2 � t 2 � χ , x � X = dt χ ( t ) x ( t ) ; � ω , v � V = dt ω ( t ) v ( t ) t 1 t 1

  23. By definition of the transpose of a linear operator, for any x and any ω , one must have � ω , D x � = � D t ω , x � . Here, this gives � t 2 � t 2 dt ( D t ω )( t ) x ( t ) dt ω ( t ) ( D x )( t ) = . t 1 t 1 dt ( t ) , then ( D t ω )( t ) = − d ω To prove that if ( D x )( t ) = dx dt ( t ) , we must then prove that for any two functions x ( t ) and ω ( t ) , � t 2 � t 2 dt ω ( t ) dx dt d ω dt ( t ) + dt ( t ) x ( t ) = 0 . t 1 t 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend