statistical inverse problems and instrumental variables
play

Statistical Inverse Problems and Instrumental Variables Thorsten - PowerPoint PPT Presentation

Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut fr Numerische und Angewandte Mathematik University of Gttingen Workshop on Inverse and Partial Information Problems: Methodology and Applications RICAM,


  1. Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology and Applications RICAM, Linz, 27.-31.10.2008

  2. Collaborators • Frank Bauer (Linz) • Laurent Cavalier (Marseille) • Jean-Pierre Florens (Toulouse) • Jan Johannnes (Heidelberg) • Enno Mammen (Mannheim) • Axel Munk (Göttingen)

  3. outline A Newton method for nonlinear statistical inverse problems 1 Oracle inequalities 2 3 Nonparametric instrumental variables and perturbed operators

  4. statistical inverse problem problem: Let X , Y be separable Hilbert spaces and F : D ( F ) ⊂ X → Y a Fréchet differentiable, one-to-one operator. Estimate a † given indirect observations in the form of a random process Y = F ( a † ) + σξ + δζ. F − 1 is not continuous! ξ normalized stochastic noise: a Hilbert space satisfying E ξ = 0 and � Cov ξ � ≤ 1 σ ≥ 0 stochastic noise level ζ ∈ Y normalized deterministic noise, � ζ � = 1 δ ≥ 0 deterministic noise level

  5. the algorithm The Newton equation F ′ [ � a k ]( � a k + 1 − � a k ) = Y − F ( � a k ) , k = 1 , 2 , . . . is regularized in each step by Tikhonov regularization with initial guess a 0 and regularization parameter α k = α 0 q k , q ∈ ( 0 , 1 ) : a k + 1 := argmin a ∈X � F ′ [ � a k ) − Y � 2 Y + α k + 1 � a − a 0 � 2 � a k ]( a − � a k )+ F ( � X

  6. What is this for linear problems? If F = T is linear, the iteration formula simplifies to a k + 1 := argmin a ∈X � Ta − Y � 2 Y + α k + 1 � a − a 0 � 2 � X . The iteration steps decouple in the sense that none of the previous iterate appears in the formula for � a k + 1 . Bias and variance must be balanced by proper choice of the stopping index.

  7. What if � a k / ∈ D ( F ) for some k ? • Since typically D ( F ) � = X and the stochastic noise σξ can be arbitrarily large, there exists a positive probability that � a k / ∈ D ( F ) in each Newton step. • “Emergency stop”: If this happens, we stop the Newton iteration and return a 0 as estimator of a † . • We will have to show that the probability the such an emergency stop is necessary rapidly tends to 0 with the stochastic noise level σ .

  8. Can we improve on the qualification of Tikhonov regularization? Replace Tikhonov regularization by iterated Tikhonov regularization: a ( 0 ) ˆ k + 1 := a 0 � a ( j ) � F ′ [ � a k ) − Y � 2 ˆ k + 1 := argmin a ∈X a k ]( a − � a k ) + F ( � Y � a ( j − 1 ) k + 1 � 2 + α k + 1 � a − ˆ j = 1 , . . . , m X a ( m ) ˆ a k + 1 := ˆ k + 1 closed formula: � � a k ] ∗ × F ′ [ � a k ] ∗ F ′ [ � F ′ [ � � a k + 1 := a 0 + g α k + 1 a k ] � � a k ) + F ′ [ � × Y − F ( � a k ]( � a k − a 0 ) � � m α , g α ( λ ) := 1 r α ( λ ) := λ ( 1 − r α ( λ )) α + λ

  9. references: deterministic convergence analysis: B. Kaltenbacher, A. Neubauer, O. Scherzer. Iterative Regularization Methods for Nonlinear Ill-Posed Problems. Radon Series on Computational and Applied Mathematics, de Gruyter, Berlin, 2008 A. B. Bakushinsky and M. Y. Kokurin. Iterative Methods for Approximate Solution of Inverse Problems . Springer, Dordrecht, 2008. A. B. Bakushinsky. The problem of the convergence of the iteratively regularized Gauss-Newton method. Comput. Maths. Math. Phys. , 32:1353–1359, 1992. The following results are from: F . Bauer, T. Hohage and A. Munk. Iteratively Regularized Gauss-Newton Method for Nonlinear Inverse Problems with Random Noise. preprint, under revision for SIAM J. Numer. Anal.

  10. error decomposition a k − a † in the Let T := F ′ [ a † ] and T k := F ′ [ � a k ] . The error E k = � k th Newton step can be decomposed into • an approximation error E app k + 1 := r α k + 1 ( T ∗ T ) E 0 , • a propagated data noise error E noi k + 1 := g α k + 1 ( T ∗ k T k ) T ∗ k ( δζ + σξ ) , • and a nonlinearity error � � g α k + 1 ( T ∗ k T k ) T ∗ F ( a † ) − F ( � E nl := a k ) + T k E k k + 1 k � � r α k + 1 ( T ∗ k T k ) − r α k ( T ∗ T ) + E 0 , i.e. E k + 1 = E app k + 1 + E noi k + 1 + E nl k + 1 .

  11. crucial lemma Lemma Under certain assumptions discussed below there exists γ nl > 0 such that � � � E nl � E app � + � E noi k � ≤ γ nl k � k = 1 , . . . , K max . k

  12. assumptions of the lemma • source condition: There exists a sufficiently small “source” w ∈ Y such that a 0 − a † = T ∗ w • α 0 sufficiently large such that � E 0 � ≤ q − m � E app 1 � • Lipschitz condition: For all a 1 , a 2 ∈ D ( F ) � F ′ [ a 1 ] − F ′ [ a 2 ] � ≤ L � a 1 − a 2 � . • choice of K max : � � k ∈ N : � E noi k � K max := max ≤ C stop √ α k

  13. on the proof of the lemma • The proof uses an straightforward induction argument in k . • The following properties of iterated Tikhonov regularization are used: • There exists γ app > 0 such that for all k � E app k + 1 � ≤ � E app � ≤ γ app � E app k + 1 � k This rules out methods with infinite qualification such as Landweber iteration! • The propagated data noise is an ordered process in the sense that � E noi k � ≤ � E noi k + 1 � for all k .

  14. optimal deterministic rates Corollary For deterministic errors ( σ = 0 ) define the optimal stopping index by � � δ � E app K ∗ := min { K max , K } , K := argmin k ∈ N � + . √ α k k Then there exist constants C , δ 0 > 0 such that � � δ � E app a K ∗ − a † � ≤ C inf � ˆ � + for all δ ∈ ( 0 , δ 0 ] . √ α k k k ∈ N In particular, under the Hölder source condition a 0 − a † = Λ( T ∗ T ) ˜ w with µ ∈ [ 1 2 , m ] we obtain � � 1 2 µ a K ∗ − a † � = O 2 µ + 1 δ � ˆ � ˜ w � , 2 µ + 1

  15. propagated data noise error We make the following assumptions on the variance term V ( a , α ) := � g α ( F ′ [ a ] ∗ F ′ [ a ]) F ′ [ a ] ∗ ξ � 2 : • There exists a known function ϕ noi such that ( E V ( a , α )) 1 / 2 ≤ ϕ noi ( α ) ∀ α ∈ ( 0 , α 0 ] and a ∈ D ( F ) . • There are constants 1 < γ noi ≤ γ noi < ∞ such that γ noi ≤ ϕ noi ( α k + 1 ) /ϕ noi ( α k ) ≤ γ noi , ∀ k ∈ N 0 . • (exponential inequality) ∃ λ 1 , λ 2 > 0 ∀ a ∈ D ( F ) ∀ α ∈ ( 0 , α 0 ] ∀ τ ≥ 1 P { V ( a , α ) ≥ τ E V ( a , α ) } ≤ λ 1 e − λ 2 τ .

  16. optimal rates for known smoothness Theorem Assume that { a : � a − a 0 � ≤ 2 R } ⊂ D ( F ) and define the optimal stopping index � � δ � E app K := argmin k ∈ N � + √ α k + σϕ noi ( α k ) . k If � ˆ a k − a 0 � ≤ 2 R for k = 1 , . . . , K, set K ∗ := K, otherwise K ∗ := 0 . Then there exist constants C > 1 and δ 0 , σ 0 > 0 such that � � � a K ∗ − a † � 2 � 1 / 2 δ � E app E � ˆ ≤ C min � + + σϕ noi ( α k ) √ α k k k ∈ N for all δ ∈ ( 0 , δ 0 ] and σ ∈ ( 0 , σ 0 ] . Short: The Newton method achieves the same rate as iterated Tikhonov applied to the linearized problem.

  17. outline A Newton method for nonlinear statistical inverse problems 1 Oracle inequalities 2 3 Nonparametric instrumental variables and perturbed operators

  18. oracle parameter choice rules Consider an inverse problem Y = F ( a † ) + σξ + δζ and a family { R α : Y → X} of regularized inverses of F . An oracle parameter choice rule α or for the method { R α } and the solution a † is defined by E � R α or ( Y ) − a † � 2 = inf E � R α ( Y ) − a † � 2 sup α sup � ζ �≤ 1 � ζ �≤ 1 An oracle inequality for some given parameter choice rule α ∗ = α ∗ ( Y , σ, δ ) is an estimate of the form E � R α ∗ ( Y ) − a † � 2 ≤ χ ( σ, δ ) sup E � R α or ( Y ) − a † � 2 . sup � ζ �≤ 1 � ζ �≤ 1 In the optimal case χ ( σ, δ ) → 1 as σ, δ → 0. E. Candès. Modern statistical estimation via oracle inequalities. Acta Numerica , 15:257–325, 2006.

  19. typical convergence results in deterministic regularization theory • In deterministic theory convergence results for parameter choice rules typically contain a comparison with all other reconstruction methods R : Y → X • In this case one cannot consider only one a † ∈ X , otherwise the optimal method would be R ( Y ) ≡ a † . • Hence, estimates must be uniform over a smoothness class S ⊂ X , which is typically defined by a source condition. E.g. � R α ∗ ( F ( a ) + δζ ) − a † � sup sup a ∈S � ζ �≤ 1 � ˜ R ( F ( a † ) + δζ ) − a † � . ≤ C inf sup sup ˜ R a † ∈S � ζ �≤ 1

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend