introduction to bayesian inversion mcmc
play

Introduction to Bayesian inversion & MCMC Ville Kolehmainen 1 1 - PowerPoint PPT Presentation

Introduction to Bayesian inversion & MCMC Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland CoE Summer School, Helsinki, June 2019 . . . . . . . . . . . . . . . . . . . . .


  1. Introduction to Bayesian inversion & MCMC Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland CoE Summer School, Helsinki, June 2019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  2. Contents Introduction Bayesian inversion Gaussian case About MAP estimate MCMC Numerical examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  3. What are inverse problems? ◮ Consider measurement problems; estimate the quantity of interest x ∈ R n from (noisy) measurement of A ( x ) ∈ R d , where A is known mapping. ◮ Inverse problems are characterized as those measurement problems which are ill-posed : 1. The problem is non-unique (e.g., less measurements than unknowns) 2. Solution is unstable w.r.t measurement noise and modelling errors (e.g., model reduction, inaccurately known nuisance parameters in the model A ( x ) ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  4. An introductory example: ◮ Consider 2D deconvolution (image deblurring); Given noisy and blurred image m ∈ R n m = Ax + e , the objective is to reconstruct the original image x ∈ R n . ◮ Forward model x �→ Ax implements discrete convolution (here the convolution kernel is Gaussian blurring kernel with std of 6 pixels). 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; Original image x . Right; Observed image m = Ax + e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  5. Example (cont.) ◮ Matrix A has trivial null-space null ( A ) = { 0 } ⇒ solution is unique (i.e. ∃ ( A T A ) − 1 ). ◮ However, the problem is unstable ( A has ”unclear” null-space, i.e., ∥ Aw ∥ = ∥ ∑ k σ k ⟨ w , v k ⟩ u k ∥ ≃ 0 for certain ∥ w ∥ = 1) ◮ Figure shows the least squares (LS) solution ∥ m − Ax ∥ 2 ⇒ x LS = ( A T A ) − 1 A T m x LS = arg min x 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x . Right; LS solution x LS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  6. Regularization. ◮ The ill posed problem is replaced by a well posed approximation. Solution “hopefully” close to the true solution. ◮ Typically modifications of the associated LS problem ∥ m − Ax ∥ 2 . ◮ Examples of methods; Tikhonov regularization, truncated iterations. ◮ Consider the LS problem x {∥ m − Ax ∥ 2 ⇒ ( A T A ) x LS = A T m x LS = arg min 2 } � �� � B Uniqueness; ∃ B − 1 if null ( A ) = { 0 } . Stability of the solution? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  7. Regularization ◮ Example (cont.); In Tikhonov regularization, the LS problem is replaced with x {∥ m − Ax ∥ 2 2 + α ∥ x ∥ 2 ⇒ ( A T A + α I ) x TIK = A T m 2 } x TIK = arg min � �� � ˜ B Uniqueness; α > 0 ⇒ null (˜ B ) = { 0 } ⇒ ∃ ˜ B − 1 . Stability of the solution guaranteed by choosing α “large enough”. ◮ Regularization poses (implicit) prior about the solution. These assumptions are sometimes well hidden. 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x . Right; Tikhonov solution x TIK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  8. Statistical (Bayesian) inversion. ◮ The inverse problem is recast as a problem of Bayesian inference. The key idea is to extract estimates and assess uncertainty about the unknowns based on ◮ Measured data ◮ Model of measurement process ◮ Model of a priori information ◮ Ill-posedness removed by explicit use of prior models! ◮ Systematic handling of model uncertainties and reduction ( ⇒ approximation error theory) 5 5 10 10 15 15 20 20 25 25 30 30 35 35 40 40 45 45 50 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 Left; true image x . Right; Bayesian estimate x CM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  9. Bayesian inversion. ◮ All variables in the model are considered as random variables. The randomness reflects our uncertainty of their actual values. ◮ The degree of uncertainty is coded in the probability distribution models of these variables. ◮ The complete model of the inverse problem is the posterior probability distribution π ( x | m ) = π pr ( x ) π ( m | x ) , m = m observed (1) π ( m ) where ◮ π ( m | x ) is the likelihood density; model of the measurement process. ◮ π pr ( x ) is the prior density; model for a priori information. ◮ π ( m ) is a normalization constant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  10. ◮ The posterior density model is a function on n dimensional space; π ( x | m ) : R n �→ R + where n is usually large. ◮ To obtain ”practical solution”, the posterior model is summarized by estimates that answer to questions such as; ◮ “What is the most probable value of x over π ( x | m ) ? ” ◮ “What is the mean of x over π ( x | m ) ? ” ◮ “In what interval are the values of x with 90% (posterior) probability? ” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  11. Point estimates ◮ Maximum a posteriori (MAP) estimate: π ( x MAP | m ) = arg max x ∈ R n π ( x | m ) . ◮ Conditional mean (CM) estimate: ∫ x CM = R n x π ( x | m ) d x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  12. Maximum a posteriori (MAP) estimate: x ∈ R n π ( x | m ) arg max Conditional mean (CM) estimate: ∫ R n x π ( x | m ) dx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  13. Spread estimates ◮ Covariance: ∫ R n ( x − x CM )( x − x CM ) T π ( x | m ) d x Γ x | m = ◮ Confidence intervals; Given 0 < τ < 100, compute a k and b k s.t. ∫ a k ∫ ∞ π k ( x k ) d x k = 100 − τ π k ( x k ) d x k = 200 −∞ b k where π k ( x k ) is the marginal density ∫ π k ( x k ) = R n − 1 π ( x | m ) d x 1 · · · d x k − 1 d x k + 1 · · · d x n . The interval I k ( τ ) = [ a k b k ] contains τ % of the mass of the marginal density. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  14. Illustration of confidence intervals ◮ x = ( x 1 , x 2 ) T ∈ R 2 ∫ ◮ π ( x 1 | m ) = π ( x | m ) d x 2 (middle) ∫ ◮ π ( x 2 | m ) = π ( x | m ) d x 1 (right) ◮ Solid vertical lines show the mean value x CM , dotted the 90% confidence limits. E(x 1 ) = −0.0094962, E(x 2 ) = 0.3537 0.9 2 0.8 0.5 1.5 0.7 1 0.4 0.6 0.5 0.5 0.3 0 0.4 −0.5 0.2 0.3 −1 0.2 0.1 −1.5 0.1 −2 0 0 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 −1 −0.5 0 0.5 1 1.5 −0.5 0 0.5 1 1.5 Left; Contours of π ( x | m ) . x CM is shown with + . Middle; marginal density π ( x 1 | m ) . Right; marginal density π ( x 2 | m ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  15. Gaussian π ( x | m ) : ◮ Let m = Ax + e , e ∼ N ( e ∗ , Γ e ) independent of x and π pr ( x ) = ( x ∗ , Γ x ) → : π ( x | m ) ∝ π ( m | x ) π pr ( x ) ( − 1 2 ( m − Ax − e ∗ ) T Γ − 1 = exp e ( m − Ax − e ∗ ) ) 1 2 ( x − x ∗ ) T Γ − 1 − x ( x − x ∗ ) ◮ π ( x | m ) fully determined by mean and covariance: Γ x | m ( A T Γ − 1 e ( m − e ∗ ) + Γ − 1 x CM = x x ∗ ) ( A T Γ − 1 e A + Γ − 1 x ) − 1 Γ x | m = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  16. ◮ Let L T e L e = Γ − 1 L T x L x = Γ − 1 e , x , then we can write π ( x | m ) ∝ exp {− 1 2 F ( x ) } , where F ( x ) = ∥ L e ( m − Ax − e ∗ ) ∥ 2 2 + ∥ L x ( x − x ∗ ) ∥ 2 2 and x MAP = arg min F ( x ) x ⇒ Connection to Tikhonov regularization! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

  17. Example ◮ Consider the original form of Tikhonov regularization; x {∥ m − Ax ∥ 2 2 + α ∥ x ∥ 2 ⇒ x TIK = ( A T A + α I ) − 1 A T m x TIK = arg min 2 } ◮ From the Bayesian viewpoint, x TIK correspond to x CM with the following assumptions ◮ Measurement model m = Ax + e , x and e mutually independent with e ∼ N ( 0 , I ) . ◮ x is assumed a priori mutually independent zero mean white noise with variance 1 /α . ◮ The original idea of the Tikhonov method was to approximate A T A with a matrix A T A + α I that is invertible and produces stable solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend