bayesian inverse problems and uncertainty quantification
play

Bayesian Inverse Problems and Uncertainty Quantification Hanne - PowerPoint PPT Presentation

Bayesian Inverse Problems and Uncertainty Quantification Hanne Kekkonen Centre for Mathematical Sciences University of Cambridge June 4, 2019 Inverse problems arise naturally from applications 1 / 27 Inverse problems are ill-posed We want to


  1. Bayesian Inverse Problems and Uncertainty Quantification Hanne Kekkonen Centre for Mathematical Sciences University of Cambridge June 4, 2019

  2. Inverse problems arise naturally from applications 1 / 27

  3. Inverse problems are ill-posed We want to recover the unknown u from a noisy measurement m ; m = Au + noise , where A is a forward operator that usually causes loss of information. 2 / 27

  4. Inverse problems are ill-posed We want to recover the unknown u from a noisy measurement m ; m = Au + noise , where A is a forward operator that usually causes loss of information. Well-posedness as defined by Jacques Hadamard: 1. Existence: There exists at least one solution. 2. Uniqueness: There is at most one solution. 3. Stability: The solution depends continuously on data. Inverse problems are ill-posed breaking at least one of the above conditions. 2 / 27

  5. The naive inversion does not produce stable solutions We want to approximate u from a measurement m = Au + n , where A : X → Y is linear and n is noise. One approach is to use the least squares method � � � Au − m � 2 u = arg min . � Y u ∈ X Problem: Multiple minima and sensitive dependence on the data m . 3 / 27

  6. Tikhonov regularisation is a classical method for solving ill-posed problems We want to approximate u from a measurement m = Au + n , where A : X → Y is linear and n is noise. The problem is ill-posed so we add a regularising term and get � � � Au − m � 2 Y + α � u � 2 � u = arg min E u ∈ E ⊂ X Regularisation gives a stable approximate solution for the inverse problem. 4 / 27

  7. Bayes formula combines data and a priori information We want to reconstruct the most probable u ∈ R k in light of Measurement information: M | u ∼ P u with Lebesgue density ρ ( m | u ) = ρ ε ( m − Au ) . A priori information: U ∼ Π pr with Lebesgue density π pr ( u ) . Bayes’ formula We can update the prior, given a measurement, to a posterior distribution using the Bayes’ formula: π ( u | m ) ∝ π pr ( u ) ρ ( m | u ) The result of Bayesian inversion is the posterior distribution π ( u | m ) . 5 / 27

  8. The result of Bayesian inversion is the posterior distribution, but typically one looks at estimates Maximum a posteriori (MAP) estimate: arg max u ∈ R n π ( u | m ) Conditional mean (CM) estimate: � R n u π ( u | m ) du 6 / 27

  9. Gaussian example Assume we are interested in the measurement model M = AU + N , where: A : X → Y , with X = R d and Y = R k . N is white Gaussian noise. U follows Gaussian prior. Posterior has density � � − 1 R k − 1 π m ( u ) = π ( u | m ) ∝ exp 2 � m − Au � 2 2 � u � 2 Σ We can use the mean of the posterior as a point estimator but having the whole posterior allows uncertainty quantification. 7 / 27

  10. Why are we interested in uncertainty quantification? 8 / 27

  11. Uncertainty quantification has many applications Studying the whole posterior distribution instead of just a point estimate offers us more information. Uncertainty quantification Confidence and credible sets E.g. Weather and climate predictions Using the whole posterior Geological sensing Bayesian search theory Figure: Search for the wreckage of Air France flight AF 447, Stone et al. 9 / 27

  12. What do we mean by uncertainty quantification? -I’m going to die? -POSSIBLY. -Possibly? You turn up when people are possibly going to die? -OH, YES. IT’S QUITE THE NEW THING. IT’S BECAUSE OF THE UNCERTAINTY PRINCIPLE. -What’s that? -I’M NOT SURE. -That’s very helpful. -I THINK IT MEANS PEOPLE MAY OR MAY NOT DIE. I HAVE TO SAY IT’S PLAYING HOB WITH MY SCHEDULE, BUT I TRY TO KEEP UP WITH MODERN THOUGHT. -Terry Pratchett, The Fifth Elephant 10 / 27

  13. Bayesian credible set A Bayesian credible set is a region in the posterior distribution that contains a large fraction of the posterior mass. 11 / 27

  14. Frequentist confidence region 11 / 27

  15. Consistency of a Bayesian solution Once we have achieved a Bayesian solution the natural next step is to consider the consistency of the solution. Convergence of a point estimator to the ‘true’ u † . Contraction of the posterior distribution; Do we have Π( u : d ( u , u † ) > δ n | m ) → P u † 0 , for some δ n → 0, as the sample size n → ∞ . Is optimal contraction rate enough to guarantee that the Bayesian credible sets have correct frequentist coverage? 12 / 27

  16. Credible sets do not necessarily cover the truth well Correctly specified prior Prior misspecified on the boundary Monard, Nickl & Paternain, The Annals of Statistics , 2019 13 / 27

  17. Do credible sets quantify frequentist uncertainty? Do we have for C = C ( m ) � � � � u † ∈ C ( m † ) Π u ∈ C | m ≈ 0 . 95 ⇔ ≈ 0 . 95 ? P u † 14 / 27

  18. Do credible sets quantify frequentist uncertainty? Do we have for C = C ( m ) � � � � u † ∈ C ( m † ) Π u ∈ C | m ≈ 0 . 95 ⇔ ≈ 0 . 95 ? P u † Bernstein–von Mises Theorem (BvM) For large sample size n , with ˆ u MLE being the maximum likelihood estimator, � nI ( u † ) − 1 � u MLE , 1 Π( · | m ) ≈ N ˆ , for M ∼ P u † , whenever u † ∈ O ⊂ R d and the prior Π has positive density on O , and the inverse Fisher information I ( u † ) is invertible. 14 / 27

  19. BvM guarantees confident credible sets The contraction rate of the posterior distribution near u † is � � u : � u − u † � R d ≥ L n → P u † 0 Π √ n | m as L n , n → ∞ For a fixed d and large n computing posterior probabilities is roughly the � n I ( f † ) − 1 � u MLE , 1 same as computing them from N ˆ 15 / 27

  20. BvM guarantees confident credible sets The contraction rate of the posterior distribution near u † is � � u : � u − u † � R d ≥ L n → P u † 0 Π √ n | m as L n , n → ∞ For a fixed d and large n computing posterior probabilities is roughly the � n I ( f † ) − 1 � u MLE , 1 same as computing them from N ˆ � � � � u † ∈ C n C n s.t. Π u ∈ C n | M = 0 . 95 = ⇒ → 0 . 95 P u † ( Bayesian credible set ) ( Frequentist confident set ) � 1 � | C n | R d = O P u † √ n ( Optimal diameter ) 15 / 27

  21. Asymptotic normality of the Tikhonov regulariser We return to the Gaussian example where the posterior is also Gaussian. The posterior mean u equals the MAP estimate which equals the Tikhonov-regulariser � � � Au − m � 2 R k + � u � 2 u = arg min . Σ u Then the following convergence holds under P u † √ n ( u − u † ) → Z ∼ N ( 0 , I ( u † ) − 1 ) as n → ∞ . 16 / 27

  22. Confident credible sets We can now construct a confidence set for Tikhonov regulariser: Consider a credible set � � u ∈ R d : � u − u � ≤ R n √ n C n = , R n s.t. Π( C n | m ) = 0 . 95 . Then the frequentist coverage probability of C n will satisfy � � u † ∈ C n R n → P u † Φ − 1 ( 0 . 95 ) → 0 . 95 P u † and as n → ∞ . Here Φ − 1 is a continuous inverse of Φ = P ( Z ≤ · ) with Z ∼ N ( 0 , I ( u † ) − 1 ) . 17 / 27

  23. Discretisation of m is given by the measurement device but the discretisation of u can be chosen freely m ∈ R k u ∈ R n k = 4 n = 48 18 / 27

  24. The discretisations are independent m ∈ R k u ∈ R n k = 8 n = 156 19 / 27

  25. The discretisations are independent m ∈ R k u ∈ R n k = 24 n = 440 20 / 27

  26. The measurement is always discrete but the unknown is usually a continuous function m ∈ R 4 u ∈ L 2 21 / 27

  27. We often want to use a continuous model for theory m = Au + ε 22 / 27

  28. Nonparametric models In many applications it is natural to use a statistical regression model M i = ( AU )( x i ) + N i , i = 1 , ..., n , N i ∼ N ( 0 , 1 ) , where x i ∈ O are measurement points and A is a forward operator. The goal is to infer U from the data ( M i ) . 23 / 27

  29. Nonparametric models In many applications it is natural to use a statistical regression model M i = ( AU )( x i ) + N i , i = 1 , ..., n , N i ∼ N ( 0 , 1 ) , where x i ∈ O are measurement points and A is a forward operator. The goal is to infer U from the data ( M i ) . For the theory we use a continuous model, which corresponds ( x i ) growing dense in the domain O . If W is a Gaussian white noise process in the Hilbert space H then 1 √ n noise level, M = AU + ε W , ε = M ∼ P u † Note that usually Au ∈ L 2 but W ∈ H − s only with s > d / 2. 23 / 27

  30. Gaussian priors are often used for inverse problems Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016). 24 / 27

  31. Gaussian priors are often used for inverse problems Gaussian priors Π are often used in practice: see e.g Kaipio & Somersalo (2005), Stuart (2010), Dashti & Stuart (2016). Using the Cameron-Martin theorem we can formally write d Π( · | m ) ∝ e ℓ ( u ) d Π( u ) ∝ e ℓ ( u ) − 1 2 � u � 2 V Π , where ℓ ( u ) = 1 1 2 ε 2 � Au � 2 , and V Π denotes the ε 2 � m , Au � − Cameron-Martin space of Π . 24 / 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend