the convergence of the laplace approximation and noise
play

The Convergence of the Laplace Approximation and Noise-Level-Robust - PowerPoint PPT Presentation

The Convergence of the Laplace Approximation and Noise-Level-Robust Monte Carlo Methods for Bayesian Inverse Problems Daniel Rudolf, Claudia Schillings, Bj orn Sprungk, Philipp Wacker Institute of Mathematical Stochastics, University of G


  1. The Convergence of the Laplace Approximation and Noise-Level-Robust Monte Carlo Methods for Bayesian Inverse Problems Daniel Rudolf, Claudia Schillings, Bj¨ orn Sprungk, Philipp Wacker Institute of Mathematical Stochastics, University of G¨ ottingen Workshop “Optimization and Inversion under Uncertainty” RICAM Linz, November 15th, 2019 1 / 30

  2. Bayesian Inverse Problems Infer unknown x ∈ R d given noisy observations of forward map G : R d → R J ε ∼ N (0 , n − 1 Σ) , y = G ( x ) + ε, n ∈ N , Given prior measure µ 0 for x , here µ 0 = N (0 , C 0 ), we obtain a posterior µ n ( d x ) = 1 Φ( x ) = 1 2 | y − G ( x ) | 2 exp( − n Φ( x )) µ 0 ( d x ) , Σ − 1 , Z n � R d e − n Φ( x ) d µ 0 where Z n := Objective: Sample (approximately) from µ n and compute � f ∈ L 1 E µ n [ f ] = R d f ( x ) µ n ( d x ) , µ 0 ( R ) In this talk we are interested in the case of increasing precision n → ∞ 2 / 30

  3. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : 3 / 30

  4. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods 3 / 30

  5. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods High-dimensional or even infinite-dimensional state space, e.g., function spaces 2 → Intense research in recent years for all mentioned methods 3 / 30

  6. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods High-dimensional or even infinite-dimensional state space, e.g., function spaces 2 → Intense research in recent years for all mentioned methods Concentrated µ n due to informative data, i.e., n ≫ 1 or J ≫ 1 3 → Analyzed so far in [Beskos et al., 2018] and [Schillings & Schwab, 2016] 3 / 30

  7. Computational Bayesian Inference Computational methods for approximate sampling or integrating w.r.t. µ : Markov chain Monte Carlo, Importance sampling Sequential Monte Carlo and particle filters, Quasi-Monte Carlo and numerical quadrature, ... Common computational challenges : Expensive evaluation of forward model G 1 → Multilevel or surrogate methods High-dimensional or even infinite-dimensional state space, e.g., function spaces 2 → Intense research in recent years for all mentioned methods Concentrated µ n due to informative data, i.e., n ≫ 1 or J ≫ 1 3 → Analyzed so far in [Beskos et al., 2018] and [Schillings & Schwab, 2016] 3 / 30

  8. Outline Laplace Approximation 1 Markov Chain Monte Carlo 2 Importance Sampling 3 Quasi Monte Carlo 4 4 / 30

  9. Next Laplace Approximation 1 Markov Chain Monte Carlo 2 Importance Sampling 3 Quasi Monte Carlo 4 5 / 30

  10. General Approach For Noise-Level Robust Sampling Prior-based sampling or integration will suffer from the increasing difference between µ n and µ 0 as n → ∞ , i.e., d µ n ∝ e − n Φ → δ argmin Φ and d TV ( µ n , µ 0 ) → 1 d µ 0 Idea: Base sampling methods on a suitable (simple) reference measure mimicking the (increasing) concentration of µ n 6 / 30

  11. General Approach For Noise-Level Robust Sampling Prior-based sampling or integration will suffer from the increasing difference between µ n and µ 0 as n → ∞ , i.e., d µ n ∝ e − n Φ → δ argmin Φ and d TV ( µ n , µ 0 ) → 1 d µ 0 Idea: Base sampling methods on a suitable (simple) reference measure mimicking the (increasing) concentration of µ n Here, Laplace approximation of µ n : L µ n := N ( x n , C n ), � � − 1 n Φ( x ) + 1 2 � C − 1 / 2 n ∇ 2 Φ( x n ) + C − 1 x � 2 , x n := argmin C n := 0 0 x 6 / 30

  12. General Approach For Noise-Level Robust Sampling Prior-based sampling or integration will suffer from the increasing difference between µ n and µ 0 as n → ∞ , i.e., d µ n ∝ e − n Φ → δ argmin Φ and d TV ( µ n , µ 0 ) → 1 d µ 0 Idea: Base sampling methods on a suitable (simple) reference measure mimicking the (increasing) concentration of µ n Here, Laplace approximation of µ n : L µ n := N ( x n , C n ), � � − 1 n Φ( x ) + 1 2 � C − 1 / 2 n ∇ 2 Φ( x n ) + C − 1 x � 2 , x n := argmin C n := 0 0 x Very common approximation in Bayesian statistics and OED ( [Long et al., 2013] , [Alexanderian et al., 2016] , [Chen & Ghattas, 2017] ...) 6 / 30

  13. Laplace’s Method for Asymptotics of Integrals [Laplace, 1774] [Wong, 2001] : Considering integrals � D ⊆ R d J ( n ) := f ( x ) exp( − n Φ( x )) d x , D with sufficiently smooth f and Φ, we have, under suitable conditions, as n → ∞ � � f ( x ⋆ ) J ( n ) = e − n Φ( x ⋆ ) n − d / 2 + O ( n − 1 ) � det(2 π H ⋆ ) where x ⋆ := argmin x ∈ R d Φ ∈ D and H ⋆ := ∇ 2 Φ( x ⋆ ) > 0 7 / 30

  14. Laplace’s Method for Asymptotics of Integrals [Laplace, 1774] [Wong, 2001] : Considering integrals � D ⊆ R d J ( n ) := f ( x ) exp( − n Φ( x )) d x , D with sufficiently smooth f and Φ, we have, under suitable conditions, as n → ∞ � � f ( x ⋆ ) J ( n ) = e − n Φ( x ⋆ ) n − d / 2 + O ( n − 1 ) � det(2 π H ⋆ ) where x ⋆ := argmin x ∈ R d Φ ∈ D and H ⋆ := ∇ 2 Φ( x ⋆ ) > 0 Yields: Given smooth Lebesgue density of µ 0 , then for suitable f � � � � � � � � R d f d N ( x ⋆ , ( nH ⋆ ) − 1 ) � ∈ O ( n − 1 ) R d f d µ n − � 7 / 30

  15. Convergence of Laplace Approximation Theorem ( [Schillings, S., Wacker, 2019] ) Given that Φ ∈ C 3 ( R d ) , unique x n and C n > 0 for sufficiently large n > 0 , a unique minimizer x ⋆ := argmin x ∈ R d Φ( x ) exists with ∇ 2 Φ( x ⋆ ) > 0 and � x − x ⋆ � > r Φ( x ) ≥ Φ( x ⋆ ) + δ r , inf δ r > 0 , lim n →∞ x n = x ⋆ . Then d H ( µ n , L µ n ) ∈ O ( n − 1 / 2 ) . 8 / 30

  16. Convergence of Laplace Approximation Theorem ( [Schillings, S., Wacker, 2019] ) Given that Φ ∈ C 3 ( R d ) , unique x n and C n > 0 for sufficiently large n > 0 , a unique minimizer x ⋆ := argmin x ∈ R d Φ( x ) exists with ∇ 2 Φ( x ⋆ ) > 0 and � x − x ⋆ � > r Φ( x ) ≥ Φ( x ⋆ ) + δ r , inf δ r > 0 , lim n →∞ x n = x ⋆ . Then d H ( µ n , L µ n ) ∈ O ( n − 1 / 2 ) . Closely related to the Bernstein–von Mises theorem but: Covariance of L µ n depends on given data (BvM: Fisher information) Misspecification (“ground truth” not in prior support) not important Density d µ n / d L µ n also exists in Hilbert spaces (for Gaussian µ 0 ) 8 / 30

  17. Remarks The convergence theorem can be extended under suitable assumptions to any prior µ 0 which is absolutely continuous w.r.t. Lebesgue measure, 1 sequences of Φ n , e.g., 2 n � Φ n ( x ) = 1 � y i − G ( x ) � 2 2 n i =1 the underdetermined case G : R d → R J , J < d , iff µ 0 is Gaussian and G acts 3 only on linear active subspace M with dim( M ) ≤ J : ∀ x ∈ R M ∀ m ∈ M ⊥ G ( x + m ) = G ( x ) , x n , � x n � , � C n − � C n � ∈ O ( n − 1 ) Approximations � C n of x n , C n such that � x n − � 4 9 / 30

  18. Examples µ 0 = N (0 , I 2 ), Φ( x ) = 1 2 � y − G ( x ) � 2 , G ( x ) = [exp( 1 5 ( x 2 − x 1 )) , sin( x 2 − x 1 )] ⊤ Posterior µ n L µ n 10 0 Hellinger distance 10 -1 10 -2 d H ( n , LA n ) rate: -0.50 10 -3 10 0 10 5 n 2 � 0 − G ( x ) � 2 with G ( x ) = x 2 − x 2 µ 0 = N (0 , I 2 ) and Φ( x ) = 1 1 Posterior µ n L µ n 1.2 Hellinger distance 1 0.8 0.6 d H ( n , LA n ) rate: 0.00 0.4 10 0 10 5 n 10 / 30

  19. Next Laplace Approximation 1 Markov Chain Monte Carlo 2 Importance Sampling 3 Quasi Monte Carlo 4 11 / 30

  20. Markov Chain Monte Carlo (MCMC) Construct Markov chain ( X m ) m ∈ N with invariant measure µ n , i.e., X m ∼ µ n ⇒ X m +1 ∼ µ n D and for f ∈ L 2 Given suitable conditions, we have X m − m →∞ µ n − − − → µ 0 ( R ) M � S M ( f ) := 1 a.s. f ( X m ) − − − − → E µ n [ f ] M M →∞ m =1 Autocorrelation of Markov chain effects efficiency: � � �� 2 � � ∞ � � � M E � S M ( f ) − E µ n [ f ] � − − − − → Var µ n ( f ) 1 + 2 Corr ( f ( X 1 ) , f ( X 1+ m )) M →∞ m =0 � �� � integrated autocorrelation time (IACT) 12 / 30

  21. Metropolis-Hastings (MH) algorithm [Metropolis et al., 1953] Given current state X m = x , draw new state y according to proposal kernel P ( x , · ): Y m ∼ P ( x ) 1 accept proposed y with acceptance probability α ( x , y ), i.e., set 2 � y , with probability α ( x , y ) , X m +1 = x , with probability 1 − α ( x , y ) . 13 / 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend