fast methods for bayesian inverse problems with uncertain
play

Fast methods for Bayesian inverse problems with uncertain PDE - PowerPoint PPT Presentation

Fast methods for Bayesian inverse problems with uncertain PDE forward models Ilona Ambartsumyan and Omar Ghattas Oden Institute for Computational Engineering and Sciences The University of Texas at Austin Example of inverse problem with


  1. Fast methods for Bayesian inverse problems with uncertain PDE forward models Ilona Ambartsumyan and Omar Ghattas Oden Institute for Computational Engineering and Sciences The University of Texas at Austin

  2. Example of inverse problem with uncertain forward model Infer fault transmissibility in a subsurface flow model: • PDE model: multi-phase porous medium flow (expensive to solve) • Observations: fluid pressure Berkeley Lab, newscenter.lbl.gov and well production (typically sparse and noisy) • Presence of random parameter field: background permeability (reconstructed from previous studies) SPE 10, spe.org/web/csp 1

  3. Derivation of the algorithm

  4. Inverse problem governed by random PDE forward problem Infer the inversion parameter m ∈ M , given noisy observations d ∈ V of the state variable u ∈ U , governed by the random PDE r ( u, m, k ) = 0 where • Random parameter k ∈ K is normally distributed, k ∼ N (¯ k, Γ k ) • Observations are polluted by Gaussian additive noise, d = B u ( k, m ) + β, β ∼ N (0 , Γ β ) • Prior information about m is given by the measure µ pr 2

  5. Bayesian solution of inverse problem By (infinite dimensional) Bayes’ Theorem, dµ post = 1 Z π like ( d | m ) dµ pr Marginalize out the random parameter k : � π like ( d | m ) ∝ π ( d | m, k ) dµ k K � − 1 � � 2( d − B u ( m, k )) T Γ − 1 = exp β ( d − B u ( m, k )) dµ k K For sufficiently complex forward models, this integral is prohibitive to approximate via Monte Carlo sampling (and it has to be done for each sample of m ) 3

  6. Approximation of the likelihood Consider the first-order approximation of u wrt k : u L ( m, k ) = u ( m, ¯ k ) + D k u ( m, ¯ k )[ k − ¯ k ] where D k is the Fr´ echet derivative of u wrt k The likelihood based on u L becomes: � 1 ν − 1 � 2 � d − B u ( m, ¯ π L k ) � 2 like ∝ exp 2 log det(Γ ν ) Γ − 1 where Γ ν := Γ β + ( B D k u )Γ k ( B D k u ) T The term ( B D k u )Γ k ( B D k u ) T is the contribution of model uncertainty to the noise covariance, and results from pushing forward the covariance of k , Γ k , through the Jacobian of the parameter-to-observable map, B D k u . 4

  7. Low rank approximation of sensitivity matrix • As m changes across optimization iterations/MCMC sampling steps, we need to evaluate (and later compute derivatives wrt m of) the term J ( m, ¯ k ) T J ( m, ¯ k ) � �� � � �� � ( B D k u ) T Γ β + ( B D k u ) Γ k • J Γ k J T resembles the Gauss-Newton Hessian (of the log-likelihood wrt k ), but turned inside-out (i.e. JJ T operating on data space instead of J T J operating on parameter space) • Explicit construction of J requires as many (linearized) forward PDE solves as min (data dimension, k dimension), which is prohibitive • For many problems, singular values of B D k u decay rapidly and a (randomized) low rank approximation can be made: J Γ k J T ≈ V r Λ r V T r where the columns of V r contain the r eigenvectors of J Γ k J T and the diagonal elements of Λ r contains its eigenvalues • Use randomized SVD which requires O ( r ) matrix-vector products with random vectors ξ i , amounting to O ( r ) linearized adjoint solves η i = J T ξ i and linearized forward solves J Γ k η i 5

  8. Example: Antarctic ice sheet flow inversion T. Isaac, N. Petra, G. Stadler, O. Ghattas, JCP 2015 6

  9. Spectrum of Hessian for Antarctic ice flow inverse problem O ( i − 3 ) eigenvalue decay of prior-preconditioned data misfit Hessian (5000 out of 1.19 million) and eigenvectors 1, 7, 100, 500, 4000 7

  10. Eigenproblem-constrained optimization for MAP point With the low-rank approximation, the maximum a posteriori estimate, m MAP , is found by minimizing the negative log of the posterior distribution: � 1 2 �B u ( m, ¯ k ) − d � 2 m MAP = arg min (Γ β + V r Λ r V T r ) − 1 m ∈M + 1 L 2 (Ω) + 1 � − 1 m ) � 2 2 log det(Γ β + V r Λ r V T 2 � Γ pr ( m − ¯ 2 r ) where the state solution u depends on the parameters m, ¯ k through ( r ( u, m, ¯ k ) , ˜ u ) = 0 ∀ ˜ u ∈ U and the eigenvalues λ i & eigenvectors v i depend on u, m, ¯ k through ( J (¯ k, m, u ) Γ k J (¯ k, m, u ) T v i , ˜ v i ) = λ i ( v i , ˜ v i ) ∀ ˜ v i ∈ V, i = 1 , . . . , r • Solving the eigenvalue problem via randomized SVD amounts to solving O ( r ) pairs of linearized adjoint/forward PDEs • Computing the gradient requires differentiating through the eigenvalue problem to obtain eigenvalue and eigenvector sensitivities, necessitating 8 an solution of O ( r ) adjoint eigenvalue problems.

  11. Current simpler implementation based on MC estimator • Have constructed fast algorithms for Hessian-constrained optimization in other contexts (e.g., P. Chen, U. Villa, and O. Ghattas, JCP 2019) • But somewhat complicated implementation, so initially consider a Monte Carlo estimator of J Γ k J T • Let e be a standard normal random variable; then k )) T = ( B D k u ( m, ¯ E [ ee T ]Γ 1 / 2 ( B D k u ( m, ¯ k ))Γ k ( B D k u ( m, ¯ k ))Γ 1 / 2 ( B D k u ( m, ¯ k )) T k k e ) T ] = E [ B D k u ( m, ¯ k ))Γ 1 / 2 e ( B D k u ( m, ¯ k ))Γ 1 / 2 k k • Consider a Monte Carlo estimator with ξ i ∼ N (0 , Γ k ) , i = 1 , . . . , n s , n s k )) T ≈ 1 � � � � � T ( B D k u ( m, ¯ k ))Γ k ( B D k u ( m, ¯ B D k u ( m, ¯ B D k u ( m, ¯ k )[ ξ i ] k )[ ξ i ] n s i =1 9

  12. MC estimator-based MAP point computation The maximum a posteriori estimate, m MAP , is found by minimizing the negative log of the posterior distribution: � 1 2 �B u ( m, ¯ k ) − d � 2 m MAP = arg min � ns (Γ β + 1 i =1 B U i B U T i ) − 1 ns m ∈M n s + 1 L 2 (Ω) + 1 2 log det(Γ β + 1 � � − 1 m ) � 2 B U i B U T 2 � Γ pr ( m − ¯ i ) 2 n s i =1 where the state solution u depends on the parameters m, ¯ k through ( r ( u, m, ¯ ∀ ˜ u ∈ U k ) , ˜ u ) = 0 and the sensitivity solutions, U i , depend on u, m, ¯ k through (( ∂ u r )(¯ k, m, u )[ U i ] , ˜ U i ) = − (( ∂ k r )(¯ k, m, u )[ ξ i ] , ˜ ∀ ˜ U i ) U i ∈ U , i = 1 , . . . , n s This is an n s + 1 PDE constrained optimization problem, but it can be solved efficiently using gradient based optimization, since n s is typically small and independent of the parameter dimension. This requires an efficient computation of the gradient... 10

  13. Gradient computation for the MAP optimization problem The (weak form of the) gradient is given by m, Γ − 1 m ) + (( ∂ m r )(¯ ( G ( m ) , ˜ m ) = ( m − ¯ pr ˜ k, m, u )[ ˜ m ] , v ) n s � � � (( ∂ um r )(¯ m ] , V i ) + (( ∂ km r )(¯ + k, m, u )[ U i , ˜ k, m, u )[ ξ i , ˜ m ] , V i ) , ∀ ˜ m ∈ M i =1 where u and { U i } n s i =1 solve the “forward” problems ( r ( u, m, ¯ k ) , ˜ u ) = 0 ∀ ˜ u ∈ V (( ∂ u r )(¯ k, m, u )[ U i ] , ˜ U i ) = − (( ∂ k r )(¯ k, m, u )[ ξ i ] , ˜ ∀ ˜ U i ∈ V, i = 1 , . . . , n s U i ) and v and { V i } n s i =1 solve the “adjoint” problems � � (( ∂ u r )( u, m, ¯ B u − d, Γ − 1 k )[˜ v ] , v ) = − ν ˜ v n s � � � (( ∂ uu r )(¯ v ] , V i ) + (( ∂ ku r )(¯ − , ∀ ˜ v ∈ V k, m, u )[ U i , ˜ k, m, u )[ ξ i , ˜ v ] , V i ) i =1 V i ] , V i ) = − 1 n s ( B T Γ − 1 (( ∂ uU i r )(¯ k, m, u )[ U i , ˜ V i ] , V i ) + (( ∂ kU i r )(¯ k, m, u )[ ξ i , ˜ ν B U i , ˜ V i ) + 1 n s ( B T Γ − 1 ν ( B u − d )( B u − d ) T Γ − 1 ν B U i , ˜ V i ) , ∀ ˜ V i ∈ V, i = 1 , . . . , n s 11

  14. Numerical examples

  15. Test case 1: Poisson problem The forward problem: find u ∈ V s.t. −∇ · ( e m ∇ ) u = k, in Ω , u = 0 , on Γ D , e m ∇ u · n = 0 , on Γ N . Figure 1: State solution • The domain is [0 , 1] × [0 , 1] ; • Dirichlet BC on Γ D = { 0 } × [0 , 1] ; • 4225 dofs in state space, 1089 dofs in ( m , k ) parameter space • 16 observations with random locations; • n s = 30 samples; Figure 2: Observations • 1% Gaussian noise. 12

  16. Test case 1: Random parameter We assume that µ k ∼ N (¯ k, Γ k ) , where the covariance is given by Γ k = A − 2 s.t. A k = ˜ k satisfies γ k (Θ k ∇ k, ∇ v ) + δ k ( k, v ) = (˜ ∀ v ∈ H 1 (Ω) k, v ) Figure 3: Left: sample from µ k (used to synthesize data); right: mean of µ k (used for deterministic inversion) The prior is given by a distribution with similar structure 13

  17. Test case 1: Some comments • Random parameter-to-state map is linear, so approximation is exact • Small number of observations, not evenly distributed over domain • Relatively high variance of random parameter distribution, dominates observational noise Algorithm has been implemented using FEniCS 1 and hIPPYlib 2 open source libraries 1 https://fenicsproject.org/ 2 https://hippylib.github.io/ 14

  18. Test case 1: MAP point Figure 4: Left: true m ; middle: MAP point m det ; right: MAP point m stoch We compare the true field m , the MAP estimate obtained using the deterministic algorithm m det , and the MAP point obtained by the proposed algorithm m stoch . 15

  19. Test case 1: Eigenvalue of data misfit Hessian (wrt m ) Figure 5: Generalized eigenvalues of prior preconditioned data misfit Hessian (wrt m ) 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend