Fast methods for Bayesian inverse problems with uncertain PDE - - PowerPoint PPT Presentation

fast methods for bayesian inverse problems with uncertain
SMART_READER_LITE
LIVE PREVIEW

Fast methods for Bayesian inverse problems with uncertain PDE - - PowerPoint PPT Presentation

Fast methods for Bayesian inverse problems with uncertain PDE forward models Ilona Ambartsumyan and Omar Ghattas Oden Institute for Computational Engineering and Sciences The University of Texas at Austin Example of inverse problem with


slide-1
SLIDE 1

Fast methods for Bayesian inverse problems with uncertain PDE forward models

Ilona Ambartsumyan and Omar Ghattas

Oden Institute for Computational Engineering and Sciences The University of Texas at Austin

slide-2
SLIDE 2

Example of inverse problem with uncertain forward model

Infer fault transmissibility in a subsurface flow model:

Berkeley Lab, newscenter.lbl.gov SPE 10, spe.org/web/csp

  • PDE model: multi-phase

porous medium flow (expensive to solve)

  • Observations: fluid pressure

and well production (typically sparse and noisy)

  • Presence of random

parameter field: background permeability (reconstructed from previous studies)

1

slide-3
SLIDE 3

Derivation of the algorithm

slide-4
SLIDE 4

Inverse problem governed by random PDE forward problem

Infer the inversion parameter m ∈ M, given noisy observations d ∈ V of the state variable u ∈ U, governed by the random PDE r(u, m, k) = 0 where

  • Random parameter k ∈ K is normally distributed,

k ∼ N(¯ k, Γk)

  • Observations are polluted by Gaussian additive noise,

d = Bu(k, m) + β, β ∼ N(0, Γβ)

  • Prior information about m is given by the measure µpr

2

slide-5
SLIDE 5

Bayesian solution of inverse problem

By (infinite dimensional) Bayes’ Theorem, dµpost dµpr = 1 Z πlike(d|m) Marginalize out the random parameter k: πlike(d|m) ∝

  • K

π(d|m, k) dµk =

  • K

exp

  • −1

2(d − Bu(m, k))T Γ−1

β (d − Bu(m, k))

  • dµk

For sufficiently complex forward models, this integral is prohibitive to approximate via Monte Carlo sampling (and it has to be done for each sample of m)

3

slide-6
SLIDE 6

Approximation of the likelihood

Consider the first-order approximation of u wrt k: uL(m, k) = u(m, ¯ k) + Dku(m, ¯ k)[k − ¯ k] where Dk is the Fr´ echet derivative of u wrt k The likelihood based on uL becomes:

πL

like ∝ exp

1 2d − Bu(m, ¯ k)2

Γ−1

ν −1

2 log det(Γν)

  • where Γν := Γβ + (BDku)Γk(BDku)T

The term (BDku)Γk(BDku)T is the contribution of model uncertainty to the noise covariance, and results from pushing forward the covariance of k, Γk, through the Jacobian of the parameter-to-observable map, BDku.

4

slide-7
SLIDE 7

Low rank approximation of sensitivity matrix

  • As m changes across optimization iterations/MCMC sampling steps, we

need to evaluate (and later compute derivatives wrt m of) the term Γβ +

J(m,¯ k)

  • (BDku) Γk

J(m,¯ k)T

  • (BDku)T
  • J ΓkJT resembles the Gauss-Newton Hessian (of the log-likelihood wrt

k), but turned inside-out (i.e. JJT operating on data space instead of JTJ operating on parameter space)

  • Explicit construction of J requires as many (linearized) forward PDE

solves as min(data dimension, k dimension), which is prohibitive

  • For many problems, singular values of BDku decay rapidly and a

(randomized) low rank approximation can be made: J ΓkJT ≈ VrΛrV T

r

where the columns of Vr contain the r eigenvectors of J ΓkJT and the diagonal elements of Λr contains its eigenvalues

  • Use randomized SVD which requires O(r) matrix-vector products with

random vectors ξi, amounting to O(r) linearized adjoint solves ηi = JT ξi and linearized forward solves JΓkηi

5

slide-8
SLIDE 8

Example: Antarctic ice sheet flow inversion

  • T. Isaac, N. Petra, G. Stadler, O. Ghattas, JCP 2015

6

slide-9
SLIDE 9

Spectrum of Hessian for Antarctic ice flow inverse problem

O(i−3) eigenvalue decay of prior-preconditioned data misfit Hessian (5000 out

  • f 1.19 million) and eigenvectors 1, 7, 100, 500, 4000

7

slide-10
SLIDE 10

Eigenproblem-constrained optimization for MAP point

With the low-rank approximation, the maximum a posteriori estimate, mMAP, is found by minimizing the negative log of the posterior distribution: mMAP = arg min

m∈M

1 2Bu(m, ¯ k) − d2

(Γβ+VrΛrV T

r )−1

+ 1 2Γ

− 1

2

pr (m − ¯

m)2

L2(Ω) + 1

2 log det(Γβ + VrΛrV T

r )

  • where the state solution u depends on the parameters m, ¯

k through (r(u, m, ¯ k), ˜ u) = 0 ∀ ˜ u ∈ U and the eigenvalues λi & eigenvectors vi depend on u, m, ¯ k through (J(¯ k, m, u) ΓkJ(¯ k, m, u)Tvi, ˜ vi) = λi(vi, ˜ vi) ∀ ˜ vi ∈ V, i = 1, . . . , r

  • Solving the eigenvalue problem via randomized SVD amounts to solving

O(r) pairs of linearized adjoint/forward PDEs

  • Computing the gradient requires differentiating through the eigenvalue

problem to obtain eigenvalue and eigenvector sensitivities, necessitating an solution of O(r) adjoint eigenvalue problems.

8

slide-11
SLIDE 11

Current simpler implementation based on MC estimator

  • Have constructed fast algorithms for Hessian-constrained optimization in other

contexts (e.g., P. Chen, U. Villa, and O. Ghattas, JCP 2019)

  • But somewhat complicated implementation, so initially consider a Monte Carlo

estimator of JΓkJT

  • Let e be a standard normal random variable; then

(BDku(m, ¯ k))Γk(BDku(m, ¯ k))T = (BDku(m, ¯ k))Γ1/2

k

E[eeT ]Γ1/2

k

(BDku(m, ¯ k))T = E[BDku(m, ¯ k))Γ1/2

k

e (BDku(m, ¯ k))Γ1/2

k

e)T ]

  • Consider a Monte Carlo estimator with ξi ∼ N(0, Γk), i = 1, . . . , ns,

(BDku(m, ¯ k))Γk(BDku(m, ¯ k))T ≈ 1 ns

ns

  • i=1
  • BDku(m, ¯

k)[ξi] BDku(m, ¯ k)[ξi] T

9

slide-12
SLIDE 12

MC estimator-based MAP point computation

The maximum a posteriori estimate, mMAP, is found by minimizing the negative log of the posterior distribution: mMAP = arg min

m∈M

1 2Bu(m, ¯ k) − d2

(Γβ+ 1

ns

ns

i=1 BUiBUT i )−1

+ 1 2Γ

− 1

2

pr (m − ¯

m)2

L2(Ω) + 1

2 log det(Γβ + 1 ns

ns

  • i=1

BUiBU T

i )

  • where the state solution u depends on the parameters m, ¯

k through (r(u, m, ¯ k), ˜ u) = 0 ∀ ˜ u ∈ U and the sensitivity solutions, Ui, depend on u, m, ¯ k through ((∂ur)(¯ k, m, u)[Ui], ˜ Ui) = −((∂kr)(¯ k, m, u)[ξi], ˜ Ui) ∀ ˜ Ui ∈ U, i = 1, . . . , ns This is an ns + 1 PDE constrained optimization problem, but it can be solved efficiently using gradient based optimization, since ns is typically small and independent of the parameter dimension. This requires an efficient computation of the gradient...

10

slide-13
SLIDE 13

Gradient computation for the MAP optimization problem

The (weak form of the) gradient is given by (G(m), ˜ m) = (m − ¯ m, Γ−1

pr ˜

m) + ((∂mr)(¯ k, m, u)[ ˜ m], v) +

ns

  • i=1
  • ((∂umr)(¯

k, m, u)[Ui, ˜ m], Vi) + ((∂kmr)(¯ k, m, u)[ξi, ˜ m], Vi)

  • , ∀ ˜

m ∈ M where u and {Ui}ns

i=1 solve the “forward” problems

(r(u, m, ¯ k), ˜ u) = 0 ∀ ˜ u ∈ V ((∂ur)(¯ k, m, u)[Ui], ˜ Ui) = −((∂kr)(¯ k, m, u)[ξi], ˜ Ui) ∀ ˜ Ui ∈ V, i = 1, . . . , ns and v and {Vi}ns

i=1 solve the “adjoint” problems

((∂ur)(u, m, ¯ k)[˜ v], v) = −

  • Bu − d, Γ−1

ν ˜

v

ns

  • i=1
  • ((∂uur)(¯

k, m, u)[Ui, ˜ v], Vi) + ((∂kur)(¯ k, m, u)[ξi, ˜ v], Vi)

  • , ∀ ˜

v ∈ V ((∂uUir)(¯ k, m, u)[Ui, ˜ Vi], Vi) + ((∂kUir)(¯ k, m, u)[ξi, ˜ Vi], Vi) = − 1 ns (BT Γ−1

ν BUi, ˜

Vi) + 1 ns (BT Γ−1

ν (Bu − d)(Bu − d)T Γ−1 ν BUi, ˜

Vi), ∀ ˜ Vi ∈ V, i = 1, . . . , ns

11

slide-14
SLIDE 14

Numerical examples

slide-15
SLIDE 15

Test case 1: Poisson problem

The forward problem: find u ∈ V s.t. −∇ · (em∇)u = k, in Ω, u = 0,

  • n ΓD,

em∇u · n = 0,

  • n ΓN.
  • The domain is [0, 1] × [0, 1];
  • Dirichlet BC on ΓD = {0} × [0, 1];
  • 4225 dofs in state space,

1089 dofs in (m, k) parameter space

  • 16 observations with random locations;
  • ns = 30 samples;
  • 1% Gaussian noise.

Figure 1: State solution Figure 2: Observations

12

slide-16
SLIDE 16

Test case 1: Random parameter

We assume that µk ∼ N(¯ k, Γk), where the covariance is given by Γk = A−2 s.t. Ak = ˜ k satisfies γk(Θk∇k, ∇v) + δk(k, v) = (˜ k, v) ∀v ∈ H1(Ω)

Figure 3: Left: sample from µk (used to synthesize data); right: mean of µk (used

for deterministic inversion)

The prior is given by a distribution with similar structure

13

slide-17
SLIDE 17

Test case 1: Some comments

  • Random parameter-to-state map is linear, so approximation is

exact

  • Small number of observations, not evenly distributed over

domain

  • Relatively high variance of random parameter distribution,

dominates observational noise Algorithm has been implemented using FEniCS1 and hIPPYlib2

  • pen source libraries

1https://fenicsproject.org/ 2https://hippylib.github.io/

14

slide-18
SLIDE 18

Test case 1: MAP point

Figure 4: Left: true m; middle: MAP point mdet; right: MAP point mstoch

We compare the true field m, the MAP estimate obtained using the deterministic algorithm mdet, and the MAP point obtained by the proposed algorithm mstoch.

15

slide-19
SLIDE 19

Test case 1: Eigenvalue of data misfit Hessian (wrt m)

Figure 5: Generalized eigenvalues of prior preconditioned data misfit Hessian (wrt

m) 16

slide-20
SLIDE 20

Test case 2: Advection–diffusion–reaction problem

The forward problem: find u ∈ V s.t. −∇ · (κ∇u) + ∇u · k + cu = m, in Ω, u = 0,

  • n ΓD.
  • The domain is [0, 1] × [0, 1];
  • Dirichlet BC on

ΓD = {0} × [0, 1] ∪ [0, 1] × {0};

  • 2048 dofs in state space,

1089 dofs in parameter space

  • 50 observations with random locations;
  • ns = 30 samples;
  • 1% Gaussian noise.

Figure 6: State solution Figure 7: Observations

17

slide-21
SLIDE 21

Test case 2: Random parameter

We assume that µk ∼ N(¯ k, Γk), where covariance given by Γk = A−2 s.t. Ak = ˜ k satisfies γk(Θk∇k, ∇v) + δk(k, v) = (˜ k, v), ∀v ∈ H1(Ω).

Figure 9: Left: sample from µk; right: mean of µk.

18

slide-22
SLIDE 22

Test case 2: State solution

The state solution is sensitive to the random parameter

Figure 10: Left: true m; middle: u(m, ¯

k); right: u(m, k).

Diffusion coefficient κ = 10−2, reaction coefficient c = 0.4

19

slide-23
SLIDE 23

Test case 2: Some comments

  • Random parameter-to-state map is not linear, so

approximation is not exact

  • Moderate number of observations, evenly distributed over the

domain

  • Large variance in random parameter distribution, dominates

noise

  • Sharp interfaces in the inversion parameter field (TV prior

used)

20

slide-24
SLIDE 24

Test case 2: MAP point

Figure 11: Left: true m; middle: MAP point mdet; right: MAP point mstoch.

We compare the true field m, the MAP estimate obtained using traditional algorithm mdet, and the MAP point obtained by the proposed algorithm mstoch.

21

slide-25
SLIDE 25

Conclusions

slide-26
SLIDE 26

Conclusions

We considered the Bayesian inference of an unknown parameter field m in the presence of uncertainty in another parameter field k

  • We proposed a method based on a linearized approximation of

the map from k to u that accounts for both the uncertainty and the sensitivity of the state wrt k

  • Computation of the negative log posterior and its gradient

require a number of PDE solves (ns+1) that is small and independent of the dimensions of the unknown parameter m and the random parameter k

  • Examples illustrate the effect of accounting for model

uncertainty in the Bayesian solution

22

slide-27
SLIDE 27

Ongoing and future work

  • Approximation of the Jacobian J via randomized truncated SVD

(greater accuracy for fewer ns)

  • Requires addition of eigenvalue problem with forw/adjoint PDEs
  • Comparison of our approach to full MCMC + inner MC solution
  • Extension of the method to allow for more general distributions

µk of the random parameter

  • Use of the approximation as a control variate:

πlike(d|m) ∝ Eµk[π(d|m, k)] = Eµk[πL(d|m, k)] + Eµk[π(d|m, k) − πL(d|m, k)] ≈ exp 1 2d − Bu(m, ¯ k)2

Γ−1

ν

− 1 2 log det(Γν)

  • +

1 N

N

  • i=1

exp 1 2d − Bu(m, ki)2

Γ−1

β

1 N

N

  • i=1

exp 1 2d − Bu(m, ¯ k) − BDku(m, ¯ k)[ki − ¯ k]2

Γ−1

β

  • 23