SLIDE 1
Fast methods for Bayesian inverse problems with uncertain PDE forward models
Ilona Ambartsumyan and Omar Ghattas
Oden Institute for Computational Engineering and Sciences The University of Texas at Austin
SLIDE 2 Example of inverse problem with uncertain forward model
Infer fault transmissibility in a subsurface flow model:
Berkeley Lab, newscenter.lbl.gov SPE 10, spe.org/web/csp
porous medium flow (expensive to solve)
- Observations: fluid pressure
and well production (typically sparse and noisy)
parameter field: background permeability (reconstructed from previous studies)
1
SLIDE 3
Derivation of the algorithm
SLIDE 4 Inverse problem governed by random PDE forward problem
Infer the inversion parameter m ∈ M, given noisy observations d ∈ V of the state variable u ∈ U, governed by the random PDE r(u, m, k) = 0 where
- Random parameter k ∈ K is normally distributed,
k ∼ N(¯ k, Γk)
- Observations are polluted by Gaussian additive noise,
d = Bu(k, m) + β, β ∼ N(0, Γβ)
- Prior information about m is given by the measure µpr
2
SLIDE 5 Bayesian solution of inverse problem
By (infinite dimensional) Bayes’ Theorem, dµpost dµpr = 1 Z πlike(d|m) Marginalize out the random parameter k: πlike(d|m) ∝
π(d|m, k) dµk =
exp
2(d − Bu(m, k))T Γ−1
β (d − Bu(m, k))
For sufficiently complex forward models, this integral is prohibitive to approximate via Monte Carlo sampling (and it has to be done for each sample of m)
3
SLIDE 6 Approximation of the likelihood
Consider the first-order approximation of u wrt k: uL(m, k) = u(m, ¯ k) + Dku(m, ¯ k)[k − ¯ k] where Dk is the Fr´ echet derivative of u wrt k The likelihood based on uL becomes:
πL
like ∝ exp
1 2d − Bu(m, ¯ k)2
Γ−1
ν −1
2 log det(Γν)
- where Γν := Γβ + (BDku)Γk(BDku)T
The term (BDku)Γk(BDku)T is the contribution of model uncertainty to the noise covariance, and results from pushing forward the covariance of k, Γk, through the Jacobian of the parameter-to-observable map, BDku.
4
SLIDE 7 Low rank approximation of sensitivity matrix
- As m changes across optimization iterations/MCMC sampling steps, we
need to evaluate (and later compute derivatives wrt m of) the term Γβ +
J(m,¯ k)
J(m,¯ k)T
- (BDku)T
- J ΓkJT resembles the Gauss-Newton Hessian (of the log-likelihood wrt
k), but turned inside-out (i.e. JJT operating on data space instead of JTJ operating on parameter space)
- Explicit construction of J requires as many (linearized) forward PDE
solves as min(data dimension, k dimension), which is prohibitive
- For many problems, singular values of BDku decay rapidly and a
(randomized) low rank approximation can be made: J ΓkJT ≈ VrΛrV T
r
where the columns of Vr contain the r eigenvectors of J ΓkJT and the diagonal elements of Λr contains its eigenvalues
- Use randomized SVD which requires O(r) matrix-vector products with
random vectors ξi, amounting to O(r) linearized adjoint solves ηi = JT ξi and linearized forward solves JΓkηi
5
SLIDE 8 Example: Antarctic ice sheet flow inversion
- T. Isaac, N. Petra, G. Stadler, O. Ghattas, JCP 2015
6
SLIDE 9 Spectrum of Hessian for Antarctic ice flow inverse problem
O(i−3) eigenvalue decay of prior-preconditioned data misfit Hessian (5000 out
- f 1.19 million) and eigenvectors 1, 7, 100, 500, 4000
7
SLIDE 10 Eigenproblem-constrained optimization for MAP point
With the low-rank approximation, the maximum a posteriori estimate, mMAP, is found by minimizing the negative log of the posterior distribution: mMAP = arg min
m∈M
1 2Bu(m, ¯ k) − d2
(Γβ+VrΛrV T
r )−1
+ 1 2Γ
− 1
2
pr (m − ¯
m)2
L2(Ω) + 1
2 log det(Γβ + VrΛrV T
r )
- where the state solution u depends on the parameters m, ¯
k through (r(u, m, ¯ k), ˜ u) = 0 ∀ ˜ u ∈ U and the eigenvalues λi & eigenvectors vi depend on u, m, ¯ k through (J(¯ k, m, u) ΓkJ(¯ k, m, u)Tvi, ˜ vi) = λi(vi, ˜ vi) ∀ ˜ vi ∈ V, i = 1, . . . , r
- Solving the eigenvalue problem via randomized SVD amounts to solving
O(r) pairs of linearized adjoint/forward PDEs
- Computing the gradient requires differentiating through the eigenvalue
problem to obtain eigenvalue and eigenvector sensitivities, necessitating an solution of O(r) adjoint eigenvalue problems.
8
SLIDE 11 Current simpler implementation based on MC estimator
- Have constructed fast algorithms for Hessian-constrained optimization in other
contexts (e.g., P. Chen, U. Villa, and O. Ghattas, JCP 2019)
- But somewhat complicated implementation, so initially consider a Monte Carlo
estimator of JΓkJT
- Let e be a standard normal random variable; then
(BDku(m, ¯ k))Γk(BDku(m, ¯ k))T = (BDku(m, ¯ k))Γ1/2
k
E[eeT ]Γ1/2
k
(BDku(m, ¯ k))T = E[BDku(m, ¯ k))Γ1/2
k
e (BDku(m, ¯ k))Γ1/2
k
e)T ]
- Consider a Monte Carlo estimator with ξi ∼ N(0, Γk), i = 1, . . . , ns,
(BDku(m, ¯ k))Γk(BDku(m, ¯ k))T ≈ 1 ns
ns
k)[ξi] BDku(m, ¯ k)[ξi] T
9
SLIDE 12 MC estimator-based MAP point computation
The maximum a posteriori estimate, mMAP, is found by minimizing the negative log of the posterior distribution: mMAP = arg min
m∈M
1 2Bu(m, ¯ k) − d2
(Γβ+ 1
ns
ns
i=1 BUiBUT i )−1
+ 1 2Γ
− 1
2
pr (m − ¯
m)2
L2(Ω) + 1
2 log det(Γβ + 1 ns
ns
BUiBU T
i )
- where the state solution u depends on the parameters m, ¯
k through (r(u, m, ¯ k), ˜ u) = 0 ∀ ˜ u ∈ U and the sensitivity solutions, Ui, depend on u, m, ¯ k through ((∂ur)(¯ k, m, u)[Ui], ˜ Ui) = −((∂kr)(¯ k, m, u)[ξi], ˜ Ui) ∀ ˜ Ui ∈ U, i = 1, . . . , ns This is an ns + 1 PDE constrained optimization problem, but it can be solved efficiently using gradient based optimization, since ns is typically small and independent of the parameter dimension. This requires an efficient computation of the gradient...
10
SLIDE 13 Gradient computation for the MAP optimization problem
The (weak form of the) gradient is given by (G(m), ˜ m) = (m − ¯ m, Γ−1
pr ˜
m) + ((∂mr)(¯ k, m, u)[ ˜ m], v) +
ns
k, m, u)[Ui, ˜ m], Vi) + ((∂kmr)(¯ k, m, u)[ξi, ˜ m], Vi)
m ∈ M where u and {Ui}ns
i=1 solve the “forward” problems
(r(u, m, ¯ k), ˜ u) = 0 ∀ ˜ u ∈ V ((∂ur)(¯ k, m, u)[Ui], ˜ Ui) = −((∂kr)(¯ k, m, u)[ξi], ˜ Ui) ∀ ˜ Ui ∈ V, i = 1, . . . , ns and v and {Vi}ns
i=1 solve the “adjoint” problems
((∂ur)(u, m, ¯ k)[˜ v], v) = −
ν ˜
v
ns
k, m, u)[Ui, ˜ v], Vi) + ((∂kur)(¯ k, m, u)[ξi, ˜ v], Vi)
v ∈ V ((∂uUir)(¯ k, m, u)[Ui, ˜ Vi], Vi) + ((∂kUir)(¯ k, m, u)[ξi, ˜ Vi], Vi) = − 1 ns (BT Γ−1
ν BUi, ˜
Vi) + 1 ns (BT Γ−1
ν (Bu − d)(Bu − d)T Γ−1 ν BUi, ˜
Vi), ∀ ˜ Vi ∈ V, i = 1, . . . , ns
11
SLIDE 14
Numerical examples
SLIDE 15 Test case 1: Poisson problem
The forward problem: find u ∈ V s.t. −∇ · (em∇)u = k, in Ω, u = 0,
em∇u · n = 0,
- n ΓN.
- The domain is [0, 1] × [0, 1];
- Dirichlet BC on ΓD = {0} × [0, 1];
- 4225 dofs in state space,
1089 dofs in (m, k) parameter space
- 16 observations with random locations;
- ns = 30 samples;
- 1% Gaussian noise.
Figure 1: State solution Figure 2: Observations
12
SLIDE 16
Test case 1: Random parameter
We assume that µk ∼ N(¯ k, Γk), where the covariance is given by Γk = A−2 s.t. Ak = ˜ k satisfies γk(Θk∇k, ∇v) + δk(k, v) = (˜ k, v) ∀v ∈ H1(Ω)
Figure 3: Left: sample from µk (used to synthesize data); right: mean of µk (used
for deterministic inversion)
The prior is given by a distribution with similar structure
13
SLIDE 17 Test case 1: Some comments
- Random parameter-to-state map is linear, so approximation is
exact
- Small number of observations, not evenly distributed over
domain
- Relatively high variance of random parameter distribution,
dominates observational noise Algorithm has been implemented using FEniCS1 and hIPPYlib2
1https://fenicsproject.org/ 2https://hippylib.github.io/
14
SLIDE 18
Test case 1: MAP point
Figure 4: Left: true m; middle: MAP point mdet; right: MAP point mstoch
We compare the true field m, the MAP estimate obtained using the deterministic algorithm mdet, and the MAP point obtained by the proposed algorithm mstoch.
15
SLIDE 19
Test case 1: Eigenvalue of data misfit Hessian (wrt m)
Figure 5: Generalized eigenvalues of prior preconditioned data misfit Hessian (wrt
m) 16
SLIDE 20 Test case 2: Advection–diffusion–reaction problem
The forward problem: find u ∈ V s.t. −∇ · (κ∇u) + ∇u · k + cu = m, in Ω, u = 0,
- n ΓD.
- The domain is [0, 1] × [0, 1];
- Dirichlet BC on
ΓD = {0} × [0, 1] ∪ [0, 1] × {0};
- 2048 dofs in state space,
1089 dofs in parameter space
- 50 observations with random locations;
- ns = 30 samples;
- 1% Gaussian noise.
Figure 6: State solution Figure 7: Observations
17
SLIDE 21
Test case 2: Random parameter
We assume that µk ∼ N(¯ k, Γk), where covariance given by Γk = A−2 s.t. Ak = ˜ k satisfies γk(Θk∇k, ∇v) + δk(k, v) = (˜ k, v), ∀v ∈ H1(Ω).
Figure 9: Left: sample from µk; right: mean of µk.
18
SLIDE 22
Test case 2: State solution
The state solution is sensitive to the random parameter
Figure 10: Left: true m; middle: u(m, ¯
k); right: u(m, k).
Diffusion coefficient κ = 10−2, reaction coefficient c = 0.4
19
SLIDE 23 Test case 2: Some comments
- Random parameter-to-state map is not linear, so
approximation is not exact
- Moderate number of observations, evenly distributed over the
domain
- Large variance in random parameter distribution, dominates
noise
- Sharp interfaces in the inversion parameter field (TV prior
used)
20
SLIDE 24
Test case 2: MAP point
Figure 11: Left: true m; middle: MAP point mdet; right: MAP point mstoch.
We compare the true field m, the MAP estimate obtained using traditional algorithm mdet, and the MAP point obtained by the proposed algorithm mstoch.
21
SLIDE 25
Conclusions
SLIDE 26 Conclusions
We considered the Bayesian inference of an unknown parameter field m in the presence of uncertainty in another parameter field k
- We proposed a method based on a linearized approximation of
the map from k to u that accounts for both the uncertainty and the sensitivity of the state wrt k
- Computation of the negative log posterior and its gradient
require a number of PDE solves (ns+1) that is small and independent of the dimensions of the unknown parameter m and the random parameter k
- Examples illustrate the effect of accounting for model
uncertainty in the Bayesian solution
22
SLIDE 27 Ongoing and future work
- Approximation of the Jacobian J via randomized truncated SVD
(greater accuracy for fewer ns)
- Requires addition of eigenvalue problem with forw/adjoint PDEs
- Comparison of our approach to full MCMC + inner MC solution
- Extension of the method to allow for more general distributions
µk of the random parameter
- Use of the approximation as a control variate:
πlike(d|m) ∝ Eµk[π(d|m, k)] = Eµk[πL(d|m, k)] + Eµk[π(d|m, k) − πL(d|m, k)] ≈ exp 1 2d − Bu(m, ¯ k)2
Γ−1
ν
− 1 2 log det(Γν)
1 N
N
exp 1 2d − Bu(m, ki)2
Γ−1
β
1 N
N
exp 1 2d − Bu(m, ¯ k) − BDku(m, ¯ k)[ki − ¯ k]2
Γ−1
β