Projected Stein variational Newton: A fast and scalable Bayesian - - PowerPoint PPT Presentation

projected stein variational newton
SMART_READER_LITE
LIVE PREVIEW

Projected Stein variational Newton: A fast and scalable Bayesian - - PowerPoint PPT Presentation

Projected Stein variational Newton: A fast and scalable Bayesian inference method in high dimensions Peng Chen Keyi Wu, Joshua Chen, Thomas OLeary-Roseberry, Omar Ghattas Oden Institute for Computational Engineering and Sciences The University


slide-1
SLIDE 1

Projected Stein variational Newton:

A fast and scalable Bayesian inference method in high dimensions

Peng Chen

Keyi Wu, Joshua Chen, Thomas OLeary-Roseberry, Omar Ghattas

Oden Institute for Computational Engineering and Sciences The University of Texas at Austin

RICAM Workshop on Optimization and Inversion under Uncertainty

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 1 / 65

slide-2
SLIDE 2

Example: inversion in Antarctica ice sheet flow

uncertain parameter: basal sliding field in boundary condition forward model: viscous, shear-thinning, incompressible fluid −∇ · (η(u)(∇u + ∇uT) − Ip) = ρg ∇ · u = 0 data: (InSAR) satellite observation of surface ice flow velocity

  • T. Isaac, N. Petra, G. Stadler, O. Ghattas, JCP

, 2015

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 2 / 65

slide-3
SLIDE 3

Outline

1

Bayesian inversion

2

Stein variational methods

3

Projected Stein variational methods

4

Stein variational reduced basis methods

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 3 / 65

slide-4
SLIDE 4

Outline

1

Bayesian inversion

2

Stein variational methods

3

Projected Stein variational methods

4

Stein variational reduced basis methods

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 4 / 65

slide-5
SLIDE 5

Uncertainty parametrization

Example I: Karhunen–Lo` eve expansion

Karhunen–Lo` eve expansion for β with mean ¯ β and covariance C β(x, θ) = ¯ β(x) +

  • j≥1
  • λjψj(x) θj,

(λj, ψj)j≥1: eigenpairs of a covariance C, θ = (θj)j≥1, uncorrelated, given by θj = 1

  • λj
  • D

(κ − ¯ κ)ψj(x)dx.

Example II: dictionary basis representation

We can approximate the random field β by β(x, θ) =

  • j≥1

ψj(x) θj, (ψj)j≥1 dictionary basis, e.g., wavelet or finite element basis.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 5 / 65

slide-6
SLIDE 6

Bayesian inversion

We consider an abstract form of the parameter to data model y = O(θ) + ξ uncertain parameter: θ ∈ Θ ⊂ Rd parameter-to-observable map O

  • bservation data: y ∈ Rn

noise ξ, e.g., ξ ∼ N(0, Γ) Bayes’ rule: πy(θ)

posterior

= 1 π(y) π(y|θ)

likelihood

π0(θ)

prior

, with the model evidence π(y) =

  • Θ

π(y|θ)π0(θ)dθ.

Parameter θ QoI

s(θ)

Foward Model

A(u, v, θ) = F(v)

Bayesian Inversion

πy(θ) ∝ π(y|θ)π0(θ)

Observational Data

y = ℬu + ξ prior posterior π(y|θ) = πξ(y − ℬu) 풪(θ) = ℬu(θ)

The central tasks: sample from posterior and compute statistics, e.g., Eπy[s] =

  • Θ

s(θ)πy(θ)dθ.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 6 / 65

slide-7
SLIDE 7

Computational challenges

Computational challenges for Bayesian inversion:

the posterior has complex geometry: non-Gaussian, multimodal, concentrating in a local region the parameter lives in high-dimensional spaces curse of dimensionality – complexity grows exponentially the map O is expensive to evaluate: involving solve of large-scale partial differential equations complex geometry high dimensionality large-scale computation

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 7 / 65

slide-8
SLIDE 8

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-9
SLIDE 9

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-10
SLIDE 10

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-11
SLIDE 11

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-12
SLIDE 12

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-13
SLIDE 13

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-14
SLIDE 14

Computational methods

Towards better design of MCMC to reduce # samples

1

Langevin and Hamiltonian MCMC (local geometry using gradient, Hessian, etc.) [Stuart et al., 2004, Girolami and Calderhead, 2011, Martin et al., 2012, Bui-Thanh and Girolami, 2014, Lan et al., 2016, Beskos et al., 2017]...

2

dimension reduction MCMC (intrinsic low-dimensionality) [Cui et. al., 2014, 2016, Constantine et. al., 2016]...

3

randomized/optimized MCMC (optimization for sampling) [Oliver, 2017, Wang et al., 2018, Wang et al., 2019]...

Direct posterior construction and statistical computation

1

Laplace approximation (Gaussian posterior approximation) [Bui-Thanh et al., 2013, Chen et al., 2017, Schillings et al., 2019]...

2

deterministic quadrature (sparse Smolyak, high-order quasi-MC) [Schillings and Schwab, 2013, Gantner and Schwab, 2016, Chen and Schwab, 2016, Chen et al., 2017]...

3

transport maps (polynomials, radial basis functions, deep neural networks) [El Moselhy and Marzouk, 2012, Spantini et al., 2018, Rezende and Mohamed, 2015, Liu and Wang, 2016, Detommaso et al., 2018, Chen et al., 2019]...

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 8 / 65

slide-15
SLIDE 15

Computational methods

Surrogate models to reduce the large-scale computation

1

polynomial approximation (stochastic spectral, stochastic collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009, Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

2

model reduction (POD, greedy reduced basis) [Wang and Zabaras, 2005, Lieberman et al., 2010, Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015, Chen and Schwab, 2016, Chen and Ghattas, 2019]...

3

multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et. al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,

  • Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

Aim for this talk: Fast and scalable Bayesian inference in high dimensions by exploiting intrinsic low-dimensionality in both parameter and state spaces, using projected transport map in parameter space reduced basis approximation in state space

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

slide-16
SLIDE 16

Computational methods

Surrogate models to reduce the large-scale computation

1

polynomial approximation (stochastic spectral, stochastic collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009, Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

2

model reduction (POD, greedy reduced basis) [Wang and Zabaras, 2005, Lieberman et al., 2010, Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015, Chen and Schwab, 2016, Chen and Ghattas, 2019]...

3

multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et. al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,

  • Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

Aim for this talk: Fast and scalable Bayesian inference in high dimensions by exploiting intrinsic low-dimensionality in both parameter and state spaces, using projected transport map in parameter space reduced basis approximation in state space

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

slide-17
SLIDE 17

Computational methods

Surrogate models to reduce the large-scale computation

1

polynomial approximation (stochastic spectral, stochastic collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009, Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

2

model reduction (POD, greedy reduced basis) [Wang and Zabaras, 2005, Lieberman et al., 2010, Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015, Chen and Schwab, 2016, Chen and Ghattas, 2019]...

3

multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et. al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,

  • Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

Aim for this talk: Fast and scalable Bayesian inference in high dimensions by exploiting intrinsic low-dimensionality in both parameter and state spaces, using projected transport map in parameter space reduced basis approximation in state space

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

slide-18
SLIDE 18

Computational methods

Surrogate models to reduce the large-scale computation

1

polynomial approximation (stochastic spectral, stochastic collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009, Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

2

model reduction (POD, greedy reduced basis) [Wang and Zabaras, 2005, Lieberman et al., 2010, Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015, Chen and Schwab, 2016, Chen and Ghattas, 2019]...

3

multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et. al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,

  • Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

Aim for this talk: Fast and scalable Bayesian inference in high dimensions by exploiting intrinsic low-dimensionality in both parameter and state spaces, using projected transport map in parameter space reduced basis approximation in state space

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

slide-19
SLIDE 19

Computational methods

Surrogate models to reduce the large-scale computation

1

polynomial approximation (stochastic spectral, stochastic collocation) [Marzouk et al., 2007, Marzouk and Xiu, 2009, Schwab and Stuart, 2012, Chen et al., 2017, Farcas et al., 2019]...

2

model reduction (POD, greedy reduced basis) [Wang and Zabaras, 2005, Lieberman et al., 2010, Nguyen et al., 2010, Lassila et al., 2013, Cui et al., 2015, Chen and Schwab, 2016, Chen and Ghattas, 2019]...

3

multilevel/multifidelity (MCMC, stochastic collocation) [Dodwell et. al., 2015, Teckentrup et. al., 2015, Scheichl et. al., 2017,

  • Peherstorfer. et. al., 2018, Farcas et. al., 2019]...

Aim for this talk: Fast and scalable Bayesian inference in high dimensions by exploiting intrinsic low-dimensionality in both parameter and state spaces, using projected transport map in parameter space reduced basis approximation in state space

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 9 / 65

slide-20
SLIDE 20

Outline

1

Bayesian inversion

2

Stein variational methods

3

Projected Stein variational methods

4

Stein variational reduced basis methods

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 10 / 65

slide-21
SLIDE 21

Transport map

Find a transport map T : Rd → Rd, such that θ ∼ π0 → T(θ) ∼ πy, prior posterior

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 11 / 65

slide-22
SLIDE 22

Kullback–Leibler divergence

Definition: Kullback–Leibler (KL) divergence

DKL(π1|π2) =

  • Θ

π1(θ) log π1(θ) π2(θ)

  • dθ.

It measures the difference between two probability distribution DKL(π1|π2) ≥ 0, and DKL(π1|π2) = 0 if and only if π1 = π2, a.e. It is not symmetric, thus not a distance DKL(π1|π2) = DKL(π2|π1) Relation to (Shannon) information theory DKL(π1|π2) = Eθ∼π1[− log(π2)]

  • cross entropy

− Eθ∼π1[− log(π1)]

  • entropy

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 12 / 65

slide-23
SLIDE 23

Optimization for transport map

Find a transport map T : Rd → Rd, such that θ ∼ π0 → T(θ) ∼ πy, by minimizing the KL divergence min

T∈T DKL(T♯π0|πy)

⇔ min

T∈T DKL(π0|T♯πy).

T♯ is a pushforward map satisfying T♯π0(θ) = π0(T−1(θ))|det∇T−1(θ)|, T♯ is a pullback map satisfying T♯πy(θ) = πy(T(θ))|det∇T(θ)|. T is a tensor-product function space Hd = H ⊗ · · · ⊗ H.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 13 / 65

slide-24
SLIDE 24

Composition of transport map

Instead of seeking one complex (highly nonlinear) transport map T, we look for composition of a sequence of simple transport maps T = TL ◦ TL−1 ◦ · · · ◦ T1 ◦ T0, L ∈ N, perturbation of identity: Tl(θ) = I(θ) + Ql(θ), identity map I(θ) = θ perturbation map Ql : Rd → Rd

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 14 / 65

slide-25
SLIDE 25

Optimization of each transport map

At each l = 0, 1, . . . , we define πl+1 := (Tl ◦ · · · ◦ T0)♯π0 ⇐ ⇒ πl+1 = (Tl)♯πl We introduce a cost functional Jl[Q] := DKL((I + Q)♯πl|πy). (1) One step optimization of Jl(Q) w.r.t. Q leads to Tl = I + αlQl, with step size αl > 0 (learning rate, line search).

Optimization methods

Gradient descent method: steepest descent [Liu and Wang, 2016] Ql = −DJl[0]. Newton method: solve the linear system [Detommaso et al., 2018] D2Jl[0](V, Ql) = −DJl[0](V), ∀ V ∈ T .

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 15 / 65

slide-26
SLIDE 26

Optimization of each transport map

Calculus of variation

The first variation DJl[0] at Q = 0 in direction V ∈ T

DJl[0](V) := −Eπl

  • (V(θ))⊤∇θ log(πy(θ)) + trace(∇θV(θ))
  • The second variation D2Jl[0] at Q = 0 in directions V, W ∈ T

D2Jl[0](V, W) := −Eπl

  • (V(θ))⊤∇2

θ log(πy(θ)) W(θ) − trace(∇θW(θ) ∇θV(θ))

  • Recall the Bayes’ rule:

πy(θ)

posterior

= 1 π(y) π(y|θ)

likelihood

π0(θ)

prior

, which leads to ∇θ log(πy(θ)) = ∇θπy(θ) πy(θ) = ∇θ(π(y|θ)π0(θ)) π(y|θ)π0(θ) Key observation: the intractable model evidence π(y) is canceled out.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 16 / 65

slide-27
SLIDE 27

Reproducing Kernel Hilbert Space (RKHS)

T is a tensor-product function space Hd = H ⊗ · · · ⊗ H. tensor-product polynomials [El Moselhy and Marzouk, 2012, Spantini et al., 2018], radial basis/kernel functions [Liu and Wang, 2016, Detommaso et al., 2018].

Reproducing Kernel Hilbert Space H

There exists a function kθ ∈ H for every θ ∈ Θ, such that v(θ) = v, kθ ∀v ∈ H, which implies existence of kθ′ ∈ H for every θ′ ∈ Θ such that kθ′(θ) = kθ′, kθ =: k(θ, θ′) reproducing kernel Many choices: bilinear, polynomials, Bergman, radial basis functions

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 17 / 65

slide-28
SLIDE 28

N-dimensional approximation of RKHS

Gaussian kernel

k(θ, θ′) = exp

  • − 1

2h(θ − θ′)⊤M(θ − θ′)

  • .

To account for the geometry of the posterior, [Detommaso et al., 2018] M = ¯ H := Eπl

  • −∇2

θ log(πy(θ))

  • , h = d,

v.s. M = I ∈ Rd×d. Finite dimensional approximation of RKHS: Hl

N = span(kl 1(θ), . . . , kl N(θ)) ⊂ H,

where the basis functions are taken as kl

n(θ) = k(θ, θl n),

n = 1, . . . , N, where θl

n ∼ πl are particles transported from θ0 n ∼ π0 by

θl

n = (Tl ◦ · · · ◦ T0)(θ0 n),

n = 1, . . . , N.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 18 / 65

slide-29
SLIDE 29

Stein variational gradient descent (SVGD)

[Liu and Wang, 2016]

For DJl[0](V) = DJl[0], VHd, by the reproducing property DJl[0], VHd = −Eπl[∇θ log(πy(θ))k(θ, θ′) + ∇θk(θ, θ′)], V(θ′). For gradient descent, we have (by notation kl

n(θ) = k(θ, θl n))

Ql(θl

n) = −DJl[0](θl n) = Eπl

  • ∇θ log(πy(θ))kl

n(θ) + ∇θkl n(θ)

  • Sample average approximation (SAA): θl

m ∼ πl, m = 1, . . . , N

Ql(θl

n) ≈ 1

N

N

  • m=1

∇θ log(πy(θl

m))kl n(θl m) + ∇θkl n(θl m).

Particle updates by the transport map θl+1

n

= Tl(θl

n) := θl n + αl Ql(θl n),

n = 1, . . . , N.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 19 / 65

slide-30
SLIDE 30

Stein variational Newton (SVN) [Detommaso et al., 2018]

We seek Ql ∈ T l

N = (Hl N)d, where Hl N = span(kl 1(θ), . . . , kl N(θ)),

Ql(θ) =

N

  • n=1

cl

nkl n(θ),

where the coefficients cl

n ∈ Rd, with cl = (cl 1, . . . , cl N) ∈ RdN.

For the Newton system: find Ql ∈ T l

N such that

D2Jl[0](V, Ql) = −DJl[0](V), ∀ V ∈ T l

N,

which, by using the reproducing property, becomes Hcl = −gl, gradient: gl = (gl

1, . . . , gl N) ∈ RdN, Hessian: H ∈ RdN×dN.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 20 / 65

slide-31
SLIDE 31

Stein variational Newton (SVN) [Detommaso et al., 2018]

The gradient gl = (gl

1, . . . , gl N) ∈ RdN, with gl m ∈ Rd given by

gl

m = − 1

N

N

  • i=1

∇θ log(πy(θl

i))kl m(θl i) + ∇θkl m(θl i)

The Hessian H ∈ RdN×dN: with Hmn ∈ Rd×d given by Hmn = 1 N

N

  • i=1

−∇2

θ log(πy(θl i))kl m(θl i)kl n(θl i) + ∇θkl m(θl i)(∇θkl n(θl i))⊤.

Decouple dN × dN system to N systems of size d × d ¯ Hmcl

m = −gl m,

m = 1, . . . , N, with diagonal approximation ¯ Hm = 1 N

N

  • i=1

−∇2

θ log(πy(θl i))kl m(θl i)kl m(θl i) + ∇θkl m(θl i)(∇θkl m(θl i))⊤.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 21 / 65

slide-32
SLIDE 32

SVGD vs SVN with M = I vs M = ¯ H

  • G. Detommaso, T. Cui, Y. Marzouk, A. Spantini, R. Scheichl. A Stein variational

Newton method. NeurIPS, 2018.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 22 / 65

slide-33
SLIDE 33

Outline

1

Bayesian inversion

2

Stein variational methods

3

Projected Stein variational methods

4

Stein variational reduced basis methods

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 23 / 65

slide-34
SLIDE 34

Computational challenges in high dimensions

Curse of dimensionality: d ≫ 1

The number N of basis functions grows rapidly (exponentially) w.r.t. the dimension d to achieve map representation with required accuracy.

4 5 6 7 8 9 10 Log2(Dimension) 0.8 0.6 0.4 0.2 0.0 0.2 Log10(RMSE of variance) SVGD SVN pSVN

P . Chen, K. Wu, J. Chen, T. O’Leary-Roseberry, O. Ghattas. Projected Stein variational Newton: A fast and scalable Bayesian inference method in high

  • dimensions. NeurIPS, 2019.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 24 / 65

slide-35
SLIDE 35

Intrinsic low dimensionality

The posterior = the prior in a low-dimensional subspace. high correlation in different dimensions; forward map is smoothing/regularizing; parameters are anistropic, e.g., KL expansion.

100 101 102 103

j

10-6 10-4 10-2 100 102

λj

j−2(α = 2) prior, case 1 posterior, case 1 rearranged, case 1 prior, case 2 posterior, case 2 rearranged, case 2 posterior, case 3 rearranged, case 3

P . Chen, U. Villa, O. Ghattas. Hessian-based adaptive sparse quadrature for infinite-dimensional Bayesian inverse problems. CMAME, 2017.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 25 / 65

slide-36
SLIDE 36

Projection [Constantine et. al. 2014, Cui et. al. 2014]

We denote a basis of the subspace of dimension r ≪ d as Ψ = (ψ1, . . . , ψr) ∈ Rd×r. We project θ to the low-dimensional subspace as θr =

r

  • i=1

ψiψ⊤

i θ = Ψw.

As a result, we consider the projected posterior πr

y(θ) =

1 πr(y) π(y|θr) π0(θ), (2) where the maginal density πr(y) = Eπ0[π(y|θr)].

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 26 / 65

slide-37
SLIDE 37

Projected Stein variational methods

By decomposition θ = θr + θ⊥, we have πr

y(θ) = π(y|θr) π0(θr) π0(θ⊥|θr).

With θ⊥ frozen, by θr = Ψw, we define p0(w) := π0(θr), py(w) := πr

y(θr) = π(y|θr)π0(θr).

We seek T = TL ◦ TL−1 ◦ · · · ◦ T1 ◦ T0 : Rr → Rr, such that min

T∈T DKL(T♯p0|py).

Apply SVGD/SVN in Rr for w, pSVGD/pSVN where ∇w log(py(w)) = Ψ⊤∇θ log(πr

y(θr)),

and ∇2

w log(py(w)) = Ψ⊤∇2 θ log(πr y(θr))Ψ.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 27 / 65

slide-38
SLIDE 38

Basis construction

The basis functions Ψ for projection are obtained by Hψi = λiC−1

0 ψi,

i = 1, . . . , r, which corresponds to the r largest (in | · |) eigenvalues, i.e., |λ1| ≥ · · · ≥ |λr|. C0: prior covariance. With ηy(θ) = − log(π(y|θ)) Gradient-based subspace: H = Eπ

  • ∇θηy(θ)(∇θηy(θ))⊤

. Hessian-based subspace: H = Eπ

  • ∇2

θηy(θ)

  • .

Choice of the density π: density at step l, i.e., πl.

10 20 30 40 i 1 1 2 3 4 Log10(|

i|)

# dimensions=289 # dimensions=1,089 # dimensions=4,225 # dimensions=16,641 Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 28 / 65

slide-39
SLIDE 39

Algorithm 1 pSVN in parallel using MPI

1: Input: N prior samples, θ0

1, . . . , θ0 N, in each of K cores.

2: Output: posterior samples θy

1, . . . , θy N in each core.

3: Perform projection to get θn = θr

n + θ⊥ n and the samples wl−1 n

.

4: Perform MPI Allgather for wl−1

n

, n = 1, . . . , M.

5: repeat 6:

Compute the gradient and Hessian.

7:

Perform MPI Allgather for the gradient and Hessian.

8:

Compute the kernel and its gradient.

9:

Assemble and solve Newton system for c1, . . . , cN.

10:

Perform a line search to get wl

1, . . . , wl N.

11:

Perform MPI Allgather for wl

n, n = 1, . . . , N.

12:

Update the samples θr

n = Ψwl n + ¯

θ, n = 1, . . . , N.

13:

Set l ← l + 1.

14: until A stopping criterion is met. 15: Reconstruct samples θy

n = θr n + θ⊥ n , n = 1, . . . , N.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 29 / 65

slide-40
SLIDE 40

Algorithm 2 Adaptive pSVN

1: Input: N prior samples, θ0

1, . . . , θ0 N, in each of K cores.

2: Output: posterior samples θy

1, . . . , θy N in each core.

3: Set level l2 = 1, θl2−1

n

= θ0

n, n = 1, . . . , N.

4: repeat 5:

Perform the eigendecomposition and form the bases Ψl2.

6:

Apply Algorithm pSVN to update the samples [θl2

1 , . . . , θl2 N] = pSVN([θl2−1 1

, . . . , θl2−1

N

], K, Ψl2).

7:

Set l2 ← l2 + 1.

8: until A stopping criterion is met.

Advantages:

Avoids/alleviates the curse of dimensionality. Largely reduces computational cost with r ≪ d. Converges faster in low-dimensional space. Parallel computation with reduced communication.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 30 / 65

slide-41
SLIDE 41

Projection error estimates

Assumption

Assume that the parameter-to-observable map O satisfies:

1

There exists a constant CO > 0 such that Eπ0[||O(·)||Γ] ≤ CO.

2

For every b > 0, there exists a constant Cb > 0 such that ||O(θ1) − O(θ2)||Γ ≤ Cb||θ1 − θ2||Θ, for max{||θ1||X, ||θ2||Θ} < b.

Theorem

Under Assumption 1, for Hessian-based projection, we have

DKL(πy | πr

y) ≤ C||θ − θr||Θ,

C independent of r. For gradient-based projection, based on a result in [Zahm et. al., 2018], we obtain (with C independent of r)

DKL(πy | πr

y) ≤ C d

  • i=r+1

λi.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 31 / 65

slide-42
SLIDE 42

Numerical results: Accuracy

We first consider a linear parameter-to-observable map O(θ) = Aθ, A = O(Bθ), B = (L + M)−1, where L and M are the discrete Laplacian and mass matrices in the PDE model −△u + u = θ, in (0, 1), u(0) = 0, u(1) = 1. Gaussian prior N(0, C0), C0 is discretized from (I − 0.1△)−1 with Laplace operator △.

4 5 6 7 8 9 10 Log2(Dimension) 0.8 0.6 0.4 0.2 0.0 0.2 Log10(RMSE of variance) SVGD SVN pSVN 2 4 6 8 # iterations 1.5 1.0 0.5 0.0 Log10(RMSE of variance) SVGD N=32 SVN N=32 pSVN N=32 SVGD N=512 SVN N=512 pSVN N=512

Decay of the RMSE of the L2 of pointwise variance of the parameter w.r.t. dimension d = 16, 64, 256, 1024 with N = 128 samples (left), and with N = 32, and 512 samples in parameter dimension d = 256 w.r.t. # iterations (right).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 32 / 65

slide-43
SLIDE 43

Numerical results: Accuracy

We consider a nonlinear Bayesian inverse problem with O(θ) = O(S(θ)), u = S(θ), −∇ · (eθ∇u) = 0, in (0, 1)2 Gaussian prior N(0, C0), where C0 is a discretization of (I − 0.1△)−2. We test the accuracy against a dimension-independent likelihood informed (DILI) MCMC method with 10,000 samples as reference.

5 10 15 # iterations 1.2 1.0 0.8 0.6 0.4 0.2 0.0 Log10(RMSE of mean) SVN 32 pSVN 32 SVN 512 pSVN 512 5 10 15 # iterations 1.0 0.8 0.6 0.4 0.2 0.0 Log10(RMSE of variance) SVN 32 pSVN 32 SVN 512 pSVN 512

Decay of the RMSE of the L2 of the mean (left) and pointwise variance (right)

  • f the parameter with dimension d = 1089 and N = 32 and 512 samples.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 33 / 65

slide-44
SLIDE 44

Numerical results: Scalability

We consider a nonlinear Bayesian inverse problem with O(θ) = O(S(θ)), u = S(θ), −∇ · (eθ∇u) = 0, in (0, 1)2 Gaussian prior N(0, C0), where C0 is a discretization of (I − 0.1△)−2. We test the accuracy against a dimension-independent likelihood informed (DILI) MCMC method with 10,000 samples as reference.

2 4 6 8 # iterations 2.0 1.5 1.0 0.5 0.0 Log10(step norm) # samples=8 # samples=32 # samples=128 # samples=512 1 2 3 4 5 Log2(# processor cores) 2 2 4 6 8 10 Log2(time (s)) total variation kernel solve sample O(N

1)

Left: Decay of the averaged norm of the update wl+1 − wl w.r.t. the iteration number l, with increasing number of samples. Right: Decay of the wall clock time of different computational components w.r.t. increasing # cores.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 34 / 65

slide-45
SLIDE 45

Summary

Take away message: SVN provides good samples for complex posterior. pSVN is scalable to address the curse-of-dimensionality. Ongoing: Bayesian optimal experimental design with Keyi Wu. Triangular map and data assimilation with Joshua Chen. Deep learning for transport map with Tom O’Leary-Roseberry. Gravitational wave inversion with Bassel Saleh, Alex Leviyev. Integration with model reduction with Zihang Zhang. Convergence analysis w.r.t. # particles, parameter dimensions. Multilevel parallel implementation w.r.t. particles and PDE solves.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 35 / 65

slide-46
SLIDE 46

Outline

1

Bayesian inversion

2

Stein variational methods

3

Projected Stein variational methods

4

Stein variational reduced basis methods

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 36 / 65

slide-47
SLIDE 47

PDE-constrained Bayesian inversion

We have the data model y = B(u(θ)) + ξ where u is the solution of the PDE (in weak form) A(u(θ), v; θ) = F(v) v ∈ V B : V → Y is a vector of observational functionals. Examples: linear diffusion, elasiticity, Stokes flow, acoustic, etc., −∇ · (κ(θ)∇u) = f, in D, with suitable boundary conditions, which leads to A(u, v; θ) =

  • D

κ(x, θ)∇u(x, θ) · ∇v(x)dx, F(v) =

  • D

f(x)v(x)dx. With Gaussian noise ξ ∈ N(0, Γ), we define the potential ηy(θ) := 1 2(y − B(u(θ)))TΓ−1(y − B(u(θ))) ⇒ π(y|θ) = log(−ηy(θ)).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 37 / 65

slide-48
SLIDE 48

High-fidelity approximation of the potential ηy

E.g. finite element, we consider: find uh ∈ Vh ⊂ V such that A(uh, vh, θ) = F(vh) ∀vh ∈ Vh. (3) Then the data model is given by y = B(uh(θ)) + ξ, then for ξ ∼ N(0, Γ) the likelihood function is given by π(y|θ) = exp(−ηy(uh(θ))), where the potential ηy(uh(θ)) (nonlinear w.r.t. uh) ηy(uh(θ)) = 1 2(y − B(uh(θ)))TΓ−1(y − B(uh(θ))). For SVGD, and the projected SVGD, we also need −∇θ log(πy(θ)) = ∇θηy(uh(θ)) + ∇θπ0(θ) π0(θ) .

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 38 / 65

slide-49
SLIDE 49

High-fidelity approximation of the gradient ∇θηy

We form a Lagrangian L(uh, ph, θ) = ηy(uh) + A(uh, ph, θ) − F(ph), ∂vLwh = 0 to obtain the adjoint ph, i.e., find ph ∈ V such that A(wh, ph; θ) = −∂uηy|uh(wh) ∀wh ∈ Vh, (4) where ∂uηy|uh(wh) = −B(wh)TΓ−1(y − B(uh)). Then the gradient is given by ∇θηy(uh(θ)) = ∂θL(uh, ph; θ) = ∂θA(uh, ph, θ).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 39 / 65

slide-50
SLIDE 50

Model reduction

High-fidelity approximation

Finite element space Vh, dim(Vh) = Nh Given θ, find uh ∈ Vh s.t. A(uh, vh; θ) = F(vh) ∀vh ∈ Vh The algebraic system is Ah(θ)uh = f h V = [ψ1, . . . , ψN] VTuh = uN VTAh(θ)V = AN(θ) VTf h = f N

Reduced basis approximation

Reduced basis space VN ⊂ Vh, dim(VN) = N Given θ, find uN ∈ VN s.t. A(uN, vN; θ) = F(vN) ∀vN ∈ VN The algebraic system is AN(θ)uN = f N

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 40 / 65

slide-51
SLIDE 51

Model reduction: Building blocks

POD/SVD

Training samples Ξt = {θn, n = 1, . . . , Nt} Compute snapshots U = [uh(θ1), . . . , uh(θNt)] Perform SVD U = VΣWT Extract bases V[1 : N, :] N = argminn En(Σ) ≥ 1 − ε

Greedy algorithm

Training samples Ξt = {θn, n = 1, . . . , Nt} Initialize VN for N = 1 as VN = span{uh(θ1)} Pick next sample such that θN+1 = argmaxθ∈Ξt ∆N(θ) Update bases VN+1 as VN ⊕ span{uh(θN+1)}

Offline-Online

Affine assumption/approx. A = Q

q=1 θq(θ)Aq

Offline computation once Aq

N = VTAq hV

Online assemble AN(θ) = Q

q=1 θq(θ)Aq N

Online solve AN(θ)uN = f N

Goal-oriented a-posteriori error estimate ∆N(θ) – dual weighted residual

∆N(θ) = A(uN, pN, θ) − F(pN)

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 41 / 65

slide-52
SLIDE 52

Reduced basis approximation of the potential ηy

RB approximation for the adjoint problem: find pN ∈ WN s.t. A(wN, pN, θ) = −∂uηy|uN(wN) ∀wN ∈ WN. The goal-oriented a-posterior error estimate is given by ∆N(θ) = A(uN, pN, θ) − F(pN). RB approximation for the potential ηy(θ): ηy,N(θ) = ηy(uN(θ)). Dual-weighted residual correction: η∆

y,N(θ) = ηy,N(θ) + ∆N(θ).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 42 / 65

slide-53
SLIDE 53

Reduced basis approximation of the gradient ∇θηy

With the RB state uN and adjoint pN, the gradient is given by ∇θηy(uN(θ)) = ∂θA(uN, pN; θ). For the modified potential η∆

y,N(θ), we form the Lagrangian

L(uN, pN, ˆ uN, ˆ pN; θ) = η∆

y,N(θ) + A(uN, ˆ

uN; θ) − F(ˆ uN) + A(ˆ pN, pN; θ) + ∇uηy|uN(ˆ pN), and solve the variational problem: find ˆ pN ∈ WN A(ˆ pN, wN; θ) = F(wN) − A(uN, wN; θ), ∀wN ∈ WN, and the variational problem: find ˆ uN ∈ VN A(vN, ˆ uN; θ) = −A(vN, pN; θ) − ∂uηy|uN(vN) − ∇2

uηy|uN(ˆ

pN, vN), ∀vN ∈ VN, which leads to the gradient ∇θη∆

y,N(θ) = ∂θL(uN, pN, ˆ

uN, ˆ pN; θ).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 43 / 65

slide-54
SLIDE 54

Error estimates for the state and adjoint

Assumption: Well-posedness

The bilinear form A(·, ·; θ) : V × V → R and linear form F(·) : V → R satisfy A1 At any θ ∈ Θ, there exist a coercivity constant α(θ) > 0 and a continuity constant γ(θ) > 0 such that α(θ)||w||2

V ≤ A(w, w; θ) and A(w, v; θ) ≤ γ(θ)||w||V||v||V, ∀w, v ∈ V.

The linear functional F(·) : V → R is bounded with norm ||F(·)||V′ < ∞. A2 Moreover, A(·, ·; θ) is continuously differentiable w.r.t. θ at every θ ∈ Θ, and for each j = 1, . . . , d, there exists ρj(θ) < ∞ such that ∂θjA(w, v; θ) ≤ ρj(θ)||w||V||v||V, ∀w, v ∈ V.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 44 / 65

slide-55
SLIDE 55

Error estimates for the state u and adjoint p

Let eu

r(θ) and ep r(θ) denote the RB state and adjoint errors

eu

r(θ) := uh(θ) − uN(θ),

ep

r(θ) := ph(θ) − pN(θ).

Let Ru(uN, ·; θ) denotes the residual of the state equation Ru(uN, vh; θ) = A(uN, vh; θ) − F(vh; θ) ∀vh ∈ Vh, and Rp(pN, ·; θ) denotes the residual of the adjoint equation Rp(wh, pN; θ) = A(wh, pN; θ) + ∇uηy|uN(wh) ∀wh ∈ Vh.

Lemma: Error estimates for the state u and adjoint p

Under the well-posedness assumption, for any θ ∈ Θ, there holds ||eu

r(θ)||V ≤

1 α(θ)||Ru(uN, ·; θ)||V′, and ||ep

r(θ)||V ≤

1 α(θ)||Rp(·, pN; θ)||V′ + CO α(θ)||eu

r(θ)||V.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 45 / 65

slide-56
SLIDE 56

Error estimates for the potential ηy and gradient ∇θηy

Lemma: Error esstimates for ηy,N(θ) and η∆

y,N(θ).

There exists constant C(θ) > 0 for each θ ∈ Θ, independent of N, s.t. |eη

r (θ)| := |ηy(θ) − ηy,N(θ)| ≤ C(θ)||eu r(θ)||V.

There exists constant C1(θ) > 0 for each θ ∈ Θ, independent of N, s.t. |e∆

r (θ)| := |ηy(θ) − η∆ y,N(θ)| ≤ C||eu r(θ)||V(||eu r(θ)||V + ||ep r(θ)||V).

Lemma: Error esstimates for ∇θηy,N(θ) and ∇θη∆

y,N(θ)

There exist C1(θ), C2(θ) > 0 for each θ ∈ Θ, independent of N, s.t. ||∇θeη

r (θ)||1 ≤ C1(θ)||∇θeu r(θ)||Vd + C2(θ)||∇θuN(θ)||Vd||eu r(θ)||V.

There exist C1(θ), , C2(θ), C3(θ), C4(θ) > 0, independent of N, such that ||∇θe∆

r (θ)||1 ≤ C1||∇θeu r(θ)||Vd||ep r(θ)||V + C2||∇θep r(θ)||Vd||eu r(θ)||V

+ C3||eu

r(θ)||V||ep r(θ)||V + C4||∇θeu r(θ)||Vd||eu r(θ)||V.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 46 / 65

slide-57
SLIDE 57

Error estimates for the posterior πy

Theorem: Error estimates for the posterior πy

DKL(πh

y|πr y) ≤ Eπh

y [|eη

r |] + Eπh

y[| exp(eη

r ) − 1|],

and DKL(πh

y|π∆ y ) ≤ Eπh

y

  • |e∆

r |

  • + Eπh

y[| exp(e∆

r ) − 1|].

Corollary: Error estimates for the posterior πy

Let Θ1 =: {θ ∈ Θ : eη

r (θ) < 1}, if

Eπh

y(Θ\Θ1)[| exp(eη

r ) − 1|] < δEπh

y [|eη

r |]

for some constant δ > 0, we have DKL(πh

y|πr y) ≤ (3 + δ)Eπh

y [|eη

r |] .

The same holds for DKL(πh

y|π∆ y ) ≤ (3 + δ)Eπh

y

  • |e∆

r |

  • .

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 47 / 65

slide-58
SLIDE 58

Algorithm 3 Adaptive greedy algorithm with Stein samples

1: Input: samples θ0

m ∼ π0, m = 1, . . . , M, tolerance ε0 r, update step k.

2: Output: Stein samples θm, m = 1, . . . , M. 3: Initialization: at θ = θ0

1, solve the high-fidelity state and adjoint prob-

lems for uh and ph, set Vr = span{uh} and Wr = span{ph}, compute the reduced matrices and vectors for once.

4: while at step l = 0, k, 2k, . . . , of the SVGD algorithm do 5:

Compute the error indicator △N(θl

m) for m = 1, . . . , M.

6:

while maxm=1,...,M |△N(θl

m)| > εl r do

7:

Choose θ = argmaxθl

m,m=1,...,M |△N(xl

m)|.

8:

Solve the high-fidelity problems for uh and ph at θ.

9:

Enrich the spaces Vr = Vr span{uh}, Wr = Wr span{ph}.

10:

Compute all the reduced matrices and vectors for once.

11:

Compute the error indicator △N(θl

m) for m = 1, . . . , M.

12:

end while

13:

Perform SVGD update with RB approximations.

14:

Update the tolerance εl

r according to gradient in SVGD algorithm.

15: end while

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 48 / 65

slide-59
SLIDE 59

Numerical example

We consider the diffusion problem −∇ · (a(θ, x)∇u) = f(x), x ∈ D = (0, 1)2, where f = 1, the coefficient

a(θ, x) = 5 +

  • 1≤i+j≤4

1

  • i2 + j2 θi,j cos(iπx1) cos(jπx2).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 49 / 65

slide-60
SLIDE 60

Numerical results: Comparison

(a) Samples at step l = 0, 9, 99 (b) marginal posterior Figure: Comparison of (128) sample distribution driven by SVGD high-fidelity approximation (blue) and reduced basis approximation (red).

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 50 / 65

slide-61
SLIDE 61

Numerical results: Accuracy

(a) adaptive construction ηy (b) adaptive construction ∇θηy (c) fixed construction ηy (d) fixed construction ∇θηy

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 51 / 65

slide-62
SLIDE 62

Numerical results: Adaptive greedy algorithm

Figure: Tolerances for adaptive greedy algorithm (left); # reduced basis functions for difference initial tolerances (right)

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 52 / 65

slide-63
SLIDE 63

Numerical results: Cost

FE adaptive RB fixed RB initial tolerance ε0

r

n/a 1 0.1 0.01 0.00001 M = 64 DOF (Nh, Nr) 16641 20 31 49 62 time to build RB n/a 4.4 7.1 12.2 15.8 time for evaluation 1.8 × 103 4.4 4.8 5.8 7.3 speedup factor 1 203 148 98 62 M=128 DOF (Nh, Nr) 16641 19 30 53 87 time to build RB n/a 4.5 7.3 14.3 26.3 time for evaluation 3.5 × 103 8.3 9.5 11.8 19.2 speedup factor 1 267 212 137 78 Table: Comparison of high fidelity and reduced basis approximations on degrees of freedom (DOF), CPU time for different tolerances and # samples

P . Chen, O. Ghattas. Stein variational reduced basis Bayesian inversion, 2019.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 53 / 65

slide-64
SLIDE 64

Summary

Take away message: Reduced basis methods reduce the computational cost while preserving physical structure with certified accuracy. Leverage goal-oriented adaptive construction of RB. Ongoing: RB for SVN. Parameter and state reduction by projected SV + RB. Extension to nonlinear and nonaffine problems.

Thank you for your attention!

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 54 / 65

slide-65
SLIDE 65

References I

Beskos, A., Girolami, M., Lan, S., Farrell, P . E., and Stuart, A. M. (2017). Geometric mcmc for infinite-dimensional inverse problems. Journal of Computational Physics, 335:327 – 351. Binev, P ., Cohen, A., Dahmen, W., DeVore, R., Petrova, G., and Wojtaszczyk, P . (2011). Convergence rates for greedy algorithms in reduced basis methods. SIAM Journal on Mathematical Analysis, 43(3):1457–1472. Bui-Thanh, T., Ghattas, O., Martin, J., and Stadler, G. (2013). A computational framework for infinite-dimensional bayesian inverse problems part I: The linearized case, with application to global seismic inversion. SIAM Journal on Scientific Computing, 35(6):A2494–A2523.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 55 / 65

slide-66
SLIDE 66

References II

Bui-Thanh, T. and Girolami, M. (2014). Solving large-scale pde-constrained bayesian inverse problems with riemann manifold hamiltonian monte carlo. Inverse Problems, 30(11):114014. Chen, P . and Ghattas, O. (2019). Stein variational reduced basis Bayesian inversion. in preparation. Chen, P . and Schwab, C. (2016). Sparse-grid, reduced-basis Bayesian inversion: Nonaffine-parametric nonlinear equations. Journal of Computational Physics, 316:470 – 503.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 56 / 65

slide-67
SLIDE 67

References III

Chen, P ., Villa, U., and Ghattas, O. (2017). Hessian-based adaptive sparse quadrature for infinite-dimensional Bayesian inverse problems. Computer Methods in Applied Mechanics and Engineering, 327:147–172. Chen, P ., Wu, K., Chen, J., O’Leary-Roseberry, T., and Ghattas, O. (2019). Projected stein variational Newton: A fast and scalable Bayesian inference method in high dimensions. arXiv preprint arXiv:1901.08659. Cui, T., Marzouk, Y., and Willcox, K. (2015). Data-driven model reduction for the bayesian solution of inverse problems. International Journal for Numerical Methods in Engineering, 102(5):966–990.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 57 / 65

slide-68
SLIDE 68

References IV

Detommaso, G., Cui, T., Marzouk, Y., Spantini, A., and Scheichl,

  • R. (2018).

A stein variational Newton method. In Advances in Neural Information Processing Systems, pages 9187–9197. DeVore, R., Petrova, G., and Wojtaszczyk, P . (2012). Greedy algorithms for reduced bases in Banach spaces. Arxiv preprint arXiv:1204.2290. El Moselhy, T. and Marzouk, Y. (2012). Bayesian inference with optimal maps. Journal of Computational Physics.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 58 / 65

slide-69
SLIDE 69

References V

Farcas, I.-G., Latz, J., Ullmann, E., Neckel, T., and Bunagrtz, H.-J. (2019). Multilevel adaptive sparse Leja approximations for Bayesian inverse problems. arXiv preprint arXiv:1904.12204. Gantner, R. N. and Schwab, C. (2016). Computational higher order quasi-Monte Carlo integration. In Monte Carlo and Quasi-Monte Carlo Methods, pages 271–288. Springer. Girolami, M. and Calderhead, B. (2011). Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 59 / 65

slide-70
SLIDE 70

References VI

Lan, S., Bui-Thanh, T., Christie, M., and Girolami, M. (2016). Emulation of higher-order tensors in manifold monte carlo methods for bayesian inverse problems. Journal of Computational Physics, 308:81–101. Lassila, T., Manzoni, A., Quarteroni, A., and Rozza, G. (2013). A reduced computational and geometrical framework for inverse problems in hemodynamics. International journal for numerical methods in biomedical engineering, 29(7):741–776. Lieberman, C., Willcox, K., and Ghattas, O. (2010). Parameter and state model reduction for large-scale statistical inverse problems. SIAM Journal on Scientific Computing, 32(5):2523–2542.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 60 / 65

slide-71
SLIDE 71

References VII

Liu, Q. and Wang, D. (2016). Stein variational gradient descent: A general purpose Bayesian inference algorithm. In Advances In Neural Information Processing Systems, pages 2378–2386. Martin, J., Wilcox, L., Burstedde, C., and Ghattas, O. (2012). A stochastic Newton MCMC method for large-scale statistical inverse problems with application to seismic inversion. SIAM Journal on Scientific Computing, 34(3):A1460–A1487. Marzouk, Y., Najm, H., and Rahn, L. (2007). Stochastic spectral methods for efficient bayesian solution of inverse problems. Journal of Computational Physics, 224(2):560–586.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 61 / 65

slide-72
SLIDE 72

References VIII

Marzouk, Y. and Xiu, D. (2009). A stochastic collocation approach to Bayesian inference in inverse problems. Communications in Computational Physics, 6(4):826–847. Nguyen, C., Rozza, G., Huynh, D., and Patera, A. (2010). Reduced basis approximation and a posteriori error estimation for parametrized parabolic pdes; application to real-time bayesian parameter estimation. Technical report, John Wiley & Sons. Oliver, D. S. (2017). Metropolized randomized maximum likelihood for improved sampling from multimodal distributions. SIAM/ASA Journal on Uncertainty Quantification, 5(1):259–277.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 62 / 65

slide-73
SLIDE 73

References IX

Rezende, D. J. and Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. Schillings, C. and Schwab, C. (2013). Sparse, adaptive smolyak quadratures for bayesian inverse problems. Inverse Problems, 29(6). Schillings, C., Sprungk, B., and Wacker, P . (2019). On the convergence of the Laplace approximation and noise-level-robustness of Laplace-based Monte Carlo methods for Bayesian inverse problems. arXiv preprint arXiv:1901.03958.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 63 / 65

slide-74
SLIDE 74

References X

Schwab, C. and Stuart, A. (2012). Sparse deterministic approximation of bayesian inverse problems. Inverse Problems, 28(4):045003. Spantini, A., Bigoni, D., and Marzouk, Y. (2018). Inference via low-dimensional couplings. The Journal of Machine Learning Research, 19(1):2639–2709. Stuart, A., Voss, J., and Wilberg, P . (2004). Conditional path sampling of sdes and the langevin mcmc method. Communications in Mathematical Sciences, 2(4):685–697. Wang, J. and Zabaras, N. (2005). Using Bayesian statistics in the estimation of heat source in radiation. International journal of heat and mass transfer, 48(1):15–29.

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 64 / 65

slide-75
SLIDE 75

References XI

Wang, K., Bui-Thanh, T., and Ghattas, O. (2018). A randomized maximum a posteriori method for posterior sampling

  • f high dimensional nonlinear Bayesian inverse problems.

SIAM Journal on Scientific Computing, 40(1):A142–A171. Wang, Z., Cui, T., Bardsley, J., and Marzouk, Y. (2019). Scalable optimization-based sampling on function space. arXiv preprint arXiv:1903.00870.

Thank you for your attention!

Peng Chen (Oden Institute, UT Austin) pSVN & RB for Bayesian inversion November 11, 2019 65 / 65