Transport methods for sampling: low-dimensional structure and - - PowerPoint PPT Presentation

transport methods for sampling low dimensional structure
SMART_READER_LITE
LIVE PREVIEW

Transport methods for sampling: low-dimensional structure and - - PowerPoint PPT Presentation

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm Department of Aeronautics and Astronautics Center for


slide-1
SLIDE 1

Transport methods for sampling: low-dimensional structure and preconditioning

Youssef Marzouk joint work with Daniele Bigoni, Matthew Parno, Alessio Spantini, & Olivier Zahm

Department of Aeronautics and Astronautics Center for Computational Engineering Statistics and Data Science Center Massachusetts Institute of Technology http://uqgroup.mit.edu

Support from AFOSR, DARPA, DOE

12 July 2019

Marzouk et al. MIT 1 / 40

slide-2
SLIDE 2

Motivation: Bayesian inference in large-scale models

Observations y Parameters x

πpos(x) := π(x|y) ∝ π(y|x)πpr(x)

  • Bayes’ rule

◮ Characterize the posterior distribution (density πpos) ◮ This is a challenging task since:

◮ x ∈ Rn is typically high-dimensional (e.g., a discretized function) ◮ πpos is non-Gaussian ◮ evaluations of the likelihood (hence πpos) may be expensive

◮ πpos can be evaluated up to a normalizing constant

Marzouk et al. MIT 2 / 40

slide-3
SLIDE 3

Computational challenges

◮ Extract information from the posterior (means, covariances, event

probabilities, predictions) by evaluating posterior expectations: Eπpos[h(x)] =

  • h(x)πpos(x)dx

◮ Key strategy for making this computationally tractable:

◮ Efficient and structure-exploiting sampling schemes Marzouk et al. MIT 3 / 40

slide-4
SLIDE 4

Computational challenges

◮ Extract information from the posterior (means, covariances, event

probabilities, predictions) by evaluating posterior expectations: Eπpos[h(x)] =

  • h(x)πpos(x)dx

◮ Key strategy for making this computationally tractable:

◮ Efficient and structure-exploiting sampling schemes ◮ This talk: relate to notions of coupling and transport. . . Marzouk et al. MIT 3 / 40

slide-5
SLIDE 5

Deterministic couplings of probability measures

T η π

Core idea

◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : Rn → Rn such that T♯η = π

Marzouk et al. MIT 4 / 40

slide-6
SLIDE 6

Deterministic couplings of probability measures

S =T −1 η π

Core idea

◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : Rn → Rn such that T♯η = π ◮ Equivalently, find S = T −1 such that S♯π = η

Marzouk et al. MIT 4 / 40

slide-7
SLIDE 7

Deterministic couplings of probability measures

T η π

Core idea

◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : Rn → Rn such that T♯η = π ◮ Equivalently, find S = T −1 such that S♯π = η ◮ In principle, enables exact (independent, unweighted) sampling!

Marzouk et al. MIT 4 / 40

slide-8
SLIDE 8

Deterministic couplings of probability measures

T η π

Core idea

◮ Choose a reference distribution η (e.g., standard Gaussian) ◮ Seek a transport map T : Rn → Rn such that T♯η = π ◮ Equivalently, find S = T −1 such that S♯π = η ◮ Satisfying these conditions only approximately can still be useful!

Marzouk et al. MIT 4 / 40

slide-9
SLIDE 9

Choice of transport map

A useful building block is the Knothe–Rosenblatt rearrangement: T(x) =      T 1(x1) T 2(x1, x2) . . . T n(x1, x2, . . . , xn)     

◮ Unique triangular and monotone map satisfying T♯η = π for

absolutely continuous η, π on Rn

◮ Jacobian determinant easy to evaluate ◮ Monotonicity is essentially one-dimensional: ∂xkT k > 0 ◮ “Exposes” marginals, enables conditional sampling. . .

Marzouk et al. MIT 5 / 40

slide-10
SLIDE 10

Choice of transport map

A useful building block is the Knothe–Rosenblatt rearrangement: T(x) =      T 1(x1) T 2(x1, x2) . . . T n(x1, x2, . . . , xn)     

◮ Unique triangular and monotone map satisfying T♯η = π for

absolutely continuous η, π on Rn

◮ Jacobian determinant easy to evaluate ◮ Monotonicity is essentially one-dimensional: ∂xkT k > 0 ◮ “Exposes” marginals, enables conditional sampling. . . ◮ Numerical approximations can employ a monotone parameterization

guaranteeing ∂xkT k > 0. For example: T k(x1, . . . , xk) = ak(x1, . . . , xk−1)+ xk exp(bk(x1, . . . , xk−1, w)) dw

Marzouk et al. MIT 5 / 40

slide-11
SLIDE 11

How to construct triangular maps?

Construction #1: “maps from densities,” i.e., variational characterization of the direct map T [Moselhy & M 2012]

Marzouk et al. MIT 6 / 40

slide-12
SLIDE 12

How to construct triangular maps?

Construction #1: “maps from densities,” i.e., variational characterization of the direct map T [Moselhy & M 2012] min

T∈T h

DKL( T♯ η || π ) = min

T∈T h

DKL( η || T −1

π )

◮ π is the “target” density on Rn; η is, e.g., N(0, In) ◮ T h △ is a set of monotone lower triangular maps

◮ T h→∞

contains the Knothe–Rosenblatt rearrangement

◮ Expectation is with respect to the reference measure η

◮ Compute via, e.g., Monte Carlo, sparse quadrature

◮ Use unnormalized evaluations of π and its gradients ◮ No MCMC or importance sampling ◮ In general non-convex, unless π is log-concave

Marzouk et al. MIT 6 / 40

slide-13
SLIDE 13

Illustrative example

min

T Eη[ − log π ◦ T −

  • k

log ∂xkT k ]

◮ Parameterized map T ∈ T h △ ⊂ T△ ◮ Optimize over coefficients of

parameterization

◮ Use gradient-based optimization ◮ The posterior is in the tail of the reference

3 3 3 3 6 9 12 15 18

Marzouk et al. MIT 6 / 40

slide-14
SLIDE 14

Illustrative example

min

T Eη[ − log π ◦ T −

  • k

log ∂xkT k ]

◮ Parameterized map T ∈ T h △ ⊂ T△ ◮ Optimize over coefficients of

parameterization

◮ Use gradient-based optimization ◮ The posterior is in the tail of the reference

3 3 3 3 6 9 12 15 18

Marzouk et al. MIT 6 / 40

slide-15
SLIDE 15

Illustrative example

min

T Eη[ − log π ◦ T −

  • k

log ∂xkT k ]

◮ Parameterized map T ∈ T h △ ⊂ T△ ◮ Optimize over coefficients of

parameterization

◮ Use gradient-based optimization ◮ The posterior is in the tail of the reference

3 3 3 3 6 9 12 15 18

Marzouk et al. MIT 6 / 40

slide-16
SLIDE 16

Illustrative example

min

T Eη[ − log π ◦ T −

  • k

log ∂xkT k ]

◮ Parameterized map T ∈ T h △ ⊂ T△ ◮ Optimize over coefficients of

parameterization

◮ Use gradient-based optimization ◮ The posterior is in the tail of the reference

3 3 3 3 6 9 12 15 18

Marzouk et al. MIT 6 / 40

slide-17
SLIDE 17

Useful features

◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: xi ∼ η ⇒ T(xi) ◮ Clear convergence criterion, even with unnormalized target density:

DKL( T♯ η || π ) ≈ 1 2 Varη

  • log

η T −1

¯ π

  • Marzouk et al.

MIT 7 / 40

slide-18
SLIDE 18

Useful features

◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: xi ∼ η ⇒ T(xi) ◮ Clear convergence criterion, even with unnormalized target density:

DKL( T♯ η || π ) ≈ 1 2 Varη

  • log

η T −1

¯ π

  • ◮ Can either accept bias or reduce it by:

◮ Increasing the complexity of the map T ∈ T h

◮ Sampling the pullback T −1

π using MCMC or importance sampling (more on this later)

Marzouk et al. MIT 7 / 40

slide-19
SLIDE 19

Useful features

◮ Move samples; don’t just reweigh them ◮ Independent and cheap samples: xi ∼ η ⇒ T(xi) ◮ Clear convergence criterion, even with unnormalized target density:

DKL( T♯ η || π ) ≈ 1 2 Varη

  • log

η T −1

¯ π

  • ◮ Can either accept bias or reduce it by:

◮ Increasing the complexity of the map T ∈ T h

◮ Sampling the pullback T −1

π using MCMC or importance sampling (more on this later)

◮ Related transport constructions for inference and sampling: Stein variational

gradient descent [Liu & Wang 2016, DeTommaso 2018], normalizing flows [Rezende & Mohamed 2015], SOS polynomial flow [Jaini et al. 2019], Gibbs flow [Heng et al. 2015], particle flow filter [Reich 2011], implicit sampling [Chorin et al. 2009–2015], etc.

Marzouk et al. MIT 7 / 40

slide-20
SLIDE 20

How to construct triangular maps?

Construction #2: “maps from samples” min

S∈Sh

DKL( S♯π || η ) = min

S∈Sh

DKL( π || S−1

η )

◮ Suppose we have Monte Carlo samples {xi}M i=1 ∼ π ◮ For standard Gaussian η, this problem is convex and separable ◮ This is density estimation via transport! (cf. Tabak & Turner 2013)

Marzouk et al. MIT 8 / 40

slide-21
SLIDE 21

How to construct triangular maps?

Construction #2: “maps from samples” min

S∈Sh

DKL( S♯π || η ) = min

S∈Sh

DKL( π || S−1

η )

◮ Suppose we have Monte Carlo samples {xi}M i=1 ∼ π ◮ For standard Gaussian η, this problem is convex and separable ◮ This is density estimation via transport! (cf. Tabak & Turner 2013) ◮ Equivalent to maximum likelihood estimation of S

  • S ∈ arg max

S∈Sh

1 M

M

  • i=1

log S−1

η

pullback

(xi), η = N(0, In),

◮ Each component

Sk of S can be computed separately, via smooth convex optimization

  • Sk ∈ arg

min

Sk∈Sh

△,k

1 M

M

  • i=1

1 2Sk(xi)2 − log ∂kSk(xi)

  • Marzouk et al.

MIT 8 / 40

slide-22
SLIDE 22

Low-dimensional structure of transport maps

Underlying challenge: maps in high dimensions

◮ Major bottleneck: representation of the map, e.g., cardinality of the

map basis

◮ How to make the construction/representation of high-dimensional

transports tractable?

Marzouk et al. MIT 9 / 40

slide-23
SLIDE 23

Low-dimensional structure of transport maps

Underlying challenge: maps in high dimensions

◮ Major bottleneck: representation of the map, e.g., cardinality of the

map basis

◮ How to make the construction/representation of high-dimensional

transports tractable? Main ideas:

1

Exploit Markov structure of the target distribution

◮ Leads to sparsity and/or decomposability of transport maps [Spantini,

Bigoni, & M JMLR 2018]

2

Exploit certain low rank structure

◮ Near-identity or “lazy” maps [Bigoni et al. arXiv:1906.00031] Marzouk et al. MIT 9 / 40

slide-24
SLIDE 24

Markov random fields

◮ Let Z1, . . . , Zn be random variables with joint density π > 0

A B S

(i, j) / ∈ E iff Zi ⊥ ⊥ Zj | ZV\{i,j}

◮ G = (V, E) encodes conditional independence (an I-map for π) ◮ Theorem [SBM 2018]: Define G s.t. (i, j) /

∈ E iff ∂xi,xj log π = 0. Then the resulting G is the unique minimal I-map for π.

Marzouk et al. MIT 10 / 40

slide-25
SLIDE 25

Sparsity of transport maps

◮ Focus on the inverse triangular map S, where S♯π = η ◮ Theorem [SBM 2018]: S (a nonlinear function) inherits the same

sparsity pattern as the Cholesky factor of the incidence matrix (properly scaled) of a graphical model for π, provided that η(x) =

i η(xi)

S(x) =        S1(x1) S2(x1, x2) S3(x1, x2, x3) . . . Sn(x1, x2, . . . , xn)        = ⇒        S1(x1) S2(x1, x2) S3(x1,x2, x3) . . . Sn(x1, x2, . . . ,xn−1, xn)       

Marzouk et al. MIT 11 / 40

slide-26
SLIDE 26

How to compute the sparsity pattern

3 2 5 1 4

G5

3 2 5 1 4

G4

3 2 5 1 4

G3

5 2 4 1 3

G2

◮ Compute marginal graphs: Gi−1 is obtained from Gi by removing

node i and by turning its neighborhood into a clique (like variable elimination)

◮ Sparsity of inverse transport: the i-th

component of S can depend, at most, on the variables in a neighborhood of node i in Gi

◮ Sparsity depends on the ordering of the variables

(similar heuristics sparse Cholesky)

Pkj = ∂xjSk

Marzouk et al. MIT 12 / 40

slide-27
SLIDE 27

Decomposable transport maps

◮ Definition: a decomposable transport is a map T = T1 ◦ · · · ◦ Tk

that factorizes as the composition of finitely many maps of low effective dimension that are triangular (up to a permutation), e.g., T(x) =         A1(x1, x2, x3) B1(x2, x3) C1(x3) x4 x5 x6        

  • T1

       x1 A2(x2, x3, x4, x5) B2(x3, x4, x5) C2(x4, x5) D2(x5) x6        

  • T2

       x1 x2 x3 A3(x4) B3(x4, x5) C3(x4, x5, x6)        

  • T3

◮ Theorem [SBM 2018]: Decomposable graphical models for π lead to

decomposable direct maps T, provided that η(x) =

i η(xi)

Marzouk et al. MIT 13 / 40

slide-28
SLIDE 28

Decomposable transport maps

◮ Example graph decomposition V = (A, S, B) ◮ Effective dimension of each component map is |A ∪ S|

A B S

2 3 1 6 4 5

Marzouk et al. MIT 14 / 40

slide-29
SLIDE 29

Transport maps and graphical models Key message

◮ Inverse maps: enforce sparsity in the approximation space S△, i.e.,

in solving minS∈S△ DKL( π || S−1

η )

◮ Can also use for structure learning in non-Gaussian graphical models

[Morrison/Baptista/M NIPS 2017]

◮ Direct maps: enforce decomposable structure in the approximation

space T△, i.e., when solving minT∈T△ DKL( T♯η || π )

◮ A general tool for modeling and computation with non-Gaussian

Markov random fields

Marzouk et al. MIT 15 / 40

slide-30
SLIDE 30

Transport maps and state-space models

Z0 Z1 Z2 Z3 ZN Y0 Y1 Y2 Y3 YN

Θ

◮ In many situations, elements of the composition

T = T1 ◦ T2 ◦ · · · ◦ Tk can be constructed sequentially

◮ Yields new algorithms for smoothing and and joint state-parameter

inference in state-space models [SBM 2018; Houssineau, Jasra, Singh

2018]

◮ Recent offshoot: nonlinear ensemble filtering [arXiv:1907.00389]

Marzouk et al. MIT 16 / 40

slide-31
SLIDE 31

Remainder of the talk: two vignettes

1 Exploiting the low-dimensional structure of transport ◮ Sparsity ◮ Decomposability ◮ Low rank 2 What to do with approximate maps? ◮ Preconditioned/adaptive MCMC Marzouk et al. MIT 17 / 40

slide-32
SLIDE 32

Vignette #1: low rank structure

◮ Let U = [Ur U⊥] ∈ Rn×n be a unitary matrix, with Ur ∈ Rn×r. A

lazy map T : Rn → Rn takes the form: T(z) = Urτ(z1, . . . , zr) + U⊥z⊥ for some diffeomorphism τ : Rr → Rr.

◮ Map T ∈ Tr(U) departs from the identity only on an r-dimensional

subspace

◮ Proposition: For any lazy map T ∈ Tr(U), there exists a strictly

positive function f : Rr → R+ such that T♯η(x) = f (U⊤

r x) η(x),

for all x ∈ Rn where η = N(0, In). Conversely, any density of the form f (U⊤

r x) η(x) for some f : Rr → R+ admits a lazy map

representation.

Marzouk et al. MIT 18 / 40

slide-33
SLIDE 33

Low-rank structure

Why would such structure (approximately) appear?

◮ Bayesian inverse problems: data only partially informative; posterior

departs from the prior primarily on a low-dimensional subspace.

◮ Formalized by likelihood-informed subspace [Cui et al. 2014]; also,

active subspace [Constantine et al. 2015], and recent refinements/connections [Zahm et al. 2018].

Marzouk et al. MIT 19 / 40

slide-34
SLIDE 34

Error bound and subspace

How to find a good Ur?

◮ Define

Hπ := ∇ log π η

  • ∇ log π

η ⊤ dπ .

◮ Let (λi, ui) be the ith eigenpair of Hπ and put Ur = [u1 u2 · · · ur]. ◮ Theorem [Zahm et al. 2018]:

DKL(π||T ⋆

♯ η) ≤ 1

2(λr+1 + . . . + λd). where T ⋆

♯ η = f ⋆(U⊤ r x)η(x) and f ⋆(zr) = EX∼η

  • π(X)

η(X)|U⊤ r X = zr

  • .

◮ Good approximation when the spectrum of Hπ decays quickly ◮ Uses a ridge approximation of dπ/dη (e.g., the likelihood), with

  • ptimal profile function f ⋆

Marzouk et al. MIT 20 / 40

slide-35
SLIDE 35

Layers of lazy maps

◮ What if (λi) do not decay quickly? What if we are limited to small r? ◮ Answer: layers of lazy maps, via a greedy construction

◮ Given (π, η, r1): compute Hπ and construct a first lazy map T1 ◮ Pull back π by T1: π2 := (T −1

1 )♯π

◮ Given (π2, η, r2): compute Hπ2 and construct a next lazy map T2 . . . ◮ Generic iteration: at stage ℓ, build a lazy map to the pullback

πℓ := (T1 ◦ T2 ◦ · · · ◦ Tℓ−1)−1

♯ π

◮ Stop when 1

2 Tr(Hπℓ) < ǫ

Inputs Lazy map Outputs Rotation Rotation Lazy map Rotation

Marzouk et al. MIT 21 / 40

slide-36
SLIDE 36

Layers of lazy maps

Example: rotated “banana” target distribution, r = 1 maps

Target π T♯

T♯

T♯

T♯

T♯

Marzouk et al. MIT 22 / 40

slide-37
SLIDE 37

Example: log-Gaussian Cox process

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8

Field Λ Λ Λ⋆ and observations y⋆ Realizations of Λ Λ Λ ∼ πΛ

Λ Λ|y⋆

Marzouk et al. MIT 23 / 40

slide-38
SLIDE 38

Example: log-Gaussian Cox process

◮ Parameter dimension n = 4096, 30 observations; fixed ranks r

1 2 3 4 5 6 7 Lazy iteration 100 101 102 103

1 2Tr(H )

Lazy rank: 1 Lazy rank: 3 Lazy rank: 5

Convergence

10 20 30 Eigenvalues 10

3

10

1

101 103 Eig(H0) Eig(H1) Eig(H2) Eig(H3) Eig(H4)

Spectrum of Hπℓ

Marzouk et al. MIT 24 / 40

slide-39
SLIDE 39

Example: elliptic PDE Bayesian inverse problem

  • ∇ · (eκ(x)∇u(x)) = 0,

for x ∈ D := [0, 1]2 , u(x) = 0 for x1 = 0, u(x) = 1 for x1 = 1,

∂u(x) ∂n

= 0 for x2 ∈ {0, 1}

◮ Infer κ(x), discretized with n = 2601 parameters; 81 observations;

lazy maps of r ≤ 4 and polynomial degree up to 2

1

u(x) and observations

2 4 6 8 10 Lazy iteration 100 101 102 103 104 105

1 2Tr(H ) 1 2V[log /T

]

Convergence

3 2 1 1 2

Posterior realizations

  • f κ(x)

Marzouk et al. MIT 25 / 40

slide-40
SLIDE 40

Vignette #2: preconditioning

What to do when T♯η = π?

◮ Maybe close enough? Can evaluate variance diagnostic

Varη[log(η/T −1

¯ π)], bound Tr(HT −1

π), etc. ◮ Enrich T, e.g., add a layer or expand T h △ in the given layer ◮ Work on the pullback: treat T −1 ♯

π with an asymptotically exact scheme, e.g., Markov chain Monte Carlo

Marzouk et al. MIT 26 / 40

slide-41
SLIDE 41

Vignette #2: preconditioning

What to do when T♯η = π?

◮ Maybe close enough? Can evaluate variance diagnostic

Varη[log(η/T −1

¯ π)], bound Tr(HT −1

π), etc. ◮ Enrich T, e.g., add a layer or expand T h △ in the given layer ◮ Work on the pullback: treat T −1 ♯

π with an asymptotically exact scheme, e.g., Markov chain Monte Carlo One possible construction: transport-accelerated MCMC

◮ Transport map “preconditions” MCMC target; use MCMC iterates in

maps-from-samples construction

◮ Can be understood in the framework of adaptive MCMC

Marzouk et al. MIT 26 / 40

slide-42
SLIDE 42

Preconditioning MCMC

◮ Effective MCMC proposal = adapted to the target

◮ Can we transform proposals or, equivalently, targets for better

sampling?

Marzouk et al. MIT 27 / 40

slide-43
SLIDE 43

Recall maps-from-samples construction

◮ Suppose we have samples xi from target π

  • S ∈ arg max

S∈Sh

1 M

M

  • i=1

log S−1

η (xi), η = N(0, In),

◮ Easy to solve: convex and separable, regardless of form of π ◮ View

S♯π as the “preconditioned” target

◮ In the MCMC setting, {xi}M

i=1 comprises dependent MCMC samples

S♯π may be far from standard Gaussian for small M and/or crude Sh

Marzouk et al. MIT 28 / 40

slide-44
SLIDE 44
  • Ingredient #1: static map

– Perform MCMC in the reference space, on the “preconditioned” density – Simple proposal in reference space (e.g., random walk) corresponds to a more complex/tailored proposal on target

r θ ˜ p(r) ˜ T(θ) π(θ)

Map-accelerated MCMC

S(x)

<latexit sha1_base64="V42+yM7K01CdYDjVXVTDc/7rW6k=">ADUnicbVJNb9NAEN0kfJTQ0hSOXFZEkYrUhNgpNMcKLhw4FEjaSokV7W7G6Spr9kdIyLv4JfwxX+Ahf+CifWbkJxw0iWnubNm3meWZ4oabHf/1WrN+7cvXd/50Hz4e7eo/3WweNzq1MjYCy0uaSMwtKxjBGiQouEwMs4gou+PJNwV98BmOljke4SiCI2CKWoRQMXWrW6mbTsnELHiQ9Xv9weDVyeDIgTIcOPYHL/1h/vHwy/N81mpvGLoNvDVok3WczQ7qe9O5FmkEMQrFrJ14/QSDjBmUQkHenKYWEiaWbAETB2MWgQ2y0lJOy4zp6E27ouRltl/FRmLrF1F3FVGDK/sba5I/o+bpBgOg0zGSYoQi+tBYaoalosic6lAYFq5QATRjqvVFwxwS6VTYrUZekBXuyj4uOtSRdNjlEunIK41XfaVGlYWb6FAro0QBHX94V063gCjRUXEtV4i4zb/K0qMDsEWd2Wq+ylSuKuhIFW2soPdwouXIU18zMabEZWmxQq6oqlgJC9783KqEjR+KmuryK98I/oCiV/UaVc6aFUNQO03ebLoH5N1+Ltvg3O95g57/rh9+nr9lHbIU/KMHBKPnJBT8packTER5Cv5Rr6TH/Wf9d+NWqNxXVqvrTVPSCUau38AnrQFg=</latexit>

π(x)

<latexit sha1_base64="NtWFkOUtPAGz7XfQAbVT9/WJY=">ADKXicbVLNbhMxEHaXv7JQaOHIxSKVCQadgMSPVZw4cChoCStlKwq25lN3fhnsWcR0SrvwBVegafhBlx5EbzblLINI1kazTfzOeZ4YWSHpPk50Z07fqNm7c2b8d37m7du7+982DkbekEDIV1h1z5kFJA0OUqOC4cMA0V3DE569r/OgjOC+tGeCigEyzmZG5FAxDaDQp5O6nJyfbnaSXNEbXnXTldMjKDk92oq3J1IpSg0GhmPfjNCkwq5hDKRQs40npoWBizmYwDq5hGnxWNXKXtBsiU5pbF5B2kT/ZVRMe7/QPGRqhqf+KlYH/4eNS8z3s0qaokQw4rxRXiqKltZ/p1PpQKBaBIcJ4NWKk6ZYwLDhOJWqUGaVbW6pk6wLg0g3d/jEukgbYS3dZVONYkX1qVe6kIBHb5/23T3gCjNrEXi1s6Rcb/8SyqczcHX62Jq70PJlMQFDSkK1sbQaLhkchUgbpmb0noytJ6gVW2WkQLy8N9LlrA6gHiR3WwlfdZ/SgFr61Vt9Za1U3QBs4yjsMBpVfPZd0Z9Xvp817/3YvOwavVKW2SR+Qx2SUpeUkOyBtySIZEkDPymXwhX6Nv0foR/TrPDXaWHEekpZFv/8Aa7j70Q=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>

S]π

<latexit sha1_base64="+bZFnD6SiMRwUSrbCv8zMg7X/Io=">ADWnicbVJNb9NAEN3GfLQupSlw47IisShDXYA0WMFw4cCiRtpdiKdjfjdJW1+yOkSLv4RfwxV+ARI/hrWbUJwkqXRvHkz2+W50paDIJfOx3vzt1793f3/P0HBw8Pu0ePLqwujICx0EqbK84sKJnBGCUquMoNsJQruOSLdzV+RWMlTob4TKHOGXzTCZSMHSlafd1GTVDJmbO4zIcBE0cB5tJ9Xka2Wtmchrlsp2e2uAbifrKT2yivPpUecgmlRpJChUMzaSRjkGJfMoBQKj8qLORMLNgcJi7NWAo2LhtpFe27yowm2rgvQ9pU/2WULV2mXLXmTK8tptYXfwfNikwOY1LmeUFQiZuFiWFoqhpbRadSQMC1dIlTBjptFLhTGACnaV+a9QojMtaXTPHRZ86kJ6ecIl0FDbC27oKo5rGdfSplWmugI4/fWi2W0CU2bxF4lovkHFb/SXlRidg6/sydfKlYErikroWBVs2NBpumVw5iGtmZrR2htYOatVmZVJA4v73liV06kBcdzdXCV8MjymgGLS1pq2zlvUS1I5T+b57QOHmc9lOLoaD8OVg+PFV7+zt6intkqfkGXlOQvKGnJH35JyMiSDfyHfyg/zs/PY63p63f9Pa2VlxHpNWeE/+APaFB58=</latexit>
slide-45
SLIDE 45
  • Ingredient #1: static map

– Perform MCMC in the reference space, on the “preconditioned” density – Simple proposal in reference space (e.g., random walk) corresponds to a more complex/tailored proposal on target

α = π(S −1( ′ r ))| ∇S −1 | ′

r qr(r | ′

r ) π(S −1(r))| ∇S −1 |r qr( ′ r |r)

simple proposal qr on pushforward of target through map

Map-accelerated MCMC

r θ ˜ p(r) S(θ) π(θ) θ r r0 qr(r0|r) θ0

S]π

<latexit sha1_base64="+bZFnD6SiMRwUSrbCv8zMg7X/Io=">ADWnicbVJNb9NAEN3GfLQupSlw47IisShDXYA0WMFw4cCiRtpdiKdjfjdJW1+yOkSLv4RfwxV+ARI/hrWbUJwkqXRvHkz2+W50paDIJfOx3vzt1793f3/P0HBw8Pu0ePLqwujICx0EqbK84sKJnBGCUquMoNsJQruOSLdzV+RWMlTob4TKHOGXzTCZSMHSlafd1GTVDJmbO4zIcBE0cB5tJ9Xka2Wtmchrlsp2e2uAbifrKT2yivPpUecgmlRpJChUMzaSRjkGJfMoBQKj8qLORMLNgcJi7NWAo2LhtpFe27yowm2rgvQ9pU/2WULV2mXLXmTK8tptYXfwfNikwOY1LmeUFQiZuFiWFoqhpbRadSQMC1dIlTBjptFLhTGACnaV+a9QojMtaXTPHRZ86kJ6ecIl0FDbC27oKo5rGdfSplWmugI4/fWi2W0CU2bxF4lovkHFb/SXlRidg6/sydfKlYErikroWBVs2NBpumVw5iGtmZrR2htYOatVmZVJA4v73liV06kBcdzdXCV8MjymgGLS1pq2zlvUS1I5T+b57QOHmc9lOLoaD8OVg+PFV7+zt6intkqfkGXlOQvKGnJH35JyMiSDfyHfyg/zs/PY63p63f9Pa2VlxHpNWeE/+APaFB58=</latexit>

S(x)

<latexit sha1_base64="V42+yM7K01CdYDjVXVTDc/7rW6k=">ADUnicbVJNb9NAEN0kfJTQ0hSOXFZEkYrUhNgpNMcKLhw4FEjaSokV7W7G6Spr9kdIyLv4JfwxX+Ahf+CifWbkJxw0iWnubNm3meWZ4oabHf/1WrN+7cvXd/50Hz4e7eo/3WweNzq1MjYCy0uaSMwtKxjBGiQouEwMs4gou+PJNwV98BmOljke4SiCI2CKWoRQMXWrW6mbTsnELHiQ9Xv9weDVyeDIgTIcOPYHL/1h/vHwy/N81mpvGLoNvDVok3WczQ7qe9O5FmkEMQrFrJ14/QSDjBmUQkHenKYWEiaWbAETB2MWgQ2y0lJOy4zp6E27ouRltl/FRmLrF1F3FVGDK/sba5I/o+bpBgOg0zGSYoQi+tBYaoalosic6lAYFq5QATRjqvVFwxwS6VTYrUZekBXuyj4uOtSRdNjlEunIK41XfaVGlYWb6FAro0QBHX94V063gCjRUXEtV4i4zb/K0qMDsEWd2Wq+ylSuKuhIFW2soPdwouXIU18zMabEZWmxQq6oqlgJC9783KqEjR+KmuryK98I/oCiV/UaVc6aFUNQO03ebLoH5N1+Ltvg3O95g57/rh9+nr9lHbIU/KMHBKPnJBT8packTER5Cv5Rr6TH/Wf9d+NWqNxXVqvrTVPSCUau38AnrQFg=</latexit>

π(x)

<latexit sha1_base64="NtWFkOUtPAGz7XfQAbVT9/WJY=">ADKXicbVLNbhMxEHaXv7JQaOHIxSKVCQadgMSPVZw4cChoCStlKwq25lN3fhnsWcR0SrvwBVegafhBlx5EbzblLINI1kazTfzOeZ4YWSHpPk50Z07fqNm7c2b8d37m7du7+982DkbekEDIV1h1z5kFJA0OUqOC4cMA0V3DE569r/OgjOC+tGeCigEyzmZG5FAxDaDQp5O6nJyfbnaSXNEbXnXTldMjKDk92oq3J1IpSg0GhmPfjNCkwq5hDKRQs40npoWBizmYwDq5hGnxWNXKXtBsiU5pbF5B2kT/ZVRMe7/QPGRqhqf+KlYH/4eNS8z3s0qaokQw4rxRXiqKltZ/p1PpQKBaBIcJ4NWKk6ZYwLDhOJWqUGaVbW6pk6wLg0g3d/jEukgbYS3dZVONYkX1qVe6kIBHb5/23T3gCjNrEXi1s6Rcb/8SyqczcHX62Jq70PJlMQFDSkK1sbQaLhkchUgbpmb0noytJ6gVW2WkQLy8N9LlrA6gHiR3WwlfdZ/SgFr61Vt9Za1U3QBs4yjsMBpVfPZd0Z9Xvp817/3YvOwavVKW2SR+Qx2SUpeUkOyBtySIZEkDPymXwhX6Nv0foR/TrPDXaWHEekpZFv/8Aa7j70Q=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>

x0

<latexit sha1_base64="wTH7Miqwo1GQefGc2oJaO71j0=">ADJXicbVLNbhMxEHaXn5aFQgtHLhZRBAcadtNK7bEqFw4cSpW0ldJVZTuzqRX/LPYsIlrlDXqFV+BpuCEkTrwK3m1K2YaRLI3m2/m8zwQkmPSfJrJbpz9791bUH8cNH64+fbGw+Pfa2dAKGwirTjnzoKSBIUpUcFo4YJorOHTtzV+8gmcl9YMcFZAptnEyFwKhiF09Pnl+UYn6SWN0WUnXTgdsrD81o/WxsRanBoFDM+1GaFJhVzKEUCubxWemhYGLKJjAKrmEafFY1Uue0GyJjmlsXnkHaRP9lVEx7P9M8ZGqGF/42Vgf/h41KzPeySpqiRDiqlFeKoqW1v+mY+lAoJoFhwkng1YqLphjAsN04lapQZpVtbqmTrAuDSDd2+IS6SBthLd1lU41idfWpV7qQgEdHr1vuntAlGbSInFrp8i4n/8lFc7m4OtVMbX1sWRK4oyGFAVLY2g03DC5ChC3zI1pPRlaT9CqNstIAXn47w1LWB1AvM5utpK+6b+mgKLX1qpba63qJmgDZx7H4YDS2+ey7Bz3e+l2r/9hp7N/sDilNfKcvCvSEp2yT5Rw7JkAiSk0vyhXyNvkXfox/Rz6vUaGXBeUZaFv3+AwiR+ko=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>
slide-46
SLIDE 46
  • Ingredient #1: static map

– Perform MCMC in the reference space, on the “preconditioned” density – Simple proposal in reference space (e.g., random walk) corresponds to a more complex/tailored proposal on target

α = π(S −1( ′ r ))| ∇S −1 | ′

r qr(r | ′

r ) π(S −1(r))| ∇S −1 |r qr( ′ r |r)

more complex proposal, directly on target distribution

Map-accelerated MCMC

r θ ˜ p(r) S(θ) π(θ) θ r r0 qr(r0|r) θ0

S]π

<latexit sha1_base64="+bZFnD6SiMRwUSrbCv8zMg7X/Io=">ADWnicbVJNb9NAEN3GfLQupSlw47IisShDXYA0WMFw4cCiRtpdiKdjfjdJW1+yOkSLv4RfwxV+ARI/hrWbUJwkqXRvHkz2+W50paDIJfOx3vzt1793f3/P0HBw8Pu0ePLqwujICx0EqbK84sKJnBGCUquMoNsJQruOSLdzV+RWMlTob4TKHOGXzTCZSMHSlafd1GTVDJmbO4zIcBE0cB5tJ9Xka2Wtmchrlsp2e2uAbifrKT2yivPpUecgmlRpJChUMzaSRjkGJfMoBQKj8qLORMLNgcJi7NWAo2LhtpFe27yowm2rgvQ9pU/2WULV2mXLXmTK8tptYXfwfNikwOY1LmeUFQiZuFiWFoqhpbRadSQMC1dIlTBjptFLhTGACnaV+a9QojMtaXTPHRZ86kJ6ecIl0FDbC27oKo5rGdfSplWmugI4/fWi2W0CU2bxF4lovkHFb/SXlRidg6/sydfKlYErikroWBVs2NBpumVw5iGtmZrR2htYOatVmZVJA4v73liV06kBcdzdXCV8MjymgGLS1pq2zlvUS1I5T+b57QOHmc9lOLoaD8OVg+PFV7+zt6intkqfkGXlOQvKGnJH35JyMiSDfyHfyg/zs/PY63p63f9Pa2VlxHpNWeE/+APaFB58=</latexit>

S(x)

<latexit sha1_base64="V42+yM7K01CdYDjVXVTDc/7rW6k=">ADUnicbVJNb9NAEN0kfJTQ0hSOXFZEkYrUhNgpNMcKLhw4FEjaSokV7W7G6Spr9kdIyLv4JfwxX+Ahf+CifWbkJxw0iWnubNm3meWZ4oabHf/1WrN+7cvXd/50Hz4e7eo/3WweNzq1MjYCy0uaSMwtKxjBGiQouEwMs4gou+PJNwV98BmOljke4SiCI2CKWoRQMXWrW6mbTsnELHiQ9Xv9weDVyeDIgTIcOPYHL/1h/vHwy/N81mpvGLoNvDVok3WczQ7qe9O5FmkEMQrFrJ14/QSDjBmUQkHenKYWEiaWbAETB2MWgQ2y0lJOy4zp6E27ouRltl/FRmLrF1F3FVGDK/sba5I/o+bpBgOg0zGSYoQi+tBYaoalosic6lAYFq5QATRjqvVFwxwS6VTYrUZekBXuyj4uOtSRdNjlEunIK41XfaVGlYWb6FAro0QBHX94V063gCjRUXEtV4i4zb/K0qMDsEWd2Wq+ylSuKuhIFW2soPdwouXIU18zMabEZWmxQq6oqlgJC9783KqEjR+KmuryK98I/oCiV/UaVc6aFUNQO03ebLoH5N1+Ltvg3O95g57/rh9+nr9lHbIU/KMHBKPnJBT8packTER5Cv5Rr6TH/Wf9d+NWqNxXVqvrTVPSCUau38AnrQFg=</latexit>

π(x)

<latexit sha1_base64="NtWFkOUtPAGz7XfQAbVT9/WJY=">ADKXicbVLNbhMxEHaXv7JQaOHIxSKVCQadgMSPVZw4cChoCStlKwq25lN3fhnsWcR0SrvwBVegafhBlx5EbzblLINI1kazTfzOeZ4YWSHpPk50Z07fqNm7c2b8d37m7du7+982DkbekEDIV1h1z5kFJA0OUqOC4cMA0V3DE569r/OgjOC+tGeCigEyzmZG5FAxDaDQp5O6nJyfbnaSXNEbXnXTldMjKDk92oq3J1IpSg0GhmPfjNCkwq5hDKRQs40npoWBizmYwDq5hGnxWNXKXtBsiU5pbF5B2kT/ZVRMe7/QPGRqhqf+KlYH/4eNS8z3s0qaokQw4rxRXiqKltZ/p1PpQKBaBIcJ4NWKk6ZYwLDhOJWqUGaVbW6pk6wLg0g3d/jEukgbYS3dZVONYkX1qVe6kIBHb5/23T3gCjNrEXi1s6Rcb/8SyqczcHX62Jq70PJlMQFDSkK1sbQaLhkchUgbpmb0noytJ6gVW2WkQLy8N9LlrA6gHiR3WwlfdZ/SgFr61Vt9Za1U3QBs4yjsMBpVfPZd0Z9Xvp817/3YvOwavVKW2SR+Qx2SUpeUkOyBtySIZEkDPymXwhX6Nv0foR/TrPDXaWHEekpZFv/8Aa7j70Q=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>

x0

<latexit sha1_base64="wTH7Miqwo1GQefGc2oJaO71j0=">ADJXicbVLNbhMxEHaXn5aFQgtHLhZRBAcadtNK7bEqFw4cSpW0ldJVZTuzqRX/LPYsIlrlDXqFV+BpuCEkTrwK3m1K2YaRLI3m2/m8zwQkmPSfJrJbpz9791bUH8cNH64+fbGw+Pfa2dAKGwirTjnzoKSBIUpUcFo4YJorOHTtzV+8gmcl9YMcFZAptnEyFwKhiF09Pnl+UYn6SWN0WUnXTgdsrD81o/WxsRanBoFDM+1GaFJhVzKEUCubxWemhYGLKJjAKrmEafFY1Uue0GyJjmlsXnkHaRP9lVEx7P9M8ZGqGF/42Vgf/h41KzPeySpqiRDiqlFeKoqW1v+mY+lAoJoFhwkng1YqLphjAsN04lapQZpVtbqmTrAuDSDd2+IS6SBthLd1lU41idfWpV7qQgEdHr1vuntAlGbSInFrp8i4n/8lFc7m4OtVMbX1sWRK4oyGFAVLY2g03DC5ChC3zI1pPRlaT9CqNstIAXn47w1LWB1AvM5utpK+6b+mgKLX1qpba63qJmgDZx7H4YDS2+ey7Bz3e+l2r/9hp7N/sDilNfKcvCvSEp2yT5Rw7JkAiSk0vyhXyNvkXfox/Rz6vUaGXBeUZaFv3+AwiR+ko=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>
slide-47
SLIDE 47
  • Ingredient #2: adaptive map

– Update the map with each MCMC iteration: more samples, more accurate , better – Adaptive MCMC [Haario 2001, Andrieu 2006], but with nonlinear transformation to capture non-Gaussian structure S

r θ ˜ Tk(θ) ˜ pk(r) π(θ)

Map-accelerated MCMC

Sk(θ)

π(x)

<latexit sha1_base64="NtWFkOUtPAGz7XfQAbVT9/WJY=">ADKXicbVLNbhMxEHaXv7JQaOHIxSKVCQadgMSPVZw4cChoCStlKwq25lN3fhnsWcR0SrvwBVegafhBlx5EbzblLINI1kazTfzOeZ4YWSHpPk50Z07fqNm7c2b8d37m7du7+982DkbekEDIV1h1z5kFJA0OUqOC4cMA0V3DE569r/OgjOC+tGeCigEyzmZG5FAxDaDQp5O6nJyfbnaSXNEbXnXTldMjKDk92oq3J1IpSg0GhmPfjNCkwq5hDKRQs40npoWBizmYwDq5hGnxWNXKXtBsiU5pbF5B2kT/ZVRMe7/QPGRqhqf+KlYH/4eNS8z3s0qaokQw4rxRXiqKltZ/p1PpQKBaBIcJ4NWKk6ZYwLDhOJWqUGaVbW6pk6wLg0g3d/jEukgbYS3dZVONYkX1qVe6kIBHb5/23T3gCjNrEXi1s6Rcb/8SyqczcHX62Jq70PJlMQFDSkK1sbQaLhkchUgbpmb0noytJ6gVW2WkQLy8N9LlrA6gHiR3WwlfdZ/SgFr61Vt9Za1U3QBs4yjsMBpVfPZd0Z9Xvp817/3YvOwavVKW2SR+Qx2SUpeUkOyBtySIZEkDPymXwhX6Nv0foR/TrPDXaWHEekpZFv/8Aa7j70Q=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>

Sk(x)

<latexit sha1_base64="CYColpCFAjmaAlQ3+3UKceiS03U=">ADKXicbVLNbhMxEHaXv7JQaOHIxSKVCQadgMSPVZw4cChQJWSleR7cymJv5Z7FlEtMo7cIVX4Gm4AVdeBO82pWzDSJZG80383lmeKGkxyT5uRFduXrt+o3Nm/Gt21t37m7v3Bt5WzoBQ2GVdceceVDSwBAlKjguHDNFRzx+csaP/oIzktrBrgoINsZmQuBcMQGr2bzHc/PZpsd5Je0hd9KV0yErO5zsRFsnUytKDQaFYt6P06TArGIOpVCwjE9KDwUTczaDcXAN0+CzqpG7pN0QmdLcuvAM0ib6L6Ni2vuF5iFTMz1l7E6+D9sXGK+n1XSFCWCEWeN8lJRtLT+O51KBwLVIjhMOBm0UnHKHBMYJhS3Sg3SrKrVNXWCdWkA6f4el0gHaSO8rat0qk8ty71UhcK6PDt6a7B0RpZi0St3aOjPvlX1LhbA6+XhdTex9KpiQuaEhRsDaGRsMFk6sAcvclNaTofUErWqzjBSQh/9esITVAcTz7GYr6ZP+Ywoem2turXWqm6CNnCWcRwOKL18LuvOqN9Ln/b6b51Dl6sTmTPCAPyS5JyXNyQF6RQzIkgrwn8kX8jX6Fn2PfkS/zlKjRXnPmlZ9PsPKU37uQ=</latexit>

(Sk)]π

<latexit sha1_base64="EWgQngqzYl3nlMVMgF14FOhSz7M=">ADXnicbVLBbtNAEN0mUEogNIULEpcVUaQitakdkCi3Ci4cOBRI2kqxZe1uxukqa6/ZHSNFlr+Fr+EKZ258Cms3oThJEujefNmnt8sz5S06Hm/dlrtO3d37+3d7zx42H203zt4fGF1bgRMhFbaXHFmQckUJihRwVmgCVcwSVfvKvwy69grNTpGJcZhAmbpzKWgqErRb03RVAPmZo5Dwt/6NVx5G0m5eHnaPEiCuw1MxkNMlGvf4apNvJelKfrOI8Omh1g5kWeQIpCsWsnfpehmHBDEqhoOwEuYWMiQWbw9SlKUvAhkUtr6QDV5nRWBv3pUjr6r+MgiXWLhPuOhOG13YTq4r/w6Y5xqdhIdMsR0jFzaI4VxQ1rQyjM2lAoFq6hAkjnVYqnAlMoLO10xg19sOiUlfPcTGgDqSnx1wiHfu18Kau3Ki6cR0DamWSKaCTx/q7RYQZTpvkLjWC2Tcln9JmdEx2OrGTB1/yZmSuKSuRcGWDbWGWyZXDuKamRmtnKGVg1o1WakUELv/vWUJnTgQ1931VfyT0REFMOm1qRx1qJagtpxyk7HPSB/87lsJxejof9yOPr4qn/2dvWU9sgz8pwcEp+8JmfkPTknEyLIN/Kd/CA/W7/bu+1ue/+mtbWz4jwhjWg/QPWfwji</latexit>
slide-48
SLIDE 48

r θ ˜ Tk+1(θ) π(θ) ˜ pk+1(r)

  • Ingredient #2: adaptive map

– Update the map with each MCMC iteration: more samples, more accurate , better – Adaptive MCMC [Haario 2001, Andrieu 2006], but with nonlinear transformation to capture non-Gaussian structure

Map-accelerated MCMC

π(x)

<latexit sha1_base64="NtWFkOUtPAGz7XfQAbVT9/WJY=">ADKXicbVLNbhMxEHaXv7JQaOHIxSKVCQadgMSPVZw4cChoCStlKwq25lN3fhnsWcR0SrvwBVegafhBlx5EbzblLINI1kazTfzOeZ4YWSHpPk50Z07fqNm7c2b8d37m7du7+982DkbekEDIV1h1z5kFJA0OUqOC4cMA0V3DE569r/OgjOC+tGeCigEyzmZG5FAxDaDQp5O6nJyfbnaSXNEbXnXTldMjKDk92oq3J1IpSg0GhmPfjNCkwq5hDKRQs40npoWBizmYwDq5hGnxWNXKXtBsiU5pbF5B2kT/ZVRMe7/QPGRqhqf+KlYH/4eNS8z3s0qaokQw4rxRXiqKltZ/p1PpQKBaBIcJ4NWKk6ZYwLDhOJWqUGaVbW6pk6wLg0g3d/jEukgbYS3dZVONYkX1qVe6kIBHb5/23T3gCjNrEXi1s6Rcb/8SyqczcHX62Jq70PJlMQFDSkK1sbQaLhkchUgbpmb0noytJ6gVW2WkQLy8N9LlrA6gHiR3WwlfdZ/SgFr61Vt9Za1U3QBs4yjsMBpVfPZd0Z9Xvp817/3YvOwavVKW2SR+Qx2SUpeUkOyBtySIZEkDPymXwhX6Nv0foR/TrPDXaWHEekpZFv/8Aa7j70Q=</latexit>

x

<latexit sha1_base64="XyHL5eGZjA23/ToeKRNn6+KzpRM=">ADJHicbVLNbhMxEHa3QMtC6Q9HLhZRJA407IZK9FjBhQOHFiVtpXRV2c5sasU/iz2LiFZ5Aq7wCjwN8SBS5+l3m1K2YaRLI3m2/m8zwQkmPSfJnJVq9d/B2vrD+NHjSebW9s7x96WTsBQWGXdKWcelDQwRIkKTgsHTHMFJ3z6rsZPoPz0poBzgrINJsYmUvBMISOvpxvdZJe0hdtKF0yELOzfjbOxlaUGgwKxbwfpUmBWcUcSqFgHp+VHgompmwCo+AapsFnVaN0TrshMqa5deEZpE30X0bFtPczUOmZnjh72J18H/YqMR8P6ukKUoEI64b5aWiaGn9bTqWDgSqWXCYcDJopeKCOSYwDCdulRqkWVWra+oE69IA0v1dLpEO0kZ4W1fpVJN4Y13qpS4U0OHD013D4jSTFokbu0UGfzv6TC2Rx8vSmdj+VTEmc0ZCiYGkMjYZbJlcB4pa5Ma0nQ+sJWtVmGSkgD/+9ZQmrA4g32c1W0lf9lxRQ9NpadWutVd0EbeDM4zgcUHr3XJad434vfd3rH+1Dt4uTmdPCPyQuSkjfkgLwnh2RIBAHylXwj36Mf0c/oV/T7OjVaWXCekpZFl1d2qvoZ</latexit>

(Sk+1)]π

<latexit sha1_base64="80fX9SIKRrpuCF26XUkHsq+LT8A=">ADYnicbVJNb9NAEN0mfBRDIaFHOKyIhXRpnFAohekCi4cOBRI2kqxZe1uxukqa6/ZHSNFln8Nv4YrnLjzQ1i7CcUJI1kazZs38/xmeakxeHw106rfev2nbu797z7D/YePup0H59bnRsBE6GVNpecWVAyhQlKVHCZGWAJV3DBF+8q/OIrGCt1OsZlBmHC5qmMpWDoSlHnTRHUQ6ZmzsPCHwzrOBxuJuXB56hYvPDL51Fgr5jJaJDJMur01g10O1lP65FVnEXd1l4w0yJPIEWhmLVTf5hWDCDUigovSC3kDGxYHOYujRlCdiwqCWtO8qMxpr474UaV39l1GwxNplwl1nwvDKbmJV8X/YNMf4JCxkmuUIqbheFOeKoqaVaXQmDQhUS5cwYaTSoUzgQl01nqNUWM/LCp19RwXfepAenLEJdKxXwtv6sqNqhvX0adWJpkCOvn0od5uAVGm8waJa71Axm35l5QZHYOt7szU0ZecKYlL6loUbNlQa7hcuUgrpmZ0coZWjmoVZOVSgGx+98bltCJA3HdXV/FPx4dUkAxaGpNGmctqiWoHaf0PeA/M3nsp2cjwb+y8Ho46ve6dvVU9olT8gzckB8pqckvfkjEyIN/Id/KD/Gz9bnvtbnv/urW1s+Lsk0a0n/4BW8IKXg=</latexit>

Sk+1(x)

<latexit sha1_base64="YVjN/63J5grA6FsN4xiGVhxcebk=">ADLXicbVLNbhMxEHaXv7JQaOHIxSKVARNdwMSPVZw4cChQNJWpKvIdmZTK/5Z7FlEtMpbcIVX4Gk4ICGuvAbebUrZhpEsjeab+bzPBCSY9J8mMtunL12vUb6zfjW7c37tzd3Lp36G3pBAyFVdYdc+ZBSQNDlKjguHDANFdwxGcva/zoIzgvrRngvIBMs6mRuRQMQ+j9u3E1e5wutj89Gm92kl7SGF10qXTIUs7G9FGycTK0oNBoVi3o/SpMCsYg6lULCIT0oPBRMzNoVRcA3T4LOqkbyg3RCZ0Ny68AzSJvovo2La+7nmIVMzPWXsTr4P2xUYr6XVdIUJYIRZ43yUlG0tP4/nUgHAtU8OEw4GbRScocEximFLdKDdKsqtU1dYJ1aQDp3g6XSAdpI7ytq3SqSTy3LvVSFwro8O3rprsHRGmLRK3doaM+8VfUuFsDr5eGVM7H0qmJM5pSFGwMoZGwWTqwBxy9yE1pOh9QStarOMFJCH/16whNUBxPsZivpbv8JBRS9tlbdWmtVN0EbOIs4DgeUXj6XVew30uf9vpvnX2XyxPaZ08IA/JNknJc7JPXpEDMiSCGPKZfCFfo2/R9+hn9OsNVpbcu6TlkW/wBztf01</latexit>

S Eπ

slide-49
SLIDE 49
  • Ingredient #3: global proposals

– If the map becomes sufficiently accurate, would like to avoid random-walk behavior

Map-accelerated MCMC

reference RW proposal mapped RW proposal

slide-50
SLIDE 50
  • Ingredient #3: global proposals

– If the map becomes sufficiently accurate, would like to avoid random-walk behavior

Map-accelerated MCMC

reference independence proposal mapped independence proposal

slide-51
SLIDE 51
  • Ingredient #3: global proposals

– If the map becomes sufficiently accurate, would like to avoid random-walk behavior – Solution: delayed rejection MCMC [Mira 2001] – First proposal = independent sample from η (global, more efficient); second proposal = random walk (local, more robust)

  • Entire scheme is provably ergodic with respect to the exact

posterior measure [Parno & M, SIAM JUQ 2018]

– Requires enforcing some regularity conditions on maps, to preserve tail behavior of transformed target

Map-accelerated MCMC

slide-52
SLIDE 52

Example: biological oxygen demand model

◮ Likelihood model:

d = θ1(1 − exp (−θ2x)) + ǫ ǫ ∼ N

  • 0, 2 × 10−4

◮ 20 noisy observations at

x = 5 5, 6 5, . . . , 25 5

  • ◮ Degree-three polynomial map

True posterior density

Marzouk et al. MIT 29 / 40

slide-53
SLIDE 53

Transformed distribution

Original posterior π Pushforward of posterior through learned map, S♯π

Marzouk et al. MIT 30 / 40

slide-54
SLIDE 54

Results: ESS per computational effort

T M + D R T M + N U T S T M + L A N N U T S D R A M 100 200 300 161 57 179 1.4 5.8 ESS/(1,000 Evaluations) – θ1 T M + D R G T M + D R L T M + M I X N U T S D R A M 500 1,000 1,500 1468 487 1495 57 127 ESS/second– θ1

Marzouk et al. MIT 31 / 40

slide-55
SLIDE 55

Example #2: predator-prey model

◮ Six parameter ODE population model

dP dt = rP

  • 1 − P

K

  • − s PQ

a + P dQ dt = u PQ a + P − vQ

◮ Five noisy observations of both populations ◮ Infer 6 parameters + 2 initial values; uniform priors

Marzouk et al. MIT 32 / 40

slide-56
SLIDE 56

Predator-prey model: chains

1 2 3 P(0) DRAM sMMALA AMALA 1 2 3 MCMC Step P(0) TMDRG MCMC Step TMDRL MCMC Step TMRWM

Marzouk et al. MIT 33 / 40

slide-57
SLIDE 57

Example: maple sap dynamics model

◮ Coupled PDE system for

ice, water, and gas locations [Ceseri & Stockie 2013]

◮ Measure gas pressure in

vessel

◮ Infer 10 physical model

parameters

◮ Very challenging

posterior!

R +R+r R s

x

R+R−r s R+2R water

fiber/ vessel vessel wall

f

fiber

R

v

sgi siw gas ice water gas

gi iw f f v f v f v

r R

Image from Ceseri and Stockie, 2013

Marzouk et al. MIT 34 / 40

slide-58
SLIDE 58

Maple posterior distribution

θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8 θ9 θ10

−1 −0.5 0.5 1 2 θ1 θ7 −1 −0.5 0 0.5 1 −0.6 −0.4 −0.2 0.2 0.4 θ3 θ6

Marzouk et al. MIT 35 / 40

slide-59
SLIDE 59

Results: ESS per computational effort

T M + D R G T M + D R L T M + M I X D R A M 2 4 6 8 10 5.7 10 2.9 0.6 ESS/(10,000 Evaluations) T M + D R G T M + D R L T M + M I X D R A M 5 10 15 20 25 18 26 7.1 2.3 ESS/(1000 seconds)

Marzouk et al. MIT 36 / 40

slide-60
SLIDE 60

Comments on MCMC with transport maps

Useful characteristics of the algorithm:

◮ Map construction is easily parallelizable ◮ Requires no gradients from posterior density

Generalizes many current MCMC techniques:

◮ Adaptive Metropolis: map enables non-Gaussian proposals and a

natural mixing between local and global moves

◮ Manifold MCMC [Girolami & Calderhead 2011]: map also defines a

Riemannian metric

Marzouk et al. MIT 37 / 40

slide-61
SLIDE 61

Looking to higher dimensions: regularized estimation of S

For simplicity, consider map components Sk(x) =

j βjψj(x1:k−1) + αkxk

  • Sk ∈ arg

min

Sk∈Sh

△,k

1 N

N

  • i=1

1 2Sk(xi)2 − log ∂kSk(xi)

  • + λNβ1

Assume sub-Gaussian π and basis functions ψj(x)

Theorem [BZM]

For polynomial maps of degree m with sparsity s, with high probability Eπ

  • DKL
  • π(xk|x1:k−1) ||

S♯

  • s2m log k

N Takeaways

◮ Accurate estimation is feasible in high dimensions with N ≪ k ◮ From factorization property of density, error in conditionals ensures

DKL(π|| S♯η) d

  • s2m log d

N

Marzouk et al. MIT 38 / 40

slide-62
SLIDE 62

Conclusions

◮ Central idea: characterize complex/intractable distributions by

constructing deterministic couplings

◮ Many kinds of low-dimensional structure (non-exhaustive):

◮ Sparse maps, decomposable maps ◮ Low rank structure (lazy maps)

◮ Exploiting the pullback distribution

◮ Compositions of approximate maps, constructed greedily ◮ Use approximate maps to precondition other sampling or cubature

schemes

Marzouk et al. MIT 39 / 40

slide-63
SLIDE 63

Conclusions

Extensions and ongoing work:

◮ Using sparse grids or QMC for map construction ◮ Preconditioning other sampling schemes: computational tradeoffs,

impact on target geometry and regularity

◮ Zoo of map parameterizations and their approximation properties ◮ Additional varieties of low-dimensional structure ◮ Sequential variational inference with maps: filtering, smoothing,

parameter estimation

◮ Maps from samples:

◮ Nonparametric estimation schemes ◮ As a tool for ensemble filtering [arXiv:1907.00389] or approximate

Bayesian computation. . .

Marzouk et al. MIT 39 / 40

slide-64
SLIDE 64

Conclusions

Extensions and ongoing work:

◮ Using sparse grids or QMC for map construction ◮ Preconditioning other sampling schemes: computational tradeoffs,

impact on target geometry and regularity

◮ Zoo of map parameterizations and their approximation properties ◮ Additional varieties of low-dimensional structure ◮ Sequential variational inference with maps: filtering, smoothing,

parameter estimation

◮ Maps from samples:

◮ Nonparametric estimation schemes ◮ As a tool for ensemble filtering [arXiv:1907.00389] or approximate

Bayesian computation. . .

Thanks for your attention!

Marzouk et al. MIT 39 / 40

slide-65
SLIDE 65

References

◮ A. Spantini, R. Baptista, Y. Marzouk. “Coupling techniques for nonlinear ensemble

filtering.” arXiv:1907.00389.

◮ D. Bigoni, O. Zahm, A. Spantini, Y. Marzouk. “Greedy inference with layers of

lazy maps.” arXiv:1906.00031.

◮ O. Zahm, T. Cui, K. Law, A. Spantini, Y. Marzouk. “Certified dimension

reduction in nonlinear Bayesian inverse problems.” arXiv:1807.03712.

◮ A. Spantini, D. Bigoni, Y. Marzouk. “Inference via low-dimensional couplings.”

JMLR 19(66): 1–71, 2018.

◮ M. Parno, Y. Marzouk, “Transport map accelerated Markov chain Monte Carlo.”

SIAM JUQ 6: 645–682, 2018.

◮ G. Detomasso, T. Cui, A. Spantini, Y. Marzouk, R. Scheichl, “A Stein variational

Newton method.” NeurIPS 2018.

◮ R. Morrison, R. Baptista, Y. Marzouk. “Beyond normality: learning sparse

probabilistic graphical models in the non-Gaussian setting.” NeurIPS 2017.

◮ Y. Marzouk, T. Moselhy, M. Parno, A. Spantini, “An introduction to sampling via

measure transport.” Handbook of Uncertainty Quantification, R. Ghanem, D. Higdon, H. Owhadi, eds. Springer (2016). arXiv:1602.05023.

◮ General python code at http://transportmaps.mit.edu

Marzouk et al. MIT 40 / 40