Bayesian parameter estimation using Multilevel and multi-index Monte - - PowerPoint PPT Presentation

bayesian parameter estimation using multilevel and multi
SMART_READER_LITE
LIVE PREVIEW

Bayesian parameter estimation using Multilevel and multi-index Monte - - PowerPoint PPT Presentation

SMC 2 S(ML)MC 2 MLMC BIP OBIP Coupling PMCMC PMC(ML)MC Numerics MIMC Summary Bayesian parameter estimation using Multilevel and multi-index Monte Carlo Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka), Y. Xu (NUS*), & Y.


slide-1
SLIDE 1

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Bayesian parameter estimation using Multilevel and multi-index Monte Carlo

Kody Law joint with A. Jasra (NUS), K. Kamatani (Osaka),

  • Y. Xu (NUS*), & Y. Zhou (Cubist)

Monash Workshop on Numerical Differential Equations and Applications Monash University, AU

February 12, 2020

slide-2
SLIDE 2

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-3
SLIDE 3

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Orientation

Aim: Approximate posterior expectations of the state path and static parameters associated to an S(P)DE which must be finitely approximated. Solution: Apply an approximate coupling strategy so that multi-index Monte Carlo (MIMC) methods can be used within a particle MCMC [B02, AR08, ADH10] and SMC2 [CJP13]. MLMC (d = 1) [H00, G08] and MIMC (d > 1) [HNT15] methods reduce cost to mean-squared error= O(ε2); Recently this methodology has been applied to inference, mostly in cases where target can be evaluated up to a normalizing constant [HSS13, DKST15, HTL16, BJLTZ17]. Here we can only simulate a non-negative unbiased estimator (utanc); using PMCMC we are able to sample consistently from an approximate coupling of successive targets [JKLZ18.i, JKLZ18.ii], and this is extended to the sequential context via SMC2 [JLX19].

slide-4
SLIDE 4

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-5
SLIDE 5

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Example: expectation for SDE [G08]

Estimation of expectation of solution of intractable stochastic differential equation (SDE). dX = f(X)dt + σ(X)dW, X0 = x0 . Aim: estimate E(g(XT)). We need to (1) Approximate, e.g. by Euler-Maruyama method with resolution h: Xn+1 = Xn + hf(Xn) + √ hσ(Xn)ξn, ξn ∼ N(0, 1). (2) Sample {X (i)

NT }N i=1, NT = T/h.

slide-6
SLIDE 6

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Multilevel Monte Carlo (MLMC)

Aim: Approximate η∞(g) := Eη∞(g) for g : E → R. Single level estimator: 1

N

N

i=1 g(U(i) L ), U(i) L

∼ ηL i.i.d. Cost to achieve MSE= O(ε2) is C =Cost(U(i)

L ) × ε−2.

Multilevel estimator*: L

l=0 1 Nl

Nl

i=1{g(U(i) l ) − g(U(i) l−1)} ,

(Ul, Ul−1)(i) ∼ ¯ ηl i.i.d. such that

  • ¯

ηldul−1,l = ηl,l−1 for l = 0, . . . , L. (*g(U(i)

−1) := 0)

Cost is CML = L

l=0 ClNl, where Cl is the cost to obtain a

sample from ¯ ηl. Fix bias by choosing L. Minimize cost CML({Nl}L

l=0) for

fixed variance= L

l=0 Vl/Nl, ⇒ Nl ∝

  • Vl/Cl.

Example: Milstein solution of SDE for MSE= O(ε2) C = O(ε−3) vs. CML = O(ε−2).

slide-7
SLIDE 7

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Illustration of pairwise coupling

Pairwise coupling of trajectories of an SDE: X 1

n+1 = X 1 n +hf(X 1 n )+

√ hσ(X 1

n )ξn,

ξn ∼ N(0, 1), n = 0, . . . , N1 X 0

n+1 = X 0 n +(2h)f(X 0 n )+

√ hσ(X 0

n )(ξ2n+ξ2n+1), n = 0, . . . , N1/2.

0.0 0.2 0.4 0.6 0.8 1.0 t 0.2 0.0 0.2 0.4 0.6

W

  • t

W

  • 1
  • t

(a) Wiener process W 1

n =

√ h n

i=0 ξn, W 0 n = W 1 2n.

0.0 0.2 0.4 0.6 0.8 1.0 t 1.0 1.1 1.2 1.3 1.4 1.5 1.6

X

  • t

X

  • 1
  • t

(b) Stochastic process driven by Wiener process.

slide-8
SLIDE 8

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-9
SLIDE 9

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Bayesian inference is about approximating integrals

Suppose we know how to evaluate γ(x) for x ∈ X. Let η(dx) = γ(x)dx

  • X γ(x)dx ,

and ϕ : X → R, and suppose we want to estimate η(ϕ) :=

  • X

ϕ(x)η(dx) . X may be quite high dimension, e.g. Rd with d = 100 easily, or even 1000, 10000, etc...

slide-10
SLIDE 10

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Monte Carlo

If we could obtain i.i.d. samples xi ∼ η, then we could use η(ϕ) ≈ 1 N

N

  • i=1

ϕ(xi) . Convergence rate (of MSE) is O(1/N), independently of d. Unfortunately we cannot get i.i.d. samples.

slide-11
SLIDE 11

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Importance sampling and ratio estimators

Suppose we can get i.i.d. samples xi ∼ ν where 0 < G(x) := γ(x)

ν(x) < C.

Then we can use the self-normalized importance sampling estimator η(ϕ) ≈ N

i=1 G(xi)ϕ(xi)

N

i=1 G(xi)

. The rate will still be O(1/N), but typically with a constant O(ed), depending on E(G(x) − EG(x))2. We may as well use quadrature.

slide-12
SLIDE 12

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Markov chain Monte Carlo

Suppose we can construct a Markov chain K, that is an

  • perator with the property K : B(X) → B(X) and

K ∗ : P(X) → P(X), where B(X) are bounded measurable functions and P(X) are probability measures, and such that (ηK)(dx) =

  • X

η(dx′)K(x′, dx) = η(dx) , and for all A ⊂ X, x, x′ ∈ X,

  • A

K(x, dz) ≤

  • A

K(x′, dz) .

slide-13
SLIDE 13

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Markov chain Monte Carlo

Then we can run the Markov chain to collect samples, x0 ∈ X and xi ∼ K(xi−1, ·) = K i(x0, ·) and use these for Monte Carlo η(ϕ) ≈ 1 N

Nb+N

  • i=Nb+1

ϕ(xi) . Again Monte Carlo provides rate O(1/N), but now under quite general conditions one may achieve polynomial constant O(d).

slide-14
SLIDE 14

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Example: Metropolis-Hastings

Let Q denote a Markov kernel on X.

1

Let x0 ∈ X.

2

Sample x∗ ∼ Q(xi, ·) .

3

Set xi+1 = x∗ with probability: min

  • 1, γ(x∗)Q(x∗, xi)

γ(xi)Q(xi, x∗)

  • ,
  • therwise xi+1 = xi.

4

Set i = i + 1 and return to the start of (2).

slide-15
SLIDE 15

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-16
SLIDE 16

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Parameter inference

Estimate the posterior expectation of a function ϕ of the joint path X1:T and parameters θ, of an intractable S(P)DE dX = fθ(X)dt + σθ(X)dW, X0 ∼ µθ , given noisy partial observations Yn ∼ gθ(Xn, ·) , n = 1, . . . , T . Aim: estimate E[ϕ(θ, X0:T)|y1:T], where y1:T := {y1, . . . , yT}. The hidden process {Xn} is a Markov chain. Discretize with resolution h and denote the transition kernel Fθ,h

  • xp−1, dxp
  • – this can be simulated from, but its density

cannot be evaluated.

slide-17
SLIDE 17

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Return to ML (SDE, for simplicity)

The joint measure (suppressing fixed yp in notation) is Πh(dθ, dx0:n) ∝ Π(dθ)µθ(dx0)

n

  • p=1

gθ(xp, yp)Fθ,h(xp−1, dxp) , For +∞ > h0 > · · · > hL > 0, we would like to compute EΠhL[ϕ(θ, X0:n)] =

L

  • l=0
  • EΠhl [ϕ(θ, X0:n)] − EΠhl−1[ϕ(θ, X0:n)]
  • where EΠh−1[·] := 0.
slide-18
SLIDE 18

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-19
SLIDE 19

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Approximate coupling

Consider a single pair EΠh[ϕ(θ, X0:n)] − EΠh′[ϕ(θ, X0:n)], h < h′. Let z = (x, x′) and let Qθ,h,h′(z, d¯ z) be a coupling of (Fθ,h(x, d¯ x), Fθ,h′(x′, d¯ x′)). Let Gp,θ(z) = max{gθ(x, yp), gθ(x′, yp)}. We will sample from the joint coarse/fine filter Πh,h′(dθ, dz0:n) ∝ Π(dθ)νθ(dz0)

n

  • p=1

Gp,θ(zp)Qθ,h,h′(zp−1, dzp), where νθ is the initial coupling νθ(d(x, x′)) = µθ(dx)δx(dx′).

slide-20
SLIDE 20

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Change of measure

We have EΠh[ϕ(θ, X0:n)] − EΠh′[ϕ(θ, X ′

0:n)] =

EΠh,h′[ϕ(θ, X0:n)H1,θ(θ, Z0:n)] EΠh,h′[H1,θ(θ, Z0:n)] − EΠh,h′[ϕ(θ, X ′

0:n)H2,θ(θ, Z0:n)]

EΠh,h′[H2,θ(θ, Z0:n)] where H1,θ(θ, z0:n) =

n

  • p=1

gθ(xp, yp) Gp,θ(zp) H2,θ(θ, z0:n) =

n

  • p=1

gθ(x′

p, yp)

Gp,θ(zp) .

slide-21
SLIDE 21

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-22
SLIDE 22

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Sequential Importance Resampling [DDG01]

12 Doucet, de Freitas, and Gordon

i=I,

... ,

N=10 particles

  • 000

. . . .

t , t t t ,

·r

1

  • L J\h

.

t .

,

, ,

,

  • • ••••
  • 1

1

,

  • {-(i) N-I}

X

t_l ,

{xO

) w(i) }

I- I ' I

  • I

{xCi) N-1}

I-I'

(X

Ci) iN (i)}

I' I

Figure 1.1. In this example( the bootstrap filter starts at time t - 1 with an unweighted measure

N- 1 }, which provides an approximation of

p(xt- lIYl:t-2). For each particle we compute the importance weights using the information at time t -

  • 1. This results in the weighted measure

which yields an approximation p(Xt-lIYl:t-I). Subsequently, the resampling step selects only the fittest particles to obtain the unweighted measure

N- 1 },

which is still an approximation ofp(Xt-lIYl:t-l) . Finally, the sampling (predic- tion) step introduces variety, resulting in the measure

N- 1 } , which is an

approximation of p(XtIYl:t-l).

importance weights and indices (both being one-dimensional quantities). This enables one to easily carry out sequential inference for very complex models. An illustrative example For demonstration purposes, we apply the bootstrap algorithm to the following nonlinear, non-Gaussian model (Gordon et al. 1993, Kitagawa 1996):

1

Xt-l

  • Xt-l +

25

2

+

8 cos (1.2t) +

Vt

2

l+xt _ 1

Yt 20 +

Wt ,

slide-23
SLIDE 23

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Particle filter, for fixed θ

Let M ≥ 1 and θ be fixed, and introduce a1:M

0:n−1 ∈ {1, . . . , M}Mn.

The bootstrap particle filter [Del04] approximates Πh,h′(dz0:n|θ) ∝ νθ(dz0)

n

  • p=1

Gp,θ(zp)Qθ,h,h′(zp−1, dzp) by sampling from

P(a1:M

0:n−1, dz1:M 0:n |θ) =

M

  • i=1

νθ(dzi

0)

  • n
  • p=1

M

  • i=1
  • Gp−1,θ(z

ai

p−1

p−1 )

M

j=1 Gp−1,θ(zj p−1)

Qθ,h,h′(z

ai

p−1

p−1 , dzi p)

  • ,

where G0,θ := 1, i.e. z

ai

p−1

p−1 is resampled with the probability Gp−1,θ(z

ai p−1 p−1 )

M

j=1 Gp−1,θ(zj p−1)

.

slide-24
SLIDE 24

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Particle marginal MH (PMMH) [ADH10]

Draw J with probability proportional to Gn,θ(zj

n) for j = 1, . . . , M.

Let zn = zj

n, and trace its ancestral lineage

  • zn−1 = z

aj

n−1

n−1 ,

  • zn−2 = z

a

aj n−1 n−2

n−2 , and so on .

Define pM

h,h′(y0:n|θ) = n p=1

  • 1

M

M

j=1 Gp,θ(zj p)

  • . Denote

U = (a1:M

0:n−1, z1:M 0:n , θ).

Run a MH chain on the extended state space {Ui} targeting ∝ P(a1:M

0:n−1, z1:M 0:n |θ)pM h,h′(y0:n|θ)Π(dθ). Draw J ∼ P(J|Ui) and

construct zi

0:n as above. The target has the property [ADH10]

P(U, J) = P((U, J)\( z0:n, θ)| z0:n, θ)Πh,h′( z0:n, θ) .

slide-25
SLIDE 25

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Particle marginal MH (PMMH) [ADH10]

1

Sample θ0 ∼ π(dθ) and (a1:M

0:n−1, z1:M 0:n ) from particle filter

P(a1:M

0:n−1, dz1:M 0:n |θ0), and store pM h,h′(y0:n|θ0). 2

Select a path z0

0:n: draw zj n with probability proportional to Gn,θ0(zj n), let

  • z0

n = zj n, and trace back its ancestral lineage

  • z0

n−1 = z aj

n−1

n−1 ,

  • z0

n−2 = z a

aj n−1 n−2

n−2 , and so on ; Set i = 1 . 3

Sample θ∗|θi−1 according to R(dθ∗|θi−1) = r(θ∗|θi−1)dθ∗, then sample from particle filter P(a1:M

0:n−1, dz1:M 0:n |θ∗). Select one path

z∗

0:n as above. 4

Set θi = θ∗, zi

0:n =

z∗

0:n with probability:

min

  • 1, pM

h,h′(y0:n|θ∗)

pM

h,h′(y0:n|θi−1)

π(θ∗)r(θi−1|θ∗) π(θi−1)r(θ∗|θi−1)

  • therwise θi = θi−1,

zi

0:n =

zi−1

0:n . 5

Set i = i + 1 and return to the start of 3.

slide-26
SLIDE 26

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-27
SLIDE 27

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

PMMH increment estimator

1 N

N

i=1 ϕ(θi,

xi

0:n)H1,θ(θi,

zi

0:n) 1 N

N

i=1 H1,θ(θi,

zi

0:n)

1 N

N

i=1 ϕ(θi,

x′i

0:n)H2,θ(θi,

zi

0:n) 1 N

N

i=1 H2,θ(θi,

zi

0:n)

.

− → N→∞

EΠh,h′[ϕ(θ, X0:n)H1,θ(θ, Z0:n)] EΠh,h′[H1,θ(θ, Z0:n)] − EΠh,h′[ϕ(θ, X ′

0:n)H2,θ(θ, Z0:n)]

EΠh,h′[H2,θ(θ, Z0:n)] = EΠh[ϕ(θ, X0:n)] − EΠh′[ϕ(θ, X ′

0:n)]

slide-28
SLIDE 28

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Multilevel estimator

ENl

l (ϕ) = 1 Nl

Nl

i=1 ϕ(θi, ,

xi

0:n)H1,θ(θi,

zi

0:n) 1 Nl

Nl

i=1 H1,θ(θi,

zi

0:n)

1 Nl

Nl

i=1 ϕ(θi,

x′i

0:n)H2,θ(θi,

zi

0:n) 1 Nl

Nl

i=1 H2,θ(θi,

zi

0:n)

is a consistent estimator of (E0 := EΠh0[ϕ(θ, X0:n)]) El(ϕ) := EΠhl [ϕ(θ, X0:n)] − EΠhl−1[ϕ(θ, X0:n)] . ⇒ L

l=0 ENl l (ϕ) is a consistent estimator of EΠhL[ϕ(θ, X0:n)] .

slide-29
SLIDE 29

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Multilevel estimator error analysis

Consider

L

  • l=0

¯ ENl

l (ϕ) ,

¯ ENl

l (ϕ) = ENl l (ϕ) − El(ϕ) .

One must bound E[(

L

  • l=0

¯ ENl

l (ϕ))2] = L

  • l=0

 E[¯ ENl

l (ϕ)2] + E[¯

ENl

l (ϕ)] L

  • q=l=0

E[¯ ENq

q (ϕ)]

  .

slide-30
SLIDE 30

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Assumptions

(A1) ∀ y ∈ T, ∃ C > 0 such that ∀ x ∈ S, θ ∈ Θ, C ≤ gθ(x, y) ≤ C−1. And ∀ y ∈ T, gθ(x, y) is globally Lipschitz on S × Θ. (A2) ∀ 0 ≤ k ≤ n, ∃ β > 0 such that ∀ ϕ ∈ Bb(Θ × Sk+1) ∩ Lip(Θ × Sk+1) ∃ C > 0

  • Θ×S2k+2 |ϕ(θ, x0:k)−ϕ(θ, x′

0:k)|2Π(dθ)νθ(dz0) k

  • p=1

Qθ,h,h′(zp−1, dzp) ≤ C(h′)β.

(A3) ∀ n > 0, ∃ ξ ∈ (0, 1) and ν ∈ P(W) s.t. ∀ w ∈ W, ϕ ∈ Bb(W) ∩ Lip(W), h, h′, the PMMH kernel K satisfies:

  • W

ϕ(w′)K(w, dw′) ≥ ξ

  • W

ϕ(w′)ν(dw′).

K is η-reversible, where η is the joint on the extended space.

slide-31
SLIDE 31

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Main result

Theorem (JKLZ18) Assume (A1-3). Then ∀ n > 0, ∃ β > 0 such that ∀ ϕ ∈ Bb(Θ × Sn+1) ∩ Lip(Θ × Sn+1) ∃ C > 0 such that E[¯ ENl

l (ϕ)2] ≤ Chβ l

Nl , E[¯ ENl

l (ϕ)] ≤ Chβ/2 l

Nl , and β is from (A2).

slide-32
SLIDE 32

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

(A4) ∃ γ, α, C > 0, such that the cost to simulate ENl

l

is controlled by C(ENl

l ) ≤ CNlh−γ l

, and the bias is controlled by |EΠhL (ϕ(θ, X0:n)) − EΠ0(ϕ(θ, X0:n))| ≤ Chα

L .

Corollary Assume (A1-4). ∀ n > 0 and ϕ ∈ Bb(Θ × Sn+1) ∩ Lip(Θ × Sn+1) ∃ C > 0 such that ∀ ǫ > 0 one can choose (L, {Nl}L

l=0) so

E

  • |

L

  • l=0

ENl

l (ϕ) − EΠ0(ϕ(θ, X0:n))|2

  • ≤ Cǫ2 ,

with a total cost (per time step) COST ≤ C      ǫ−2, if β > γ, ǫ−2| log(ǫ)|2, if β = γ, ǫ−(2+ γ−β

α ),

if β < γ.

slide-33
SLIDE 33

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-34
SLIDE 34

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

SMC samplers

Let Un = [u0, . . . , un] ∈ Un for n = 0, 1, 2, . . . . Consider target distributions ηn ∝ κn on Un. Interlace sequential importance resampling (selection) along the sequence, and mutation by MCMC kernels. This enables sampling sequentially with complexity O(n2) per sample 1.

Initialize i.i.d. Ui

0 ∼ η0, i = 1, . . . , N, and ui 1 ∼ q(Ui 0, ·). For

n = 1, . . . Resample { Ui

n}N i=1 according to the weights {wi n}N i=1,

wi

n = Gi n/ N j=1 Gj n, G0 = 1, and

Gi

n = κn(Ui n)/[κn−1(Ui n−1)qn(Ui n−1, ui n)].

Draw Ui

n+1 ∼ Mn+1(

Ui

n, · ), where

Mn+1(Un, dU′

n+1) = Kn(Un, dU′ n) ⊗ qn+1(U′ n, du′ n+1)

is an MCMC kernel such that ηnKn = ηn.

1In principle, under suitable assumptions [BCJ14]

slide-35
SLIDE 35

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Sequential Monte Carlo2 (SMC2)

Define Un = (θ, a1:M

0:n−1, z1:M 0:n ), and recall the PMMH target

Pn(Un, J), which has the joint posterior as its marginal on (θ, z0:n), where z0:n is the ancestral path of the particle with index J ∼ {Gn,θ(zj

n)/ M i=1 Gn,θ(zi n)} at time n.

Consider the PMMH marginal Pn(Un), which has as its marginal the posterior on θ. SMC2: Run an SMC sampler with N samples {Uk

n }N k=1

targeting Pn(Un), using PMMH marginal kernels (without sampling J) as the mutations. When we wish to estimate expectations EΠh,h′(f(θ, Z0:n)), we extend to Pn(Un, J) = Pn(J|Un)Pn(Un): For each k = 1, . . . , N:

Sample Jk ∼ {Gn,θk (zj,k

n )/ M i=1 Gn,θk (zi,k n )}.

Construct the ancestral lineage zk

0:n (as before).

Estimate 1

N

N

k=1 f(θk,

zk

0:n).

slide-36
SLIDE 36

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-37
SLIDE 37

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

SMLMC2

For each l = 0, . . . , L: Run SMC2 on Pl,n(Ul,n), then at a given time n (additional l index suppressed): For each k = 1, . . . , Nl:

Sample Jk ∼ {Gn,θk (zj,k

n )/ M i=1 Gn,θk (zi,k n )}.

Construct the ancestral lineage zk

0:n (as before).

Estimate

ENl

l

(ϕ) =

1 Nl

Nl

i=1 ϕ(θi , ,

xi

0:n)H1,θ(θi ,

zi

0:n) 1 Nl

Nl

i=1 H1,θ(θi ,

zi

0:n)

1 Nl

Nl

i=1 ϕ(θi ,

x′i

0:n)H2,θ(θi ,

zi

0:n) 1 Nl

Nl

i=1 H2,θ(θi ,

zi

0:n)

Construct the resulting MLMC estimator L

l=0 ENl l (ϕ).

We obtained a theorem (for MIMC) under very strong assumptions. Numerical results indicate the method works under weaker assumptions.

slide-38
SLIDE 38

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-39
SLIDE 39

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Ornstein-Uhlenbeck process

dXt = θ(µ − Xt)dt + σdWt , X0 = x0 , Yk|Xδk ∼ N(Xδk, τ 2) , θ ∼ G(1, 1) , σ ∼ G(1, 0.5) . N(m, τ 2) denotes the Normal with mean m and variance τ 2. G(a, b) denotes the Gamma with shape a and scale b. x0 = 0, µ = 0, δ = 0.5, and τ 2 = 0.2. 100 observations simulated with θ = 1 and σ = 0.5.

slide-40
SLIDE 40

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Langevin SDE

dXt = 1 2∇ log π(Xt)dt + σdWt, X0 = x0 Yk|Xk ∼ N(0, τ 2 exp Xk) , θ ∼ G(1, 1) , σ ∼ G(1, 0.5) . π(x) denotes the probability density function of a Student’s t-distribution with θ degrees of freedom. x0 = 0. 1,000 observations simulated with θ = 10, σ = 1, and τ 2 = 1.

slide-41
SLIDE 41

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary Langevin SDE Ornstein-Uhlenbech process 10 20 30 10 20 30 0.00 0.25 0.50 0.75 1.00

Lag ACF Parameter

𝜏 𝜄

Figure: Autocorrelation of a typical PMCMC chain.

slide-42
SLIDE 42

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary Lagenvin SDE 𝜏 Lagenvin SDE 𝜄 Ornstein-Uhlenbech process 𝜏 Ornstein-Uhlenbech process 𝜄 2−8 2−7 2−6 2−5 2−4 2−3 2−2 2−5 2−4 2−3 2−2 2−1 20 2−8 2−7 2−6 2−5 2−4 2−3 2−6 2−5 2−4 2−3 2−2 2−1 22 24 26 28 210 22 24 26 28 210 22 24 26 28 210 22 24 26 28 210

Error Cost Algorithm

ML-PMCMC PMCMC

Figure: Cost vs. MSE for the 2 parameters for each of the 2 SDEs.

slide-43
SLIDE 43

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Model Parameter ML-PMCMC PMCMC OU θ −1.022 −1.463 σ −1.065 −1.522 Langevin θ −1.060 −1.508 σ −1.023 −1.481

Table: Estimated rates of convergence of MSE with respect to cost for various parameters, fitted to the curves.

slide-44
SLIDE 44

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-45
SLIDE 45

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Multi-index Monte Carlo (MIMC) idea

If spatio-temporal approximation dimension d > 1, then MIMC is preferable to MLMC [HNT15]. α ∈ Nd ∆iEα(ϕ(u)) = Eα(ϕα(u)) − Eα−ei(ϕα−ei(u)), ∆ = ∆d · · · ∆1, E(ϕ(u)) =

  • α∈Nd

∆Eα(ϕα(u)) ≈

  • α∈I

∆Eα(ϕα(u))

slide-46
SLIDE 46

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

MIMC estimator

Construct an empirical estimator of

α∈I ∆Eα(ϕα(u)).

Key: for each α ∈ I, obtain Nα i.i.d. samples from a suitably coupled target (Uα(kα), Uα(kα−1), . . . , Uα(1))(i) ∼ ¯ ηα such that

  • ¯

ηαduα(1) · · · duα(l−1)duα(l+1) · · · duα(kα) = ηα(l) for l = 0, . . . , L, where kα ≤ 2d. Similar to MLMC, one finds optimal Nα ∝

  • Vα/Cα .

Optimal index set2 I consists of superlevel sets of Pα = Bα/√VαCα, where Bα is bias associated to increment α. Canonical complexity O(ε−2) can be obtained independent

  • f dimension d for optimal index sets.

For tensor product index sets an additional (dimension-dependent) constraint on the rates is required

  • r else the cost is dominated by the single highest

resolution sample.

2In the sense that any set with smaller work has larger bias.

slide-47
SLIDE 47

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Outline

1

Multilevel Monte Carlo sampling

2

Bayesian inference problem

3

Our Bayesian inference problem

4

Approximate coupling

5

Particle Markov chain Monte Carlo

6

Particle Markov chain Multilevel Monte Carlo

7

Sequential Monte Carlo2

8

Sequential Multilevel Monte Carlo2

9

Numerical simulations

10 Multi-index Monte Carlo sampling 11 Summary

slide-48
SLIDE 48

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Summary

New approximate coupling strategy can be used to apply MLMC to PMCMC for static parameter estimation [JKLZ18.i]. Same strategy can be employed for multi-index MCMC [JKLZ18.ii]. Recently extended to MISMC2 [JLX19.s]. Exciting prospects of looking at

MIMC versions with appropriate index sets; Other versions of MIMC ∩ SMC; Links/comparisons with other methods, in cases where they are applicable

  • ther practical enhancements.

PhD and postdocs wanted – please inquire ! AIMS Foundations of Data Science now accepting papers !

slide-49
SLIDE 49

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

References

[G08]: Giles. "Multilevel Monte Carlo path simulation." Op. Res., 56, 607-617 (2008). [H00] Heinrich. "Multilevel Monte Carlo methods." LSSC proceedings (2001). [CGST11] Cliffe, Giles, Scheichl, & Teckentrup. "MLMC and applications to elliptic PDEs." Computing and Visualization in Science, 14(1), 3 (2011). [D04]: Del Moral. "Feynman-Kac Formulae." Springer: New York (2004). [B02] Beaumont. "Estimation of population growth..." Genetics 164(3) 1139-1160 (2003). [AR08] Andrieu & Roberts. "The pseudo-marginal approach." Annals of Stat. 37(2) 697-725 (2009). [ADH10] Andrieu, Doucet, and Holenstein. "Particle MCMC methods." JRSSB 72(3) 269-342 (2010).

slide-50
SLIDE 50

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

References

[JKLZ18.i]: Jasra, Kamatani, Law, Zhou. "MLMC for static Bayesian parameter estimation." SISC 40, A887-A902 (2018). [JKLZ18.ii]: Jasra, Kamatani, Law, Zhou. "A Multi-Index Markov Chain Monte Carlo Method." IJUQ 8(1), 61-73 (2018). [JLX19.s]: Jasra, Law, Xu. "Multi-index SMC2." Submitted. [BJLTZ15]: Beskos, Jasra, Law, Tempone, Zhou. "Multilevel Sequential Monte Carlo samplers." SPA 127:5, 1417–1440 (2017). [HSS13]: Hoang, Schwab, Stuart. Inverse Prob., 29, 085010 (2013). [DKST13]: Dodwell, Ketelsen, Scheichl, Teckentrup. "A hierarchical MLMCMC algorithm." SIAM/ASA JUQ 3(1) 1075-1108 (2015).

slide-51
SLIDE 51

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

References

[CHLNT17] Chernov, Hoel, Law, Nobile, Tempone. "Multilevel ensemble Kalman filtering for spatio-temporal processes." arXiv:1608.08558 (2017). [JLS17] Jasra, Law, Suciu. "Advanced Multilevel Monte Carlo Methods." arXiv:1704.07272 (2017). To appear in ISR (2019!) [DJLZ16]: Del Moral, Jasra, Law, Zhou. "Multilevel Sequential Monte Carlo samplers for normalizing constants." ToMACS 27(3), 20 (2017). [JKLZ15]: Jasra, Kamatani, Law, Zhou. "Multilevel particle filter." SINUM 55(6), 3068–3096 (2017). [DDP18]: Deligiannidis, Doucet, Pitt. "The Correlated Pseudo-Marginal Method." JRSSB 80 (5), 839–870 (2015). [HLT15]: Hoel, Law, Tempone. "Multilevel ensemble Kalman filter." SINUM 54(3), 1813–1839 (2016). [GCR]: Cotter, Gregory, Reich. "Multilevel ensemble transform particle filter."

slide-52
SLIDE 52

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Thank you

slide-53
SLIDE 53

MLMC BIP OBIP Coupling PMCMC PMC(ML)MC SMC2 S(ML)MC2 Numerics MIMC Summary

Warmly welcoming submissions!

FOUNDATIONS of

DATA SCIENCE

Editorial Board:

Niall Adams Feng Bao Adrian Bishop Michael W. Berry Marc Bocquet Fengqing Chao Yunjin Choi Paul Constantine Colin Cotter Tiangang Cui Masoumeh Dashti Patrick-Rubin Delanchy Pierre Del Moral Jack Dongarra Evangelos Evangelou Brittany Fasy Colin Fox Roger Ghanem Omar Ghattas Dimitrios Giannakis Mark Girolami Heather Harrington Nick Heard Jeremy Heng Michael Hintermüller Jeremie Houssineau Maria De Iorio Chris Jones Kengo Kamatani Nikolas Kantas Jessica Cisewski Kehe Michael Kirby Anthony Lee Benedict Leimkuhler Bill Lionheart Po-Ling Loh Jan Mandel Youssef Marzouk Scott McKinley Sayan Mukherjee James Murphy Habib Najm George Ostrouchov Michela Ottobre Houman Owhadi Daniel Paulin Marcelo Pereyra Victor Perez-Abreu Petr Plechac Arvind Ramanathan Sebastian Reich Juan Restrepo Tim Sauer Claudia Schillings Carola Schöenlieb Sumeetpal Singh Konstantinos Spiliopoulos Hans De Sterck Jie Xiong Nicholas Zabaras AMERICAN INSTITUTE OF MATHEMATICAL SCIENCES Ajay Jasra (Editor in Chief) Kody J. H. Law (Editor in Chief) Vasileios Maroulas (Editor in Chief)