Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 - - PowerPoint PPT Presentation

advanced simulation lecture 16
SMART_READER_LITE
LIVE PREVIEW

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 - - PowerPoint PPT Presentation

Advanced Simulation - Lecture 16 Patrick Rebeschini March 7th, 2018 Patrick Rebeschini Lecture 16 1/ 25 Outline Particle methods for static problems. Evidence estimation. The End! Patrick Rebeschini Lecture 16 2/ 25 Sequence of


slide-1
SLIDE 1

Advanced Simulation - Lecture 16

Patrick Rebeschini March 7th, 2018

Patrick Rebeschini Lecture 16 1/ 25

slide-2
SLIDE 2

Outline

Particle methods for static problems. Evidence estimation. The End!

Patrick Rebeschini Lecture 16 2/ 25

slide-3
SLIDE 3

Sequence of posterior distributions

Consider the problem of estimating

  • ϕ(x)π(x)dx

for test function ϕ : X → R and target distribution π on X. Introduce intermediate distributions π0, π1, . . . , πT such that: π0 is easy to sample from, πt and πt+1 are not too different, πT = π.

Patrick Rebeschini Lecture 16 3/ 25

slide-4
SLIDE 4

Sequence of posterior distributions

Case where π(θ) ∝ p(θ)p(y1:T | θ). Introduce the partial posterior πt(θ) ∝ p(θ)p(y1:t | θ). We have π0(θ) = p(θ), the prior distribution. We have πT (θ) = π(θ) = p(θ | y1:T ), the full posterior distribution. We can expect πt(θ) to be similar to πt+1(θ), at least when t is large, when the data is i.i.d.

Patrick Rebeschini Lecture 16 4/ 25

slide-5
SLIDE 5

Sequence of posterior distributions

General target distribution π(x). Introduce a simple parametric distribution q. We can introduce πt(x) ∝ π(x)γtq(x)1−γt where 0 = γ0 < γ1 < . . . < γT = 1. Then π0 = q and πT = π. If γt is close to γt+1, then πt is close to πt+1.

Patrick Rebeschini Lecture 16 5/ 25

slide-6
SLIDE 6

Sequential Importance Sampling: algorithm

At time t = 0 Sample Xi ∼ π0(·) for i ∈ {1, . . . , N}. Compute the weights ∀i ∈ {1, . . . , N} wi

0 = 1

N . At time t ≥ 1 Compute the weights wi

t = wi t−1 × ωi t

= wi

t−1 ×

πt(Xi) πt−1(Xi). At all times, (wi

t, Xi)N i=1 approximates πt.

Patrick Rebeschini Lecture 16 6/ 25

slide-7
SLIDE 7

Sequential Importance Sampling: diagnostics

As already seen for IS, we can compute the effective sample size ESSt =

N

i=1 wi t

2 N

i=1(wi t)2

=

1

n

i=1

W i

t

2 .

ESSt = N if W i

t = N−1 for all i.

If there exists i such that W i

t ≈ 1, and for j = i, W j t ≈ 0,

then ESSt ≈ 1. As a rule of thumb, the higher the ESS the better our approximation.

Patrick Rebeschini Lecture 16 7/ 25

slide-8
SLIDE 8

Toy model

Data yt ∼ N(µ, 1), 1000 points generated with µ⋆ = 2. Prior µ ∼ N(0, 1) Sequence of partial posterior πt(µ) ∝ p(µ) t

s=1 p(ys | µ).

Incremental weights: πt(µ) πt−1(µ) ∝ p(yt | µ) We can look at the evolution of the effective sample size with t.

Patrick Rebeschini Lecture 16 8/ 25

slide-9
SLIDE 9

Toy model

2 32 128 1024 250 500 750 1000

time ESS

Figure: Effective sample size against “time”, using sequential importance sampling.

Patrick Rebeschini Lecture 16 9/ 25

slide-10
SLIDE 10

Sequential Importance Sampling with Resampling

At time t = 0 Sample Xi

0 ∼ π0(·) for i ∈ {1, . . . , N}.

Compute the weights ∀i ∈ {1, . . . , N} wi

0 = 1

N . At time t ≥ 1 Resample

wi

t−1, Xi t−1

  • N−1, Xi

t−1

  • .

Define Xi

t = ¯

Xi

t−1.

Compute the weights wi

t = ωi t =

πt(Xi

t)

πt−1(Xi

t).

Problem: there are less and less unique values in (Xi

t)N i=1.

Patrick Rebeschini Lecture 16 10/ 25

slide-11
SLIDE 11

Toy model

2 32 128 1024 250 500 750 1000

time ESS

Figure: Effective sample size against “time”, using sequential importance sampling with resampling.

Patrick Rebeschini Lecture 16 11/ 25

slide-12
SLIDE 12

Toy model

2 32 128 1024 250 500 750 1000

time # unique

Figure: Number of unique values against “time”, using sequential importance sampling with resampling.

Patrick Rebeschini Lecture 16 12/ 25

slide-13
SLIDE 13

Move steps

Consider particles (N−1, ¯ Xi

t)N i=1, approximating πt.

The ESS is maximum (=N), but multiple values within ( ¯ Xi

t)N i=1 are identical.

How to diversify the particles while still approximating πt? Apply a Markov kernel Kt to each ¯ Xi

t:

∀i ∈ {1, . . . , N} Xi

t ∼ Kt( ¯

Xi

t, dx)

Can we find Kt such that (N−1, Xi

t) still approximates πt?

e.g. 1 N

N

  • i=1

ϕ(Xi

t) a.s.

− − − − →

N→∞

  • ϕ(x)πt(x)dx.

Patrick Rebeschini Lecture 16 13/ 25

slide-14
SLIDE 14

Move steps

Assume that Kt leaves πt invariant:

  • πt(x)Kt(x, y)dx = πt(y).

If ¯ Xi

t ∼ πt, then Xi t ∼ Kt( ¯

Xi

t, dy) also follows πt.

Also, if ¯ X ∼ q and w(x) = π(x)/q(x), such that Eq[w( ¯ X)ϕ( ¯ X)] = Eπ[ϕ(X)] then, for X ∼ K( ¯ X, dx), EqK[w( ¯ X)ϕ(X)] = Eπ[ϕ(X)].

Patrick Rebeschini Lecture 16 14/ 25

slide-15
SLIDE 15

Move steps

Result: if (wi, ¯ Xi) approximates π, then we can apply a π-invariant kernel to each ¯ Xi and not change the weights. Draw Xi ∼ K( ¯ Xi, dx) for all i ∈ {1, . . . , N}. Keep the weights unchanged. (wi, Xi) still approximates π.

Patrick Rebeschini Lecture 16 15/ 25

slide-16
SLIDE 16

Sequential Monte Carlo Sampler: algorithm

At time t = 0 Sample Xi

0 ∼ π0(·) for i ∈ {1, . . . , N}.

Compute the weights ∀i ∈ {1, . . . , N} wi

0 = 1

N . At time t ≥ 1 Resample

wi

t−1, Xi t−1

  • N−1, Xi

t−1

  • .

Draw Xi

t ∼ Kt−1( ¯

Xi

t−1, dx).

Compute the weights wi

t = ωi t =

πt(Xi

t)

πt−1(Xi

t).

At all times, (wi

t, Xi t)N i=1 approximates πt.

Patrick Rebeschini Lecture 16 16/ 25

slide-17
SLIDE 17

Adaptive resampling

Problem: if Kt is a Metropolis-Hastings with invariant distribution πt, computational cost typically linear in t. Thus applying Kt at each step t ∈ {1, . . . , T} yields an

  • verall cost in

T

  • t=1

O(t) = O(T 2) We can save time by performing the resample-move step

  • nly when necessary.

Use the Effective Sample Size to know whether to trigger a resample-move step.

Patrick Rebeschini Lecture 16 17/ 25

slide-18
SLIDE 18

Sequential Monte Carlo Sampler: algorithm

At time t = 0 Sample Xi

0 ∼ π0(·) for i ∈ {1, . . . , N}.

Compute the weights ∀i ∈ {1, . . . , N} wi

0 = 1

N At time t ≥ 1 If ESS < ˜ N, then:

Resample wi

t−1, Xi t−1

  • N −1, X

i t−1

  • .

Draw Xi

t ∼ Kt−1( ¯

Xi

t−1, dx).

Compute the weights wi

t = wi t−1 ×

πt(Xi

t)

πt−1(Xi

t).

At all times, (wi

t, Xi t)N i=1 approximates πt.

Patrick Rebeschini Lecture 16 18/ 25

slide-19
SLIDE 19

Toy model

2 32 128 1024 250 500 750 1000

time ESS

Figure: Effective sample size against “time”, using sequential Monte Carlo sampling. Dashed lines indicate resampling times.

Patrick Rebeschini Lecture 16 19/ 25

slide-20
SLIDE 20

Toy model

2 32 128 1024 250 500 750 1000

time Nunique

Figure: Number of unique values against “time”, using sequential Monte Carlo sampling. Dashed lines indicate resampling times.

Patrick Rebeschini Lecture 16 20/ 25

slide-21
SLIDE 21

Toy model

5 10 15 1.90 1.95 2.00 2.05 2.10 2.15

µ πT(µ)

Figure: Posterior approximation (in black), and true posterior (in red), after 1000 observations.

Patrick Rebeschini Lecture 16 21/ 25

slide-22
SLIDE 22

Evidence estimation

Assume (wi

t, Xi t)N i=1 approximates the posterior distribution

πt(θ) ∝ p(θ)p(y1:t | θ) at time t. Then the following estimator

N

i=1 wi t p(yt+1 | Xi t)

N

i=1 wi t

converges to

  • p(yt+1 | θ)πt(θ)dθ

=

  • p(yt+1 | θ)p(θ)p(y1:t | θ)

p(y1:t) dθ = 1 p(y1:t)

  • p(y1:t+1 | θ)p(θ)dθ = p(yt+1 | y1:t).

Similar to the likelihood estimator in particle filters.

Patrick Rebeschini Lecture 16 22/ 25

slide-23
SLIDE 23

Toy model

We compare the SMC estimator with the estimator

  • btained by importance sampling from the prior

distribution: ∀i ∈ {1, . . . , N} Xi ∼ p(θ), ∀i ∈ {1, . . . , N} Li

t = p(y1:t | Xi),

pN(y1:t) = 1 N

N

  • i=1

Li

t.

We plot the following relative error rN

t = |pN(y1:t) − p(y1:t)|

|p(y1:t)| .

Patrick Rebeschini Lecture 16 23/ 25

slide-24
SLIDE 24

Toy model

IS SMC

1e−07 1e−06 1e−05 1e−04 1e−03 1e−02 1e−01 250 500 750 10000 250 500 750 1000

time rt

N

Figure: Relative error using SMC samplers and importance sampling from the prior; 10 independent experiments.

Patrick Rebeschini Lecture 16 24/ 25

slide-25
SLIDE 25

The End

Inversion, Transformation, Composition, Accept-reject, Importance Sampling, Metropolis–Hastings, Gibbs sampling, Adaptive Multiple Importance Sampling, Reversible Jump, Slice Sampling, Sequential Importance Sampling, Particle filter, Pseudo-marginal Metropolis–Hastings, Sequential Monte Carlo

  • Sampler. . .

What about Nested Sampling, Path Sampling, Hamiltonian MCMC, Pinball Sampler, Sequential Quasi-Monte Carlo, Multiple-Try Metropolis–Hastings, Coupling From the Past, Multilevel Splitting, Wang–Landau algorithm, Free Energy MCMC, Stochastic Gradient Langevin Dynamics, Firefly MCMC, Configurational Bias Monte Carlo. . .? Good luck for the exams!

Patrick Rebeschini Lecture 16 25/ 25