Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, - - PowerPoint PPT Presentation

advanced simulation lecture 13
SMART_READER_LITE
LIVE PREVIEW

Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, - - PowerPoint PPT Presentation

Advanced Simulation - Lecture 13 Patrick Rebeschini February 26th, 2018 Patrick Rebeschini Lecture 13 1/ 27 Outline Sequential Importance Sampling. Resampling step. Sequential Monte Carlo / Particle Filters. Patrick Rebeschini Lecture 13


slide-1
SLIDE 1

Advanced Simulation - Lecture 13

Patrick Rebeschini February 26th, 2018

Patrick Rebeschini Lecture 13 1/ 27

slide-2
SLIDE 2

Outline

Sequential Importance Sampling. Resampling step. Sequential Monte Carlo / Particle Filters.

Patrick Rebeschini Lecture 13 2/ 27

slide-3
SLIDE 3

Hidden Markov Models

y2 X2 X0 y1 X1 ... ... yT XT θ Figure: Graph representation of a general HMM.

(Xt): initial distribution µθ, transition fθ. (Yt) given (Xt): measurement gθ. Prior on the parameter θ ∈ Θ.

Inference in HMMs, Capp´ e, Moulines, Ryden, 2005.

Patrick Rebeschini Lecture 13 3/ 27

slide-4
SLIDE 4

General inference in HMM

Proposition: The posterior p (x1:t| y1:t, θ) satisfies p (x1:t| y1:t, θ) = p (x1:t−1| y1:t−1, θ) fθ (xt| xt−1) gθ (yt| xt) p (yt| y1:t−1, θ) where p (yt| y1:t−1, θ) =

  • p (x1:t−1| y1:t−1, θ) fθ (xt| xt−1) gθ (yt| xt) dx1:t.

Proposition: The marginal posterior p (xt| y1:t) satisfies the following recursion p (xt| y1:t−1) =

  • f (xt| xt−1) p (xt−1| y1:t−1) dxt−1

p (xt| y1:t) = g (yt| xt) p (xt| y1:t−1) p (yt| y1:t−1) where p (yt| y1:t−1) =

  • g (yt| xt) p (xt| y1:t−1) dxt.

Patrick Rebeschini Lecture 13 4/ 27

slide-5
SLIDE 5

General inference in HMM

In general, the filtering problem is thus intractable:

  • ϕ(xt)p(xt | y1:t, θ)dxt =
  • ϕ(xt)p(x1:t, y1:t | θ)dx1:t

=

  • ϕ(xt)µθ(x1)

t

  • s=1

fθ(xs | xs−1)

t

  • s=1

gθ(ys | xs)dx1:t. It is a t × dim(X) dimensional integral. The likelihood is also intractable: p(y1:t | θ) =

  • p(x1:t, y1:t | θ)dx1:t

=

  • µθ(x1)

t

  • s=1

fθ(xs | xs−1)

t

  • s=1

gθ(ys | xs)dx1:t. Thus we cannot compute it pointwise, e.g. to perform Metropolis–Hastings algorithm on the parameters.

Patrick Rebeschini Lecture 13 5/ 27

slide-6
SLIDE 6

Sequential Importance Sampling

We now consider the parameter θ to be fixed. We want to infer X1:t given y1:t. Two ingredients: importance sampling, and “sampling via composition”, or “via condition”. IS: if we have a weighted sample (wi

1, Xi) approximating

π1, then (wi

2, Xi) approximates π2 if we define

wi

2 = wi 1 × π2(Xi)

π1(Xi). In standard IS, π1 and π2 are defined on the same space.

Patrick Rebeschini Lecture 13 6/ 27

slide-7
SLIDE 7

Sequential Importance Sampling

Sampling via composition: if (wi, Xi) approximates pX(x), and if Y i ∼ qY |X(y | Xi), then (wi, (Xi, Y i)) approximates pX(x)qY |X(y | x). The space has been extended. Marginally, (wi, Y i) approximates qY (y) =

  • pX(x)qY |X(y | x)dx.

Sequential Importance Sampling combines both ingredients to iteratively approximate p(x1:t | y1:t).

Patrick Rebeschini Lecture 13 7/ 27

slide-8
SLIDE 8

Sequential Importance Sampling: algorithm

At time t = 1 Sample Xi

1 ∼ q1(·).

Compute the weights wi

1 = µ(Xi 1)g(y1 | Xi 1)

q1

Xi

1

  • .

At time t ≥ 2 Sample Xi

t ∼ qt|t−1(·| Xi t−1).

Compute the weights wi

t = wi t−1 × ωi t

= wi

t−1 × f

Xi

t

  • Xi

t−1

g yt| Xi

t

  • qt|t−1(Xi

t | Xi t−1)

.

Patrick Rebeschini Lecture 13 8/ 27

slide-9
SLIDE 9

Sequential Importance Sampling: example

−6 −4 −2 2 4 25 50 75 100

time x prior

  • ptimal

Kalman Filter

Figure: Estimation of filtering means E (xt | y1:t).

Patrick Rebeschini Lecture 13 9/ 27

slide-10
SLIDE 10

Sequential Importance Sampling: example

0.0 0.5 1.0 1.5 2.0 25 50 75 100

time Var(x) prior

  • ptimal

Kalman Filter

Figure: Estimation of filtering variances V (xt | y1:t).

Patrick Rebeschini Lecture 13 10/ 27

slide-11
SLIDE 11

Sequential Importance Sampling: example

−300 −200 −100 25 50 75 100

time log(L(y1:t)) prior

  • ptimal

Kalman Filter

Figure: Estimation of marginal log likelihoods log p(y1:t).

Patrick Rebeschini Lecture 13 11/ 27

slide-12
SLIDE 12

Sequential Importance Sampling: example

200 400 600 800 25 50 75 100

time ESS prior

  • ptimal

Figure: Effective sample size over time.

Patrick Rebeschini Lecture 13 12/ 27

slide-13
SLIDE 13

Sequential Importance Sampling: example

−10 −5 5 10 25 50 75 100

time x

Figure: Spread of 100 paths drawn from the prior proposal, and KF means in blue. Darker lines indicate higher weights.

Patrick Rebeschini Lecture 13 13/ 27

slide-14
SLIDE 14

Resampling

Idea: at time t, select particles with high weights, and remove particles with low weights. Spend the fixed computational budget “N” on the most promising paths. Obtain an equally weighted sample (N−1, ¯ Xi) from a weighted sample (wi, Xi). Resampling on empirical probability measures: input πN(x) =

  • wj−1

wiδXi(x) and output ¯ πN(x) = N−1 δ ¯

Xi(x).

Patrick Rebeschini Lecture 13 14/ 27

slide-15
SLIDE 15

Multinomial resampling

How to draw from an empirical probability distribution? πN(x) = 1

N

j=1 wj N

  • i=1

wiδXi(x) Remember how to draw from a mixture model?

K

  • i=1

ωi pi(x) Draw k with probabilities ω1, . . . , ωN, then draw from pk.

Patrick Rebeschini Lecture 13 15/ 27

slide-16
SLIDE 16

Multinomial resampling

Draw an “ancestry vector” A1:N =

  • A1, . . . , AN

∈ {1, . . . , N}N independently from a categorical distribution A1:N i.i.d ∼ Cat

  • w1, . . . , wN

, in other words ∀i ∈ {1, . . . , N} ∀k ∈ {1, . . . , N} P

  • Ai = k
  • =

wk

N

j=1 wj .

Define ¯ Xi to be XAi for all i ∈ {1, . . . , N}. XAi is said to be the “parent” or “ancestor” of ¯ Xi. Return ¯ X =

¯

X1, . . . , ¯ XN .

Patrick Rebeschini Lecture 13 16/ 27

slide-17
SLIDE 17

Multinomial resampling

Draw an “offspring vector” O1:N =

  • O1, . . . , ON

∈ {0, . . . , N}N from a multinomial distribution O1:N

t

∼ Multinomial

  • N; w1, . . . , wN

so that ∀i ∈ {1, . . . , N} E

  • Oi

= N wi

N

j=1 wj

and

N

  • i=1

Oi = N. Each particle Xi is replicated Oi times (possibly zero times) to create the sample ¯ X: ¯ X ← {} For i = 1, . . . , N, for k = 0, . . . , Oi

t, ¯

X ←

¯

X, Xi Return ¯ X =

¯

X1, . . . , ¯ XN .

Patrick Rebeschini Lecture 13 17/ 27

slide-18
SLIDE 18

Multinomial resampling

Other strategies are possible to perform resampling. Some properties are desirable: E

  • Oi

= N wi

N

j=1 wj ,

  • r

P

  • Ai = k
  • =

wk

N

j=1 wj .

This is sometimes called “unbiasedness”, because then 1 N

N

  • k=1

ϕ

¯

Xk = 1 N

N

  • k=1

Okϕ

  • Xk

has expectation

N

  • k=1

wk

N

j=1 wj ϕ

  • Xk

.

Patrick Rebeschini Lecture 13 18/ 27

slide-19
SLIDE 19

Sequential MC / Sequential Importance Resampling

At time t = 1 Sample Xi

1 ∼ q1(·).

Compute the weights wi

1 = µ(Xi 1)g(y1 | Xi 1)

q1

Xi

1

  • .

At time t ≥ 2 Resample

wi

t−1, Xi 1:t−1

  • N−1, Xi

1:t−1

  • .

Sample Xi

t ∼ qt|t−1(·| ¯

Xi

t−1), Xi 1:t :=

¯

Xi

1:t−1, Xi t

  • Compute the weights

wi

t = ωi t = f

Xi

t

  • Xi

t−1

g yt| Xi

t

  • qt|t−1(Xi

t | Xi t−1)

.

Patrick Rebeschini Lecture 13 19/ 27

slide-20
SLIDE 20

Sequential Importance Sampling

Y n!

(a) (b) (c) Level sets of likelihood function x − ! g(x, Yn)

Patrick Rebeschini Lecture 13 20/ 27

slide-21
SLIDE 21

Sequential Importance Resampling

3%par:cles%

%

2%par:cles%

%

Y n!

N N

(a) (b) (c) (d) Level sets of likelihood function x − ! g(x, Yn)

Patrick Rebeschini Lecture 13 21/ 27

slide-22
SLIDE 22

Sequential Monte Carlo: example

−4 −2 2 25 50 75 100

time x prior

  • ptimal

Kalman Filter

Figure: Estimation of filtering means E (xt | y1:t).

Patrick Rebeschini Lecture 13 22/ 27

slide-23
SLIDE 23

Sequential Monte Carlo: example

0.4 0.6 0.8 1.0 25 50 75 100

time Var(x) prior

  • ptimal

Kalman Filter

Figure: Estimation of filtering variances V (xt | y1:t).

Patrick Rebeschini Lecture 13 23/ 27

slide-24
SLIDE 24

Sequential Monte Carlo: example

−200 −150 −100 −50 25 50 75 100

time log(L(y1:t)) prior

  • ptimal

Kalman Filter

Figure: Estimation of marginal log likelihoods log p(y1:t).

Patrick Rebeschini Lecture 13 24/ 27

slide-25
SLIDE 25

Sequential Monte Carlo: example

250 500 750 1000 25 50 75 100

time ESS prior

  • ptimal

Figure: Effective sample size over time.

Patrick Rebeschini Lecture 13 25/ 27

slide-26
SLIDE 26

Sequential Monte Carlo: example

−6 −3 3 6 25 50 75 100

time x

Figure: Support of the approximation of p(xt | y1:t), over time.

Patrick Rebeschini Lecture 13 26/ 27

slide-27
SLIDE 27

Sequential Monte Carlo: example

−5.0 −2.5 0.0 2.5 25 50 75 100

time x

Figure: Trajectories ¯ Xi

1:T , for i ∈ {1, . . . , N} and N = 100.

Patrick Rebeschini Lecture 13 27/ 27