Sequential Monte Carlo Dr. Jarad Niemi STAT 615 - Iowa State - - PowerPoint PPT Presentation

sequential monte carlo
SMART_READER_LITE
LIVE PREVIEW

Sequential Monte Carlo Dr. Jarad Niemi STAT 615 - Iowa State - - PowerPoint PPT Presentation

Sequential Monte Carlo Dr. Jarad Niemi STAT 615 - Iowa State University October 20, 2017 Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 1 / 72 Overview Outline Outline 1. State-space models 3. State and parameter


slide-1
SLIDE 1

Sequential Monte Carlo

  • Dr. Jarad Niemi

STAT 615 - Iowa State University

October 20, 2017

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 1 / 72

slide-2
SLIDE 2

Overview Outline

Outline

  • 1. State-space models

p(y|θ, ψ)p(θ|ψ)

Definition Terminology Notation

  • 2. State inference p(θ|y, ψ)

Exact inference Importance sampling Sequential importance sampling Bootstrap filter - resampling Auxiliary particle filter

  • 3. State and parameter inference

p(θ, ψ|y)

Bootstrap filter Kernel density Sufficient statistics

  • 4. Advanced SMC

SMC-MCMC Fixed parameter SMC for marginal likelihood calculations

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 2 / 72

slide-3
SLIDE 3

Overview Bayesian inference

Definition Bayes’ rule is P(A|B) = P(B|A)P(A) P(B) . In this rule, B represents what we know about the world and A represents what we don’t. Suppose p(θt, ψ|y1:t−1) is our current knowledge about the state of the

  • world. We observe datum yt then

p(θt, ψ|y1:t) = p(yt|θt, ψ)p(θt, ψ|y1:t−1) p(yt|y1:t−1) . where y1:t = (y1, y2, . . . , yt).

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 3 / 72

slide-4
SLIDE 4

State-space models Definition

Definition A state-space model can be described by these conditional distributions: an observation equation: po(yt|θt, ψ), an evolution equation: pe(θt|θt−1, ψ), and a prior p(θ0, ψ). where yt: an observation vector of length m θt: a latent state vector of length p ψ: a fixed parameter vector of length q

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 4 / 72

slide-5
SLIDE 5

State-space models Graphical representation

Graphical representation

θt−1 θt θt+1 yt−1 yt yt+1 θt−1 θt θt+1 yt−1 yt yt+1 θt−1 θt θt+1 yt−1 yt yt+1

p(θt|θt−1, ψ) p(yt|θt, ψ)

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 5 / 72

slide-6
SLIDE 6

State-space models Interpretation

Interpretation

Model State interpretation Local level model True level Linear growth model True level and slope Seasonal factor model Seasonal effect Dynamic regression Time-varying regression coefficients Stochastic volatility Underlying volatility in the market Markov switching model Influenza epidemic on/off

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 6 / 72

slide-7
SLIDE 7

State-space models Examples

Stochastic volatility

  • bservation

volatility 250 500 750 1000 −10 −5 5 10 −10 −5 5 10

time value Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 7 / 72

slide-8
SLIDE 8

State-space models Examples

Markov switching model

  • bservation

state 250 500 750 1000 −1 1 2 3 −1 1 2 3

time value Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 8 / 72

slide-9
SLIDE 9

State-space models Inference

Inference

Definition The state filtering distribution is the distribution for the state conditional on all

  • bservations up to and including time t, i.e.

p(θt|y1:t, ψ) = p(θt|y1, y2, . . . , yt, ψ). Definition The state smoothing distribution is the distribution for the state conditional on all

  • bserved data, i.e.

p(θt|y1:T , ψ) = p(θt|y1, y2, . . . , yT , ψ) where t < T. Definition The state forecasting distribution is the distribution for future states conditional on all

  • bserved data, i.e.

p(θT +k|y1:T , ψ) = p(θT +k|y1, y2, . . . , yT , ψ) where k > 0.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 9 / 72

slide-10
SLIDE 10

State-space models Inference

θt−1 θt θt+1 yt−1 yt yt+1

Filtering Smoothing Forecasting

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 10 / 72

slide-11
SLIDE 11

State-space models Filtering

Filtering

Goal: p(θt|y1:t) (filtered distribution) Recursive procedure: Assume p(θt−1|y1:t−1) Prior for θt p(θt|y1:t−1) =

  • p(θt, θt−1|y1:t−1)dθt−1

=

  • p(θt|θt−1, y1:t−1)p(θt−1|y1:t−1)dθt−1

=

  • p(θt|θt−1)p(θt−1|y1:t−1)dθt−1

One-step ahead predictive distribution for yt p(yt|y1:t−1) =

  • p(yt, θt|y1:t−1)dθt

=

  • p(yt|θt, y1:t−1)p(θt|y1:t−1)dθt

=

  • p(yt|θt)p(θt|y1:t−1)dθt

Filtered distribution for θt p(θt|y1:t) = p(yt|θt, y1:t−1)p(θt|y1:t−1) p(yt|y1:t−1) = p(yt|θt)p(θt|y1:t−1) p(yt|y1:t−1)

Start from p(θ0).

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 11 / 72

slide-12
SLIDE 12

State-space models Smoothing

Smoothing

Goal: p(θt|y1:T ) for t < T Backward transition probability p(θt|θt+1, y1:T ) p(θt|θt+1, y1:T ) = p(θt|θt+1, y1:t) = p(θt+1|θt, y1:t)p(θt|y1:t) p(θt+1|y1:t) = p(θt+1|θt)p(θt|y1:t) p(θt+1|y1:t) Recursive smoothing distributions p(θt|y1:T ) assuming we know p(θt+1|y1:T ) p(θt|y1:T ) =

  • p(θt, θt+1|y1:T )dθt+1

=

  • p(θt+1|y1:T )p(θt|θt+1, y1:T )dθt+1

=

  • p(θt+1|y1:T )p(θt+1|θt)p(θt|y1:t)

p(θt+1|y1:t) dθt+1 = p(θt|y1:t)

  • p(θt+1|θt)

p(θt+1|y1:t)p(θt+1|y1:T )dθt+1 Start from p(θT |y1:T ).

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 12 / 72

slide-13
SLIDE 13

State-space models Forecasting

Forecasting

Goal: p(yT+k, θT+k|y1:T )

p(yT +k, θT +k|y1:T ) = p(yT +k|θT +k)p(θT +k|y1:T ) Recursively, given p(θT +(k−1)|y1:T ) p(θT +k|y1:T ) =

  • p(θT +k, θT +(k−1)|y1:T ) dθT +(k−1)

=

  • p(θT +k|θT +(k−1), y1:T )p(θT +(k−1)|y1:T )dθT +(k−1)

=

  • p(θT +k|θT +(k−1))p(θT +(k−1)|y1:T )dθT +(k−1)

Start with k = 1.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 13 / 72

slide-14
SLIDE 14

State inference

Outline

  • 1. State-space models

p(y|θ, ψ)p(θ|ψ)

Definition Terminology Notation

  • 2. State inference p(θ|y, ψ)

Exact inference Importance sampling Sequential importance sampling Bootstrap filter - resampling Auxiliary particle filter

  • 3. State and parameter inference

p(θ, ψ|y)

Bootstrap filter Kernel density Sufficient statistics

  • 4. Advanced SMC

SMC-MCMC Fixed parameter SMC for marginal likelihood calculations

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 14 / 72

slide-15
SLIDE 15

State inference Exact inference

Exact inference

Our goal for most of today is to find filtering methods. We assume p(θt−1|y1:t−1) is known and try to obtain p(θt|y1:t) using p(θt|θt−1) and p(yt|θt). Then, starting with p(θ0|y0) = p(θ0) we can find p(θt|y1:t) for all t.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 15 / 72

slide-16
SLIDE 16

State inference Exact inference

There are two important state-space models when the filtering updating is availably analytically: Hidden Markov models Dynamic linear models

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 16 / 72

slide-17
SLIDE 17

State inference Exact inference

Hidden Markov models

Definition A hidden Markov model (HMM) is a state-space model with an arbitrary

  • bservation equation and an evolution equation that can be represented by

a transition probability matrix, i.e. p(θt = j|θt−1 = i) = pij. Filtering in HMMs Suppose we have a HMM with p states. Let qi = p(θt−1 = i|y1:t−1), then p(θt = j|y1:t−1) =

p

  • i=1

qipij p(θt = j|y1:t) ∝ p(yt|θt = j)p(θt = j|y1:t−1). If pi ∝ ai for i ∈ {1, 2, . . . , p}, then pi =

ai p

i=1 ai . Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 17 / 72

slide-18
SLIDE 18

State inference Exact inference

Definition A dynamic linear model (DLM) is a state-space model where both the

  • bservation and evolution equations are linear in the states and have

additive Gaussian errors and the prior is Gaussian, i.e. yt = Ftθt + vt vt ∼ N(0, Vt) θt = Gtθt−1 + wt wt ∼ N(0, Wt) θ0 ∼ N(m0, C0) where vt, wt, and θ0 are independent of each other and mutually independent through time. Kalman filter Kalman smoother

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 18 / 72

slide-19
SLIDE 19

State inference Exact inference

Suppose θt−1|y1:t−1 ∼ N(mt−1, Ct−1) , then The one-step-ahead prior distribution for θt is θt|y1:t−1 ∼ N(at, Rt) where at = E(θt|y1:t−1) = Gtmt−1, Rt = V ar(θt|y1:t−1) = GtCt−1G′

t + Wt.

The one-step-ahead predictive distribution for yt is yt|y1:t−1 ∼ N(ft, Qt) where ft = E(yt|y1:t−1) = Ftat, Qt = V ar(yt|y1:t−1) = FtRtF ′

t + Vt.

The filtering distribution of θt is θt|y1:t ∼ N(mt, Ct) where mt = E(θt|y1:t) = at + RtF ′

tQ−1 t et,

Ct = V ar(θt|y1:t) = Rt − RtF ′

tQ−1 t FtRt,

where et = yt − ft is the forecast error.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 19 / 72

slide-20
SLIDE 20

State inference The model

Test model: yt = θt + vt θt = α + βθt−1 + wt vt

ind

∼ N(0, V ) wt

ind

∼ N(0, W) θ0 ∼ N(m0, C0)

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 20 / 72

slide-21
SLIDE 21

State inference The model

Assume θt−1|y1:t−1 ∼ N(mt−1, Ct−1) θt|y1:t−1 ∼ N(at, Rt) at = α + βmt−1 Rt = β2Ct−1 + W yt|y1:t−1 ∼ N(ft, Qt) ft = at Qt = Rt + V θt|y1:t ∼ N(mt, Ct) Ct = 1 Rt + 1 V −1 mt = Ct at Rt + yt V

  • Jarad Niemi (STAT615@ISU)

Sequential Monte Carlo October 20, 2017 21 / 72

slide-22
SLIDE 22

State inference The model

Kalman filter updating

Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval Data Mean 95% Interval

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 22 / 72

slide-23
SLIDE 23

State inference The model

Dynamic linear models are a rich class of models: Trend Seasonal Dynamic regression ARIMA Seeming unrelated time series equations Seemingly unrelated regression models Hierarchical DLMs Multivariate ARMA models Petris, Petrone, Campagnoli. (2009) Dynamic Linear Models with R. West and Harrison. (1997) Bayesian Forecasting and Dynamic Models.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 23 / 72

slide-24
SLIDE 24

State inference Approximate inference

Approximate inference

HMMs and DLMs are the main classes of models with closed form updating of the filtering distributions. Generally, no closed form expression exists and we must use an approximation. Numerical approximations

Extended Kalman filter Bound optimal filter Gaussian sum filter Quadrature filter

Monte Carlo approximations

Markov chain Monte Carlo (MCMC) Sequential Monte Carlo (SMC)

Bootstrap filter Auxiliary particle filter

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 24 / 72

slide-25
SLIDE 25

State inference Monte Carlo sampling

Suppose we want to approximate some density f(θ), e.g. p(θt|y1:t). Draw samples from f(θ).

n=100

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5

n=1,000

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5

n=10,000

−3 −2 −1 1 2 3 0.0 0.1 0.2 0.3 0.4 0.5

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 25 / 72

slide-26
SLIDE 26

State inference Monte Carlo sampling

Suppose Z ∼ N(0, 1) and we are trying to estimate P(Z > 1) ≈ 0.1586553. If Zi

ind

∼ N(0, 1), then P(Z > 1) ≈ 1 J

J

  • i=1

I(Zi > 1) is a standard MC approach. # Samples P(Z > 1) 10 0.10 100 0.15 1000 0.158 10000 0.1636 100000 0.15846

  • 1

2 3 4 5 0.10 0.11 0.12 0.13 0.14 0.15 0.16 Log_10 of number of samples P(Z>1) Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 26 / 72

slide-27
SLIDE 27

State inference MCMC

Sequential MCMC

Suppose we want to sample from p(θt+1|y1:t+1) =

  • p(θ0:t+1|y1:t+1)dθ0:t.

An MCMC approach says to iterate through draws of full conditionals, e.g. θs ∼ p(θs|y1:t, θ−s) = p(θs|ys, θs−1, θs+1) where θ−s indicates θ0:t with the s component removed and s = 0, 1, 2, . . . , t. Now, you just obtained yt+1. You need to redo the analysis, e.g. θs ∼ p(θs|ys, θs−1, θs+1) for s = 0, 1, 2, . . . , t, t + 1.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 27 / 72

slide-28
SLIDE 28

State inference Importance sampling

Importance sampling

Suppose we want to approximate some density f(θ), e.g. p(θt|y1:t), but we cannot simulate from f(θ). Draw θi

ind

∼ g(θ) and give each draw a weight wi = f(θi)

g(θi).

Cauchy−Normal Normal−Cauchy Normal−t_15 −5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0 −5.0 −2.5 0.0 2.5 5.0 0.0 0.5 1.0 1.5 2.0

x density variable

target proposal ratio

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 28 / 72

slide-29
SLIDE 29

State inference Importance sampling

Suppose we are trying to estimate E[θ] when θ ∼ t2. We draw samples from θi ∼ N(0, 0.52) and give a weight wi =

t2(θi) N(θi;0,0.52) to each sample.

5 10 15 20 −3 −2 −1 1 2 3 i E[X]

  • x_i

E[X] weights

  • Jarad Niemi (STAT615@ISU)

Sequential Monte Carlo October 20, 2017 29 / 72

slide-30
SLIDE 30

State inference Importance sampling

Suppose Z ∼ N(0, 1) and we are trying to estimate P(Z > 4.5) ≈ 3.398 × 10−6. If Zi ∼ N(0, 1), then P(Z > 4.5) ≈ 1 J

J

  • i=1

I(Zi > 4.5) is a standard MC approach. If J=100,000 usually the indicator function is all zeros.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 30 / 72

slide-31
SLIDE 31

State inference Importance sampling

Suppose Z ∼ N(0, 1) and we are trying to estimate P(Z > 4.5) ≈ 3.398 × 10−6. If θi ∼ Exp(1) + 4.5, then P(Z > 4.5) ≈ 1 J

J

  • i=1

N(θi; 0, 1) Exp(θi − 4.5; 1)I(θi > 4.5) is an importance sampling approach. # Samples P(Z > 4.5)[×10−6] 10 3.53 100 2.95 1000 3.32 10000 3.41 100000 3.41

  • 1

2 3 4 5 3.0 3.1 3.2 3.3 3.4 3.5 Log_10 of number of samples P(Z>4.5) [x10^−6] Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 31 / 72

slide-32
SLIDE 32

State inference Importance sampling

Importance sampling summary

Importance sampling summary: Importance sampling can be vastly superior to Monte Carlo sampling. When we are trying to estimate an entire density, we want the

tails of our proposal density to be heavier than our target density and the proposal density to be as close to the target density as possible.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 32 / 72

slide-33
SLIDE 33

State inference Sequential importance sampling

Suppose we have a general state-space model p(yt|θt) p(θt|θt−1) and a current filtered distribution p(θt−1|y1:t−1). Our goal is to approximate p(θt|y1:t). Let f(θt) = p(θt|y1:t) = p(yt|θt)p(θt|y1:t−1) p(yt|y1:t−1) ∝ p(yt|θt)p(θt|y1:t−1) g(θt) = p(θt|y1:t−1) =

  • p(θt|θt−1)p(θt−1|y1:t−1)dθt−1

f(θt) g(θt) ∝ p(yt|θt)p(θt|y1:t−1) p(θt|y1:t−1) = p(yt|θt) θ(i)

t−1

∼ p(θt−1|y1:t−1) θ(i)

t

∼ p(θt|θ(i)

t−1)

w(i)

t

∝ p(yt|θ(i)

t )

p(θt|y1:t) ≈

J

  • i=1

w(i)

t δθ(i)

t

The pair

  • w(i)

t , θ(i) t

  • is called a particle.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 33 / 72

slide-34
SLIDE 34

State inference Sequential importance sampling

Sequential importance sampling

Sequential importance sampling procedure:

  • 1. Suppose we have a particle approximation to our density at time

t − 1, i.e. p(θt−1|y1:t−1) ≈

J

  • i=1

w(i)

t−1δθ(i)

t−1.

  • 2. For i ∈ {1, 2, . . . , J}
  • a. Sample θ(i)

t

∼ p(θt|θ(i)

t−1)

  • b. Set w(i)

t

∝ w(i)

t−1p(yt|θ(i) t )

  • 3. We now have a particle approximation to our density at time t, i.e.

p(θt|y1:t) ≈

J

  • i=1

w(i)

t δθ(i)

t . Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 34 / 72

slide-35
SLIDE 35

State inference Sequential importance sampling

Sequential importance sampling (SIS)

2 4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles 2 4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • 2

4 6 8 10 −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 t xt

  • Data

Truth Particles

  • Jarad Niemi (STAT615@ISU)

Sequential Monte Carlo October 20, 2017 35 / 72

slide-36
SLIDE 36

State inference Sequential importance sampling

SIS summary

Sequential importance sampling (SIS) summary: Positives

As the number of particles J increases, the accuracy increases.

Negatives

Inference is dominated by a few particles with high weight Many particles are kept that are irrelevant

Why don’t we eliminate particles with low weight in favor of particles with large weight?

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 36 / 72

slide-37
SLIDE 37

State inference Sequential importance sampling with resampling

Let’s approximate p(θt−1|y1:t−1) by sampling with replacement proportional to the weights w(i)

t−1:

p(θt−1|y1:t−1) ≈      i 1 2 · · · J w(i)

t−1

0.02 0.05 · · · 0.03 θ(i)

t−1

1.91 0.63 · · · −0.12

  • p(θt−1|y1:t−1) ≈

     i 1 2 · · · J w(i)

t−1

1/J 1/J · · · 1/J θ(i)

t−1

0.63 0.63 · · · −0.12

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 37 / 72

slide-38
SLIDE 38

State inference Sequential importance sampling with resampling

(Gordon, Salmond, and Smith 1993)

Sequential importance sampling with resampling (SIR) procedure:

  • 1. Suppose we have a particle approximation to our density at time

t − 1, i.e. p(θt−1|y1:t−1) ≈

J

  • i=1

w(i)

t−1δθ(i)

t−1.

  • 2. For i ∈ {1, 2, . . . , J}
  • a. Sample j ∈ {1, 2, . . . , J} with probability w(j)

t−1

  • b. Sample θ(i)

t

∼ p(θt|θ(j)

t−1)

  • c. Set w(i)

t

∝ 1p(yt|θ(i)

t )

  • 3. We now have a particle approximation to our density at time t, i.e.

p(θt|y1:t) ≈

J

  • i=1

w(i)

t δθ(i)

t . Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 38 / 72

slide-39
SLIDE 39

State inference Sequential importance sampling with resampling

Sequential importance sampling with resampling (SIR)

2 4 6 8 10 1 2 3 t xt

  • Data

Truth Particles Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 39 / 72

slide-40
SLIDE 40

State inference Resampling (Douc, Capp´ e, and Moulines 2005)

Constraints on resampling: Number of resulting particles (J) is fixed Resulting weights are uniform (1/J) Number of repeats is unbiased (E[Nj] = Jw(j)) Schemes that meet these requirements: Multinomial sampling Residual sampling Stratified sampling Systematic sampling

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 40 / 72

slide-41
SLIDE 41

State inference Resampling

Multinomial sampling

  • 1. Draw U1, . . . , UJ

ind

∼ Unif(0, 1)

  • 2. Invert cumulative sum of weights

2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Index Cumulative sum of weights

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 41 / 72

slide-42
SLIDE 42

State inference Resampling

Residual sampling

  • 1. Keep nj = ⌊Jw(j)⌋ repeats of particle j
  • 2. Update remaining probability w(j)′ ∝ Jw(j) − nj
  • 3. Multinomial sampling on remaining J − J

j=1 nj particles with

probabilities w(j)′

2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Index Cumulative sum of weights

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 42 / 72

slide-43
SLIDE 43

State inference Stratified sampling

Stratified sampling

  • 1. Draw Uj

ind

∼ Unif

  • j−1

J , j J

  • for j = 1, 2, . . . , J
  • 2. Invert cumulative sum of weights

2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Index Cumulative sum of weights

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 43 / 72

slide-44
SLIDE 44

State inference Systematic sampling

Systematic sampling

  • 1. Draw U1 ∼ Unif(0, 1/J)
  • 2. Set Uj = Uj−1 + 1

J for j = 1, 2, . . . , J

  • 3. Invert cumulative sum of weights

2 4 6 8 10 0.0 0.4 0.8 Index Cumulative sum of weights

A counter example shows that this can be worse than stratified and residual sampling.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 44 / 72

slide-45
SLIDE 45

State inference When to resample

Resampling adds variability

2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Index Cumulative sum of weights

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 45 / 72

slide-46
SLIDE 46

State inference When to resample

Methods for determining when to resample Effective sample size ESS = J

  • i=1

(w(j))2 −1 Coefficient of variation CoV =

  • 1

J

J

  • i=1

(Jw(j) − 1)2 1/2 Entropy Ent = −

J

  • j=1

w(j) log2(w(j))

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 46 / 72

slide-47
SLIDE 47

State inference When to resample

Dynamic resampling

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 47 / 72

25 50 75 100 5 10 15 20

time Effective sample size

slide-48
SLIDE 48

State inference Avoiding resampling

Better than resampling would be to avoid the need altogether Resample-move (Gilks and Berzuini 2001) Auxiliary particle filter (Pitt and Shepherd 1999)

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 48 / 72

slide-49
SLIDE 49

State inference Avoiding resampling

Resample-move procedure:

  • 1. Suppose we have a particle approximation to our density at time

t − 1, i.e. p(θt−1|y1:t−1) ≈

J

  • i=1

1 J δθ(i)

t−1.

  • 2. For i ∈ {1, 2, . . . , J}
  • a. Sample j ∈ {1, 2, . . . , J} with probability proportional to p(yt|θ(j)

t−1)

  • b. Sample θ(i)

t

∼ p(θt|θ(j)

t−1)

  • 3. We now have a particle approximation to our density at time t, i.e.

p(θt|y1:t) ≈

J

  • i=1

1 J δθ(i)

t .

Evaluating p(yt|θt−1) requires solving the integral p(yt|θt−1) =

  • p(yt|θt)p(θt|θt−1)dθt.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 49 / 72

slide-50
SLIDE 50

State inference Avoiding resampling

Auxiliary particle filter (APF) procedure:

  • 1. Suppose we have a particle approximation to our density at time

t − 1, i.e. p(θt−1|y1:t−1) ≈

J

  • i=1

w(i)

t−1δθ(i)

t−1.

  • 2. For i ∈ {1, 2, . . . , J}
  • a. Sample j ∈ {1, 2, . . . , J} with prob. proportional to w(j)

t−1p(yt|µ(j) t )

  • b. Sample θ(i)

t

∼ p(θt|θ(j)

t−1)

  • c. Set w(i)

t

= p(yt|θ(i)

t

) p(yt|µ(j)

t

)

  • 3. We now have a particle approximation to our density at time t, i.e.

p(θt|y1:t) ≈

J

  • i=1

w(i)

t δθ(i)

t .

where µ(j)

t

is a point estimate of θ(j)

t , usually µ(j) t

= E[θt|θ(j)

t−1].

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 50 / 72

slide-51
SLIDE 51

State inference Avoiding resampling

Auxiliary Particle Filter (APF)

2 4 6 8 10 1 2 3 t xt

  • Data

Truth Particles Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 51 / 72

slide-52
SLIDE 52

State inference Summary

Summary of p(θt|yt, ψ): Avoid particle degeneracy if possible

Resample-move Auxiliary particle filter

When resampling

Use either stratified or residual and only resample when necessary, e.g. ESS

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 52 / 72

slide-53
SLIDE 53

State and parameter inference

Outline

  • 1. State-space models

p(y|θ, ψ)p(θ|ψ)

Definition Terminology Notation

  • 2. State inference p(θ|y, ψ)

Exact inference Importance sampling Sequential importance sampling Bootstrap filter - resampling Auxiliary particle filter

  • 3. State and parameter inference

p(θ, ψ|y)

Bootstrap filter Kernel density Sufficient statistics

  • 4. Advanced SMC

SMC-MCMC Fixed parameter SMC for marginal likelihood calculations

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 53 / 72

slide-54
SLIDE 54

State and parameter inference

Unknown fixed parameters

What if the fixed parameters are unknown? incorporate into state with degenerate evolutions, e.g. ψt = ψt−1 incorporate into state with evolutions, e.g. ψt = ψt−1 + ǫt use kernel density approximation to regenerate parameter values use sufficient statistics to regenerate parameter values use Markov chain Monte Carlo to regenerate parameter values

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 54 / 72

slide-55
SLIDE 55

State and parameter inference

Unknown fixed parameters

Our goal has changed slightly We assume p(θt−1, ψ|y1:t−1) is known and try to obtain p(θt, ψ|y1:t) using p(θt|θt−1, ψ) and p(yt|θt, ψ). Then, starting with p(θ0, ψ|y0) = p(θ0, ψ) we can find p(θt, ψ|y1:t) for all t.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 55 / 72

slide-56
SLIDE 56

State and parameter inference Resampling with fixed parameters

Resampling with fixed parameters

Modify SIR to include fixed parameters:

  • 1. Suppose we have a particle approximation to our density at time

t − 1, i.e. p(θt−1, ψ|y1:t−1) ≈

J

  • i=1

w(i)

t−1δ(θt−1,ψ)(i).

  • 2. For i ∈ {1, 2, . . . , J}
  • a. Sample j ∈ {1, 2, . . . , J} with probability w(j)

t−1.

  • b. Set ψ(i) = ψ(j).
  • c. Sample θ(i)

t

∼ p(θt|θ(j)

t−1, ψ(i)).

  • d. Set w(i)

t

∝ p(yt|θ(i)

t , ψ(i)).

  • 3. We now have a particle approximation to our density at time t, i.e.

p(θt, ψ|y1:t) ≈

J

  • i=1

w(i)

t δ(θt,ψ)(i).

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 56 / 72

slide-57
SLIDE 57

State and parameter inference Resampling with fixed parameters

Reampling with fixed parameters

Test model: yt = θt + vt θt = α + βθt−1 + wt vt

ind

∼ N(0, V ) wt

ind

∼ N(0, W) θ0 ∼ N(m0, C0) ψ = (α, β, V, W)

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 57 / 72

slide-58
SLIDE 58

State and parameter inference Resampling with fixed parameters

SIR with fixed parameters

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 58 / 72

2 4 6 8 10 −0.6 0.0 t 2 4 6 8 10 0.8 1.2 t 2 4 6 8 10 0.03 0.06 0.09 t 2 4 6 8 10 0.04 0.08 t

slide-59
SLIDE 59

State and parameter inference Kernel density estimation

Clearly we need to regenerate parameter values. One idea: Approximate our particle approximation by a kernel density approximation Draw new parameter values from this kernel density approximation p(ψ|y1:t−1) ≈

J

  • i=1

w(i)

t−1δψ(i)

J

  • i=1

w(i)

t−1N(aψ(i) + (1 − a)ψ, h2V )

where h2 = 1 − a2 = 1 − 3τ − 1 2τ 2 and ψ and V are the Monte Carlo estimate of the mean and covariance.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 59 / 72

slide-60
SLIDE 60

State and parameter inference Kernel density estimation

Kernel density approximation

0.99

  • 0.85
  • Jarad Niemi (STAT615@ISU)

Sequential Monte Carlo October 20, 2017 60 / 72

slide-61
SLIDE 61

State and parameter inference Sufficient statistics

Sufficient statistics

Another idea: Rather than storing draws of parameters, store sufficient statistics for the parameters. Suppose the model admits a sufficient statistic representation, i.e. p(θt, st, ψ|y1:t) = p(ψ|st)p(θt, st|y1:t). Then, each particle stores a distribution for the parameters and the sufficient statistics can be updated deterministically via st = S(st−1, θt, θt−1, yt).

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 61 / 72

slide-62
SLIDE 62

State and parameter inference Sufficient statistics

For example, in our model yt = θt + vt vt ∼ N(0, V ) θt = α + βθt−1 + wt wt ∼ N(0, W) if V |y1:t−1 ∼ IG(aV,t−1, bV,t−1) α, β, W|y1:t−1 ∼ N − IG(mt−1, St−1, aW,t−1, bW,t−1) st−1 = (aV,t−1, bV,t−1, mt−1, St−1, aW,t−1, bW,t−1) then V |y1:t ∼ IG(aV,t, bV,t) α, β, W|y1:t, θt, θt−1 ∼ N − IG(mt, St, aW,t, bW,t) st = (aV,t, bV,t, mt, St, aW,t, bW,t)

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 62 / 72

slide-63
SLIDE 63

State and parameter inference Sufficient statistics Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 63 / 72

slide-64
SLIDE 64

State and parameter inference Sufficient statistics Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 64 / 72

slide-65
SLIDE 65

State and parameter inference MCMC kernels

MCMC within SMC

Final idea for fixed parameter regeneration: Stop the SMC algorithm Perform one (or more) iterations of MCMC on each particle Pros

Does not affect SMC theory Will move fixed parameters around

Cons

Requires entire data and state history If not, introduces bias

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 65 / 72

slide-66
SLIDE 66

State and parameter inference Summary

Summary

Summary of p(θt, ψ|yt): Regenerating fixed parameters is necessary

When possible use sufficient statistics Otherwise use kernel density or MCMC step

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 66 / 72

slide-67
SLIDE 67

Advanced SMC

Outline

  • 1. State-space models

p(y|θ, ψ)p(θ|ψ)

Definition Terminology Notation

  • 2. State inference p(θ|y, ψ)

Exact inference Importance sampling Sequential importance sampling Bootstrap filter - resampling Auxiliary particle filter

  • 3. State and parameter inference

p(θ, ψ|y)

Bootstrap filter Kernel density Sufficient statistics

  • 4. Advanced SMC

SMC for marginal likelihood calculations Theoretical results for fixed parameters SMC to generate Metropolis proposals

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 67 / 72

slide-68
SLIDE 68

Advanced SMC Marginal likelihood calculations

Marginal likelihood calculations

In Bayesian data analysis, model comparison and hypothesis testing involves the marginal likelihood: p(y) =

  • p(y|ψ)p(ψ)dψ.

For example, Bayes’ factors depend on the marginal likelihood of for both models: BF(0 : 1) = p(y|M0) p(y|M1) =

  • p(y|ψ0)p(ψ0|M0)dψ0
  • p(y|ψ1)p(ψ1|M1)dψ1

. The marginal likelihood can be decomposed as p(y1:T ) =

T

  • t=1

p(yt|y1:t−1) This can be approximated using SMC methods by p(yt|y1:t−1) ≈ 1 J

J

  • i=1

˜ w(i)

t

where ˜ w(i)

t

are the unnormalized weights.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 68 / 72

slide-69
SLIDE 69

Advanced SMC Theoretical results

Theoretical results

For p(θ1:t|y1:t, ψ), we have E

  • h(θT:T−L)[ˆ

p(θT:T−L|y1:T ) − p(θT:T−L|y1:T )]dθT:T−L

  • p1/p

≤ c(L) √ J . As long as we are only interested in a fixed time into the past, we can use a constant number of particles and maintain accuracy. While for p(θ1:t, ψ|y1:t), we have E

  • h(ψ)[ˆ

p(ψ|y1:T ) − p(ψ|y1:T )]dψ

  • p1/p

≤ c(T) √ J . So, as time increases, we need more and more particles to maintain accuracy.

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 69 / 72

slide-70
SLIDE 70

Advanced SMC SMC-MCMC

MCMC within SMC

How about using SMC to generate proposal draws for the states within a Metropolis sampling scheme?

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 70 / 72

slide-71
SLIDE 71

Summary

  • f Summaries

Summary

If your goal is p(θt|y1:t, ψ), then

Analytically integrate anything you can Use a point estimate to reduce particle degeneracy Use stratified or residual resampling

If your goal is p(θt, ψ|y1:t), then

Analytically integrate anything you can Use kernel density approximation to regenerate particles Use lots of particles Stop every once in a while and run MCMC

Lots of areas for open research

Marginal likelihood calculation Theoretical results SMC for Metropolis proposals

Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 71 / 72

slide-72
SLIDE 72

Summary References References Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993), Novel approach to nonlinear/non-Gaussian Bayesian state estimation, IEEE Proceedings Part F: Communications, Radar and Signal Processing, 140, 107113. Pitt, M. K. and Shephard, N. (1999), Filtering via simulation: auxiliary particle filters, Journal of the American Statistical Association, 94, 590599. Liu, J. and West, M. (2001), Combined parameter and state estimation in simulation-based filtering, in Sequential Monte Carlo Methods in Practice, eds. A. Doucet, J. F. G. De Freitas, and N. J. Gordon, pp. 197217, Springer-Verlag, New York. Doucet, A., De Freitas, N., and Gordon, N. (2001), Sequential Monte Carlo Methods in Practice, Springer-Verlag, New York. Gilks, W. R. and Berzuini, C. (2001) “Following a Moving Target-Monte Carlo Inference for Dynamic Bayesian Models,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), Vol. 63, No. 1, 127-146 Fearnhead, P. (2002), Markov chain Monte Carlo, sufficient statistics, and particle filters, Journal of Computational and Graphical Statistics, 11, 848862. Storvik, G. (2002), Particle filters in state space models with the presence of unknown static parameters, IEEE Transactions on Signal Processing, 50, 281289.

  • R. Douc, O. Capp, and E. Moulines, (2005) “Comparison of Resampling Schemes for Particle Filtering,” In 4th

International Symposium on Image and Signal Processing and Analysis (ISPA), Zagreb Computation R::KFAS R::dlm R::pomp R::SMC (not updated sinced 2011-12-11) R::smcUtils LiBbi libbi.org/ (RBi https://github.com/pierrejacob/RBi) Jarad Niemi (STAT615@ISU) Sequential Monte Carlo October 20, 2017 72 / 72