Second-order particle MCMC for Bayesian parameter inference IFAC - - PowerPoint PPT Presentation

second order particle mcmc for bayesian parameter
SMART_READER_LITE
LIVE PREVIEW

Second-order particle MCMC for Bayesian parameter inference IFAC - - PowerPoint PPT Presentation

Second-order particle MCMC for Bayesian parameter inference IFAC World Congress 2014, Cape Town, South Africa, August 28, 2014. Johan Dahlin johan.dahlin@liu.se Division of Automatic Control, Linkping University, Sweden. AUTOMATIC CONTROL


slide-1
SLIDE 1

Second-order particle MCMC for Bayesian parameter inference

IFAC World Congress 2014, Cape Town, South Africa, August 28, 2014.

Johan Dahlin

johan.dahlin@liu.se

Division of Automatic Control, Linköping University, Sweden.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-2
SLIDE 2

This is collaborative work with

  • Dr. Fredrik Lindsten (University of Cambridge, United Kingdom)
  • Prof. Thomas B. Schön (Uppsala University, Sweden)

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-3
SLIDE 3

Summary

Aim

Efficient Bayesian parameter inference in nonlinear SSMs.

Methods

Markov chain Monte Carlo. Sequential Monte Carlo methods.

Contributions

Efficient estimation of first/second order information. Inclusion of first/second order in the proposal. PMH1 and PMH2.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-4
SLIDE 4

Example: Earthquakes between 1900 and 2013

1900 1920 1940 1960 1980 2000 2020 5 10 15 20 25 30 35 40 Year Number of major earthquakes Number of earthquakes Density 10 20 30 40 50 0.00 0.02 0.04 0.06 5 10 15 20 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-5
SLIDE 5

Example: A simple model of annual earthquake counts

xt+1|xt ∼ N

  • xt+1; φxt, σ2

, yt|xt ∼ P

  • yt; β exp(xt)
  • .

Task: Estimate π(θ) = p(θ|y1:T ) ∝ pθ(y1:T )p(θ) with θ = {φ, σ}.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-6
SLIDE 6

Metropolis-Hastings algorithm

Propose Compute

  • acc. prob

Accept or reject?

  • Propose: θ′ ∼ q(θ′|θk−1).
  • Compute acceptance probability:

α(θ′, θk−1) = min

  • 1,

π(θ′) π(θk−1) q(θk−1|θ′) q(θ′|θk−1)

  • Accept or reject? θ′ → θk w.p. α(θ′, θk−1).

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-7
SLIDE 7

Particle Metropolis-Hastings algorithm (cont.)

Propose Compute

  • acc. prob

Accept or reject?

  • Propose: θ′ ∼ q(θ′|θk−1, u′) and u′ ∼ PF(θ′) .
  • Compute

pθ′(y1:T |u′) and the acceptance probability: α(θ′, θk−1) = 1 ∧ p(θ′) p(θk−1)

  • pθ′(y1:T |u′)
  • pθk−1(y1:T |uk−1)

q(θk−1|θ′, u′) q(θ′|θk−1, uk−1).

  • Accept or reject? θ′ → θk and u′ → uk w.p. α(θ′, θk−1).

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-8
SLIDE 8

Particle filtering

Resampling Propagation Weighting

Given the particle system (the random variables u) u =

  • x(i)

1:T , w(i) 1:T

N

i=1,

we can obtain (consistent) estimates of:

  • the likelihood pθ(y1:T ).
  • the first and second order information of π(θ).

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-9
SLIDE 9

Zeroth order proposal (PMH0)

Gaussian random walk θ′′ = θ′ + ǫz, z ∼ N(z; 0, 1). gives the zeroth order (marginal) proposal q(θ′′|θ′, u′) = N

  • θ′′; θ′, ǫ2Id
  • .

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-10
SLIDE 10

Example: Parameter inference in earthquake model

φ σ 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5

  • Iteration: 2
  • 0.0

0.5 1.0 1.5 2.0 −400 −380 −360 −340 −320 −300 Iteration Log−likelihood

  • φ

Marginal posterior density 0.4 0.5 0.6 0.7 0.8 0.9 1.0 2 4 6 8

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-11
SLIDE 11

Example: Parameter inference in earthquake model

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-12
SLIDE 12

Example: Parameter inference in earthquake model

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-13
SLIDE 13

Example: Parameter inference in earthquake model

φ Density 0.70 0.75 0.80 0.85 0.90 0.95 1.00 2 4 6 8 10 σ Density 0.00 0.05 0.10 0.15 0.20 0.25 0.30 5 10 15 20

φ σ Posterior mean 0.86 0.15 Posterior median 0.86 0.15 Posterior mode 0.90 0.14

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-14
SLIDE 14

Example: State inference in the earthquake model

1900 1920 1940 1960 1980 2000 2020 10 20 30 40 50

Year Number of major earthquakes

1900 1920 1940 1960 1980 2000 2020 −1.0 −0.5 0.0 0.5 1.0

Year Estimated latent state

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-15
SLIDE 15

Fixed-lag particle smoothing: overview

The first and second order information can be estimated using u =

  • x(i)

1:T , w(i) 1:T

N

i=1,

and the fixed-lag particle smoother approximation,

  • pθ(❞xt:t−1|y1:T ) ≈

pθ(❞xt:t−1|y1:κt), κt = min{T, t + ∆}, with no additional computational complexity.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-16
SLIDE 16

Fixed-lag particle smoothing: motivation

2 4 6 8

  • 60
  • 40
  • 20

20 40 60 time state AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-17
SLIDE 17

First order proposal (PMH1)

Noisy gradient-based ascent update θ′′ = θ′ + ǫ2 2 S(θ′) + ǫz, z ∼ N(z; 0, 1), with the first order information S(θ′)= ∇θ log π(θ)

  • θ=θ′,

gives the first order proposal q(θ′′|θ′, u′) = N

  • θ′′; θ′ + ǫ2

2

  • S(θ′|u′), ǫ2Id
  • .

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-18
SLIDE 18

Second order proposal (PMH2)

Noisy Newton update θ′′ = θ′ + ǫ2 2

  • J (θ′)

−1S(θ′) + ǫ

  • J (θ′)

−1/2z, z ∼ N(z; 0, 1), with the second order information J (θ′)= −∇2

θ log π(θ)

  • θ=θ′,

gives the second order proposal q(θ′′|θ′, u′) = N

  • θ′′; θ′ + ǫ2

2

  • S(θ′|u′)

J (θ′|u′) −1, ǫ2 J (θ′|u′) −1

  • .

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-19
SLIDE 19

Example: Parameter inference in earthquake model

PMH0

φ σ 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5

  • PMH1

φ σ 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5

  • PMH2

φ σ 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-20
SLIDE 20

Example: Parameter inference in earthquake model

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-21
SLIDE 21

Integrated autocorrelation time

10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF PMH0: φ 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF PMH1: φ 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF PMH2: φ 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF PMH0: σ 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF PMH1: σ 10 20 30 40 50 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF PMH2: σ AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-22
SLIDE 22

Integrated autocorrelation time (cont.)

IACT: the number of iterations between two uncorrelated samples. Acceptance rate max IACT PMH0 0.47 31.82 PMH1 0.38 21.38 PMH2 0.54 14.15

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-23
SLIDE 23

Conclusions

Results

Shorter burn-in phase. Increased efficiency (about 2 times). Simplified tuning. Retained linear computational complexity in N.

Methods

Include u into the proposal. Particle fixed-lag smoothing (almost for free). Laplace approximation / Random walk on a Riemann manifold.

Future work

Non-reversible Markov chains. Adaptive methods. Approximate Bayesian computations.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-24
SLIDE 24

Thank you for your attention!

Questions, comments and suggestions are most welcome.

Further developments

Extended version available at arXiv.

The paper and code to replicate the results within it are found at: http://work.johandahlin.com/.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-25
SLIDE 25

Particle Metropolis-Hastings algorithm

The target distribution is given by the parameter proposal π(θ) = pθ(y1:T )p(θ) p(y1:T ) . An unbiased estimator of the likelihood is given by Em

  • pθ(y1:T |u)
  • =
  • pθ(y1:T |u)mθ(u) ❞u = pθ(y1:T ).

An extended target is given by π(θ, u) = pθ(y1:T |u)mθ(u)p(θ) p(y1:T ) = pθ(y1:T |u)mθ(u)π(θ) pθ(y1:T ) .

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-26
SLIDE 26

Particle Metropolis-Hastings algorithm (cont.)

  • π(θ, u) ❞u =

pθ(y1:T |u)mθ(u)π(θ) pθ(y1:T ) ❞u = π(θ) pθ(y1:T )

  • pθ(y1:T |u)mθ(u) ❞u
  • =pθ(y1:T )

, = π(θ). That is, the marginal is the desired target distribution and the Markov chain is kept invariant.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-27
SLIDE 27

(bootstrap) Particle filtering

Resampling Propagation Weighting

  • Resampling: P(a(i)

t

= j) = w(j)

t−1 and set

x(i)

t−1 = xa(i)

t

t−1.

  • Propagation: x(i)

t

∼ Rθ

  • xt
  • x(i)

t−1

  • = fθ(xt|

x(i)

t−1).

  • Weighting: w(i)

t

= Wθ

  • x(i)

t ,

x(i)

t−1

  • = gθ(yt|xt).

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-28
SLIDE 28

Likelihood estimation using the APF

The likelihood for an SSM can be decomposed by L(θ) = pθ(y1:T ) = pθ(y1)

T

  • t=2

pθ(yt|y1:t−1), where the one-step ahead predictor can be computed by pθ(yt|y1:t−1) =

  • fθ(xt|xt−1)gθ(yt|xt)pθ(xt−1|y1:t−1) ❞xt

=

  • Wθ(xt|xt−1)Rθ(xt|xt−1)pθ(xt−1|y1:t−1) ❞xt.

pθ(yt|y1:t−1) ≈ 1 N

N

  • i=1
  • Wθ(xt|xt−1)δx(i)

t ,

x(i)

t−1 ❞xt = 1

N

N

  • i=1

w(i)

t .

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-29
SLIDE 29

Fixed-lag particle smoothing (cont.)

Assume that pθ(xt|y1:T ) ≈ pθ(xt|y1:κt), κt = min{T, t + ∆}, for some 0 ≤ ∆ ≤ T. It follows that

  • pθ(xt−1:t|y1:T ) =

N

  • i=1
  • w(i)

κt δ x(i)

t−1:t,κt

(❞xt−1:t) which can be used to estimate the gradient and Hessian information about the log-target.

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-30
SLIDE 30

Score estimation using the FL smoother

The score can be estimated using Fisher’s identity given by ∇θ log pθ(y1:T )

  • θ=θ′ =
  • ∇θ log pθ(x1:T , y1:T )pθ′(x1:T |y1:T )❞x1:T

  • ∇θ log pθ(x1:T , y1:T )

pθ′(x1:T |y1:T )❞x1:T We also know that ∇θ log pθ(x1:T , y1:T ) =

T

  • t=1
  • ∇θ log fθ(xt|xt−1) + ∇θ log gθ(yt|xt)
  • η(xt,xt−1)

, which gives ∇θ log pθ(y1:T )

  • θ=θ′ ≈

T

  • t=1

N

  • i=1
  • w(i)

κt η(

x(i)

t−1,κt,

x(i)

t,κt).

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET

slide-31
SLIDE 31

Mixing

Let ϕ(θ) denote a test function, then √ M

  • ϕMH − E[ϕ(θ)]
  • d

− → N(0, σ2

ϕ).

Here, σ2

ϕ depends on the integrated autocorrelation time (IACT)

IACT(θ1:M) = 1 + 2

  • k=1

ρk(θ1:M).

AUTOMATIC CONTROL REGLERTEKNIK LINKÖPINGS UNIVERSITET