Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon - - PowerPoint PPT Presentation

par cle metropolis has ngs
SMART_READER_LITE
LIVE PREVIEW

Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon - - PowerPoint PPT Presentation

work@johandahlin.com Parcle Metropolis-Hasngs Johan Dahlin Department of Informaon Technology, Uppsala University, Sweden. September 28, 2016 This is collaborave work together with Dr. Fredrik Lindsten (Uppsala University,


slide-1
SLIDE 1

Parcle Metropolis-Hasngs

Johan Dahlin

work@johandahlin.com

Department of Informaon Technology, Uppsala University, Sweden.

September 28, 2016

slide-2
SLIDE 2

This is collaborave work together with

  • Dr. Fredrik Lindsten (Uppsala University, Sweden).
  • Prof. Thomas Schön (Uppsala University, Sweden).

Andreas Svensson (Uppsala University, Sweden).

slide-3
SLIDE 3

What are we going to do?

  • Give a (hopefully) gentle introducon to (P)MCMC.
  • Develop some intuion for PMH and its pros/cons.

Why are we doing this?

  • PMH is general algorithm for Bayesian inference.
  • Relavely simple to implement and tune.

How will we do this?

  • Employ intuion and analogues with opmisaon.
  • Invesgate PMH using simulaons and not maths.
  • Illustrate PMH by water tank benchmark.
  • By asking quesons.
slide-4
SLIDE 4

Mapping a lake

slide-5
SLIDE 5

State space models

Markov chain [X0:T , Y1:T ] with Xt ∈ X = R, Yt ∈ Y = R, t ∈ N. · · · xt−1 xt xt+1 · · · · · · yt−1 yt yt+1 · · ·

x0 ∼ µθ(x0) xt+1|xt ∼ fθ(xt+1|xt), yt|xt ∼ gθ(yt|xt). Example: linear Gaussian SSM (θ = [µ, φ, σv, σe]): xt+1|xt ∼ N

  • xt+1; µ + φ(xt − µ), σ2

v

  • ,

yt|xt ∼ N

  • yt; xt, σ2

e

  • .
slide-6
SLIDE 6

Bayesian parameter inference

  • 4
  • 2

2 4 θ

  • 4
  • 2

2 4 θ

  • 4
  • 2

2 4 θ

π(θ) = p(θ|y) ∝ p(y|θ)p(θ), π[ϕ] = Eπ

  • ϕ(θ)
  • =
  • ϕ(θ′)π(θ′) dθ′.
slide-7
SLIDE 7

Exploring posteriors by Markov chains

slide-8
SLIDE 8

Markov chains: basic properes

A sequence of random variables {Xk}K

k=0 with the property

P[Xk ∈ A|x0:k−1] = P[Xk ∈ A|xk−1] =

  • A

R(xk−1, xk) dxk. We will consider ergodic chains with the properes: Reach any point (irreducible) No cycles (aperiodic) Does not get stuck (recurrent)

slide-9
SLIDE 9

Markov chains: staonary distribuon

θk|θk−1 ∼ N

  • θk; µ + φ(θk−1 − µ), σ2

.

slide-10
SLIDE 10

Metropolis-Hasngs: algorithm

Inialise in θ0 and then generate samples {θk}K

k=1 from p(θ|y) by

(i) Sample a candidate parameter θ′ by θ′ ∼ N(θ′; θk−1, Σ). (ii) Accept θ′ by seng θk ← θ′ with probability min

  • 1,

p(θ′|y) p(θk−1|y)

  • = min
  • 1,

p(θ′) p(θk−1) p(y|θ′) p(y|θk−1) p(y) p(y)

  • and otherwise reject θ′ by seng θk ← θk−1.

User choices: K and Σ.

slide-11
SLIDE 11

Metropolis-Hasngs: toy example

  • 6
  • 4
  • 2

2 4 6

  • 6
  • 4
  • 2

2 4 6 q1 q2 0.6 0.8 1.0 1.2 1.4

  • 400
  • 300
  • 200
  • 100

iteration log-posterior at current state q2 marginal posterior density

  • 6
  • 4
  • 2

2 4 6 0.0 0.1 0.2 0.3 0.4 0.5

slide-12
SLIDE 12

Metropolis-Hasngs: toy example

  • 6
  • 4
  • 2

2 4 6

  • 6
  • 4
  • 2

2 4 6 q1 q2 1.0 1.2 1.4 1.6 1.8 2.0

  • 400
  • 300
  • 200
  • 100

iteration log-posterior at current state q2 marginal posterior density

  • 6
  • 4
  • 2

2 4 6 0.0 0.1 0.2 0.3 0.4 0.5

slide-13
SLIDE 13

Metropolis-Hasngs: toy example

  • 6
  • 4
  • 2

2 4 6

  • 6
  • 4
  • 2

2 4 6 q1 q2 10 20 30 40 50

  • 400
  • 300
  • 200
  • 100

iteration log-posterior at current state q2 marginal posterior density

  • 6
  • 4
  • 2

2 4 6 0.0 0.1 0.2 0.3 0.4 0.5

slide-14
SLIDE 14

Metropolis-Hasngs: proposal and mixing

500 1000 1500 2000 2500

  • 6
  • 4
  • 2

2 4 6 q1 trace 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF of q1 500 1000 1500 2000 2500

  • 6
  • 4
  • 2

2 4 6 q2 trace 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF of q2

slide-15
SLIDE 15

Metropolis-Hasngs: proposal and mixing

  • 6
  • 4
  • 2

2 4 6

  • 6
  • 4
  • 2

2 4 6 q1 q2 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF of q1 50 100 150 200 0.0 0.2 0.4 0.6 0.8 1.0 lag ACF of q2

slide-16
SLIDE 16

Approximang the target by parcles

slide-17
SLIDE 17

Mapping a stormy lake

slide-18
SLIDE 18

Parcle Metropolis-Hasngs (PMH)

Inialise in θ0 and then generate samples {θk}K

k=1 from p(θ|y) by

(i) Sample a candidate parameter θ′ by θ′ ∼ N(θ′; θk−1, Σ). (ii) Run a parcle filter with N parcles to esmate pN(θ′|y). (iii) Accept θ′ by seng θk ← θ′ with probability min

  • 1,
  • pN(θ′|y)
  • pN(θk−1|y)
  • ,

and otherwise reject θ′ by seng θk ← θk−1. User choices: K, Σ and N.

slide-19
SLIDE 19

Water tank example: model

Consider the model ˙ x1(t) = −k1

  • x1(t) + k4u(t) + w1(t),

˙ x2(t) = k2

  • x1(t) − k3
  • x2(t) + w2(t),

y(t) = x2(t) + e(t), where w1(t), w2(t), e(t) are independent Gaussian and The parameters are θ = {k1, k2, k3, k4, . . .} with p(θ) ∝ 1.

slide-20
SLIDE 20

Water tank example: data

200 400 600 800 1000 2 4 6 8 time input (u) 200 400 600 800 1000 2 4 6 8 10 time

  • bservation (y)
slide-21
SLIDE 21

Water tank example: parameter posteriors

k1 marginal posterior density 0.06 0.07 0.08 0.09 0.10 20 40 60 80 100 120 k2 marginal posterior density

  • 0.01

0.00 0.01 0.02 0.03 20 40 60 80 100 120 k3 marginal posterior density

  • 0.01

0.00 0.01 0.02 0.03 0.04 20 40 60 80 100 120 k4 marginal posterior density 0.60 0.62 0.64 0.66 0.68 0.70 10 20 30 40 50

slide-22
SLIDE 22

Water tank example: state predicons

200 400 600 800 1000 2 4 6 8 time input (u) 200 400 600 800 1000 2 4 6 8 10 time

  • bservation (y)
slide-23
SLIDE 23

Improving the PMH algorithm

slide-24
SLIDE 24

Water tank example: trace plots

2000 4000 6000 8000 0.06 0.07 0.08 0.09 0.10 time trace of k1 2000 4000 6000 8000

  • 0.005

0.005 0.015 0.025 time trace of k2 2000 4000 6000 8000

  • 0.01

0.00 0.01 0.02 0.03 0.04 time trace of k3 2000 4000 6000 8000 0.60 0.62 0.64 0.66 0.68 0.70 time trace of k4

slide-25
SLIDE 25

What are some open quesons?

  • Decreasing computaonal me when T is large.

Correlang and improving the parcle filter.

  • Obtaining beer mixing when p = |θ| is large (>5).

Add gradient and Hessian informaon into proposal.

  • Devising beer mixing when nx = |x| is large (>10).

Improving the parcle filter.

  • Decrease the amount of tuning by the user.

Adapve algorithms and rules-of-thumb.

slide-26
SLIDE 26

What did we do?

  • Gave a (hopefully) gentle introducon to (P)MCMC.
  • Developed some intuion for PMH and its pros/cons.

Why did we do this?

  • PMH is general algorithm for Bayesian inference.
  • Relavely simple to implement and tune.

What are you going to do now?

  • Remember that the PMH algorithm exist.
  • Learning more by reading our tutorial.
  • Try to implement the algorithm yourself.
slide-27
SLIDE 27

Complete tutorial on PMH is available at arXiv:1511.01707.

slide-28
SLIDE 28

Thank you for listening

Comments, suggesons and/or quesons? Johan Dahlin

work@johandahlin.com work.johandahlin.com

Remember: the tutorial is available at arXiv:1511.01707

slide-29
SLIDE 29

Parcle filtering [I/II]

An instance of sequenal Monte Carlo (SMC) samplers. Esmates E

  • ϕ(xt)|y1:t
  • and pθ(y1:T ).

Computaonal cost of order O(NT) (with N ∼ T). Well-understood stascal properes.

(unbiasedness, large deviaon inequalies, CLTs)

References:

  • A. Doucet and A. Johansen, A tutorial on parcle filtering and smoothing. In D. Crisan and B. Rozovsky

(editors), The Oxford Handbook of Nonlinear Filtering. Oxford University Press, 2011.

  • O. Cappé, S.J. Godsill and E. Moulines, An overview of exisng methods and recent advances in sequenal

Monte Carlo. In Proceedings of the IEEE 95(5), 2007.

slide-30
SLIDE 30

Parcle filtering [II/II]

Resampling Propagaon Weighng

By iterang: Resampling: P(a(i)

t

= j) = w(j)

t−1,

for i, j = 1, . . . , N. Propagaon: x(i)

t

∼ fθ

  • xt
  • xa(i)

t

t−1

  • ,

for i = 1, . . . , N. Weighng: w(i)

t

= gθ

  • yt
  • x(i)

t

  • ,

for i = 1, . . . , N. We obtain the parcle system

  • x(i)

0:T , w(i) 0:T

N

i=1.

slide-31
SLIDE 31

Parcle filtering: state esmaon

2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 x density 2 4 6 8 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 x density

  • ϕN

t

E

  • ϕ(xt)|y1:t
  • =

N

  • i=1

w(i)

t ϕ

  • x(i)

t

  • ,

√ N

  • ϕt −

ϕN

t

  • d

− → N

  • 0, σ2

t (ϕ)

  • .
slide-32
SLIDE 32

Parcle filtering: likelihood esmaon

error in the log-likelihood estimate density estimate

  • 1.0
  • 0.5

0.0 0.5 1.0 0.0 0.5 1.0 1.5 2.0

  • 3
  • 2
  • 1

1 2 3

  • 1.0
  • 0.5

0.0 0.5 1.0 standard Gaussian quantiles sample quantiles

log pθ(y1:T )

  • ℓ(θ)

=

T

  • t=1

log N

  • i=1

w(i)

t

  • − T log N,

√ N

  • ℓ(θ) −

ℓ(θ) + σ2

  • π

2N

  • d

− → N

  • 0, σ2
  • π
  • .
slide-33
SLIDE 33

Parcle Metropolis-Hasngs [I/III]

Propose Compute

  • acc. prob

Accept

  • r reject?
  • Propose: θ′ ∼ q(θ′|θk−1, u′) and u′ ∼ PF(θ′) .
  • Compute

pθ′(y1:T |u′) and the acceptance probability: α(θ′, θk−1) = 1 ∧ p(θ′) p(θk−1)

  • pθ′(y1:T |u′)
  • pθk−1(y1:T |uk−1)

q(θk−1|θ′, u′) q(θ′|θk−1, uk−1).

  • Accept or reject? θ′ → θk and u′ → uk w.p. α(θ′, θk−1).
slide-34
SLIDE 34

Parcle Metropolis-Hasngs [II/III]

The target distribuon is given by the parameter proposal π(θ) = pθ(y1:T )p(θ) p(y1:T ) . An unbiased esmator of the likelihood is given by Em

  • pθ(y1:T |u)
  • =
  • pθ(y1:T |u)mθ(u) du = pθ(y1:T ).

An extended target is given by π(θ, u) = pθ(y1:T |u)mθ(u)p(θ) p(y1:T ) = pθ(y1:T |u)mθ(u)π(θ) pθ(y1:T ) .

slide-35
SLIDE 35

Parcle Metropolis-Hasngs [III/III]

  • π(θ, u) du =

pθ(y1:T |u)mθ(u)π(θ) pθ(y1:T ) du = π(θ) pθ(y1:T )

  • pθ(y1:T |u)mθ(u) du
  • =pθ(y1:T )

, = π(θ). That is, the marginal is the desired target distribuon and the Markov chain is kept invariant.