Sequential Monte Carlo Algorithms for Bayesian Sequential Design Dr - - PowerPoint PPT Presentation

sequential monte carlo algorithms for bayesian sequential
SMART_READER_LITE
LIVE PREVIEW

Sequential Monte Carlo Algorithms for Bayesian Sequential Design Dr - - PowerPoint PPT Presentation

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References Sequential Monte Carlo Algorithms for Bayesian Sequential Design Dr Chris Drovandi Queensland University of Technology c.drovandi@qut.edu.au


slide-1
SLIDE 1

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Sequential Monte Carlo Algorithms for Bayesian Sequential Design

Dr Chris Drovandi Queensland University of Technology c.drovandi@qut.edu.au Collaborators: James McGree, Tony Pettitt, Gentry White Acknowledgements: Australian Research Council Discovery Grant and organisers of MCMski January 6, 2014

Chris Drovandi MCMski 2014, Chamonix, France

slide-2
SLIDE 2

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Sequential Experimental Design

Adaptive decisions as new data are collected More robust to parameter and model uncertainty Natural to use Bayesian framework. Posterior becomes new prior Next decision obtained by looking forward to all future decisions (backward induction) Simplified by myopic design (one-at-a-time) Next design point dt+1 = arg max U(d|

y1: t, d1: t). y1: t

collected data at design

d1:
  • t. U is utility function

Chris Drovandi MCMski 2014, Chamonix, France

slide-3
SLIDE 3

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Why SMC for Bayesian sequential design?

Much more efficient than MCMC. Simple re-weighting step to incorporate new information. Parallel implementation possible (e.g. GPU) Increase in efficiency allows comparisons of utility functions Convenient estimation of important Bayesian utility functions (e.g. mutual information) Decisions in real time?

Chris Drovandi MCMski 2014, Chamonix, France

slide-4
SLIDE 4

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

SMC For One static Model m

Sample from sequence of targets Data annealing here πt(θ

m| y1: t, d1: t) = f ( y1: t|θ m, d1: t)π(θ m)/Zm,t, for t = 1, . . . , T. y1: t (independent) data up to t, d1: t design points up to t, θ m

parameter for model m. f is likelihood, π prior, πt posterior f (

y1: t|m, d1: t) = Zm,t =
  • θ
m

f (

y1: t|θ m, d1: t)π(θ m)dθ m.

SMC: Generate a weighted sample (particles) for each target in the sequence via steps

Reweight: particles as data comes in (efficient) Resample: when ESS small Mutation: diversify duplicated particles (can be efficient)

Chris Drovandi MCMski 2014, Chamonix, France

slide-5
SLIDE 5

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

SMC For One STATIC Model m (Algorithm) Chopin (2002)

Have current particles {W i

t , θ

i t}N

i=1 based on data

y1: t

Re-weight step to included yt+1 W i

t+1 ∝ W i t f (yt+1|θ

i t, dt+1).

Check effective sample size: ESS = 1/ N

i=1(W i t+1)2

If ESS > E (e.g. E = N/2) go back to re-weight step for next

  • bservation

If ESS < E do the following Resample proportional to weights. Duplicates good particles Mutation: Move all particles via MCMC kernel say R times (adaptive proposal)

Chris Drovandi MCMski 2014, Chamonix, France

slide-6
SLIDE 6

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

SMC Estimate of Evidence Del Moral et al (2006)

It can be shown Zt+1/Zt = f (yt+1|

y1: t, dt+1) =
  • θ

f (yt+1|θ, dt+1)π(θ|

y1: t, d1: t)dθ.

Using SMC particles to approximate posterior at t gives estimator Zt+1/Zt ≈

N

  • i=1

W i

t f (yt+1|θ

i t, dt+1).

Can then obtain approximation of Zt+1 through Zt+1 Z0 = Zt+1 Zt Zt Zt−1 · · · Z1 Z0 . Also gives estimate of posterior predictive probability of yt+1

Chris Drovandi MCMski 2014, Chamonix, France

slide-7
SLIDE 7

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Advantage 1: Efficiently comparing utilities (Drovandi et al 2013)

Discrete data (binary) example Need to compute utility for all possible d (then for all t = 1, . . . , T) U(d|

y1: t, d1: t) =
  • z∈{0,1}

f (z|

y1: t, d1: t, d)U(d, z| y1: t, d1: t).

The whole process requires the computation (sampling) of many many posterior distributions SMC (IS) to rescue. Pretend z is the ‘next’ observation collected at design d. Simple re-weight to incorporate this

  • bservation.

Use weighted sample to estimate U(d, z). SMC also provides estimate of posterior predictive. There may be different choices for U(d, z), want to compare. Need to simulate the design many times.

Chris Drovandi MCMski 2014, Chamonix, France

slide-8
SLIDE 8

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

The Design Problem

Estimating Maximum tolerated dose, minimum effective dose (clinical trials) E[Yt] = g−1(ηt) where ηt = θ0 + θ1 dλ

t − 1

λ , where dt is the dose assigned to the tth subject. Yt ∼ Binary(E[Yt]). Uninformative prior π(θ0, θ1, λ) ∝ N(θ0; 0, 100)N(θ1; 0, 100)U(λ; 0, 1)1

  • λη∗ − θ0

θ1 + 1 > 0

  • ,

Objective is precise estimation of d∗ =

  • λT η∗ − θT

θT

1

+ 1 1/λT . Let θ = (θ0, θ1, λ).

Chris Drovandi MCMski 2014, Chamonix, France

slide-9
SLIDE 9

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

The Utility Functions

Possible choices for U(d, z|

y1: t, d1: t)

1 Posterior precision of d∗ (Natural choice, but posterior of d∗

can be unstable when little information is available)

2 Kullback-Leibler Divergence between prior and posterior for θ 3 Determinant of Posterior Covariance matrix of θ 4 Hybrid utility. Utility 3 for 10 subjects. Utility 1 thereafter.

Chris Drovandi MCMski 2014, Chamonix, France

slide-10
SLIDE 10

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Some Results

20 40 60 80 100 0.5 1 1.5

Subject D

*

prec kld D−post hybrid

Figure: Distributions of the estimated target stimulus over 10 to 100 subjects for the true parameter configuration of θ

T = (0,3,1) producing

d∗ = 0.538. Solid horizontal line is the true d∗. Shown are the 2.5%, 50% and 97.5% quantiles over the 500 runs for each utility function.

Chris Drovandi MCMski 2014, Chamonix, France

slide-11
SLIDE 11

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Advantage 2: Estimating Difficult Utilities

E.g. Mutual Information for Model Discrimination (Drovandi et al 2014) Have set of K proposed models M = 1, . . . , K. Select design to maximise ability to discriminate between models Consider mutual information between model indicator M and predicted observation Z for yt+1 Box and Hill (1967). I(M; Z|

y1: t, d) = H(M| y1: t) − H(M|Z; y1: t, d).

Therefore U(d|

y1: t) = −H(M|Z; y1: t, d) which is equal to

U(d|

y1: t) =

K

  • m=1

π(m|

y1: t)
  • f (z|m,
y1: t, d) log π(m| y1: t, z, d)dµ(z),

Chris Drovandi MCMski 2014, Chamonix, France

slide-12
SLIDE 12

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

SMC for multiple models

Effectively run an SMC algorithm for each model m = 1, . . . , K Have set of N particles for each model {W i

m,t, θ

i m, t}N

i=1.

ESS for each model m resampling and within-model updates when required Design part: use data up to t,

y1: t, and particles of all models

to compute the next design dt+1

Chris Drovandi MCMski 2014, Chamonix, France

slide-13
SLIDE 13

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Estimating the Utility

U(d|

y1: t) =

K

  • m=1

π(m|

y1: t)
  • f (z|m,
y1: t, d) log π(m| y1: t, z, d)dµ(z),

Borth (1975) notes difficult computation SMC to the rescue. Potential observation z at potential design point d. Pretend this the observation for yt+1.

Chris Drovandi MCMski 2014, Chamonix, France

slide-14
SLIDE 14

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Estimating the Utility (cont...)

Estimate predictive probability using weights wi

m,t(d, z) = W i m,tf (z|θ

i m, t, d),

ˆ f (z|m,

y1: t, d1: t, d) =

N

  • i=1

wi

m,t(d, z).

Zm,t denotes current evidence for model m, which integrates

  • ut posterior of θ at t.

Estimate evidence including (d, z) Zm,t(d, z) using log ˆ Zm,t(d, z) = log ˆ Zm,t + log ˆ f (z|m,

y1: t, d1: t, d).

Convert to ˆ π(m|

y1: t, d, z)

Chris Drovandi MCMski 2014, Chamonix, France

slide-15
SLIDE 15

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Estimating the Utility (cont...)

Therefore estimate of utility for discrete z is ˆ U(d|

y1: t) =

K

  • m=1

ˆ π(m|

y1: t)
  • z∈S

ˆ f (z|m,

y1: t, d) log ˆ

π(m|

y1: t, z, d).

In continuous case approximate integral via MC integration. Draw zi

m,t ∼ f (z|m, θi m,t, d) for i = 1, . . . , N. Weighted

sample {W i

m,t, θi m,t, zi m,t} from joint p(z, θ|d, m, y1:t, d1:t).

Therefore estimate of utility for continuous z is ˆ U(d|y1:t, d1:t) =

K

  • m=1

ˆ π(m|y1:t, d1:t)

N

  • i=1

W i

m,t log ˆ

π(m|y1:t, d1:t, zi

m,t, d).

Could also be used for large counts.

Chris Drovandi MCMski 2014, Chamonix, France

slide-16
SLIDE 16

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Chemical Engineering Example Masoumi et al 2013

Consider chemical reaction A → B. Four competing models for the fraction of A remaining after time t minutes at temperature T Model 1: µ1 = exp

  • −θ1t exp
  • −θ2

T

  • Model 2: µ2

=

  • 1 + θ1t exp
  • −θ2

T −1 Model 3: µ3 =

  • 1 + 2θ1t exp
  • −θ2

T −1/2 Model 4: µ4 =

  • 1 + 3θ1t exp
  • −θ2

T −1/3 y|m ∼ N(µm, σ2) Two design variables

d = (t, T):

t ∈ {0, 25, 50, 75, 100, 125, 150} and T ∈ {450, 475, 525, 575, 600} yielding 35 possible choices.

Chris Drovandi MCMski 2014, Chamonix, France

slide-17
SLIDE 17

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Chemical Example Continued

15 independent observations Four cases. Each model allowed to be true with parameter (θ1, θ2, σ) = (400, 5000, 0.1) Prior distribution θ1 θ2

  • ∼ N

400 5000

  • ,

70 500

  • 1(θ1 > 0)1(θ2 > 0).

σ ∼ IG(10, 1) Comparing 3 different utility functions: Random design, mutual information and total separation (Masoumi et al 2013)

Chris Drovandi MCMski 2014, Chamonix, France

slide-18
SLIDE 18

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Chemical Reaction Results (model 1 true)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mutual information 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total separation 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Random

Figure: First order reaction as true.

Chris Drovandi MCMski 2014, Chamonix, France

slide-19
SLIDE 19

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Chemical Reaction Results (model 2 true)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Random 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mutual information 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total separation

Figure: Second order reaction as true.

Chris Drovandi MCMski 2014, Chamonix, France

slide-20
SLIDE 20

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Chemical Reaction Results (model 3 true)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mutual information 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total separation 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Random

Figure: Third order reaction as true.

Chris Drovandi MCMski 2014, Chamonix, France

slide-21
SLIDE 21

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Chemical Reaction Results (model 4 true)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Mutual information 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total separation 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Random

Figure: Fourth order reaction as true.

Chris Drovandi MCMski 2014, Chamonix, France

slide-22
SLIDE 22

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Advantage 3: Embarrisingly Parallel

Extend to design in presence of random effects model Pharmacokinetics Example. For subject t the model for samples collected at design

d t = (d1t, d2t) is:

yt ∼ MVN(g(βt,

d t), δ0 I),

βt ∼ MVN(µ, Ω), Here β

t is random effect for tth subject

g(βt,

d t) =

100 exp(β2t) exp

  • −exp(β1t)

exp(β2t)

d t
  • ,

Priors: µ ∼ MVN(0, Σ), for Σ known. Ω ∼ InvWish(Ψ, ν), for Ψ and ν known δ0 ∼ U(a, b), 0 < a ≤ b < ∞, Design objective is to learn about parameters: θ = (µ, Ω, δ0).

Chris Drovandi MCMski 2014, Chamonix, France

slide-23
SLIDE 23

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Exact-Approximate SMC

The (marginal) likelihood f (

y t|θ, d t) =
  • f (
y t|βt, δ0, d t)π(βt|µ, Ω)dβt

Can be estimated unbiasedly. For each particle θ(i) f (

y t|θ(i), d t) = 1

M

M

  • j=1

f (

y t|β(j)

t , δ(i) 0 ,

d t)

(1) where β(j)

t

∼ MVN(µ(i), Ω(i)), j = 1, . . . , M. SMC with unbiased estimate of likelihood → an exact-approximate algorithm! (Duan and Fulop 2013) Caveat: SMC can degenerate quicker compared to using ‘exact’ likelihood (need large enough M)

Chris Drovandi MCMski 2014, Chamonix, France

slide-24
SLIDE 24

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Speeding things up on the GPU

Serial implementation of SMC design too slow here Many ways to parallelise algorithm (over particles, M and/or design choices d) Here we chose calculating the likelihood over particles in parallel on GPU (required in all aspects of algorithm) Order of magnitude speed improvement over C implementation of likelihood Work still in progress...

Chris Drovandi MCMski 2014, Chamonix, France

slide-25
SLIDE 25

Introduction SMC A1 - Comparing Utilities A2 - Mutual Information Utility A3 - GPU References

Key References

Box, G. E. P. and Hill, W. J. (1967). Discrimination among mechanistic models. Technometrics, 9(1):57-71. Drovandi, C. C. et al. (2014). A Sequential Monte Carlo Algorithm to Incorporate Model Uncertainty in Bayesian Sequential Design. JCGS To Appear. Drovandi, C. C. et al. (2013). Sequential Monte Carlo for Bayesian sequentially designed experiments for discrete data. CSDA, 57(1):320-335. Chopin, N. (2002). A sequential particle filter method for static models. Biometrika, 89(3):539-551. Del Moral, P et al (2006). Sequential Monte Carlo samplers. JRSS: Series B, 68(3):411-436.

Chris Drovandi MCMski 2014, Chamonix, France