Approximate Bayesian Computation using Auxiliary Models Tony - - PowerPoint PPT Presentation

approximate bayesian computation using auxiliary models
SMART_READER_LITE
LIVE PREVIEW

Approximate Bayesian Computation using Auxiliary Models Tony - - PowerPoint PPT Presentation

Approximate Bayesian Computation using Auxiliary Models Tony Pettitt Co-authors Chris Drovandi, Malcolm Faddy Queensland University of Technology Brisbane MCQMC February 2012 Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012


slide-1
SLIDE 1

Approximate Bayesian Computation using Auxiliary Models

Tony Pettitt Co-authors Chris Drovandi, Malcolm Faddy Queensland University of Technology Brisbane MCQMC February 2012

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 1 / 32

slide-2
SLIDE 2

Outline

1

Motivating Problem Application to modelling Macroparasite Immunity

2

Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC

3

Macroparasite Population Evolution Summary Statistics Auxiliary models

4

Results Posterior results ABC fits to Data

5

Conclusions Macroparasite model ABC

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 2 / 32

slide-3
SLIDE 3

Motivating Problem

1

Motivating Problem Application to modelling Macroparasite Immunity

2

Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC

3

Macroparasite Population Evolution Summary Statistics Auxiliary models

4

Results Posterior results ABC fits to Data

5

Conclusions Macroparasite model ABC

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 3 / 32

slide-4
SLIDE 4

Motivating Problem Application to modelling Macroparasite Immunity

Macroparasite Immunity

Estimate parameters of a Markov process model explaining macroparasite population development with host immunity 212 hosts (cats) i = 1, . . . , 212. Each cat injected with li juvenile Brugia pahangi larvae (approximately 100 or 200). At time ti host is sacrificed and the number of mature worms are recorded Host assumed to develop an immunity Three discrete variables: M(t) matures, L(t) juveniles, I(t) immunity. Only L(0) and M at sacrifice time are observed for each host

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 4 / 32

slide-5
SLIDE 5

Motivating Problem Application to modelling Macroparasite Immunity

Macroparasite Immunity data, proportion of mature vs sacrifice time

200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time Proporton of Matures Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 5 / 32

slide-6
SLIDE 6

Motivating Problem Application to modelling Macroparasite Immunity

Trivariate Markov Process of Riley et al (2003)

M(t) L(t) I(t)

Mature Parasites Juvenile Parasites Immunity

Maturation Gain of immunity Loss of immunity

Death due to immunity Natural death Natural death

Invisible Invisible Invisible

γL(t) νL(t) µII(t) βI(t)L(t) µLL(t) µMM(t) Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 6 / 32

slide-7
SLIDE 7

Motivating Problem Application to modelling Macroparasite Immunity

The Model and Intractable Likelihood

L, M, I are discrete counts, I hypothesised variable Deterministic form of the model dL dt = −µLL − βIL − γL, dM dt = γL − µMM, dI dt = νL − µII, µm, γ fixed. ν, µL, µI, β require estimation Likelihood based on Markov process is intractable Simulation of process L, M, I using Gillespie’s algorithm (Gillespie, 1977) A common mathematical model: epidemics, chemical kinetics, systems biology....

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 7 / 32

slide-8
SLIDE 8

Approximate Bayesian Computation Introduction to ABC

1

Motivating Problem Application to modelling Macroparasite Immunity

2

Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC

3

Macroparasite Population Evolution Summary Statistics Auxiliary models

4

Results Posterior results ABC fits to Data

5

Conclusions Macroparasite model ABC

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 8 / 32

slide-9
SLIDE 9

Approximate Bayesian Computation Introduction to ABC

ABC and the Approximate Posterior I

Bayesian statistics involves inference based on the posterior distribution p(θ|y) ∝ p(y|θ)p(θ). and if the likelihood cannot be evaluated? but easy to simulate, x, from the likelihood Applications... genetics, biology, finance, ... Involves a joint posterior distribution for θ and simulated data x p(θ, x|y) ∝ g(y|x)p(x|θ)p(θ) where g(y|x) measures closeness of observed data y to simulated data x. (Reeves and Pettitt, 2005)

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 9 / 32

slide-10
SLIDE 10

Approximate Bayesian Computation Introduction to ABC

ABC and the Approximate Posterior II

Compare simulated values, x, and observed data, y through summary statistics S(.) = S1(.), . . . , Sp(.) ρ(y, x) = S(y) − S(x) One choice, set g(y|x) = 1(ρ(y, x) ≤ ǫ) Choice of ǫ trade off between accuracy and computational time

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 10 / 32

slide-11
SLIDE 11

Approximate Bayesian Computation Three ABC Algorithms

Rejection Sampling ABC - RS-ABC

Rejection Sampling (RS-ABC) (Beaumont et al, 2002)

Sample θ∗ ∼ p(θ) Simulate x ∼ p(.|θ∗) Accept θ∗ if ρ(y, x) ≤ ǫ

Repeat the above until we have N values, θ1, . . . , θN Advantages: Simplicity, Independent values, parallelizable Disadvantage: Inefficient.

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 11 / 32

slide-12
SLIDE 12

Approximate Bayesian Computation Three ABC Algorithms

MCMC ABC

Majoram et al, 2003 Advantages: theoretical understanding, use in SMC Disadvantages: Dependent Samples, Markov chain convergence, sampler can get stuck, inefficient, needs tuning

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 12 / 32

slide-13
SLIDE 13

Approximate Bayesian Computation Sequential Monte Carlo ABC

Sequential Monte Carlo for ABC

Chopin(2002), Del Moral et al (2006),Sisson et al (2007, 2009), Beaumont et al (2009) Approximate posterior by weighted sample {θi, W i}N

i=1, N particles

Define sequence of joint targets pt(θ, x|y) ∝ p(x|θ)p(θ)1(ρ(x, y) ≤ ǫt), for t = 1, . . . , T, and a sequence of decreasing tolerances ǫ1 ≥ ǫ2 ≥ · · · ≥ ǫT. At t = 1 draw particles from prior, set ǫ1 = ∞

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 13 / 32

slide-14
SLIDE 14

Approximate Bayesian Computation Sequential Monte Carlo ABC

A Fully Adaptive SMC ABC Algorithm I

Drovandi and Pettitt (2011), Del Moral et al (2011) Reweight particles, either zero or proportional to 1, W i

t ∝

1(ρ(xi

t, y) ≤ ǫt)

1(ρ(xi

t, y) ≤ ǫt−1),

Choose ǫt so that M have zero weights and N − M have non-zero weights Replenish population by resampling M from N − M ‘alive’ particles. Diversify the particles with an MCMC kernel, qt(.|.), that is stationary for the current target. qt(.|.) determined adaptively using the ‘alive’ set. Apply MCMC kernel Rt times so that overall acceptance close to 1. Learn Rt adaptively.

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 14 / 32

slide-15
SLIDE 15

Macroparasite Population Evolution

1

Motivating Problem Application to modelling Macroparasite Immunity

2

Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC

3

Macroparasite Population Evolution Summary Statistics Auxiliary models

4

Results Posterior results ABC fits to Data

5

Conclusions Macroparasite model ABC

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 15 / 32

slide-16
SLIDE 16

Macroparasite Population Evolution

Macroparasite Population Evolution

200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Time Proporton of Matures Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 16 / 32

slide-17
SLIDE 17

Macroparasite Population Evolution Summary Statistics

Developing Summary Statistics

Nunes and Balding (2009)

For all sets and subsets of summary statistics, carry out AS-ABC for a fixed acceptance rate Compare ABC approximations in terms of concentration of posterior distribution. Use a non-parametric measure of entropy

Fearnhead and Prangle (2012)

suitable for iid cases

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 17 / 32

slide-18
SLIDE 18

Macroparasite Population Evolution Summary Statistics

Developing Summary Statistics using Indirect Inference

Summary statistics that efficiently summarize data! Different numbers of juveniles L, different sacrifice times. An approach based on indirect inference (Gouri´ eroux and Ronchetti, 1993)

Propose an auxiliary model pa(y|θa) where parameter θa is easily estimated (eg easy MLE) Auxiliary model is flexible enough to provide a good description of the data Simulate xθ from target intractable likelihood p(•|θ) and find ˆ θa(xθ) Estimate θ using ˆ θa(xθ) closest to ˆ θa(y) Alternative to ABC

Estimates of parameters of the auxiliary model fitted to the data become the summary statistics in ABC. Models based on Beta-Binomial to capture variability

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 18 / 32

slide-19
SLIDE 19

Macroparasite Population Evolution Summary Statistics

ABC Summary Statistics from auxiliary models

How to choose between different auxiliary models? Either use data analytical tools for the original data set, eg AIC Or use the Nunes and Balding approach based on ABC approximations for each model/ summary statistic choice Former is far less computer intensive but does not consider the ABC approximation Compare and contrast different Beta Binomial models Models fitted using MLE and AIC used to the rank models The models range over about 33 units of AIC How are the different fits (AIC) reflected in the ABC posteriors?

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 19 / 32

slide-20
SLIDE 20

Macroparasite Population Evolution Auxiliary models

Auxiliary Beta-Binomial model

The data show too much variation for Binomial A Beta-Binomial model has an extra parameter to capture dispersion p(mi|αi, βi) = li mi B(mi + αi, li − mi + βi) B(αi, βi) , Use reparameterisation pi = αi/(αi + βi) and θi = 1/(αi + βi) Relate the proportion, pi to time, ti, and over dispersion parameter θi, to initial larvae, li Compare various models, best (AIC=1897) with others with AIC=1911 and AIC=1930

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 20 / 32

slide-21
SLIDE 21

Macroparasite Population Evolution Auxiliary models

Auxiliary Beta-Binomial model

The data show too much variation for Binomial A Beta-Binomial model has an extra parameter to capture dispersion p(mi|αi, βi) = li mi B(mi + αi, li − mi + βi) B(αi, βi) , Use reparameterisation pi = αi/(αi + βi) and θi = 1/(αi + βi) Relate the proportion, pi to time, ti, and over dispersion parameter θi, to initial larvae, li Compare various models, best (AIC=1897) with others with AIC=1911 and AIC=1930

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 20 / 32

slide-22
SLIDE 22

Macroparasite Population Evolution Auxiliary models

Summary statistics

For each model Compare the auxiliary estimates for simulated data x, θx

a, and

  • bservations y, ˆ

θ

y a, with the Mahalanobis distance

ρ(y, x) = ρ(ˆ θ

y a, θx a) =

θ

y a − θx a)TS−1(ˆ

θ

y a − θx a),

where S is the covariance matrix for the MLE

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 21 / 32

slide-23
SLIDE 23

Results

1

Motivating Problem Application to modelling Macroparasite Immunity

2

Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC

3

Macroparasite Population Evolution Summary Statistics Auxiliary models

4

Results Posterior results ABC fits to Data

5

Conclusions Macroparasite model ABC

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 22 / 32

slide-24
SLIDE 24

Results Posterior results

SMC Algorithm Settings and Posterior Results

Algorithm Settings Take N = 1000 particles, discard half each iteration, finish with 3% acceptance for MCMC kernel. Repeat MCMC kernel to get about 99% acceptance at each iteration Fixed values γ = 0.04, µM = 0.0015 Prior choices:

ν: U(0,1) µL: U(0,1) µI: U(0,2), β: U(0,2),

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 23 / 32

slide-25
SLIDE 25

Results Posterior results

SMC Algorithm Settings and Posterior Results

Algorithm Settings Take N = 1000 particles, discard half each iteration, finish with 3% acceptance for MCMC kernel. Repeat MCMC kernel to get about 99% acceptance at each iteration Fixed values γ = 0.04, µM = 0.0015 Prior choices:

ν: U(0,1) µL: U(0,1) µI: U(0,2), β: U(0,2),

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 23 / 32

slide-26
SLIDE 26

Results Posterior results

SMC Algorithm Settings and Posterior Results

Algorithm Settings Take N = 1000 particles, discard half each iteration, finish with 3% acceptance for MCMC kernel. Repeat MCMC kernel to get about 99% acceptance at each iteration Fixed values γ = 0.04, µM = 0.0015 Prior choices:

ν: U(0,1) µL: U(0,1) µI: U(0,2), β: U(0,2),

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 23 / 32

slide-27
SLIDE 27

Results Posterior results

Posterior Density for ν

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 x 10

−3

500 1000 1500 2000 2500 3000 3500 4000

ν density

AIC 1930 AIC 1911 AIC 1897

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 24 / 32

slide-28
SLIDE 28

Results Posterior results

Posterior Density for µL

−0.01 0.01 0.02 0.03 0.04 0.05 0.06 10 20 30 40 50 60 70 80 90

µL density

AIC 1930 AIC 1911 AIC 1897

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 25 / 32

slide-29
SLIDE 29

Results Posterior results

Posterior Density Differences

More concentrated ABC posterior for worse (bigger) AIC Very different modes Nunes-Balding criterion for summary statistics: prefer bigger AIC models. Real test: the predictions from the different ABC fitted models?

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 26 / 32

slide-30
SLIDE 30

Results ABC fits to Data

ABC Fit to data

95 percent prediction interval from the stochastic model using ABC modal estimates: with best AIC auxiliary model on left, worst on right

200 400 600 800 1000 1200 1400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% prediction intervals based on the auxiliary Beta−Binomial model Autopsy time mature count 200 400 600 800 1000 1200 1400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 95% prediction intervals based on the auxiliary Binomial mixture model Autopsy time mature count

ABC model fit based on best AIC auxiliary model captures variability of data

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 27 / 32

slide-31
SLIDE 31

Conclusions

1

Motivating Problem Application to modelling Macroparasite Immunity

2

Approximate Bayesian Computation Introduction to ABC Three ABC Algorithms Sequential Monte Carlo ABC

3

Macroparasite Population Evolution Summary Statistics Auxiliary models

4

Results Posterior results ABC fits to Data

5

Conclusions Macroparasite model ABC

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 28 / 32

slide-32
SLIDE 32

Conclusions Macroparasite model

Conclusions: Macroparasite stochastic model

µL, ν more associated with variance of process, ABC differences between auxiliary models over range of AIC Most concentrated ABC posteriors are not best for model predictions The stochastic model using the ABC fit with the best AIC auxiliary model for summary statistics gives good fit to data

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 29 / 32

slide-33
SLIDE 33

Conclusions ABC

Conclusions: ABC and summary statistics

Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for

  • bserved data - statisticians’ strength!

Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes

  • therwise ad hoc choices of summaries made.

Presented an adaptive self-tuning ABC SMC algorithm

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 30 / 32

slide-34
SLIDE 34

Conclusions ABC

Conclusions: ABC and summary statistics

Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for

  • bserved data - statisticians’ strength!

Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes

  • therwise ad hoc choices of summaries made.

Presented an adaptive self-tuning ABC SMC algorithm

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 30 / 32

slide-35
SLIDE 35

Conclusions ABC

Conclusions: ABC and summary statistics

Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for

  • bserved data - statisticians’ strength!

Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes

  • therwise ad hoc choices of summaries made.

Presented an adaptive self-tuning ABC SMC algorithm

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 30 / 32

slide-36
SLIDE 36

Conclusions ABC

Conclusions: ABC and summary statistics

Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for

  • bserved data - statisticians’ strength!

Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes

  • therwise ad hoc choices of summaries made.

Presented an adaptive self-tuning ABC SMC algorithm

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 30 / 32

slide-37
SLIDE 37

Conclusions ABC

Conclusions: ABC and summary statistics

Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for

  • bserved data - statisticians’ strength!

Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes

  • therwise ad hoc choices of summaries made.

Presented an adaptive self-tuning ABC SMC algorithm

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 30 / 32

slide-38
SLIDE 38

Conclusions ABC

Conclusions: ABC and summary statistics

Indirect inference through an easy-to-estimate (MLE +AIC) auxiliary model can be used to obtain ABC summary statistics Careful data analysis involved in getting good auxiliary model for

  • bserved data - statisticians’ strength!

Nunes and Balding approach can be flawed where ABC posteriors have different modes. The Fearnhead and Prangle approach seems limited to IID caes

  • therwise ad hoc choices of summaries made.

Presented an adaptive self-tuning ABC SMC algorithm

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 30 / 32

slide-39
SLIDE 39

Conclusions ABC

Acknowledgements

Australian Research Council, Chris Drovandi, Malcolm Faddy

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 31 / 32

slide-40
SLIDE 40

Conclusions ABC

Foresight

It will be maintained that the end of an era has now been reached, as regards both statistical methods and computational techniques, and an

  • utline of the way in which biometric techniques in genetical demography

may be expected to develop will be given. Particular emphasis will be placed on the need to formulate sound methods of ”estimation by simulation” on complex models. Edwards AWF, 1967, Biometrics, 23, 176

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 32 / 32

slide-41
SLIDE 41

Conclusions ABC

Foresight

It will be maintained that the end of an era has now been reached, as regards both statistical methods and computational techniques, and an

  • utline of the way in which biometric techniques in genetical demography

may be expected to develop will be given. Particular emphasis will be placed on the need to formulate sound methods of ”estimation by simulation” on complex models. Edwards AWF, 1967, Biometrics, 23, 176

Tony Pettitt () ABC using Auxiliary Models MCQMC February 2012 32 / 32