Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 1 / 31

Markov chain Monte Carlo Markov chain construction The techniques we have discussed thus far, e.g. Metropolis-Hastings independent Metropolis-Hastings random-walk Metropolis Hamiltonian Monte Carlo Gibbs sampling Slice sampling form a set of techniques referred to as Markov chain Monte Carlo (MCMC). Today we look at some practical questions involving the use of MCMC: What initial values should I use? How long do I need to run my chain? What can I do with the samples I obtain? Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 2 / 31

Markov chain Monte Carlo Markov chain Monte Carlo An MCMC algorithm with transition kernel K ( θ ( t − 1) , θ ( t ) ) constructed to sample from p ( θ | y ) is the following: 1. Sample θ (0) ∼ π (0) . 2. For t = 1 , . . . , T , perform the kernel K ( θ ( t − 1) , θ ( t ) ) to obtain a sequence θ (1) , . . . , θ ( t ) . The questions can then be rephrased as What should I use for π (0) ? What should T be? What can I do with θ (1) , . . . , θ ( t ) ? Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 3 / 31

Initial values Initial values For ergodic Markov chains with stationary distribution p ( θ | y ) , theory states that d θ ( t ) → θ where θ ∼ p ( θ | y ) for almost all θ (0) (all with Harris recurrence). If p ( θ (0) | y ) << p ( θ MAP | y ) , then this can take a long time. For example, let θ ( t ) = 0 . 99 θ ( t − 1) + ǫ t iid ∼ N (0 , 1 − . 99 2 ) ǫ t which has stationary distribution p ( θ | y ) d = N (0 , 1) . If θ (0) ∼ p ( θ | y ) then θ ( t ) · ∼ · p ( θ | y ) for all t , but if θ (0) is very far from p ( θ | y ) then θ ( t ) · ∼ · p ( θ | y ) only for t very large. Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 4 / 31

Initial values 100 3 burn−in 2 80 1 60 good bad 0 40 −1 20 −2 −3 0 0 200 600 1000 0 200 600 1000 Index Index Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 5 / 31

Initial values How many iterations do I need for burn-in? Imagine two different chains Not mixing Not stationary Mixing and stationary 6 3 rep 1 y 0 2 −3 −6 0 250 500 750 1000 0 250 500 750 1000 0 250 500 750 1000 x Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 6 / 31

Initial values Potential scale reduction factor Gelman-Rubin potential scale reduction factor 1. Start multiple chains with initial values that are well dispersed values relative to p ( θ | y ) . 2. For each scalar estimand ψ of interest, Calculate the between B and within W chain variances Estimate the the marginal posterior variance of the estimand, i.e. V ar ( ψ | y ) : + ( ψ | y ) = t − 1 W + 1 � V ar t B t where t is the number of iterations. Calculate the potential scale reduction factor � + ( ψ | y ) � V ar ˆ R ψ = W 3. If the ˆ R ψ are approximately 1, e.g. < 1.1, then there is no evidence of non-convergence. Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 7 / 31

Initial values Potential scale reduction factor Example potential scale reduction factors [1] "Not mixing" Potential scale reduction factors: Point est. Upper C.I. [1,] 7.35 16.2 [1] "Not stationary" Potential scale reduction factors: Point est. Upper C.I. [1,] 2.62 5.31 [1] "Mixing and stationary" Potential scale reduction factors: Point est. Upper C.I. [1,] 1.01 1.04 Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 8 / 31

Initial values Methods Methods for finding good initial values From http://users.stat.umn.edu/~geyer/mcmc/burn.html : Any point you don’t mind having in a sample is a good starting point. Methods for finding good initial values: burn-in: throw away the first X iterations Start at the MLE, i.e. argmax θ p ( y | θ ) Start at the MAP (maximum aposterior), i.e. argmax θ p ( θ | y ) Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 9 / 31

How many iterations? How many iterations should I run (post ‘convergence’)? Compute the effective sample size, i.e. how many independent samples would we need to get the equivalent precision of our estimates? d = ddply(data.frame(rho=c(0,.9,.99)), .(rho), function(x) data.frame(x=rwalk(1000,0,x$rho))) ddply(d, .(rho), summarize, effective_size = round(coda::effectiveSize(x))) rho effective_size 1 0.00 1000 2 0.90 35 3 0.99 6 BDA3 a total of 100-2000 effective samples. But this really depends on what you want to estimate. If you are interested in estimating probabilities of rare events, i.e. tail probabilities, you may need many more samples. Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 10 / 31

How many iterations? Autocorrelation function rho= 0 rho= 0.9 rho= 0.99 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 ACF ACF ACF 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 −0.2 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 Lag Lag Lag Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 11 / 31

What can I do with the samples? Monte Carlo integration Consider approximating the integral via it’s Markov chain Monte Carlo (MCMC) estimate, i.e. � � θ ( t ) � ( t ) � h T = 1 ˆ E θ | y [ h ( θ ) | y ] = h ( θ ) p ( θ | y ) dθ and h . T Θ t =1 where θ ( t ) is the t th iteration from the MCMC. Under regularity conditions, SLLN: ˆ h T converges almost surely to E [ h ( θ ) | y ] . CLT: under stronger regularity conditions, � � d ˆ E [ h ( θ ) | y ] , σ 2 /T h T → N where � � � ∞ σ 2 = V ar [ h ( θ ) | y ] 1 + 2 ρ k k =1 where ρ k is the k th autocorrelation of the h ( θ ) values. Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 12 / 31

What can I do with the samples? Sequential estimates opar = par(mfrow=c(1,3)) d_ply(d, .(rho), function(x) plot(cumsum(x$x)/1:length(x$x), type="l", ylim=c(-1,1), ylab="Posterior mean", xlab="Iteration (t)", main=paste("rho=", x$rho[1])) ) rho= 0 rho= 0.9 rho= 0.99 1.0 1.0 1.0 0.5 0.5 0.5 Posterior mean Posterior mean Posterior mean 0.0 0.0 0.0 −0.5 −0.5 −0.5 −1.0 −1.0 −1.0 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 Iteration (t) Iteration (t) Iteration (t) par(opar) Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 13 / 31

What can I do with the samples? Treat the MCMC samples as samples from the posterior Use mcmcse::mcse to estimate the MCMC variance # Mean ddply(d, .(rho), function(x) round(as.data.frame(mcmcse::mcse(x$x)),2)) rho est se 1 0.00 -0.08 0.03 2 0.90 0.15 0.14 3 0.99 0.33 0.15 # Quantiles ddply(d, .(rho), function(x) round(as.data.frame(mcmcse::mcse.q(x$x, .025)),2)) rho est se 1 0.00 -2.06 0.08 2 0.90 -2.01 0.26 3 0.99 -1.53 0.24 ddply(d, .(rho), function(x) round(as.data.frame(mcmcse::mcse.q(x$x, .975)),2)) rho est se 1 0.00 1.92 0.10 2 0.90 1.94 0.17 3 0.99 2.41 0.34 Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 14 / 31

What can I do with the samples? Treat the MCMC samples as samples from the posterior rho= 0 rho= 0.9 rho= 0.99 0.4 0.5 0.4 0.4 0.3 0.3 0.3 Density Density Density 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x$x x$x x$x Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 15 / 31

One really long chain A wasteful approach The Gelman approach in practice is the following 1. Run an initial chain or, in some other way, approximate the posterior. 2. (Randomly) choose initial values for multiple chains well dispersed relative to this approximation to the posterior. 3. Run the chain until all estimands of interest have potential scale reduction factors less than 1.1. 4. Continuing running until you have a total of around 4,000 effective draws. 5. Discard the first half of all the chains. Assuming this approach correctly diagnosis convergence or lack thereof, it seems computationally wasteful since You had to run an initial chain, but then threw it away. You threw away half of your later iterations. Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 16 / 31

One really long chain One really long chain From http://users.stat.umn.edu/~geyer/mcmc/one.html If you can’t get a good answer with one long run, then you can’t get a good answer with many short runs either. 1. Start a chain at a reasonable starting value. 2. Run it for many iterations (and keep running it). If you really want a convergence diagnostic, you can try Geweke’s which tests for equality of means in the first and last parts of the chain. Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 17 / 31

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 1 / 31 Markov chain Monte Carlo Markov chain construction The techniques we have

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice President for Academic Affairs

Shinsuke Satoh, F. Isoda, T. Sano, H. Hanado (NICT), T. Ushio (Tokyo Metropolitan Univ), S. Otsuka

Evaluating Viability of Network Functions on Lambda Architecture By Arjun Singhvi, Anshul

Chair of Computer Science 5 RWTH Aachen University Learning Layers Analysis of Overlapping

A practical primal-dual interior-point algorithm for nonsymmetric conic optimization September 8,

The Significance of Food and Drink to the Indo-Europeans. Presented by Rev. Skip Ellison

From Bean to Cup: a history of coffee Michael Firmin UDLS - 19 Sep 2014 What this UDLS is not

Chocolates journey 600AD Mayans, Aztec, Incas Xocolatl Cocoa drink made of

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State - PowerPoint PPT Presentation

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad Niemi (STAT544@ISU) Markov chain Monte Carlo April 2, 2018 1 / 31 Markov chain Monte Carlo Markov chain construction The techniques we have

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Stochastic approximation for adaptive Markov chain Monte Carlo algorithms Gersende FORT LTCI /

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Welcome Data-Driven Discovery Teik C. Lim, PhD Provost and Vice President for Academic Affairs

Shinsuke Satoh, F. Isoda, T. Sano, H. Hanado (NICT), T. Ushio (Tokyo Metropolitan Univ), S. Otsuka

Evaluating Viability of Network Functions on Lambda Architecture By Arjun Singhvi, Anshul

Chair of Computer Science 5 RWTH Aachen University Learning Layers Analysis of Overlapping

A practical primal-dual interior-point algorithm for nonsymmetric conic optimization September 8,

The Significance of Food and Drink to the Indo-Europeans. Presented by Rev. Skip Ellison

From Bean to Cup: a history of coffee Michael Firmin UDLS - 19 Sep 2014 What this UDLS is not

Chocolates journey 600AD Mayans, Aztec, Incas Xocolatl Cocoa drink made of

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly