The past, present and future of Monte Carlo methods (a personal - PowerPoint PPT Presentation

Australian Centre of Excellence for Mathematical and Statistical Frontiers in Big Data, Big Models, New Insights The past, present and future of Monte Carlo methods (a personal experience) Scott A. Sisson UNSW Sydney & ACEMS 8th–12th July 2019, MCM2019 1/71

Talk Outline 1. The Past • RJMCMC convergence diagnostics ∗ • Adaptive MCMC 2. The Present • Approximate Bayesian computing • Likelihood-free approximate Gibbs sampling • MCMC for approximate data 3. The Future • of Monte Carlo methods research * Sisson & Fan (2007), Statistics & Computing. 2/71

RJMCMC convergence diagnostics Active research in 1990s–2000’s Gelman-Rubin (Potential Scale Reduction Factor) diagnostic (1992): � 1 / 2 � N − 1 W + 1 ˆ N ˆ B ˆ N R = ˆ W ◮ B = between chain variance. ◮ W = within chain variance. ◮ ˆ R → 1 when all chains have converged. ◮ Based on some quantity constantly observed across all chains ◮ Not clear how to use for RJMCMC: • Monitor deviance? • Monitor model indicator? 3/71

RJMCMC convergence diagnostics After reading some books on point processes (. . . ) ◮ Distance X v from point v to nearest event ξ 1 , . . . , ξ n in space A . ◮ For some models, can represent parameters as point process realisations • E.g. Mixture of Normals f ( x ) = � n i =1 w i φ ( x ; µ i , σ i ) • θ = ( µ 1 , . . . , µ n , σ 1 , . . . , σ n , w 1 , . . . , w n , n ) • θ = { ξ i = ( µ i , σ i , w i ) : i = 1 , . . . , n } ◮ Now X v is constantly observed across all chains ◮ So could estimate ˆ F ( x v | v ) for a chain, from some location v = ( µ v , σ v , ξ v ) Use for a diagnostic: � ∞ ◮ Comparing two chains u jk = � | F j ( x v | v ) − F k ( x v | v ) | p dx v dv V 0 ◮ or, ˆ R v on ( x 1 v , . . . , x C v ) over all chains (etc.) 4/71

RJMCMC convergence diagnostics Specific model 1 : Corbelled domes (unknown number of changepoints) � log( α j ) + β j log( d i + δ j ) + ǫ i γ j − 1 ≤ d i < γ j log( r i ) = log( c ) + ǫ i γ J ≤ d i with changepoints 0 = γ 0 ≤ . . . ≤ γ k , ǫ i ∼ N (0 , σ 2 ), subject to some continuity constraints. Fan and Brooks ◮ ξ j = ( α j , β j , δ j , γ j , σ ) ◮ Run 5 RJMCMC samplers for 15m iterations ◮ Did they converge? 1Fan and Brooks (2000), The Statistician. 5/71

RJMCMC convergence diagnostics ◮ (a,b) Kolomogorov-Smirnov and χ 2 test on model indicators � ◮ (c) PSRF/ ˆ R using deviance � 6/71

RJMCMC convergence diagnostics ◮ (a) All u j 3 traces suggest chain #3 is different (grey lines) ◮ (b) ˆ R v shows different performance for 3 specific v ∼ π ( ξ | x ). ◮ (c) Density of γ 2 under chain 3 is very different to other chains ◮ Difference identified by 3 closest v (points). 7/71

Talk Outline 1. The Past • RJMCMC convergence diagnostics • Adaptive MCMC ∗ 2. The Present • Approximate Bayesian computing • Likelihood-free approximate Gibbs sampling • MCMC for approximate data 3. The Future • of Monte Carlo methods research * Garthwaite, Fan & Sisson (2010/16) Communications in Statistics. 8/71

Adaptive MCMC Adaptive Random Walk Metropolis-Hastings (ARWMH): ◮ MCMC with proposal distribution q ( θ, θ ∗ ) = N ( θ, σ 2 A t ) ◮ Where A t = Cov( θ ) estimated from θ t − 1 , . . . , θ 1 ◮ Satisfies “diminishing adaptations” condition so achieves correct limiting distribution Q: How to choose σ ? (One) A: Realise a particular acceptance probability Which acceptance probability? ◮ 0.234 for multivariate proposal distributions (Roberts et al., 1997) ◮ 0.44 for univariate proposal distributions (Roberts and Rosenthal, 2001) Appears to work well in general Typically obtained by manually searching for σ 2 (laborious) 9/71

The Robbins-Monro Process ◮ Stochastic search algorithm 1.0 for a binary response with 0.8 probability of success p ( σ ). Can control σ . 0.6 p ( σ ) ◮ The aim is to find the value 0.4 of σ ∗ that gives a specified 0.2 value of p , say p ∗ = p ( σ ∗ ). 0.0 0 2 4 6 8 10 12 σ 1.0 Robbins-Monro process (Robbins & Monro, 1951) 0.8 Conduct a sequence of trials. 0.6 ◮ At each trial σ is set equal to the p ( σ ) 0.4 current estimate of σ ∗ . 0.2 ◮ If the trial is a success, σ is increased. 0.0 ◮ If it is a failure, σ is reduced. 0 2 4 6 8 10 12 σ ◮ Adjustment size decreases with each trial. 10/71

Incorporation into RWMH algorithms Within a RWMH algorithm: Given the current value of σ t , update σ t +1 according to � σ t + c (1 − p ∗ ) / t if θ ∗ is accepted σ t +1 = if θ ∗ is rejected σ t − cp ∗ / t where θ ∗ ∼ N ( θ t , σ 2 ), and c is a chosen steplength constant, which controls algorithm convergence. ◮ Assumes that p ( σ ) is a monotonic function of σ . ◮ The optimum choice of c is (Hodges and Lehmann, 1955) c ∗ = − 1 / [ dp ( σ ) / d σ ] σ = σ ∗ . In general this must be estimated. ◮ Previous attempts to estimate c ∗ used 3 additional Markov chains (Andrieu and Robert, 2001) 11/71

Estimating c ∗ ◮ Pr(accepting a MH move from θ ) � α ( θ, θ ∗ ) q ( θ, θ ∗ ) d θ ∗ p ( θ, σ ) = ◮ Hence, overall MH acceptance probability (OAP) is � p ( σ ) = p ( θ, σ ) π ( θ | y ) d θ. For Gaussian RWMH (symmetric proposal) we then have � π ( θ ∗ | y ) � � dq ( θ, θ ∗ ; σ ) dp ( σ ) � � � π ( θ | y ) d θ ∗ d θ = min π ( θ | y ) , 1 d σ d σ which is difficult to work with in general. However . . . 12/71

Some results If q is a m -dimensional MVN ( θ t , σ 2 A ) where A is independent of σ . Proposition 1 A lower bound on c ∗ /σ ∗ is 1 / ( mp ∗ ). Proposition 2 If π ( θ | y ) has finite variance, then for m = 1: c ∗ /σ ∗ → 1 / p ∗ as σ → ∞ . Proposition 3 If π ( θ | y ) is continuous over Θ, and if σ → 0 as p ( σ ) → 1, then c ∗ /σ ∗ ≈ 1 / (1 − p ∗ ) p ∗ → 1 . as 13/71

A simple estimate of c ∗ (univariate) Hence for univariate proposals ( m = 1), one possibility is c ∗ 1 σ ∗ ≈ p ∗ (1 − p ∗ ) which satisfies propositions 1–3. In practice: Have an estimate of c ∗ for every estimate of σ ∗ in the Robbins-Monro search process. Results for multivariate posteriors too. 14/71

How good in practice? 20 15 c*/ ! * 10 5 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p* ◮ Given p ∗ numerically solve p ( σ ) for σ ∗ and dp ( σ ) / d σ for c ∗ . ◮ Univariate distributions: N(0,1); t 5 ; Cauchy; logistic; double Exponential; Gamma(5,1); Beta(3,7); U(0,1); 1 2 *N(0,1)+ 1 2 *N(5,5) ◮ Note: Useful to slightly overestimate c ∗ in practice 15/71

How good in practice? (a) m =2 (b) m =4 20 20 15 15 c*/ ! * c*/ ! * 10 10 5 5 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p* p* (c) m =8 (d) m =20 20 20 15 15 c*/ ! * c*/ ! * 10 10 5 5 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p* p* ◮ π ( θ | y ) = � h ( θ i ) where h i are previous 9 densities. ◮ Consistent results as before. 16/71

Algorithm comments Univariate target distributions with θ ∗ ∼ N ( θ t , σ 2 ): Set p ∗ = 0 . 44, update σ t according to � σ t + c (1 − p ∗ ) / t if θ ∗ is accepted σ t +1 = if θ ∗ is rejected, σ t − cp ∗ / t where c = σ t / { p ∗ (1 − p ∗ ) } Various other implementational details omitted. 17/71

Univariate simulation study σ ∗ quantile Optimal OAP quantile Target σ ∗ 0.05 median 0.95 0.05 median 0.95 distribution N (0 , 1) 2.42 2.31 2.43 2.56 0.417 0.443 0.468 t -dist. (5 d.f.) 2.71 2.54 2.73 2.89 0.413 0.441 0.470 Cauchy 4.39 3.69 4.25 5.03 0.389 0.443 0.501 Logistic 4.05 3.82 4.05 4.33 0.417 0.442 0.467 Double exponential 2.70 2.52 2.70 2.93 0.413 0.439 0.465 Gamma(5,1) 4.98 4.62 4.96 5.28 0.414 0.443 0.467 Beta(3,7) 0.335 0.311 0.335 0.355 0.417 0.440 0.466 Uniform 0.806 0.764 0.807 0.849 0.418 0.442 0.464 2 N (0 , 1) + 1 1 2 N (5 , 5) 6.07 5.59 6.10 6.50 0.415 0.442 0.468 norm norm DBEXP DBEXP BETA BETA 1.0 5.0 1.0 1.0 4 4.5 6 0.8 0.8 0.8 4.0 3 5 0.6 0.6 0.6 sigma sigma sigma 3.5 acc acc acc 0.4 0.4 0.4 4 2 3.0 0.2 2.5 0.2 0.2 3 1 2.0 0.0 0.0 0.0 0 500 1500 0 500 1500 0 500 1500 0 500 1500 0 500 1500 0 500 1500 Index Index Index Index Index Index 18/71

The Past: Summary 1. The Past • RJMCMC convergence diagnostics • Adaptive MCMC Active research in 1990s–2000’s Who worries about convergence in 2020? Should be more important than ever (bigger, more complex models) MCMC adaption still important (e.g. HMC) But less focus on going beyond (1-step) Markov chain Maybe the problems have been adequately resolved (for now). Research trends come and go, and this is ok. � 19/71

Talk Outline 1. The Past • RJMCMC convergence diagnostics • Adaptive MCMC 2. The Present • Approximate Bayesian computing ∗ • Likelihood-free approximate Gibbs sampling • MCMC for approximate data 3. The Future • of Monte Carlo methods research * Rodrigues, Nott & Sisson (2016) Computational Statistics & Data Analysis 20/71

The past, present and future of Monte Carlo methods (a personal - PowerPoint PPT Presentation

Australian Centre of Excellence for Mathematical and Statistical Frontiers in Big Data, Big Models, New Insights The past, present and future of Monte Carlo methods (a personal experience) Scott A. Sisson UNSW Sydney & ACEMS 8th12th

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Marlon Dumas University of Tartu, Estonia Estonian Theory Days | 3-4 October 2015 Process Mining

Talking About Concerns . . . James D. Herbsleb School of Computer Science Carnegie Mellon

Model Estimation, Testing, and Reporting Model Estimation, Testing, and Reporting PSYC 575 PSYC

Lecture 21 More Spatial Random Effects Models Colin Rundel 04/10/2017 1 Loa Loa Example 2

Evolving Prometheus for the Cloud Native World Brian Brazil Founder Who am I? Engineer

DCS/CSCI 2350 Social & Economic Networks How do diseases, behavior, opinion, technology,

How Predictable is Information Diffusion? Travis Martin, Jake Hofman, Amit Sharma, Ashton

The past, present and future of Monte Carlo methods (a personal - PowerPoint PPT Presentation

Australian Centre of Excellence for Mathematical and Statistical Frontiers in Big Data, Big Models, New Insights The past, present and future of Monte Carlo methods (a personal experience) Scott A. Sisson UNSW Sydney & ACEMS 8th12th

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Methods for physically based Volume rendering Monte Carlo Methods for physically based

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Monte Carlo Methods An introduction to Monte Carlo (MC) methods How to use MC methods

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

Monte Carlo Methods and Area Estimates CS3220 - Summer 2008 Jonathan Kaldor Monte Carlo Methods

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Monte Carlo Methods Monte Carlo Methods I, at any rate, am convinced that He does not throw dice.

Logistic regression Predict binary outcomes (success/failure) from numerical or categorical

Marlon Dumas University of Tartu, Estonia Estonian Theory Days | 3-4 October 2015 Process Mining

Talking About Concerns . . . James D. Herbsleb School of Computer Science Carnegie Mellon

Model Estimation, Testing, and Reporting Model Estimation, Testing, and Reporting PSYC 575 PSYC

Lecture 21 More Spatial Random Effects Models Colin Rundel 04/10/2017 1 Loa Loa Example 2

Evolving Prometheus for the Cloud Native World Brian Brazil Founder Who am I? Engineer

DCS/CSCI 2350 Social &amp; Economic Networks How do diseases, behavior, opinion, technology,

How Predictable is Information Diffusion? Travis Martin, Jake Hofman, Amit Sharma, Ashton

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

DCS/CSCI 2350 Social & Economic Networks How do diseases, behavior, opinion, technology,