Markov chain Monte Carlo methods Youssef Marzouk Department of - PowerPoint PPT Presentation

Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology http://uqgroup.mit.edu , ymarz@mit.edu 7 July 2015 Marzouk (MIT) ICERM IdeaLab 7 July 2015 1 / 35

Overview of the lecture Markov chain Monte Carlo (MCMC) Metropolis-Hastings algorithm, transition kernels, ergodicity Mixture and cycles of kernels Gibbs sampling Gradient-exploiting MCMC, adaptive MCMC, other practicalities Using approximations (e.g., approximate likelihoods) within MCMC Marzouk (MIT) ICERM IdeaLab 7 July 2015 2 / 35

Why Markov chain Monte Carlo (MCMC)? In general, MCMC provides a means of sampling (“simulating”) from an arbitrary distribution. The density π ( x ) need be known only up to a normalizing constant Utility in inference and prediction : write both as posterior expectations, E π f . Marzouk (MIT) ICERM IdeaLab 7 July 2015 3 / 35

Why Markov chain Monte Carlo (MCMC)? In general, MCMC provides a means of sampling (“simulating”) from an arbitrary distribution. The density π ( x ) need be known only up to a normalizing constant Utility in inference and prediction : write both as posterior expectations, E π f . Then n E π f ≈ 1 � x ( i ) � � f n i x ( i ) will be asymptotically distributed according to π x ( i ) will not be i.i.d. In other words, we must pay a price! Marzouk (MIT) ICERM IdeaLab 7 July 2015 3 / 35

Construction of an MCMC sampler Define a Markov chain (i.e., discrete time). For real-valued random variables, the chain has a continuous-valued state space (e.g., R d ). Ingredients of the definition: Initial distribution, x 0 ∼ π 0 Transition kernel K ( x n , x n + 1 ) . � K ( x , x ′ ) dx ′ P ( X n + 1 ∈ A | X n = x ) = A (Analogy: consider matrix of transition probabilities for a finite state space.) Markov property: X n + 1 depends only on X n . Goal: design transition kernel K such that chain converges asymptotically to the target distribution π independently of the initial distribution (starting point). Marzouk (MIT) ICERM IdeaLab 7 July 2015 4 / 35

Construction of an MCMC sampler (cont.) Goal: choose transition kernel K such that chain converges asymptotically to the target distribution π independently of the starting point. Use realizations of X n , X n − 1 , . . . in a Monte Carlo estimator of posterior expectations (an ergodic average) Would like to converge to the target distribution quickly and to have samples as close to independent as possible Price for non-i.i.d. samples: greater variance in MC estimates of posterior expectations Marzouk (MIT) ICERM IdeaLab 7 July 2015 5 / 35

Metropolis-Hastings algorithm A simple recipe! Draw a proposal y from q ( y | x n ) 1 Calculate acceptance ratio 2 � 1 , π ( y ) q ( x n | y ) � α ( x n , y ) = min π ( x n ) q ( y | x n ) Put 3 � y , with probability α ( x n , y ) x n + 1 = x n , with probability 1 − α ( x n , y ) Marzouk (MIT) ICERM IdeaLab 7 July 2015 6 / 35

Metropolis-Hastings algorithm A simple recipe! Draw a proposal y from q ( y | x n ) 1 Calculate acceptance ratio 2 � 1 , π ( y ) q ( x n | y ) � α ( x n , y ) = min π ( x n ) q ( y | x n ) Put 3 � y , with probability α ( x n , y ) x n + 1 = x n , with probability 1 − α ( x n , y ) Very cool demo, thanks to Chi Feng (MIT): http://chifeng.scripts.mit.edu/stuff/mcmc-demo/ Marzouk (MIT) ICERM IdeaLab 7 July 2015 6 / 35

Metropolis-Hastings algorithm Notes on the algorithm: If q ( y | x n ) ∝ π ( y ) then α = 1. Thus we “correct” for sampling from q , rather than from π , via the Metropolis acceptance step. q does not have to be symmetric. If the proposal is symmetric, the acceptance probability simplifies (a “Hastings” proposal). π need be evaluated only up to a multiplicative constant Marzouk (MIT) ICERM IdeaLab 7 July 2015 7 / 35

Metropolis-Hastings algorithm What is the transition kernel of the Markov chain we have just defined? Hint: it is not q ! Marzouk (MIT) ICERM IdeaLab 7 July 2015 8 / 35

Metropolis-Hastings algorithm What is the transition kernel of the Markov chain we have just defined? Hint: it is not q ! Informally, it is K ( x n , x n + 1 ) = p ( x n + 1 | accept ) P [ accept ] + p ( x n + 1 | reject ) P [ reject ] Marzouk (MIT) ICERM IdeaLab 7 July 2015 8 / 35

Metropolis-Hastings algorithm What is the transition kernel of the Markov chain we have just defined? Hint: it is not q ! Informally, it is K ( x n , x n + 1 ) = p ( x n + 1 | accept ) P [ accept ] + p ( x n + 1 | reject ) P [ reject ] More precisely, we have: K ( x n , x n + 1 ) = p ( x n + 1 | x n ) q ( x n + 1 | x n ) α ( x n , x n + 1 ) + δ x n ( x n + 1 ) r ( x n ) , = � where r ( x n ) ≡ q ( y | x n ) ( 1 − α ( x n , y )) dy Marzouk (MIT) ICERM IdeaLab 7 July 2015 8 / 35

Metropolis-Hastings algorithm Now, some theory. What are the key questions? Is π a stationary distribution of the chain? (Is the chain 1 π -invariant?) Stationarity: π is such that X n ∼ π ⇒ X n + 1 ∼ π Does the chain converge to stationarity? In other words, as n → ∞ , 2 does L ( X n ) converge to π ? Can we use paths of the chain in Monte Carlo estimates? 3 Marzouk (MIT) ICERM IdeaLab 7 July 2015 9 / 35

Metropolis-Hastings algorithm Now, some theory. What are the key questions? Is π a stationary distribution of the chain? (Is the chain 1 π -invariant?) Stationarity: π is such that X n ∼ π ⇒ X n + 1 ∼ π Does the chain converge to stationarity? In other words, as n → ∞ , 2 does L ( X n ) converge to π ? Can we use paths of the chain in Monte Carlo estimates? 3 A sufficient (but not necessary) condition for (1) is detailed balance (also called ‘reversibility’): π ( x n ) K ( x n , x n + 1 ) = π ( x n + 1 ) K ( x n + 1 , x n ) Marzouk (MIT) ICERM IdeaLab 7 July 2015 9 / 35

Metropolis-Hastings algorithm Now, some theory. What are the key questions? Is π a stationary distribution of the chain? (Is the chain 1 π -invariant?) Stationarity: π is such that X n ∼ π ⇒ X n + 1 ∼ π Does the chain converge to stationarity? In other words, as n → ∞ , 2 does L ( X n ) converge to π ? Can we use paths of the chain in Monte Carlo estimates? 3 A sufficient (but not necessary) condition for (1) is detailed balance (also called ‘reversibility’): π ( x n ) K ( x n , x n + 1 ) = π ( x n + 1 ) K ( x n + 1 , x n ) This expresses an equilibrium in the flow of the chain � � Hence π ( x n ) K ( x n , x n + 1 ) dx n = π ( x n + 1 ) K ( x n + 1 , x n ) dx n = � π ( x n + 1 ) K ( x n + 1 , x n ) dx n = π ( x n + 1 ) . As an exercise, verify detailed balance for the M-H kernel defined on the previous slide. Marzouk (MIT) ICERM IdeaLab 7 July 2015 9 / 35

Metropolis-Hastings algorithm Beyond π -invariance, we also need to establish (2) and (3) from the previous slide. This leads to additional technical requirements: π -irreducibility: for every set A with π ( A ) > 0, there exists n such that K n ( x , A ) > 0 ∀ x . Intuition: chain visits any measurable subset with nonzero probability in a finite number of steps. Helps you “forget” the initial condition. Sufficient to have q ( y | x ) > 0 for every ( x , y ) ∈ χ × χ . Aperiodicity: “don’t get trapped in cycles” Marzouk (MIT) ICERM IdeaLab 7 July 2015 10 / 35

Metropolis-Hastings algorithm When these requirements are satisfied (i.e., chain is irreducible and aperiodic , with stationary distribution π ) we have 1 � � � K n ( x , · ) µ ( dx ) − π ( · ) � � lim = 0 � � n →∞ � � TV for every initial distribution µ . K n is the kernel for n transitions K n ( x , · ) µ ( dx ) = L ( X n ) � This yields the law of X n : The total variation distance � µ 1 − µ 2 � TV = sup A | µ 1 ( A ) − µ 2 ( A ) | is the largest possible difference between the probabilities that the two measures can assign to the same event. Marzouk (MIT) ICERM IdeaLab 7 July 2015 11 / 35

Metropolis-Hastings algorithm When these requirements are satisfied (i.e., chain is irreducible and aperiodic , with stationary distribution π ) we have For h ∈ L 1 π , 2 n 1 � x ( i ) � � lim h = E π [ h ] w.p. 1 n n →∞ i This is a strong law of large numbers that allows computation of posterior expectations. Obtaining a central limit theorem, or more generally saying anything about the rate of convergence to stationarity, requires additional conditions (e.g., geometric ergodicity). See [Roberts & Rosenthal 2004] for an excellent survey of MCMC convergence results. Marzouk (MIT) ICERM IdeaLab 7 July 2015 11 / 35

Metropolis-Hastings algorithm What about the quality of MCMC estimates? Marzouk (MIT) ICERM IdeaLab 7 July 2015 12 / 35

Metropolis-Hastings algorithm What about the quality of MCMC estimates? What is the price one pays for correlated samples? Compare Monte Carlo (iid) and MCMC estimates of E π h (and for the latter, assume we have a CLT): Monte Carlo = Var π [ h ( X )] � ¯ � Var h n n MCMC = Var π [ h ( X )] � ¯ � Var h n θ n where ∞ � θ = 1 + 2 corr ( h ( X i ) , h ( X i + s )) s > 0 is the integrated autocorrelation . Marzouk (MIT) ICERM IdeaLab 7 July 2015 12 / 35

Metropolis-Hastings algorithm Now try a very simple computational demonstration: MCMC sampling from a univariate distribution. Marzouk (MIT) ICERM IdeaLab 7 July 2015 13 / 35

Markov chain Monte Carlo methods Youssef Marzouk Department of - PowerPoint PPT Presentation

Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology http://uqgroup.mit.edu , ymarz@mit.edu 7 July 2015 Marzouk (MIT) ICERM

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Cooperation and Competition among Business Schools 2008 International Business School Shanghai

Whats the Most Important Thing How to w to Bring Bring Out th t the Best Best in You

DEVELOPMENT Masterclass Tim Coburn Partner, Accelerance 11 th May 2016, London Innovative

Bouncing, rolling and sticking of stiff and soft bodies Amabile Tatone Dipartimento di

The K 3 form factor from four-flavor lattice QCD and | V us | Aida X. El-Khadra (University

Object Detection Deep ConvNets for Recognition for... Images (global) Objects (local) Video

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting

Markov chain Monte Carlo methods Youssef Marzouk Department of - PowerPoint PPT Presentation

Markov chain Monte Carlo methods Youssef Marzouk Department of Aeronautics and Astronautics Center for Computational Engineering Massachusetts Institute of Technology http://uqgroup.mit.edu , ymarz@mit.edu 7 July 2015 Marzouk (MIT) ICERM

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

MARKOV CHAIN MONTE CARLO METHODS MARKOV CHAIN MONTE CARLO METHODS MARKO LAINE, FMI MARKO LAINE,

Markov chain Monte Carlo Dr. Jarad Niemi STAT 544 - Iowa State University April 2, 2018 Jarad

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabs Pczos

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Probabilistic Graphical Models Probabilistic Graphical Models Markov Chain Monte Carlo Inference

Markov chain Monte Carlo Reminder Need to sample large, non-standard distributions: Markov

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National University Monte Carlo

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Markov Chains Markov Processes Discrete-time Markov Chains Continuous-time Markov Chains Dr

Bayesian inference &amp; Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly

Cooperation and Competition among Business Schools 2008 International Business School Shanghai

Whats the Most Important Thing How to w to Bring Bring Out th t the Best Best in You

DEVELOPMENT Masterclass Tim Coburn Partner, Accelerance 11 th May 2016, London Innovative

Bouncing, rolling and sticking of stiff and soft bodies Amabile Tatone Dipartimento di

The K 3 form factor from four-flavor lattice QCD and | V us | Aida X. El-Khadra (University

Object Detection Deep ConvNets for Recognition for... Images (global) Objects (local) Video

Transport methods for sampling: low-dimensional structure and preconditioning Youssef Marzouk

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting

Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly