Understanding MCMC Marcel Lthi, University of Basel Slides based - PowerPoint PPT Presentation

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Understanding MCMC Marcel Lüthi, University of Basel Slides based on presentation by Sandro Schönborn 1

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL The big picture … which satisfies detailed … an aperiodic and irreducable balance condition for p(x) induces Metropolis Hastings If Markov Markov chain Algorithm Chain is a- periodic and irreducable converges to it … samples from Equilibrium is Distribution 𝑞(𝑦) distribution

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Understanding Markov Chains 3

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Markov Chain State space 𝑂 , 𝑌 𝑗 ∈ 𝑇 with joint distribution • Sequence of random variables 𝑌 𝑗 𝑗=1 Transition probability 𝑂 𝑄 𝑌 1 , 𝑌 2 , … , 𝑌 𝑂 = 𝑄 𝑌 1 ෑ 𝑄(𝑌 𝑗 |𝑌 𝑗−1 ) 𝑗=2 Initial distribution 1/6 1/3 2 Automatically true if we use • Simplifications: (for our analysis) 1 1 computers (e.g. 32 bit floats) • Discrete state space: 𝑇 = {1, 2, … , 𝐿 } 1 • Homogeneous Chain: 𝑄 𝑌 𝑗 = 𝑚 𝑌 𝑗−1 = 𝑛 = 𝑈 𝑚𝑛 1/2 3 4

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Example: Markov Chain 0.05 0.8 0.95 • Simple weather model: dry (D) or rainy (R) hour R • Condition in next hour? 𝑌 𝑢+1 D • State space 𝑇 = {𝐸, 𝑆} • Stochastic: 𝑄(𝑌 𝑢+1 |𝑌 𝑢 ) 0.2 • Depends only on current condition 𝑌 𝑢 • Draw samples from chain: DDDDDDDDRRRRRRRRRRRDDDDDDDDDDD • Initial: 𝑌 0 = 𝐸 DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD • Evolution: 𝑄 𝑌 𝑢+1 𝑌 𝑢 DDDDDDDDDRDD... • Long-term Behavior • Does it converge? Average probability of rain? • Dynamics? How quickly will it converge? 5

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Discrete Homogeneous Markov Chain Formally linear algebra: • Distribution (vector): 𝑄(𝑌 𝑗 = 1) 𝑄 𝑌 𝑗 : 𝒒 𝒋 = ⋮ 𝑄(𝑌 𝑗 = 𝐿) • Transition probability (transition matrix): 𝑄 1 ← 1 ⋯ 𝑄 1 ← 𝐿 𝑄 𝑌 𝑗 𝑌 𝑗−1 : 𝑈 = ⋮ ⋱ ⋮ 𝑄 𝐿 ← 1 ⋯ 𝑄 𝐿 ← 𝐿 𝑈 𝑚𝑛 = 𝑄 𝑚 ← 𝑛 = 𝑄 𝑌 𝑗 = 𝑚 𝑌 𝑗−1 = 𝑛 6

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Evolution of the Initial Distribution • Evolution of 𝑄 𝑌 1 → 𝑄(𝑌 2 ) : 𝑄 𝑌 2 = 𝑚 = ෍ 𝑄 𝑚 ← 𝑛 𝑄 𝑌 1 = 𝑛 𝑛∈𝑇 𝒒 2 = 𝑈𝒒 1 • Evolution of 𝑜 steps: 𝒒 𝑜+1 = 𝑈 𝑜 𝒒 1 • Is there a stable distribution 𝒒 ∗ ? (steady-state) A stable distribution is an 𝒒 ∗ = 𝑈𝒒 ∗ eigenvector of 𝑈 with eigenvalue 𝜇 = 1 7

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Steady-State Distribution: 𝒒 ∗ • It exists: • 𝑈 subject to normalization constraint: left eigenvector to eigenvalue 1 ෍ 𝑈 𝑚𝑛 = 1 ⇔ 1 𝑈 = 1 1 … … 1 𝑚 • T has eigenvalue 𝜇 = 1 (left-/right eigenvalues are the same) • Steady-state distribution as corresponding right eigenvector 𝑈𝒒 ∗ = 𝒒 ∗ • Does any arbitrary initial distribution evolve to 𝒒 ∗ ? • Convergence? • Uniqueness? 8

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Equilibrium Distribution: 𝒒 ∗ • Additional requirement for 𝑈 : T n 𝑚𝑛 > 0 for 𝑜 > 𝑂 0 The chain is called irreducible and aperiodic (implies ergodic ) • All states are connected using at most 𝑂 0 steps • Return intervals to a certain state are irregular • Perron-Frobenius theorem for positive matrices: • PF1: 𝜇 1 = 1 is a simple eigenvalue with 1d eigenspace ( uniqueness ) • PF2: 𝜇 1 = 1 is dominant, all 𝜇 𝑗 < 1, 𝑗 ≠ 1 ( convergence ) • 𝒒 ∗ is a stable attractor, called equilibrium distribution 𝑈𝒒 ∗ = 𝒒 ∗ 9

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Convergence • Time evolution of arbitrary distribution 𝒒 0 𝒒 𝑜 = 𝑈 𝑜 𝒒 0 • Expand 𝒒 0 in Eigen basis of 𝑈: 𝑈𝒇 𝑗 = 𝜇 𝑗 𝒇 𝑗 , 𝜇 𝑗 < 𝜇 1 = 1, 𝜇 𝑙 ≥ |𝜇 𝑙+1 | 𝐿 𝒒 0 = ෍ 𝑑 𝑗 𝒇 𝑗 𝑗 𝐿 𝑈𝒒 0 = ෍ 𝑑 𝑗 𝜇 𝑗 𝒇 𝑗 𝑗 𝐿 𝑜 𝒇 𝑗 = 𝑑 1 𝒇 1 + 𝜇 2 𝑜 𝑑 2 𝒇 2 + 𝜇 3 𝑜 𝑑 3 𝒇 3 + ⋯ 𝑈 𝑜 𝒒 0 = ෍ 𝑑 𝑗 𝜇 𝑗 𝑗 10

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Convergence (II) 𝐿 𝑜 𝒇 𝑗 = 𝑑 1 𝒇 𝟐 + 𝜇 2 𝑜 𝑑 2 𝒇 2 + 𝜇 3 𝑜 𝑑 3 𝒇 3 + ⋯ 𝑈 𝑜 𝒒 0 = ෍ 𝑑 𝑗 𝜇 𝑗 𝑗 ≈ 𝒒 ∗ + 𝜇 2 𝑜 𝑑 2 𝒇 2 𝑑 1 𝒇 𝟐 = 𝒒 ∗ (𝑜 ≫ 1) • We have convergence : Normalizations: 𝑜→∞ 𝒒 ∗ 𝑈 𝑜 𝒒 0 𝒇 1 = 1 ∗ = 1 σ 𝑗 𝑞 𝑗 • Rate of convergence: = 𝜇 2 𝑜 𝑑 2 𝑜 𝑑 2 𝒇 2 𝒒 𝑜 − 𝒒 ∗ ≈ 𝜇 2 11

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Example: Weather Dynamics Rain forecast for stable versus mixed weather: 𝑡 = 0.95 0.2 𝑛 = 0.85 0.6 mixed stable 𝑋 𝑋 D R 0.05 0.8 0.15 0.4 𝒒 ∗ = 0.8 Long-term average 𝒒 ∗ = 0.8 0.2 probability of rain: 20% 20% 0.2 Eigenvalues: 1, 0.75 0.75 Eigenvalues: 1, 0.25 0.25 Rainy now, next hours? Rainy now, next hours? RRRRDDDDDDDDDDDD RDDDDDDDDDDDDDDD DDDDDDDDDDDDD... RDDDRDDDDDDDD... 12

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Markov Chain: First Results • Aperiodic and irreducible chains are ergodic : (every state reachable after > 𝑂 steps, irregular return time) • Convergence towards a unique equilibrium distribution 𝒒 ∗ • Equilibrium distribution 𝒒 ∗ • Eigenvector of 𝑈 with eigenvalue 𝜇 = 1 : 𝑈𝒒 ∗ = 𝒒 ∗ • Rate of convergence: Exponential decay with second largest eigenvalue ∝ 𝜇 2 𝑜 Only useful if we can design chain with desired equilibrium distribution! 13

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Detailed Balance • Special property of some Markov chains Distribution 𝑞 satisfies detailed balance if the total flow of probability between every pair of states is equal, (we have a local equilibrium): 𝑄 𝑚 ← 𝑛 𝑞 𝑛 = 𝑄 𝑛 ← 𝑚 𝑞 𝑚 • Detailed balance implies: 𝑞 is the equilibrium distribution 𝑈𝒒 𝑚 = ෍ 𝑈 𝑚𝑛 𝑞 𝑛 = ෍ 𝑈 𝑛𝑚 𝑞 𝑚 = 𝑞 𝑚 𝑛 𝑛 • Most MCMC methods construct chains which satisfies detailed balance. 14

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL The Metropolis-Hastings Algorithm MCMC to draw samples from an arbitrary distribution 16

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Idea of Metropolis Hastings algorithm • Design a Markov Chain, which satisfies the detailed balance condition 𝑈 𝑁𝐼 𝑦 ′ ← 𝑦 𝑄 𝑦 = 𝑈 𝑁𝐼 𝑦 ← 𝑦 ′ 𝑄 𝑦 ′ • Ergodicity ensures that chain converges to this distribution

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Attempt 1: A simple algorithm • Initialize with sample 𝒚 • Generate next sample, with current sample 𝒚 Draw a sample 𝒚 ′ from 𝑅(𝒚 ′ |𝒚) (“proposal”) 1. Emit current state 𝒚 as sample 2. • It’s a Markov chain • Need to choose Q for every P to satisfy detailed balance 𝑅 𝑦 ′ ← 𝑦 𝑄 𝑦 = 𝑅 𝑦 ← 𝑦 ′ 𝑄 𝑦 ′

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Attempt 2: More general solution • Initialize with sample 𝒚 • Generate next sample, with current sample 𝒚 Draw a sample 𝒚 ′ from 𝑅(𝒚 ′ |𝒚) (“proposal”) 1. With probability 𝛽(x, x ′ ) emit 𝒚 ′ as new sample 2. With probability 1 − 𝛽(x, x ′ ) emit 𝑦 as new sample 3. • It’s a Markov chain • Decouples Q from P through acceptance rule a • How to choose a?

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL What is the acceptance function a? 𝑈 𝑁𝐼 𝑦 ′ ← 𝑦 𝑄 𝑦 = 𝑈 𝑁𝐼 𝑦 ← 𝑦 ′ 𝑄 𝑦 ′ 𝑏 𝑦 ′ 𝑦 𝑅 𝑦 ′ 𝑦 𝑄 𝑦 = 𝑏 𝑦 𝑦′ 𝑅 𝑦 𝑦′ 𝑄 𝑦′ Case A: x’ = x • Detailed balance trivially satisfied for every a( x’,x ) Case B: 𝑦 ′ ≠ 𝑦 • We have the following requirement 𝑏 𝑦 ′ 𝑦 𝑏 𝑦 𝑦 ′ = 𝑅 𝑦 𝑦 ′ 𝑄 𝑦 ′ 𝑅 𝑦 ′ 𝑦 𝑄 𝑦

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL What is the acceptance function a? Requirement: Choose 𝑏(𝑦 ′ |𝑦) such that 𝑏 𝑦 ′ 𝑦 𝑏 𝑦 𝑦 ′ = 𝑅 𝑦 𝑦 ′ 𝑄 𝑦 ′ 𝑅 𝑦 ′ 𝑦 𝑄 𝑦 • 𝑏 𝑦 𝑦 ′ is probability distribution 𝑏 𝑦 𝑦 ′ ≤ 1 and 𝑏 𝑦 𝑦 ′ ≥ 0 • Easy to check that: 𝑏 𝑦 ′ 𝑦 = min 1, 𝑅 𝑦 𝑦 ′ 𝑄 𝑦 ′ 𝑅 𝑦 ′ 𝑦 𝑄 𝑦 satisfies this property.

Understanding MCMC Marcel Lthi, University of Basel Slides based - PowerPoint PPT Presentation

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Understanding MCMC Marcel Lthi, University of Basel Slides based on presentation by Sandro Schnborn 1 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Lattice Gaussian Sampling with Markov Chain Monte Carlo (MCMC) Cong Ling Imperial College London

Constrained MCMC Algorithms for ERG models Duy Vu and David Hunter Constraints ergm uses

Markov chains and MCMC methods Ingo Blechschmidt November 7th, 2014 Kleine Bayessche AG Markov

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI

Limiting Spectral Distribution of Stochastic Block Model Yizhe Zhu University of Washington

Nonlinear Distributional Gradient Temporal Difference Learning Chao Qu 1 Shie Mannor 2 Huan Xu 3 1

Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that

The hyperbolic Brownian plane Thomas Budzinski ENS Paris July 7th, 2016 Thomas Budzinski The

Partial match queries: a limit process Nicolas Broutin Ralph Neininger Henning Sulzbach Partial

Stochastic gradient methods for machine learning Francis Bach INRIA - Ecole Normale Sup

Clustering and K-means Root Mean Square Error (RMS) Data: ! x 1 , ! x 2 , , ! x N R d

Sambuz

Useful Links

Newsletter

Mail Us