> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Understanding MCMC Marcel Lüthi, University of Basel Slides based on presentation by Sandro Schönborn 1
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL The big picture … which satisfies detailed … an aperiodic and irreducable balance condition for p(x) induces Metropolis Hastings If Markov Markov chain Algorithm Chain is a- periodic and irreducable converges to it … samples from Equilibrium is Distribution 𝑞(𝑦) distribution
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Understanding Markov Chains 3
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Markov Chain State space 𝑂 , 𝑌 𝑗 ∈ 𝑇 with joint distribution • Sequence of random variables 𝑌 𝑗 𝑗=1 Transition probability 𝑂 𝑄 𝑌 1 , 𝑌 2 , … , 𝑌 𝑂 = 𝑄 𝑌 1 ෑ 𝑄(𝑌 𝑗 |𝑌 𝑗−1 ) 𝑗=2 Initial distribution 1/6 1/3 2 Automatically true if we use • Simplifications: (for our analysis) 1 1 computers (e.g. 32 bit floats) • Discrete state space: 𝑇 = {1, 2, … , 𝐿 } 1 • Homogeneous Chain: 𝑄 𝑌 𝑗 = 𝑚 𝑌 𝑗−1 = 𝑛 = 𝑈 𝑚𝑛 1/2 3 4
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Example: Markov Chain 0.05 0.8 0.95 • Simple weather model: dry (D) or rainy (R) hour R • Condition in next hour? 𝑌 𝑢+1 D • State space 𝑇 = {𝐸, 𝑆} • Stochastic: 𝑄(𝑌 𝑢+1 |𝑌 𝑢 ) 0.2 • Depends only on current condition 𝑌 𝑢 • Draw samples from chain: DDDDDDDDRRRRRRRRRRRDDDDDDDDDDD • Initial: 𝑌 0 = 𝐸 DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD • Evolution: 𝑄 𝑌 𝑢+1 𝑌 𝑢 DDDDDDDDDRDD... • Long-term Behavior • Does it converge? Average probability of rain? • Dynamics? How quickly will it converge? 5
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Discrete Homogeneous Markov Chain Formally linear algebra: • Distribution (vector): 𝑄(𝑌 𝑗 = 1) 𝑄 𝑌 𝑗 : 𝒒 𝒋 = ⋮ 𝑄(𝑌 𝑗 = 𝐿) • Transition probability (transition matrix): 𝑄 1 ← 1 ⋯ 𝑄 1 ← 𝐿 𝑄 𝑌 𝑗 𝑌 𝑗−1 : 𝑈 = ⋮ ⋱ ⋮ 𝑄 𝐿 ← 1 ⋯ 𝑄 𝐿 ← 𝐿 𝑈 𝑚𝑛 = 𝑄 𝑚 ← 𝑛 = 𝑄 𝑌 𝑗 = 𝑚 𝑌 𝑗−1 = 𝑛 6
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Evolution of the Initial Distribution • Evolution of 𝑄 𝑌 1 → 𝑄(𝑌 2 ) : 𝑄 𝑌 2 = 𝑚 = 𝑄 𝑚 ← 𝑛 𝑄 𝑌 1 = 𝑛 𝑛∈𝑇 𝒒 2 = 𝑈𝒒 1 • Evolution of 𝑜 steps: 𝒒 𝑜+1 = 𝑈 𝑜 𝒒 1 • Is there a stable distribution 𝒒 ∗ ? (steady-state) A stable distribution is an 𝒒 ∗ = 𝑈𝒒 ∗ eigenvector of 𝑈 with eigenvalue 𝜇 = 1 7
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Steady-State Distribution: 𝒒 ∗ • It exists: • 𝑈 subject to normalization constraint: left eigenvector to eigenvalue 1 𝑈 𝑚𝑛 = 1 ⇔ 1 𝑈 = 1 1 … … 1 𝑚 • T has eigenvalue 𝜇 = 1 (left-/right eigenvalues are the same) • Steady-state distribution as corresponding right eigenvector 𝑈𝒒 ∗ = 𝒒 ∗ • Does any arbitrary initial distribution evolve to 𝒒 ∗ ? • Convergence? • Uniqueness? 8
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Equilibrium Distribution: 𝒒 ∗ • Additional requirement for 𝑈 : T n 𝑚𝑛 > 0 for 𝑜 > 𝑂 0 The chain is called irreducible and aperiodic (implies ergodic ) • All states are connected using at most 𝑂 0 steps • Return intervals to a certain state are irregular • Perron-Frobenius theorem for positive matrices: • PF1: 𝜇 1 = 1 is a simple eigenvalue with 1d eigenspace ( uniqueness ) • PF2: 𝜇 1 = 1 is dominant, all 𝜇 𝑗 < 1, 𝑗 ≠ 1 ( convergence ) • 𝒒 ∗ is a stable attractor, called equilibrium distribution 𝑈𝒒 ∗ = 𝒒 ∗ 9
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Convergence • Time evolution of arbitrary distribution 𝒒 0 𝒒 𝑜 = 𝑈 𝑜 𝒒 0 • Expand 𝒒 0 in Eigen basis of 𝑈: 𝑈𝒇 𝑗 = 𝜇 𝑗 𝒇 𝑗 , 𝜇 𝑗 < 𝜇 1 = 1, 𝜇 𝑙 ≥ |𝜇 𝑙+1 | 𝐿 𝒒 0 = 𝑑 𝑗 𝒇 𝑗 𝑗 𝐿 𝑈𝒒 0 = 𝑑 𝑗 𝜇 𝑗 𝒇 𝑗 𝑗 𝐿 𝑜 𝒇 𝑗 = 𝑑 1 𝒇 1 + 𝜇 2 𝑜 𝑑 2 𝒇 2 + 𝜇 3 𝑜 𝑑 3 𝒇 3 + ⋯ 𝑈 𝑜 𝒒 0 = 𝑑 𝑗 𝜇 𝑗 𝑗 10
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Convergence (II) 𝐿 𝑜 𝒇 𝑗 = 𝑑 1 𝒇 𝟐 + 𝜇 2 𝑜 𝑑 2 𝒇 2 + 𝜇 3 𝑜 𝑑 3 𝒇 3 + ⋯ 𝑈 𝑜 𝒒 0 = 𝑑 𝑗 𝜇 𝑗 𝑗 ≈ 𝒒 ∗ + 𝜇 2 𝑜 𝑑 2 𝒇 2 𝑑 1 𝒇 𝟐 = 𝒒 ∗ (𝑜 ≫ 1) • We have convergence : Normalizations: 𝑜→∞ 𝒒 ∗ 𝑈 𝑜 𝒒 0 𝒇 1 = 1 ∗ = 1 σ 𝑗 𝑞 𝑗 • Rate of convergence: = 𝜇 2 𝑜 𝑑 2 𝑜 𝑑 2 𝒇 2 𝒒 𝑜 − 𝒒 ∗ ≈ 𝜇 2 11
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Example: Weather Dynamics Rain forecast for stable versus mixed weather: 𝑡 = 0.95 0.2 𝑛 = 0.85 0.6 mixed stable 𝑋 𝑋 D R 0.05 0.8 0.15 0.4 𝒒 ∗ = 0.8 Long-term average 𝒒 ∗ = 0.8 0.2 probability of rain: 20% 20% 0.2 Eigenvalues: 1, 0.75 0.75 Eigenvalues: 1, 0.25 0.25 Rainy now, next hours? Rainy now, next hours? RRRRDDDDDDDDDDDD RDDDDDDDDDDDDDDD DDDDDDDDDDDDD... RDDDRDDDDDDDD... 12
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Markov Chain: First Results • Aperiodic and irreducible chains are ergodic : (every state reachable after > 𝑂 steps, irregular return time) • Convergence towards a unique equilibrium distribution 𝒒 ∗ • Equilibrium distribution 𝒒 ∗ • Eigenvector of 𝑈 with eigenvalue 𝜇 = 1 : 𝑈𝒒 ∗ = 𝒒 ∗ • Rate of convergence: Exponential decay with second largest eigenvalue ∝ 𝜇 2 𝑜 Only useful if we can design chain with desired equilibrium distribution! 13
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Detailed Balance • Special property of some Markov chains Distribution 𝑞 satisfies detailed balance if the total flow of probability between every pair of states is equal, (we have a local equilibrium): 𝑄 𝑚 ← 𝑛 𝑞 𝑛 = 𝑄 𝑛 ← 𝑚 𝑞 𝑚 • Detailed balance implies: 𝑞 is the equilibrium distribution 𝑈𝒒 𝑚 = 𝑈 𝑚𝑛 𝑞 𝑛 = 𝑈 𝑛𝑚 𝑞 𝑚 = 𝑞 𝑚 𝑛 𝑛 • Most MCMC methods construct chains which satisfies detailed balance. 14
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL The Metropolis-Hastings Algorithm MCMC to draw samples from an arbitrary distribution 16
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Idea of Metropolis Hastings algorithm • Design a Markov Chain, which satisfies the detailed balance condition 𝑈 𝑁𝐼 𝑦 ′ ← 𝑦 𝑄 𝑦 = 𝑈 𝑁𝐼 𝑦 ← 𝑦 ′ 𝑄 𝑦 ′ • Ergodicity ensures that chain converges to this distribution
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Attempt 1: A simple algorithm • Initialize with sample 𝒚 • Generate next sample, with current sample 𝒚 Draw a sample 𝒚 ′ from 𝑅(𝒚 ′ |𝒚) (“proposal”) 1. Emit current state 𝒚 as sample 2. • It’s a Markov chain • Need to choose Q for every P to satisfy detailed balance 𝑅 𝑦 ′ ← 𝑦 𝑄 𝑦 = 𝑅 𝑦 ← 𝑦 ′ 𝑄 𝑦 ′
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Attempt 2: More general solution • Initialize with sample 𝒚 • Generate next sample, with current sample 𝒚 Draw a sample 𝒚 ′ from 𝑅(𝒚 ′ |𝒚) (“proposal”) 1. With probability 𝛽(x, x ′ ) emit 𝒚 ′ as new sample 2. With probability 1 − 𝛽(x, x ′ ) emit 𝑦 as new sample 3. • It’s a Markov chain • Decouples Q from P through acceptance rule a • How to choose a?
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL What is the acceptance function a? 𝑈 𝑁𝐼 𝑦 ′ ← 𝑦 𝑄 𝑦 = 𝑈 𝑁𝐼 𝑦 ← 𝑦 ′ 𝑄 𝑦 ′ 𝑏 𝑦 ′ 𝑦 𝑅 𝑦 ′ 𝑦 𝑄 𝑦 = 𝑏 𝑦 𝑦′ 𝑅 𝑦 𝑦′ 𝑄 𝑦′ Case A: x’ = x • Detailed balance trivially satisfied for every a( x’,x ) Case B: 𝑦 ′ ≠ 𝑦 • We have the following requirement 𝑏 𝑦 ′ 𝑦 𝑏 𝑦 𝑦 ′ = 𝑅 𝑦 𝑦 ′ 𝑄 𝑦 ′ 𝑅 𝑦 ′ 𝑦 𝑄 𝑦
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL What is the acceptance function a? Requirement: Choose 𝑏(𝑦 ′ |𝑦) such that 𝑏 𝑦 ′ 𝑦 𝑏 𝑦 𝑦 ′ = 𝑅 𝑦 𝑦 ′ 𝑄 𝑦 ′ 𝑅 𝑦 ′ 𝑦 𝑄 𝑦 • 𝑏 𝑦 𝑦 ′ is probability distribution 𝑏 𝑦 𝑦 ′ ≤ 1 and 𝑏 𝑦 𝑦 ′ ≥ 0 • Easy to check that: 𝑏 𝑦 ′ 𝑦 = min 1, 𝑅 𝑦 𝑦 ′ 𝑄 𝑦 ′ 𝑅 𝑦 ′ 𝑦 𝑄 𝑦 satisfies this property.
Recommend
More recommend