Understanding MCMC Marcel Lthi, University of Basel Slides based - - PowerPoint PPT Presentation

โ–ถ
understanding mcmc
SMART_READER_LITE
LIVE PREVIEW

Understanding MCMC Marcel Lthi, University of Basel Slides based - - PowerPoint PPT Presentation

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL Understanding MCMC Marcel Lthi, University of Basel Slides based on presentation by Sandro Schnborn 1 > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018


slide-1
SLIDE 1

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Understanding MCMC

Marcel Lรผthi, University of Basel

Slides based on presentation by Sandro Schรถnborn

1

slide-2
SLIDE 2

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

The big picture

Markov chain Equilibrium distribution Distribution ๐‘ž(๐‘ฆ) Metropolis Hastings Algorithm induces converges to samples from is If Markov Chain is a- periodic and irreducable itโ€ฆ โ€ฆ which satisfies detailed balance condition for p(x) โ€ฆ an aperiodic and irreducable

slide-3
SLIDE 3

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Understanding Markov Chains

3

slide-4
SLIDE 4

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Markov Chain

  • Sequence of random variables ๐‘Œ๐‘— ๐‘—=1

๐‘‚ , ๐‘Œ๐‘— โˆˆ ๐‘‡ with joint distribution

๐‘„ ๐‘Œ1, ๐‘Œ2, โ€ฆ , ๐‘Œ๐‘‚ = ๐‘„ ๐‘Œ1 เท‘

๐‘—=2 ๐‘‚

๐‘„(๐‘Œ๐‘—|๐‘Œ๐‘—โˆ’1)

  • Simplifications: (for our analysis)
  • Discrete state space: ๐‘‡ = {1, 2, โ€ฆ , ๐ฟ}
  • Homogeneous Chain: ๐‘„ ๐‘Œ๐‘— = ๐‘š ๐‘Œ๐‘—โˆ’1 = ๐‘› = ๐‘ˆ๐‘š๐‘›

Initial distribution Transition probability State space

4

Automatically true if we use computers (e.g. 32 bit floats)

1 2 3 1/3 1/2 1/6 1 1

slide-5
SLIDE 5

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Example: Markov Chain

  • Simple weather model: dry (D) or rainy (R) hour
  • Condition in next hour? ๐‘Œ๐‘ข+1
  • State space ๐‘‡ = {๐ธ, ๐‘†}
  • Stochastic: ๐‘„(๐‘Œ๐‘ข+1|๐‘Œ๐‘ข)
  • Depends only on current condition ๐‘Œ๐‘ข
  • Draw samples from chain:
  • Initial: ๐‘Œ0 = ๐ธ
  • Evolution: ๐‘„ ๐‘Œ๐‘ข+1 ๐‘Œ๐‘ข
  • Long-term Behavior
  • Does it converge? Average probability of rain?
  • Dynamics? How quickly will it converge?

DDDDDDDDRRRRRRRRRRRDDDDDDDDDDD DDDDDDDDDDDDDDDDDDDDDDDDDDDDDD DDDDDDDDDRDD...

5

D R 0.05 0.95 0.2 0.8

slide-6
SLIDE 6

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Discrete Homogeneous Markov Chain

Formally linear algebra:

  • Distribution (vector):

๐‘„ ๐‘Œ๐‘— : ๐’’๐’‹ = ๐‘„(๐‘Œ๐‘— = 1) โ‹ฎ ๐‘„(๐‘Œ๐‘— = ๐ฟ)

  • Transition probability (transition matrix):

๐‘„ ๐‘Œ๐‘— ๐‘Œ๐‘—โˆ’1 : ๐‘ˆ = ๐‘„ 1 โ† 1 โ‹ฏ ๐‘„ 1 โ† ๐ฟ โ‹ฎ โ‹ฑ โ‹ฎ ๐‘„ ๐ฟ โ† 1 โ‹ฏ ๐‘„ ๐ฟ โ† ๐ฟ ๐‘ˆ

๐‘š๐‘› = ๐‘„ ๐‘š โ† ๐‘› = ๐‘„ ๐‘Œ๐‘— = ๐‘š ๐‘Œ๐‘—โˆ’1 = ๐‘›

6

slide-7
SLIDE 7

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Evolution of the Initial Distribution

  • Evolution of ๐‘„ ๐‘Œ1 โ†’ ๐‘„(๐‘Œ2):

๐‘„ ๐‘Œ2 = ๐‘š = เท

๐‘›โˆˆ๐‘‡

๐‘„ ๐‘š โ† ๐‘› ๐‘„ ๐‘Œ1 = ๐‘› ๐’’2 = ๐‘ˆ๐’’1

  • Evolution of ๐‘œ steps:

๐’’๐‘œ+1 = ๐‘ˆ๐‘œ๐’’1

  • Is there a stable distribution ๐’’โˆ—? (steady-state)

๐’’โˆ— = ๐‘ˆ๐’’โˆ—

A stable distribution is an eigenvector of ๐‘ˆ with eigenvalue ๐œ‡ = 1

7

slide-8
SLIDE 8

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Steady-State Distribution: ๐’’โˆ—

  • It exists:
  • ๐‘ˆ subject to normalization constraint: left eigenvector to eigenvalue 1

เท

๐‘š

๐‘ˆ

๐‘š๐‘› = 1

โ‡” 1 โ€ฆ 1 ๐‘ˆ = 1 โ€ฆ 1

  • T has eigenvalue ๐œ‡ = 1 (left-/right eigenvalues are the same)
  • Steady-state distribution as corresponding right eigenvector

๐‘ˆ๐’’โˆ— = ๐’’โˆ—

  • Does any arbitrary initial distribution evolve to ๐’’โˆ—?
  • Convergence?
  • Uniqueness?

8

slide-9
SLIDE 9

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Equilibrium Distribution: ๐’’โˆ—

  • Additional requirement for ๐‘ˆ: Tn ๐‘š๐‘› > 0 for ๐‘œ > ๐‘‚0

The chain is called irreducible and aperiodic (implies ergodic)

  • All states are connected using at most ๐‘‚0 steps
  • Return intervals to a certain state are irregular
  • Perron-Frobenius theorem for positive matrices:
  • PF1: ๐œ‡1 = 1 is a simple eigenvalue with 1d eigenspace (uniqueness)
  • PF2: ๐œ‡1 = 1 is dominant, all ๐œ‡๐‘— < 1, ๐‘— โ‰  1 (convergence)
  • ๐’’โˆ— is a stable attractor, called equilibrium distribution

๐‘ˆ๐’’โˆ— = ๐’’โˆ—

9

slide-10
SLIDE 10

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Convergence

  • Time evolution of arbitrary distribution ๐’’0

๐’’๐‘œ = ๐‘ˆ๐‘œ๐’’0

  • Expand ๐’’0 in Eigen basis of ๐‘ˆ:

๐‘ˆ๐’‡๐‘— = ๐œ‡๐‘—๐’‡๐‘—, ๐œ‡๐‘— < ๐œ‡1 = 1, ๐œ‡๐‘™ โ‰ฅ |๐œ‡๐‘™+1| ๐’’0 = เท

๐‘—

๐ฟ

๐‘‘๐‘—๐’‡๐‘— ๐‘ˆ๐’’0 = เท

๐‘—

๐ฟ

๐‘‘๐‘—๐œ‡๐‘—๐’‡๐‘— ๐‘ˆ๐‘œ๐’’0 = เท

๐‘— ๐ฟ

๐‘‘๐‘—๐œ‡๐‘—

๐‘œ๐’‡๐‘— = ๐‘‘1๐’‡1 + ๐œ‡2 ๐‘œ๐‘‘2๐’‡2 + ๐œ‡3 ๐‘œ๐‘‘3๐’‡3 + โ‹ฏ

10

slide-11
SLIDE 11

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Convergence (II)

๐‘ˆ๐‘œ๐’’0 = เท

๐‘— ๐ฟ

๐‘‘๐‘—๐œ‡๐‘—

๐‘œ๐’‡๐‘— = ๐‘‘1๐’‡๐Ÿ + ๐œ‡2 ๐‘œ๐‘‘2๐’‡2 + ๐œ‡3 ๐‘œ๐‘‘3๐’‡3 + โ‹ฏ

โ‰ˆ ๐’’โˆ— + ๐œ‡2

๐‘œ๐‘‘2๐’‡2

  • We have convergence:

๐‘ˆ๐‘œ๐’’0

๐‘œโ†’โˆž ๐’’โˆ—

  • Rate of convergence:

๐’’๐‘œ โˆ’ ๐’’โˆ— โ‰ˆ ๐œ‡2

๐‘œ๐‘‘2๐’‡2

= ๐œ‡2 ๐‘œ ๐‘‘2

Normalizations: ๐’‡1 = 1 ฯƒ๐‘— ๐‘ž๐‘—

โˆ— = 1

๐‘‘1๐’‡๐Ÿ = ๐’’โˆ—

(๐‘œ โ‰ซ 1)

11

slide-12
SLIDE 12

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Example: Weather Dynamics

Rain forecast for stable versus mixed weather:

๐‘‹

๐‘ก = 0.95

0.2 0.05 0.8 stable ๐‘‹

๐‘› = 0.85

0.6 0.15 0.4 mixed ๐’’โˆ— = 0.8 0.2 ๐’’โˆ— = 0.8 0.2 Eigenvalues: 1, 0.75 0.75 Eigenvalues: 1, 0.25 0.25 RDDDDDDDDDDDDDDD RDDDRDDDDDDDD... RRRRDDDDDDDDDDDD DDDDDDDDDDDDD... Rainy now, next hours? Rainy now, next hours?

Long-term average probability of rain: 20% 20%

12

D R

slide-13
SLIDE 13

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Markov Chain: First Results

  • Aperiodic and irreducible chains are ergodic:

(every state reachable after > ๐‘‚ steps, irregular return time)

  • Convergence towards a unique equilibrium distribution ๐’’โˆ—
  • Equilibrium distribution ๐’’โˆ—
  • Eigenvector of ๐‘ˆ with eigenvalue ๐œ‡ = 1:

๐‘ˆ๐’’โˆ— = ๐’’โˆ—

  • Rate of convergence:

Exponential decay with second largest eigenvalue โˆ ๐œ‡2 ๐‘œ

13

Only useful if we can design chain with desired equilibrium distribution!

slide-14
SLIDE 14

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Detailed Balance

  • Special property of some Markov chains

Distribution ๐‘ž satisfies detailed balance if the total flow of probability between every pair of states is equal, (we have a local equilibrium):

๐‘„ ๐‘š โ† ๐‘› ๐‘ž ๐‘› = ๐‘„ ๐‘› โ† ๐‘š ๐‘ž ๐‘š

  • Detailed balance implies: ๐‘ž is the equilibrium distribution

๐‘ˆ๐’’ ๐‘š = เท

๐‘›

๐‘ˆ๐‘š๐‘›๐‘ž๐‘› = เท

๐‘›

๐‘ˆ๐‘›๐‘š๐‘ž๐‘š = ๐‘ž๐‘š

  • Most MCMC methods construct chains which satisfies detailed balance.

14

slide-15
SLIDE 15

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

The Metropolis-Hastings Algorithm

MCMC to draw samples from an arbitrary distribution

16

slide-16
SLIDE 16

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Idea of Metropolis Hastings algorithm

  • Design a Markov Chain, which satisfies the detailed balance condition

๐‘ˆ๐‘๐ผ ๐‘ฆโ€ฒ โ† ๐‘ฆ ๐‘„ ๐‘ฆ = ๐‘ˆ๐‘๐ผ ๐‘ฆ โ† ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ

  • Ergodicity ensures that chain converges to this distribution
slide-17
SLIDE 17

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Attempt 1: A simple algorithm

  • Initialize with sample ๐’š
  • Generate next sample, with current sample ๐’š

1. Draw a sample ๐’šโ€ฒ from ๐‘…(๐’šโ€ฒ|๐’š) (โ€œproposalโ€) 2. Emit current state ๐’š as sample

  • Itโ€™s a Markov chain
  • Need to choose Q for every P to satisfy detailed balance

๐‘… ๐‘ฆโ€ฒ โ† ๐‘ฆ ๐‘„ ๐‘ฆ = ๐‘… ๐‘ฆ โ† ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ

slide-18
SLIDE 18

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

Attempt 2: More general solution

  • Initialize with sample ๐’š
  • Generate next sample, with current sample ๐’š

1. Draw a sample ๐’šโ€ฒ from ๐‘…(๐’šโ€ฒ|๐’š) (โ€œproposalโ€) 2. With probability ๐›ฝ(x, xโ€ฒ) emit ๐’šโ€ฒ as new sample 3. With probability 1 โˆ’ ๐›ฝ(x, xโ€ฒ) emit ๐‘ฆ as new sample

  • Itโ€™s a Markov chain
  • Decouples Q from P through acceptance rule a
  • How to choose a?
slide-19
SLIDE 19

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

What is the acceptance function a?

๐‘ˆ๐‘๐ผ ๐‘ฆโ€ฒ โ† ๐‘ฆ ๐‘„ ๐‘ฆ = ๐‘ˆ๐‘๐ผ ๐‘ฆ โ† ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ ๐‘ ๐‘ฆโ€ฒ ๐‘ฆ ๐‘… ๐‘ฆโ€ฒ ๐‘ฆ ๐‘„ ๐‘ฆ = ๐‘ ๐‘ฆ ๐‘ฆโ€ฒ ๐‘… ๐‘ฆ ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ Case A: xโ€™ = x

  • Detailed balance trivially satisfied for every a(xโ€™,x)

Case B: ๐‘ฆโ€ฒ โ‰  ๐‘ฆ

  • We have the following requirement

๐‘ ๐‘ฆโ€ฒ ๐‘ฆ ๐‘ ๐‘ฆ ๐‘ฆโ€ฒ = ๐‘… ๐‘ฆ ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ ๐‘… ๐‘ฆโ€ฒ ๐‘ฆ ๐‘„ ๐‘ฆ

slide-20
SLIDE 20

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

What is the acceptance function a?

Requirement: Choose ๐‘(๐‘ฆโ€ฒ|๐‘ฆ) such that ๐‘ ๐‘ฆโ€ฒ ๐‘ฆ ๐‘ ๐‘ฆ ๐‘ฆโ€ฒ = ๐‘… ๐‘ฆ ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ ๐‘… ๐‘ฆโ€ฒ ๐‘ฆ ๐‘„ ๐‘ฆ

  • ๐‘ ๐‘ฆ ๐‘ฆโ€ฒ is probability distribution ๐‘ ๐‘ฆ ๐‘ฆโ€ฒ โ‰ค 1 and ๐‘ ๐‘ฆ ๐‘ฆโ€ฒ โ‰ฅ 0
  • Easy to check that:

๐‘ ๐‘ฆโ€ฒ ๐‘ฆ = min 1, ๐‘… ๐‘ฆ ๐‘ฆโ€ฒ ๐‘„ ๐‘ฆโ€ฒ ๐‘… ๐‘ฆโ€ฒ ๐‘ฆ ๐‘„ ๐‘ฆ satisfies this property.

slide-21
SLIDE 21

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2018 | BASEL

The big picture

Markov chain Equilibrium distribution Distribution ๐‘ž(๐‘ฆ) Metropolis Hastings Algorithm induces converges to samples from is If Markov Chain is a- periodic and irreducable itโ€ฆ โ€ฆ which satisfies detailed balance condition for p(x) โ€ฆ an aperiodic and irreducable