Markov Chains DS GA 1002 Probability and Statistics for Data Science - - PowerPoint PPT Presentation

markov chains
SMART_READER_LITE
LIVE PREVIEW

Markov Chains DS GA 1002 Probability and Statistics for Data Science - - PowerPoint PPT Presentation

Markov Chains DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo Markov property The


slide-1
SLIDE 1

Markov Chains

DS GA 1002 Probability and Statistics for Data Science

http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall17 Carlos Fernandez-Granda

slide-2
SLIDE 2

Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

slide-3
SLIDE 3

Markov property

The future is conditionally independent from the past given the present For any t1 < . . . < tn+1 X (tn+1) is conditionally independent of

  • X (t1) , . . . ,

X (tn−1) given X (tn) If the state space of the random process is discrete p

X(tn+1)| X(t1), X(t2),..., X(tn) (xn+1|x1, x2, . . . , xn) = p X(tn+1)| X(tn) (xn+1|xn)

If the state space of the random process is continuous f

X(tn+1)| X(t1), X(t2),..., X(tn) (xn+1|x1, x2, . . . , xn) = f X(tn+1)| X(tn) (xn+1|xn)

slide-4
SLIDE 4

Directed graphical model

X1 X2 X3 X4 . . .

slide-5
SLIDE 5

Markov property

Iid sequences satisfy the Markov property Random walks satisfy the Markov property

slide-6
SLIDE 6

Markov chain

Random process satisfying the Markov property We consider discrete-time Markov chains with a finite state space Specified by the initial distribution and the transition probabilities p

X(0), X(1),..., X(n) (x0, x1, . . . , xn) := n

  • j=0

p

X(j)| X(0),..., X(j−1) (xj|x0, . . . , xj−1)

=

n

  • j=0

p

X(j)| X(j−1) (xj|xj−1)

slide-7
SLIDE 7

Time-homogeneous Markov chain

Transition probabilities between states are the same at every time step

  • T

X

  • jk := p

X(i+1)| X(i) (xj|xk)

The marginal distribution at each time i is represented by a state vector

  • p

X (i) :=

        p

X(i) (x1)

p

X(i) (x2)

· · · p

X(i) (xs)

       

slide-8
SLIDE 8

Car rental

Aim: Model location of cars 3 states: Los Angeles, San Francisco, San Jose New cars are uniformly distributed between the 3 states After that the transition probabilities are San Francisco Los Angeles San Jose

  • 0.6

0.1 0.3 San Francisco 0.2 0.8 0.3 Los Angeles 0.2 0.1 0.4 San Jose

slide-9
SLIDE 9

Car rental

Markov chain with

  • p

X(0) :=

     1/3 1/3 1/3      T :=      0.6 0.1 0.3 0.2 0.8 0.3 0.2 0.1 0.4     

slide-10
SLIDE 10

Car rental

SF LA SJ 0.2 0.2 0.6 0.1 0.8 0.1 0.3 0.3 0.4

slide-11
SLIDE 11

Car rental

5 10 15 Customer SF LA SJ

slide-12
SLIDE 12

Car rental

5 10 15 Customer SF LA SJ

slide-13
SLIDE 13

Car rental

5 10 15 Customer SF LA SJ

slide-14
SLIDE 14

Car rental

Probability that a car starts in SF and is in SJ after the 2nd customer p

X(0), X(2) (1, 3)

slide-15
SLIDE 15

Car rental

Probability that a car starts in SF and is in SJ after the 2nd customer p

X(0), X(2) (1, 3) = 3

  • i=1

p

X(0), X(1), X(2) (1, i, 3)

slide-16
SLIDE 16

Car rental

Probability that a car starts in SF and is in SJ after the 2nd customer p

X(0), X(2) (1, 3) = 3

  • i=1

p

X(0), X(1), X(2) (1, i, 3)

=

3

  • i=1

p

X(0) (1) p X(1)| X(0) (i|1) p X(2)| X(1) (3|i)

slide-17
SLIDE 17

Car rental

Probability that a car starts in SF and is in SJ after the 2nd customer p

X(0), X(2) (1, 3) = 3

  • i=1

p

X(0), X(1), X(2) (1, i, 3)

=

3

  • i=1

p

X(0) (1) p X(1)| X(0) (i|1) p X(2)| X(1) (3|i)

= p

X (0)1 3

  • i=1

Ti1T3i

slide-18
SLIDE 18

Car rental

Probability that a car starts in SF and is in SJ after the 2nd customer p

X(0), X(2) (1, 3) = 3

  • i=1

p

X(0), X(1), X(2) (1, i, 3)

=

3

  • i=1

p

X(0) (1) p X(1)| X(0) (i|1) p X(2)| X(1) (3|i)

= p

X (0)1 3

  • i=1

Ti1T3i = 0.6 · 0.2 + 0.2 · 0.1 + 0.2 · 0.4 3 ≈ 7.33 10−2

slide-19
SLIDE 19

State vector and transition matrix

For a Markov chain X with transition matrix T

X

  • p

X (i) = T X

p

X (i − 1)

If the Markov chain starts at time i ≥ 0 then

  • p

X (i) = T i

  • X

p

X (0)

where T i

  • X denotes multiplying i times by matrix T

X

slide-20
SLIDE 20

State vector and transition matrix

  • p

X(i) :=

        p

X(i) (x1)

p

X(i) (x2)

· · · p

X(i) (xs)

       

slide-21
SLIDE 21

State vector and transition matrix

  • p

X(i) :=

        p

X(i) (x1)

p

X(i) (x2)

· · · p

X(i) (xs)

        =         s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (x1|xj)

s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (x2|xj)

· · · s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (xs|xj)

       

slide-22
SLIDE 22

State vector and transition matrix

  • p

X(i) :=

        p

X(i) (x1)

p

X(i) (x2)

· · · p

X(i) (xs)

        =         s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (x1|xj)

s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (x2|xj)

· · · s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (xs|xj)

        =         p

X(i)| X(i−1) (x1|x1)

p

X(i)| X(i−1) (x1|x2)

· · · p

X(i)| X(i−1) (x1|xs)

p

X(i)| X(i−1) (x2|x1)

p

X(i)| X(i−1) (x2|x2)

· · · p

X(i)| X(i−1) (x2|xs)

· · · p

X(i)| X(i−1) (xs|x1)

p

X(i)| X(i−1) (xs|x2)

· · · p

X(i)| X(i−1) (xs|xs)

                p

X(i−1) (x1)

p

X(i−1) (x2)

· · · p

X(i−1) (xs)

       

slide-23
SLIDE 23

State vector and transition matrix

  • p

X(i) :=

        p

X(i) (x1)

p

X(i) (x2)

· · · p

X(i) (xs)

        =         s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (x1|xj)

s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (x2|xj)

· · · s

j=1 p X(i−1) (xj) p X(i)| X(i−1) (xs|xj)

        =         p

X(i)| X(i−1) (x1|x1)

p

X(i)| X(i−1) (x1|x2)

· · · p

X(i)| X(i−1) (x1|xs)

p

X(i)| X(i−1) (x2|x1)

p

X(i)| X(i−1) (x2|x2)

· · · p

X(i)| X(i−1) (x2|xs)

· · · p

X(i)| X(i−1) (xs|x1)

p

X(i)| X(i−1) (xs|x2)

· · · p

X(i)| X(i−1) (xs|xs)

                p

X(i−1) (x1)

p

X(i−1) (x2)

· · · p

X(i−1) (xs)

        = T

X

p

X(i−1)

slide-24
SLIDE 24

Car rental

Distribution for the 5th customer?

  • p

X(5)

slide-25
SLIDE 25

Car rental

Distribution for the 5th customer?

  • p

X(5) = T 5

  • X

p

X(0)

slide-26
SLIDE 26

Car rental

Distribution for the 5th customer?

  • p

X(5) = T 5

  • X

p

X(0)

=      0.281 0.534 0.185     

slide-27
SLIDE 27

Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

slide-28
SLIDE 28

Recurrent and transient states

A state is recurrent if P

  • X (j) = s for some j > i |

X (i) = s

  • = 1

A state is transient if P

  • X (j) = s for all j > i |

X (i) = s

  • > 0
slide-29
SLIDE 29

Employment

Aim: Employment dynamics 4 states: Student, Intern, Employed, Unemployed At 18 a person is either a student (prob. 0.9) or an intern (prob. 0.1) After that the transition probabilities are Student Intern Employed Unemployed       0.8 0.5 Student 0.1 0.5 Intern 0.1 0.9 0.4 Employed 0.1 0.6 Unemployed

slide-30
SLIDE 30

Employment

Markov chain with

  • p

X(0) :=

        0.9 0.1         , T

X :=

        0.8 0.5 0.1 0.5 0.1 0.9 0.4 0.1 0.6        

slide-31
SLIDE 31

Employment

Student Employed Intern Unemployed

0.1 0.8 0.1 0.9 0.1 0.4 0.6 0.5 0.5

slide-32
SLIDE 32

Employment

20 25 30 Age Stud. Int. Emp. Unemp.

slide-33
SLIDE 33

Employment

20 25 30 Age Stud. Int. Emp. Unemp.

slide-34
SLIDE 34

Employment

20 25 30 Age Stud. Int. Emp. Unemp.

slide-35
SLIDE 35

Employment

State 1 (student) is transient P

  • X (j) = 1 for all j > i |

X (i) = 1

slide-36
SLIDE 36

Employment

State 1 (student) is transient P

  • X (j) = 1 for all j > i |

X (i) = 1

  • ≥ P
  • X (i + 1) = 3 |

X (i) = 1

  • = 0.1 > 0
slide-37
SLIDE 37

Employment

State 1 (student) is transient P

  • X (j) = 1 for all j > i |

X (i) = 1

  • ≥ P
  • X (i + 1) = 3 |

X (i) = 1

  • = 0.1 > 0

State 3 (employed) is recurrent

P

  • X (j) = 3 for all j > i |

X (i) = 3

slide-38
SLIDE 38

Employment

State 1 (student) is transient P

  • X (j) = 1 for all j > i |

X (i) = 1

  • ≥ P
  • X (i + 1) = 3 |

X (i) = 1

  • = 0.1 > 0

State 3 (employed) is recurrent

P

  • X (j) = 3 for all j > i |

X (i) = 3

  • = P
  • X (j) = 4 for all j > i |

X (i) = 3

slide-39
SLIDE 39

Employment

State 1 (student) is transient P

  • X (j) = 1 for all j > i |

X (i) = 1

  • ≥ P
  • X (i + 1) = 3 |

X (i) = 1

  • = 0.1 > 0

State 3 (employed) is recurrent

P

  • X (j) = 3 for all j > i |

X (i) = 3

  • = P
  • X (j) = 4 for all j > i |

X (i) = 3

  • = lim

k→∞ P

  • X (i + 1) = 4 |

X (i) = 3

  • k
  • j=1

P

  • X (i + j + 1) = 4 |

X (i + j) = 4

slide-40
SLIDE 40

Employment

State 1 (student) is transient P

  • X (j) = 1 for all j > i |

X (i) = 1

  • ≥ P
  • X (i + 1) = 3 |

X (i) = 1

  • = 0.1 > 0

State 3 (employed) is recurrent

P

  • X (j) = 3 for all j > i |

X (i) = 3

  • = P
  • X (j) = 4 for all j > i |

X (i) = 3

  • = lim

k→∞ P

  • X (i + 1) = 4 |

X (i) = 3

  • k
  • j=1

P

  • X (i + j + 1) = 4 |

X (i + j) = 4

  • = lim

k→∞ 0.1 · 0.6k = 0

slide-41
SLIDE 41

Irreducible Markov chain

A Markov chain is irreducible if for any state x and y = x there exists m ≥ 0 such that P

  • X (i + m) = y |

X (i) = x

  • > 0

All states in an irreducible Markov chain are recurrent

slide-42
SLIDE 42

Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

slide-43
SLIDE 43

Period of a state

The period m of a state x of a Markov chain X is the largest integer such that the chain always takes km steps (for a positive integer k) to return to x

slide-44
SLIDE 44

Period of a state

A B C 1 0.1 0.9 1

slide-45
SLIDE 45

Aperiodic chain

A Markov chain X is aperiodic if all states have period equal to one

slide-46
SLIDE 46

Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

slide-47
SLIDE 47

Convergence in distribution

A Markov chain converges in distribution if the state vector converges to a constant vector

  • p∞ := lim

i→∞

p

X(i)

= lim

i→∞ T i

  • X

p

X(0)

slide-48
SLIDE 48

Mobile phones

◮ Company releases new mobile-phone model ◮ At the moment 90% of the phones are in stock, 10% have been sold

locally and none have been exported

◮ Each day a phone is sold with probability 0.2 and exported with

probability 0.1

◮ Initial state vector and transition matrix:

  • a :=

     0.9 0.1      , T

X =

     0.7 0.2 1 0.1 1     

slide-49
SLIDE 49

Mobile phones

In stock Sold Exported

0.2 0.1 0.7 1 1

slide-50
SLIDE 50

Mobile phones

5 10 15 20 Day In stock Sold Exported

slide-51
SLIDE 51

Mobile phones

5 10 15 20 Day In stock Sold Exported

slide-52
SLIDE 52

Mobile phones

5 10 15 20 Day In stock Sold Exported

slide-53
SLIDE 53

Mobile phones

The company wants to know how many phones are eventually sold locally and how many exported lim

i→∞

p

X(i) = lim i→∞ T i

  • X

p

X(0)

= lim

i→∞ T i

  • X

a

slide-54
SLIDE 54

Mobile phones

The transition matrix T

X has three eigenvectors

  • q1 :=

  1   ,

  • q2 :=

  1   ,

  • q3 :=

  0.80 −0.53 0.27   The corresponding eigenvalues are λ1 := 1, λ2 := 1 and λ3 := 0.7 Eigendecomposition of T

X:

T

X := QΛQ−1

Q :=

  • q1
  • q2
  • q3
  • Λ :=

  λ1 λ2 λ3  

slide-55
SLIDE 55

Mobile phones

We express the initial state vector a in terms of the eigenvectors Q−1 p

X(0) =

  0.3 0.7 1.122   so that

  • a = 0.3

q1 + 0.7 q2 + 1.122 q3

slide-56
SLIDE 56

Mobile phones

lim

i→∞ T i

  • X

a

slide-57
SLIDE 57

Mobile phones

lim

i→∞ T i

  • X

a = lim

i→∞ T i

  • X (0.3

q1 + 0.7 q2 + 1.122 q3)

slide-58
SLIDE 58

Mobile phones

lim

i→∞ T i

  • X

a = lim

i→∞ T i

  • X (0.3

q1 + 0.7 q2 + 1.122 q3) = lim

i→∞ 0.3 T i

  • X

q1 + 0.7 T i

  • X

q2 + 1.122 T i

  • X

q3

slide-59
SLIDE 59

Mobile phones

lim

i→∞ T i

  • X

a = lim

i→∞ T i

  • X (0.3

q1 + 0.7 q2 + 1.122 q3) = lim

i→∞ 0.3 T i

  • X

q1 + 0.7 T i

  • X

q2 + 1.122 T i

  • X

q3 = lim

i→∞ 0.3 λ i 1

q1 + 0.7 λ i

2

q2 + 1.122 λ i

3

q3

slide-60
SLIDE 60

Mobile phones

lim

i→∞ T i

  • X

a = lim

i→∞ T i

  • X (0.3

q1 + 0.7 q2 + 1.122 q3) = lim

i→∞ 0.3 T i

  • X

q1 + 0.7 T i

  • X

q2 + 1.122 T i

  • X

q3 = lim

i→∞ 0.3 λ i 1

q1 + 0.7 λ i

2

q2 + 1.122 λ i

3

q3 = lim

i→∞ 0.3

q1 + 0.7 q2 + 1.122 0.5 i q3

slide-61
SLIDE 61

Mobile phones

lim

i→∞ T i

  • X

a = lim

i→∞ T i

  • X (0.3

q1 + 0.7 q2 + 1.122 q3) = lim

i→∞ 0.3 T i

  • X

q1 + 0.7 T i

  • X

q2 + 1.122 T i

  • X

q3 = lim

i→∞ 0.3 λ i 1

q1 + 0.7 λ i

2

q2 + 1.122 λ i

3

q3 = lim

i→∞ 0.3

q1 + 0.7 q2 + 1.122 0.5 i q3 = 0.3 q1 + 0.7 q2

slide-62
SLIDE 62

Mobile phones

lim

i→∞ T i

  • X

a = lim

i→∞ T i

  • X (0.3

q1 + 0.7 q2 + 1.122 q3) = lim

i→∞ 0.3 T i

  • X

q1 + 0.7 T i

  • X

q2 + 1.122 T i

  • X

q3 = lim

i→∞ 0.3 λ i 1

q1 + 0.7 λ i

2

q2 + 1.122 λ i

3

q3 = lim

i→∞ 0.3

q1 + 0.7 q2 + 1.122 0.5 i q3 = 0.3 q1 + 0.7 q2 =   0.7 0.3  

slide-63
SLIDE 63

Mobile phones

5 10 15 20 Day 0.0 0.2 0.4 0.6 0.8 1.0

In stock Sold Exported

slide-64
SLIDE 64

Mobile phones

lim

i→∞ T i

  • X

p

X(0) =

   

  • Q−1

p

X(0)

  • 2
  • Q−1

p

X(0)

  • 1

   

  • b :=

  0.6 0.4   , Q−1 b =   0.6 0.4 0.75   (1)

  • c :=

  0.4 0.5 0.1   , Q−1 c =   0.23 0.77 0.50   (2)

slide-65
SLIDE 65

Initial state vector b

5 10 15 20 Day 0.0 0.2 0.4 0.6 0.8 1.0

In stock Sold Exported

slide-66
SLIDE 66

Initial state vector c

5 10 15 20 Day 0.0 0.2 0.4 0.6 0.8 1.0

slide-67
SLIDE 67

Stationary distribution

  • pstat is a stationary distribution of

X if T

X

pstat = pstat

  • pstat is an eigenvector with eigenvalue equal to one

If pstat is the initial state lim

i→∞

p

X(i) =

pstat

slide-68
SLIDE 68

Reversibility

Let X (i) be distributed according to a state vector p ∈ Rs (s = number of states)

  • X is reversible with respect to

p if P

  • X (i) = xj,

X (i + 1) = xk

  • = P
  • X (i) = xk,

X (i + 1) = xj

  • for all 1 ≤ j, k ≤ s

This is equivalent to the detailed-balance condition

  • T

X

  • kj

pj =

  • T

X

  • jk

pk, for all 1 ≤ j, k ≤ s

slide-69
SLIDE 69

Reversibility implies stationarity

The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X

  • T

X

p

  • j
slide-70
SLIDE 70

Reversibility implies stationarity

The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X

  • T

X

p

  • j =

s

  • k=1
  • T

X

  • jk

pk

slide-71
SLIDE 71

Reversibility implies stationarity

The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X

  • T

X

p

  • j =

s

  • k=1
  • T

X

  • jk

pk =

s

  • k=1
  • T

X

  • kj

pj

slide-72
SLIDE 72

Reversibility implies stationarity

The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X

  • T

X

p

  • j =

s

  • k=1
  • T

X

  • jk

pk =

s

  • k=1
  • T

X

  • kj

pj = pj

s

  • k=1
  • T

X

  • kj
slide-73
SLIDE 73

Reversibility implies stationarity

The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X

  • T

X

p

  • j =

s

  • k=1
  • T

X

  • jk

pk =

s

  • k=1
  • T

X

  • kj

pj = pj

s

  • k=1
  • T

X

  • kj

= pj

slide-74
SLIDE 74

Irreducible chains

Irreducible Markov chains have a single stationary distribution Follows from the Perron-Frobenius theorem:

◮ The transition matrix of an irreducible Markov chain has a single

eigenvector with eigenvalue equal to one

◮ The eigenvector has nonnegative entries

slide-75
SLIDE 75

Irreducible chains

If X is irreducible and aperiodic, its state vector converges to its stationary distribution pstat for any initial state vector p

X(0)

  • X converges in distribution to a random variable with pmf given by

pstat

slide-76
SLIDE 76

Car rental

Aim: Model location of cars 3 states: Los Angeles, San Francisco, San Jose New cars are uniformly distributed between the 3 states After that the transition probabilities are San Francisco Los Angeles San Jose

  • 0.6

0.1 0.3 San Francisco 0.2 0.8 0.3 Los Angeles 0.2 0.1 0.4 San Jose

slide-77
SLIDE 77

Car rental

What is the proportion of cars in each city eventually? Does this depend on the initial allocation?

slide-78
SLIDE 78

Car rental

Markov chain with

  • p

X(0) :=

     1/3 1/3 1/3      T :=      0.6 0.1 0.3 0.2 0.8 0.3 0.2 0.1 0.4     

slide-79
SLIDE 79

Car rental

SF LA SJ 0.2 0.2 0.6 0.1 0.8 0.1 0.3 0.3 0.4

slide-80
SLIDE 80

Car rental

The transition matrix has the following eigenvectors

  • q1 :=

  0.273 0.545 0.182   ,

  • q2 :=

  −0.577 0.789 −0.211   ,

  • q3 :=

  −0.577 −0.211 0.789   The eigenvalues are λ1 := 1, λ2 := 0.573 and λ3 := 0.227 No matter how the cars are allocated, 27.3% end up in San Francisco, 54.5% in LA and 18.2% in San Jose

slide-81
SLIDE 81

Car rental

5 10 15 20 Customer 0.0 0.2 0.4 0.6 0.8 1.0

SF LA SJ

slide-82
SLIDE 82

Car rental

5 10 15 20 Customer 0.0 0.2 0.4 0.6 0.8 1.0

SF LA SJ

slide-83
SLIDE 83

Car rental

5 10 15 20 Customer 0.0 0.2 0.4 0.6 0.8 1.0

SF LA SJ

slide-84
SLIDE 84

Definition Recurrence Periodicity Convergence Markov-chain Monte Carlo

slide-85
SLIDE 85

Markov-chain Monte Carlo

Irreducible aperiodic Markov chains converge to a unique stationary distribution Basic idea: Simulate a Markov chain that converges to the target distribution Very useful in Bayesian statistics Main challenge: Designing the Markov chain so the stationary distribution is the one we want

slide-86
SLIDE 86

Metropolis-Hastings algorithm

Aim: Construct a Markov chain such that its stationary distribution is

  • p ∈ Rs
  • pj := pX (xj) ,

1 ≤ j ≤ s Idea: Sample from an irreducible Markov chain with transition matrix T

  • n the same state space {x1, . . . , xs}, forcing it to converge to

p

slide-87
SLIDE 87

Metropolis-Hastings algorithm

Initialize X (0) to an arbitrary value, then for i = 1, 2, 3, . . .

  • 1. Generate C from

X (i − 1) by using T, i.e. P

  • C = k |

X (i − 1) = j

  • = Tkj,

1 ≤ j, k ≤ s

  • 2. Set
  • X (i) :=
  • C

with probability pacc

  • X (i − 1) , C
  • X (i − 1)
  • therwise

where the acceptance probability is defined as pacc (j, k) := min Tjk pk Tkj pj , 1

  • 1 ≤ j, k ≤ s
slide-88
SLIDE 88

Reversibility implies stationarity

Let X (i) be distributed according to a state vector p ∈ Rs

  • X is reversible with respect to

p if for all 1 ≤ j, k ≤ s P

  • X (i) = xj,

X (i + 1) = xk

  • = P
  • X (i) = xk,

X (i + 1) = xj

  • Equivalent to the detailed-balance condition
  • T

X

  • kj

pj =

  • T

X

  • jk

pk, for all 1 ≤ j, k ≤ s If X is reversible with respect to p, then p is a stationary distribution of X

slide-89
SLIDE 89

Reversibility of the Metropolis-Hastings chain

Holds if j = k. Assume j = k

  • T

X

  • kj := P
  • X (i) = k |

X (i − 1) = j

slide-90
SLIDE 90

Reversibility of the Metropolis-Hastings chain

Holds if j = k. Assume j = k

  • T

X

  • kj := P
  • X (i) = k |

X (i − 1) = j

  • = P
  • X (i) = C, C = k |

X (i − 1) = j

slide-91
SLIDE 91

Reversibility of the Metropolis-Hastings chain

Holds if j = k. Assume j = k

  • T

X

  • kj := P
  • X (i) = k |

X (i − 1) = j

  • = P
  • X (i) = C, C = k |

X (i − 1) = j

  • = P
  • X (i) = C | C = k,

X (i − 1) = j

  • P
  • C = k |

X (i − 1) = j

slide-92
SLIDE 92

Reversibility of the Metropolis-Hastings chain

Holds if j = k. Assume j = k

  • T

X

  • kj := P
  • X (i) = k |

X (i − 1) = j

  • = P
  • X (i) = C, C = k |

X (i − 1) = j

  • = P
  • X (i) = C | C = k,

X (i − 1) = j

  • P
  • C = k |

X (i − 1) = j

  • = pacc (j, k) Tkj
slide-93
SLIDE 93

Reversibility of the Metropolis-Hastings chain

Holds if j = k. Assume j = k

  • T

X

  • kj := P
  • X (i) = k |

X (i − 1) = j

  • = P
  • X (i) = C, C = k |

X (i − 1) = j

  • = P
  • X (i) = C | C = k,

X (i − 1) = j

  • P
  • C = k |

X (i − 1) = j

  • = pacc (j, k) Tkj

Similarly,

  • T

X

  • jk = pacc (k, j) Tjk
slide-94
SLIDE 94

Reversibility of the Metropolis-Hastings chain

  • T

X

  • kj

pj = pacc (j, k) Tkj pj

slide-95
SLIDE 95

Reversibility of the Metropolis-Hastings chain

  • T

X

  • kj

pj = pacc (j, k) Tkj pj = Tkj pj min Tjk pk Tkj pj , 1

slide-96
SLIDE 96

Reversibility of the Metropolis-Hastings chain

  • T

X

  • kj

pj = pacc (j, k) Tkj pj = Tkj pj min Tjk pk Tkj pj , 1

  • = min {Tjk

pk, Tkj pj}

slide-97
SLIDE 97

Reversibility of the Metropolis-Hastings chain

  • T

X

  • kj

pj = pacc (j, k) Tkj pj = Tkj pj min Tjk pk Tkj pj , 1

  • = min {Tjk

pk, Tkj pj} = Tjk pk min

  • 1, Tkj

pj Tjk pk

slide-98
SLIDE 98

Reversibility of the Metropolis-Hastings chain

  • T

X

  • kj

pj = pacc (j, k) Tkj pj = Tkj pj min Tjk pk Tkj pj , 1

  • = min {Tjk

pk, Tkj pj} = Tjk pk min

  • 1, Tkj

pj Tjk pk

  • = pacc (k, j) Tjk

pk

slide-99
SLIDE 99

Reversibility of the Metropolis-Hastings chain

  • T

X

  • kj

pj = pacc (j, k) Tkj pj = Tkj pj min Tjk pk Tkj pj , 1

  • = min {Tjk

pk, Tkj pj} = Tjk pk min

  • 1, Tkj

pj Tjk pk

  • = pacc (k, j) Tjk

pk =

  • T

X

  • jk

pk

slide-100
SLIDE 100

Generating a Poisson random variable

Aim: Generate a Poisson random variable X We don’t need to know the normalizing constant, just that pX (x) ∝ λx x!

slide-101
SLIDE 101

Auxiliary Markov chain

Tkj :=                         

1 2

if j = 0 and k = 0

1 2

if k = j + 1

1 2

if k = j − 1

  • therwise
slide-102
SLIDE 102

Acceptance probability

T is symmetric pacc (j, k) := min Tjk pX (k) Tkj pX (j) , 1

  • = min

pX (k) pX (j) , 1

slide-103
SLIDE 103

Acceptance probability

If j = 0 and k = 0 pacc (j, k) = 1

slide-104
SLIDE 104

Acceptance probability

If k = j + 1 pacc (j, j + 1) = min   

λj+1 (j+1)! λj j!

, 1    = min

  • λ

j + 1, 1

slide-105
SLIDE 105

Acceptance probability

If k = j − 1 pacc (j, j − 1) = min   

λj−1 (j−1)! λj j!

, 1    = min j λ, 1

slide-106
SLIDE 106

Generating a Poisson random variable

Initialize x0 = 0. For i = 1, 2, . . .

◮ Generate a Bernoulli sample b and a uniform sample u ◮ If b = 0:

◮ If xi−1 = 0, xi := 0 ◮ If xi−1 > 0: ◮ If u < xi−1 λ , xi−1 := xi − 1 ◮ Otherwise xi := xi−1

◮ If bi = 1:

◮ If u <

λ xi−1+1, xi := xi−1 + 1

◮ Otherwise xi := xi−1

slide-107
SLIDE 107

Generating a Poisson random variable with λ := 6

100 101 102 103

Iterations

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35

Distribution

1 2 3 4 5

slide-108
SLIDE 108

Metropolis-Hastings algorithm

Also works for infinite and continuous state spaces Only requires knowing ratio pX (x)

pX (y) or fX (x) fX (y)

slide-109
SLIDE 109

Useful in Bayesian statistics

Aim: Sample from fA|B We know fA and fB|A fA|B (a|b)

slide-110
SLIDE 110

Useful in Bayesian statistics

Aim: Sample from fA|B We know fA and fB|A fA|B (a|b) = fA (a) fB|A (b|a) ∞

u=−∞ fA (u) fB|A (b|u) du

slide-111
SLIDE 111

Useful in Bayesian statistics

Aim: Sample from fA|B We know fA and fB|A fA|B (a|b) = fA (a) fB|A (b|a) ∞

u=−∞ fA (u) fB|A (b|u) du

If we apply Metropolis-Hastings for any a1 = a2 we only need fA|B (a1|b) fA|B (a2|b)

slide-112
SLIDE 112

Useful in Bayesian statistics

Aim: Sample from fA|B We know fA and fB|A fA|B (a|b) = fA (a) fB|A (b|a) ∞

u=−∞ fA (u) fB|A (b|u) du

If we apply Metropolis-Hastings for any a1 = a2 we only need fA|B (a1|b) fA|B (a2|b) = fA (a1) fB|A (b|a1) fA (a2) fB|A (b|a2)