+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar - - PowerPoint PPT Presentation
+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar - - PowerPoint PPT Presentation
+ Advanced Sampling Algorithms + Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Advanced Sampling Aditya Kulkarni Algorithms Tobias Bertelsen Nirandika Wanigasekara Malay Singh +
+
Advanced Sampling Algorithms
Mobashir Mohammad Hirak Sarkar Parvathy Sudhir Yamilet Serrano Llerena Aditya Kulkarni Tobias Bertelsen Nirandika Wanigasekara Malay Singh
+
Introduction
Mobashir Mohammad 3
+Sampling Algorithms
Monte Carlo Algorithms
Gives an approximate answer with a certain probability May terminate with a failure signal
Las Vegas Algorithms
Always gives an exact answer Doesn’t terminates in a definite time 4
Correct answer not Guaranteed Time Bounded Answer Guaranteed Time not Bounded
+Monte Carlo Method
5
What are the chances of winning with a properly shuffled
deck?
Hard to compute because winning or losing depends on a
complex procedure of reorganizing the cards
Insight
Why not play just a few hands, and see empirically how many do
in fact we win?
More generally
We can approximate a probability density function using only a
few samples from that density
+Simplest Monte Carlo Method
Finding the value of π – “Shooting Darts”
6 1 1
π/4 is equal to the area
- f a circle of diameter 1
Integral solved with Monte Carlo
p 4 = dx dy
ò
x2+ y2 <1
ò
Randomly select a large number of points inside the square
p » 4 * N _ inside_ circle N _ total
(Exact when N_total ∞)
+Monte Carlo Principle
Given a very large set X and a distribution p(x) over it We draw a set of N independent and identically distributed
samples
We can then approximate the distribution using these
samples.
7
𝑞𝑂 𝑦 = 1 𝑜
𝑗=1 𝑂
1 𝑦𝑗=𝑦 →𝑂→∞ 𝑞 𝑦
We can also use these sample to
compute expectations, …
+Sampling
Problem:
Exact inference from a probability distribution is hard Determine the number of people likely to go shopping to a
Nike Store at Vivo City on a Sunday?
Solution:
Use random samples to get an approximate inference.
How:
Use a Markov Chain to get random samples! 8
+
Markov Chains
Hirak Sarkar 9
+Introduction to Markov Chain
Real world systems contain uncertainty and evolve over time Stochastic Processes and Markov Chains model such systems A discrete time stochastic process is a sequence of random
variables 𝑌0, 𝑌1, … and typically denoted by {𝑌𝑜}
10
+Components of a Stochastic Process
The state space of a stochastic process is the set of all values
that the {𝑌𝑜} can take.
Time: 𝑜 = 1,2, … Set of states is denoted by 𝑇.
𝑇 = {𝑡1, 𝑡2, … , 𝑡𝑙} here there are 𝑙 states Clearly {𝑌𝑜} will take one of 𝑙 values
11
+Markov Property and Markov Chain
Consider special class of stochastic processes that satisfy a
certain property called Markov Property.
Markov Property: State for 𝑌𝑜 only depends on content of
𝑌𝑜−1 not any of 𝑌𝑗 for 𝑗 < 𝑜 − 1
Formally,
𝑄 𝑌𝑜 = 𝑗𝑜 𝑌0 = 𝑗0, … , 𝑌𝑜−1 = 𝑗𝑜−1) = 𝑄 𝑌𝑜 = 𝑗𝑜 𝑌𝑜−1 = 𝑗𝑜−1) = 𝑞(𝑗𝑜, 𝑗𝑜−1)
We call such processes as 𝑁𝑏𝑠𝑙𝑝𝑤 𝐷ℎ𝑏𝑗𝑜 A matrix containing 𝑞(𝑗𝑜, 𝑗𝑜−1) in (𝑗𝑜, 𝑗𝑜−1) cell is called
Transition Matrix.
12
+Transition Matrix
The one-step transition matrix for a Markov chain with states
𝑇 = 0,1,2 is
If 𝑄 is same for any point of time then it is called time
homogeneous.
𝑘=1 𝑛
𝑞𝑗,𝑘 = 1 𝑞𝑗,𝑘 ≥ 0 ∀𝑗, 𝑘 Where 𝑞𝑗𝑘 = Pr 𝑌1 = 𝑘 𝑌0 = 𝑗
22 21 20 12 11 10 02 01 00
p p p p p p p p p P
13
+Example: Random Walk
Adjacency Matrix Transition Matrix
1 1/2 1/2 1 14
+
15
What is a random walk
1 1/2 1/2 1 A B C
t=0
+
16
What is a random walk
1 1/2 1/2 1 1 1/2 1/2 1 A B C
t=0 t=1
A B C
+
17
What is a random walk
1 1/2 1/2 1 1 1/2 1/2 1 1 1/2 1/2 1 A B C
t=0 t=1 t=2
A B C A B C
+
18
What is a random walk
1 1/2 1/2 1 1 1/2 1/2 1
t=0 t=1
1 1/2 1/2 1
t=2
1 1/2 1/2 1
t=3
A B C A B C A B C A B C
+
19
Probability Distributions
𝑦𝑢 𝑗 = 𝑄 𝑌𝑢 = 𝑗 = probability that the surfer is at
node 𝑗 at time t
𝑦𝑢+1(𝑗)= 𝑘 𝑄 𝑌𝑢+1 = 𝑗 𝑌𝑢 = 𝑘) 𝑄(𝑌𝑢 = 𝑘) = 𝑘 𝑞𝑗,𝑘 𝑦𝑢(𝑘) 𝒚𝒖+𝟐= 𝒚𝒖𝑸= 𝒚𝒖−𝟐𝑸 𝑸= 𝒚𝒖−𝟑𝑸 𝑸 𝑸 = …=𝒚𝟏𝑸𝒖 What happens when the surfer keeps walking for a
long time?
+
20
Stationary Distribution
When the surfer keeps walking for a long time When the distribution does not change anymore
i.e. 𝑦𝑈+1 = 𝑦𝑈
We denote it as 𝝆
+Basic Limit Theorem
𝑄 =
1
1 3 2 3 1 3 1 3 1 3
𝑄5 = 0.246914 0.407407 0.345679 0.251029 0.36214 0.386831 0.251029 0.366255 0.382716
𝑄10 =
0.250013 0.37474 0.375248 0.249996 0.375095 0.374909 0.249996 0.375078 0.382926
𝑄20 =
0.2500000002 0.3749999913 0.3750000085 0.2499999999 0.375000003 0.374999997 0.2499999999 0.3750000028 0.3749999973
21
+Example
22
+Classification of States
State 𝑘 is reachable from state 𝑗 if the probability to go from 𝑗 to 𝑘
in 𝑜 > 0 steps is greater than zero (State 𝑘 is reachable from state 𝑗 if in the state transition diagram there is a path from 𝑗 to 𝑘)
A subset 𝑇′ of the state space S is closed if 𝑞𝑗𝑘 = 0 for every 𝑗 ∈ 𝑇′
and 𝑘 ∉ 𝑇
A state 𝑗 is said to be absorbing if it is a single element closed
set.
A closed set 𝑇 of states is irreducible if any state j ∈ 𝑇 is
reachable from every state 𝑗 ∈ 𝑇
A Markov chain is said to be irreducible if the state space 𝑇 is
irreducible.
23
+Example
Reducible Markov Chain
24
1 2
p01 p12 p00 p10 p21 p22 p01 p12 p00 p10 p14 p22
4
p23 p32 p33
1 2 3
Absorbing State Closed irreducible set
Irreducible Markov Chain
+Periodic and Aperiodic States
Suppose that the structure of the Markov Chain is such
that state i is visited after a number of steps that is an integer multiple of an integer d >1. Then the state is called periodic with period d.
If no such integer exists (i.e., d =1) then the state is
called aperiodic.
Example
25
1 0.5 0.5
1 2
1
Periodic State d = 2
1 0.5 0.5 1 P
+Stationary Distribution
26
𝑄 = 𝝆0 = 𝜌0
0 , 𝜌1 0 = 0.5,0.5
𝝆𝟐 = 𝝆𝟏𝑸 = 0.5 0.5 0.3 0.7 0.4 0.6 = (0.36 0.65)
+Steady State Analysis
Recall that the probability of finding the MC at state i after
the k th step is given by
27
Pr
i k
X i k
1
, ... k k k π
An interesting question is what happens in the “long run”, i.e.,
lim
i k
k
Questions:
Do these limits exists?
If they exist, do they converge to a legitimate probability distribution, i.e.,
How do we evaluate 𝜌j, for all j.
1
i
This is referred to as steady state or equilibrium or stationary state
probability
+Steady State Analysis
28
THEOREM: In a finite irreducible aperiodic Markov chain a
unique stationary state probability vector π exists such that πj > 0 and
The steady state distribution 𝜌 is determined by solving
and
1
i i
lim
𝑙 → ∞ 𝜌𝑘 𝑙 = 𝜌𝑘
𝜌 = 𝜌𝑄
+Sampling from a
Markov Chain
Parvathy 29
+Sampling from a Markov Chain
Sampling – create random samples from the desired probability
distribution (P)
Construct Markov Chain with steady state distribution π
Π = P
Run the Markov chain long enough On convergence, its states are approximately distributed
according to π.
Random sampling of Markov chain yields random samples of P But!
Difficult to determine how long to run the Markov chain.
30
+Ising Model
Named after physicist Ernst Ising, invented by Wilhem Lenz Mathematical model of ferromagnetism Consists of discrete variables representing magnetic dipole
moments of atomic spins (+1 or −1)
31
+1D Ising Model
Solved by Ernst Ising N spins arranged in a ring P(𝑡) = 𝑓−𝛾𝐼(𝑡) 𝑎
𝛾 − inverse temperature 𝐼 𝑡 − energy 𝑡 = 𝑦1, … , 𝑦𝑂 is one possible configuration
Higher probability on states with lower energy
32
𝑦1 𝑦𝑂
+2D Ising Model
Spins are arranged in a lattice.
Each spin interacts with 4 neighbors
Simplest statistical models to show a phase transition
33
+2D Ising Model (2)
𝐻 = (𝑊, 𝐹) 𝑦𝑗 ∈ {+1, −1} - value of spin at location 𝑗 𝑇 = {𝑡1, 𝑡2, … , 𝑡𝑜} is the state space. 𝐼 𝑡 = − 𝑦𝑗,𝑦𝑘 ∈𝐹 𝑡 𝑦𝑗 𝑡(𝑦𝑘)
Similar spins subtract energy Opposite spins add energy 34
+2D Ising Model (3)
P(𝑡) = 𝑓−𝛾𝐼(𝑡) 𝑎
=
𝑓
−𝛾 𝑦𝑗𝑦𝑘 ∈𝐹 𝑡 𝑦𝑗 𝑡(𝑦𝑘)
𝑎
If 𝛾 = 0 all spin configurations have same probability If 𝛾 > 0 lower energy preferred 35
+Exact Inference is Hard
Posterior distribution over 𝑡
If 𝑍 is any observed variable 𝑄 𝑡 𝑍) = 𝑞 𝑡,𝑧 𝑞(𝑡,𝑧)
Intractable computation
Joint probability distribution - 2𝑂 possible combinations of 𝑡 Marginal probability distribution at a site MAP estimation 36
+The Big Question
Given 𝜌 on 𝑇 = 𝑡1, 𝑡2, … , 𝑡𝑜 simulate a random object with
distribution 𝜌
37
+Methods
Generate random samples to estimate a quantity Samples are generated “Markov-chain style”
Markov Chain Monte Carlo (MCMC) Propp Wilson Simulation Las Vegas variant Sandwiching Improvement on Propp Wilson 38
+MARKOV CHAIN
MONTE CARLO METHOD (MCMC)
YAMILET R. SERRANO LL. 39
+Recall
Given a probability distribution π on S = {s1, …, sk}, how do we simulate a random object with distribution π ?
40
+Intuition
Given a probability distribution π on S = {s1, …, sk}, how do we simulate a random object with distribution π ?
High Dimension Space
41
+MCMC
- 1. Construct
an irreducible and aperiodic Markov Chain [X0,X1, …], whose stationary distribution π.
- 2. If we run the chain with arbitrary initial
distribution then
- 1. The Markov Chain Convergence Theorem
guarantees that the distribution of the chain at time n converges to π.
- 3. Hence, if we run the chain for a sufficiently
long time n, then the distribution of Xn, will be very close to π. So it can be used as a sample.
42
+MCMC(2)
Generally, two types of MCMC algorithm
Metropolis Hasting Gibbs Sampling 43
+Metropolis Hastings Algorithm
Original method:
Metropolis, Rosenbluth, Rosenbluth, Teller
and Teller (1953).
Generalized by Hasting in 1970. Rediscovered
by Tanner and Wong (1987) and Gelfang and Smith (1990)
Is one way to implement MCMC.
Nicholas Metropolis
44
+Metropolis Hastings Algorithm
Basic Idea
GIVEN: A probability distribution π on S = {s1, …, sk} GOAL : Approx. sample from π Start with a proposal distribution Q(x,y)
- Q(x,y) specifies transition of Markov Chain
- Q(x,y) plays the role of the transition matrix
By accepting/rejecting the proposal, MH simulates a Markov Chain, whose stationary distribution is π x: current state y: new proposal
45
+Metropolis Hastings Algorithm
Algorithm
GIVEN:
A probability distribution π on S = {s1, …, sk}
GOAL :
- Approx. sample from π
Given current sample, Draw y from the proposal distribution, Draw U ~ (0,1) and update
where the acceptance probability is 46
+Metropolis Hastings Algorithm
The Ising Model
Consider m sites around a circle. Each site i can have one of two spins
xi {-1,1}
The target distribution :
47
+Metropolis Hastings Algorithm
The Ising Model
Target Distribution: Proposal Distribution:
- 1. Randomly pick one out of the m spins
- 2. Flip its sign
Acceptance probability (say i-th spin flipped):
+Metropolis Hastings Algorithm
The Ising Model
Image has been taken from Montecarlo investigation of the Ising Model(2006) – Tobin Fricke
49
+Disadvantages of MCMC
No matter how large n is taken to be in the MCMC algorithm,
there will still be some discrepancy between the distribution
- f the output and the target distribution π.
In order to make the previous error small, we need to figure
- ut how large n needs to be.
50
+Bounds on Convergence of
MCMC
Aditya Kulkarni 51
+Seminar on probabilities (Strasunbourge - 1983)
If
it is difficult to
- btain
asymptotic bounds
- n
the convergence time
- f
MCMC algorithms for Ising models, use quantitative bounds
Use a characteristics of Ising
model MCMC algorithm to
- btain quantitative bounds
52
David Aldous
+What we need to ensure
As we go along the time
We approach to the stationary distribution in a monotonically decreasing fashion.
When we stop
The sample should follow a distribution which is not further apart from a stationary distribution by some factor 𝜻.
𝑒 𝑢 → 0 as 𝑢 → ∞ 𝑒 𝑢 ≤ 𝜁
53
+Total Variation Distance || ⋅ ||𝑈𝑊
Given two distributions 𝑞 and 𝑟 over a finite set of states 𝑇, The total variation distance ‖ ⋅ ‖𝑈𝑊 is
| 𝑞 − 𝑟 |𝑈𝑊 =
1 2 ||𝑞 − 𝑟||1 = 1 2 𝑦∈𝑇 |𝑞 𝑦 − 𝑟 𝑦 |
54
+Convergence Time
Let 𝑌 = 𝑌0, 𝑌1, …
be a Markov Chain of a state space with stationary distribution 𝜌
We define 𝑒 𝑢 as the worst total variation distance at time 𝑢.
𝑒 𝑢 = 𝑛𝑏𝑦𝑦∈𝑊 ||𝑄 𝑌𝑢 𝑌0 = 𝑦 − 𝜌||𝑈𝑊
The mixing time 𝑢 is the minimum time 𝑢 such that 𝑒 𝑢 is at
most 𝜗. τ(𝜁) = min {𝑢 ∶ 𝑒 𝑢 ≤ 𝜁}
Define
τ = τ
1 2𝑓 = min {𝑢 ∶ 𝑒 𝑢 ≤ 1 2𝑓}
55
+Quantitative results
𝒆 𝒖 τ(𝜻) 𝒖
𝜻
τ
𝟐 𝟑𝒇
1
56
+Lemma
Consider two random walks started from state i and state j
Define 𝜍 𝑢 as the worst total variation distance between their respective probability distributions at time 𝑢 a. 𝜍 𝑢 ≤ 2𝑒 𝑢 b. 𝑒(𝑢) is decreasing
57
+Upper bound on d(𝑢) proof
From part a and the definition of τ = τ
1 2𝑓 = min {𝑢 ∶ 𝑒 𝑢 ≤ 1 2𝑓}
we get 𝜍 𝜐 ≤ 𝑓−1 Also from part b, we deduce that upper bound of 𝜍 at a particular time 𝑢0 gives upper bound for later times
𝜍 𝑢 ≤ (𝜍(𝑢0))𝑜 ; 𝑜𝑢0 ≤ 𝑢 ≤ (𝑜 + 1)𝑢0 𝜍 𝑢 ≤ (𝜍(𝑢0))(
𝑢 𝑢0−1)
Substitute 𝜐 instead of 𝑢0 𝜍 𝑢 ≤ (𝜍(𝜐))(
𝑢 𝜐−1)
𝑒 𝑢 ≤ exp 1 −
𝑢 τ , 𝑢 ≥ 0
58
+Upper bound on τ(𝜁) proof
Algebraic calculations:
𝑒 𝑢 ≤ exp 1 −
𝑢 τ
log 𝑒(𝑢) ≤ 1 − 𝑢 𝜐
𝑢 𝜐 ≤ 1 − log 𝑒 𝑢 ≤ 1 + log( 1 𝑒(𝑢))
𝑢 ≤ 𝜐 ∗ (1 + log(
1 𝜁))
𝜐 𝜁 ≤ 2𝑓𝑢 ∗ (1 + log(
1 𝜁))
59
+Upper bounds
d 𝑢 ≤ min(1, exp 1 − 𝑢 τ ), 𝑢 ≥ 0 τ(𝜁) ≤ τ ∗ (1 + log 1 𝜁 ), 0 < 𝜁 < 1 τ(𝜁) ≤ 2𝑓𝑢 ∗ (1 + log
1 𝜁 ), 0 < 𝜁 < 1
60
+Entropy of initial distribution
Measure of randomness: 𝑓𝑜𝑢 𝜈 = −
𝑦∈𝑊
𝜈 𝑦 log 𝜈 𝑦 𝑦 is initial state 𝜈 is initial probability distribution
61
+Few more lemma’s
1. Let (𝑌𝑢) is a random walk associated with 𝜈, and 𝜈𝑢 be the distribution of 𝑌𝑢 then 𝑓𝑜𝑢 𝜈𝑢 ≤ 𝑢 ⋅ 𝑓𝑜𝑢(𝜈) 2. If 𝑤 is a distribution of 𝑇 such that 𝑤 − 𝜌 ≤ 𝜁 then 𝑓𝑜𝑢 𝑤 ≥ (1 − 𝜁) log |𝑇|
62
+Lower bound on 𝑒(𝑢) proof
From lemma 2,
𝑓𝑜𝑢 𝑤 ≥ 1 − 𝜁 log 𝑇
𝑓𝑜𝑢 𝑤 log |𝑇| ≥ 1 − 𝜁 ≥ (1 − 𝑒(𝑢))
𝑒 𝑢 ≥ 1 − 𝑓𝑜𝑢 𝑤 log |𝑇| 𝑒 𝑢 ≥ 1 −
𝑢⋅𝑓𝑜𝑢 𝜈 log |𝑇|
63
+Lower bound on τ(ε) proof
From lemma 1,
𝑓𝑜𝑢 𝜈𝑢 ≤ 𝑢 𝑓𝑜𝑢(𝜈) 𝑢 ≥
𝑓𝑜𝑢 𝜈𝑢 𝑓𝑜𝑢(𝜈) From lemma 2,
𝑓𝑜𝑢 𝜈𝑢 ≥ (1 − 𝜁) log |𝑇| 𝑢 ≥ 1 − 𝜁 log |𝑇| 𝑓𝑜𝑢(𝜈) τ(𝜁) ≥ 1 − 𝜁 log |𝑇| 𝑓𝑜𝑢(𝜈)
64
+
𝑒 𝑢 ≥ 1 − 𝑢 ⋅ 𝑓𝑜𝑢(𝜈) log 𝑇 τ 𝜁 ≥ 1 − 𝜁 log |𝑇| 𝑓𝑜𝑢(𝜈)
Lower bounds
65
+
Propp Wilson
Tobias Bertelsen 66
+An exact version of MCMC
Problems with MCMC
- A. Have accuracy error, which depends on starting state
B.
We must know number of iterations
James Propp and David Wilson propos Coupling from the
past (1996)
A.k.a. the Propp-Wilson algorithm
Idea:
Solve problems by running chain infinitely 67
+An exact version of MCMC
Theoretical
- Runs all configurations infinitely
- Literarily takes ∞ time
- Impossible
Coupling from the past
- Runs all configurations for finite time
- Might take 1000’s of years
- Infeasible
Sandwiching
- Run few configurations for finite time
- Takes seconds
- Practicable
68
+Theoretical exact sampling
Recall the convergence theorem:
We will approach the stationary distribution as the number of
steps goes to infinity
Intuitive approach:
To sample perfectly we start a chain and run for infinity Start at 𝑢 = 0, sample at 𝑢 = ∞ Problem: We never get a sample
Alternative approach:
To sample perfectly we take a chain that have
already been running for an infinite amount of time
Start at 𝑢 = −∞, sample at 𝑢 = 0 69
+Theoretical independence of starting state
Sample from a Markov chain in MCMC depends solely on
Starting state 𝑌−∞ Sequence of random numbers 𝑉
We want to be independent of the starting state.
For a given sequence of random numbers 𝑉−∞, … , 𝑉−1
we want to ensure that the starting state 𝑌−∞ has no effect on 𝑌0
70
+Theoretical independence of starting state
Collisions:
For a given 𝑉−∞, … , 𝑉−1
if two Markov chains is at the same state at some 𝑢′ the will continue on together:
71
+Coupling from the past
At some finite past time 𝑢 = −𝑂
All past chains has already run infinitively and has coupled into one
∞ − 𝑂 = ∞
We want to continue that coupled chain to 𝑢 = 0 But we don’t know which at state they will be at 𝑢 = −𝑂
Run all states from −𝑂 instead of −∞ 72
+Coupling from the past
1.
Let 𝑉 = 𝑉−1, 𝑉−2, 𝑉−3, … be the sequence of independnent uniformly random numbers
2.
For 𝑂
𝑘 ∈ 1,2,4,8, …
1.
Extend 𝑉 to length 𝑜𝑘, keeping 𝑉−1, … , 𝑉𝑜𝑘−1 the same
2.
Start one chain from each state at 𝑢 = −𝑂
𝑘 3.
For 𝑢 from −𝑂
𝑘 to zero: Simulate the chains using 𝑉𝑢 4.
If all chains has converged at 𝑇𝑗 at 𝑢 = 0, return 𝑇𝑗
5.
Else repeat loop
73
+Coupling from the past
74
+Questions
Why do we double the lengths?
Worst case < 4𝑂𝑝𝑞𝑢 steps,
where 𝑂𝑝𝑞𝑢 is the minimal 𝑂 at which we can achieve convergence
Compare to N ∈ [1,2,3,4,5, … ], 𝑃 𝑂𝑝𝑞𝑢 2
steps
Why do we have to use the same random numbers?
Different samples might take longer or shorter to converge. We must evaluate the same sample in each iteration. The sample should only be dependent on 𝑉 not the different 𝑂 75
+Using the same 𝑉
We have 𝑙 states with the following update function
𝑇𝑗, 𝑉 = 𝑇1 𝑉 <
1 2
𝑇𝑗+1 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 𝜚 𝑇𝑙, 𝑉 = 𝑇1 𝑉 <
1 2
𝑇𝑙 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓
𝜌 𝑇𝑗 = 1 2𝑗 , 𝜌 𝑙 = 2 2𝑙
76
+
Lets assume we generate a new 𝑉 for each run: 𝑉1, 𝑉2, 𝑉3, … The probability of 𝑇1 only depends on the last random
number: 𝑄 𝑇1 = 𝑄 𝑉−1 < 1 2 = 1 2
Using the same 𝑉
𝒐 𝑽 𝑸 𝑻𝟐 𝑸 𝑻𝟐 acc. 1 𝑉−1
1
𝑄 𝑉−1
1 < 1 2
50 % 2 𝑉−2
2 , 𝑉−1 2
𝑄 𝑉−1
2 < 1 2 ∨ 𝑉−1 1 < 1 2
75 % 4 𝑉−4
3 , … , 𝑉−1 3
𝑄 𝑉−1
3 < 1 2 ∨ 𝑉−1 2 < 1 2 ∨ 𝑉−1 1 < 1 2
81.25 % 8 [𝑉−8
4 , … , 𝑉−1 4 ] 𝑄 𝑉−1
3 < 1 2 ∨ 𝑉−1 2 < 1 2 ∨ 𝑉−1 1 < 1 2 ∨ 𝑉1− 4 < 1 2
81.64 %
77
+
The probability of 𝑇1 only depends on the last random
number: 𝑄 𝑇1 = 𝑄 𝑉−1 < 1 2 = 1 2
Lets instead use the same 𝑉 for each run: 𝑉1
Using the same 𝑉
𝒐 𝑽 𝑸 𝑻𝟐 𝑸 𝑻𝟐 acc. 1 𝑉−1
1
𝑄 𝑉−1
1 < 1 2
50 % 2 𝑉−2
1 , 𝑉−1 1
𝑄 𝑉−1
1 < 1 2
50 % 4 𝑉−4
1 , … , 𝑉−1 1
𝑄 𝑉−1
1 < 1 2
50 % 8 [𝑉−8
1 , … , 𝑉−1 1 ]
𝑄 𝑉−1
1 < 1 2
50 %
78
+Problem
In each step we update up to 𝑙 chains Total execution time is 𝑃 𝑂𝑝𝑞𝑢𝑙 BUT 𝑙 = 2 𝑀2 in the Ising model Worse than the naïve approach 𝑃(𝑙)
79
+
Sandwiching
Nirandika Wanigasekara 80
+Sandwiching
- Many vertices running 𝑙
chains will take time
- Impractical for large 𝑙
Propp Wilson algorithm
- Can we still get the same
results?
- Try sandwiching
Choose a relatively small state space
81
+Sandwiching
Idea
Find two chains bounding all other chains If we have such two boundary chains Check if those two chains converge Then all other chains have also converged 82
+Sandwiching
To come up with the boundary chains
Need a way to order the states 𝑇1 ≤ 𝑇2 ≤ 𝑇3… Chain in higher state does not cross a chain in lower state
Results in a Markov Chain obeying certain monotonicity
properties
2 2 2 2 2 2 1 1 1 1 1 1
83
+Sandwiching
Lets consider
A fixed set of states𝑙 State space 𝑇 = {1, … . , 𝑙} A transition matrix 𝑄 11 = 𝑄 12 = 1 2 𝑄𝑙𝑙 = 𝑄𝑙,𝑙−1 = 1 2 𝑔𝑝𝑠 𝑗=2, …..𝑙−1, 𝑄𝑗,𝑗−1 = 𝑄𝑗,𝑗+1 = 1 2 All the other entries are 0 84
+Sandwiching
What is this Markov Chain doing?
Take one step up and one step down the ladder, at each integer
time, with probability
1 2 If at the top or the bottom(state 𝑙) It will stays where it is Ladder Walk on 𝒍 vertices
The stationary distribution 𝜌 of this Markov Chain
𝜌𝑗 = 1 𝑙 𝑔𝑝𝑠 𝑗 = 1, … . , 𝑙 85
+Sandwiching
Propp-Wilson for this Markov Chain with
Valid update function 𝜚 if u < 1 2 then step down if 𝑣 ≥ 1 2 then step up negative starting times
𝑂1, 𝑂2, … = 1, 2, 4, 8, …
States 𝑙 = 5 86
+Sandwiching
87
+Sandwiching
Update function preserves ordering between states
for all 𝑉 ∈ 0, 1 𝑏𝑜𝑒 𝑏𝑚𝑚 𝑗, 𝑘 ∈ 1, … . , 𝑙 𝑡𝑣𝑑ℎ 𝑢ℎ𝑏𝑢 𝑗 ≤ 𝑘 𝑥𝑓 ℎ𝑏𝑤𝑓
𝜚 𝑗, 𝑉 ≤ 𝜚(𝑘, 𝑉)
It is sufficient to run only 2 chains rather than k
88
+Sandwiching
Are these conditions always met
No, not always But, there are frequent instances where these conditions are met Especially useful when k is large Ising model is a good example for this 89
+
Ising Model
Malay Singh 90
+2D Ising Model
A grid with sites numbered from 1 to 𝑀2 where 𝑀 is grid size. Each site 𝑗 can have spin 𝑦𝑗 ∈ −1, +1 𝑊 is set of sites
𝑇 = −1,1 𝑊 defines all possible configurations (states)
Magnetisation 𝑛 for a state 𝑡 is
𝑛 𝑡 = 𝑗 𝑡 𝑦𝑗 𝑀2
The energy of a state 𝑡
H 𝑡 = − 𝑗𝑘 𝑡 𝑦𝑗 𝑡 𝑦𝑘 91
+Ordering of states
For two states 𝑡, 𝑡′ we say s ≤ 𝑡′
if 𝑡 𝑦 < 𝑡′ 𝑦 ∀ 𝑦 ∈ 𝑊
Maxima 𝑛 = 1 Minima 𝑛 = −1 𝑡max x = +1 ∀𝑦 ∈ 𝑊 𝑡min x = −1 ∀𝑦 ∈ 𝑊 Hence 𝑡min ≤ 𝑡 ≤ 𝑡max for all 𝑡
92
+The update function
We use the sequence of random numbers 𝑉𝑜, 𝑉𝑜−1, … , 𝑉0 We are updating the state at 𝑌𝑜. We choose a site 𝑦 in 0, 𝑀2 uniformly. 𝑌𝑜+1 𝑦 = +1, 𝑗𝑔 𝑉𝑜+1 < exp 2𝛾 𝐿+ 𝑦, 𝑡 − 𝐿− 𝑦, 𝑡 exp 2𝛾 𝐿+ 𝑦, 𝑡 − 𝐿− 𝑦, 𝑡 + 1 −1, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓
93
+Maintaining ordering
Ordering after update (From 𝑌𝑜 to 𝑌𝑜+1)
We choose the same site 𝑦 to update in both chains The spin of 𝑦 at 𝑌𝑜+1 depends on update function
We want to check
exp 2𝛾 𝐿+ 𝑦, 𝑡 − 𝐿− 𝑦, 𝑡 exp 2𝛾 𝐿+ 𝑦, 𝑡 − 𝐿− 𝑦, 𝑡 + 1 ≤ exp 2𝛾 𝐿+ 𝑦, 𝑡′ − 𝐿− 𝑦, 𝑡′ exp 2𝛾 𝐿+ 𝑦, 𝑡′ − 𝐿− 𝑦, 𝑡′ + 1
That is equivalent to checking
𝐿+ 𝑦, 𝑡 − 𝐿− 𝑦, 𝑡 ≤ 𝐿+ 𝑦, 𝑡′ − 𝐿− 𝑦, 𝑡′
94
+Maintaining ordering
As 𝑡 ≤ 𝑡′ we have 𝐿+ 𝑦, 𝑡 ≤ 𝐿+ 𝑦, 𝑡′ 𝐿− 𝑦, 𝑡 ≥ 𝐿− 𝑦, 𝑡′ First equation minus second equation 𝐿+ 𝑦, 𝑡 − 𝐿− 𝑦, 𝑡 ≤ 𝐿+ 𝑦, 𝑡′ − 𝐿− 𝑦, 𝑡′
95
+Ising Model L = 4
𝑈 = 3.5 and 𝑂 = 512
96
𝑈 = 4.8 and 𝑂 = 128
+Ising Model 𝑀 = 8
97
𝑈 = 5.9 and 𝑂 = 512
+Ising Model 𝑀 = 16
98
𝑈 = 5.3 and 𝑂 = 16 384
+Summary
Exact sampling Markov chains monte carlo but when we converge Propp Wilson with Sandwiching to rescue
99
+
Questions?
100
+Additional
Example 6.2
101
+Aditional
Update function
102
+Additional
Proof for order of between states?? Need to find
103