Mixing Time Analysis of the Glauber Dynamics for the - - PowerPoint PPT Presentation

mixing time analysis of the glauber dynamics for the
SMART_READER_LITE
LIVE PREVIEW

Mixing Time Analysis of the Glauber Dynamics for the - - PowerPoint PPT Presentation

Mixing Time Analysis of the Glauber Dynamics for the Curie-Weiss-Potts Model Precise Asymptotics. Cutoff Phenomenon. Essential Mixing. Evolution. P . Cuff J. Ding E. Lubetzky Y. Peres A. Sly O. Louidor Microsoft Research Cornell Summer


slide-1
SLIDE 1

Mixing Time Analysis of the Glauber Dynamics for the Curie-Weiss-Potts Model

Precise Asymptotics. Cutoff Phenomenon. Essential Mixing. Evolution. P . Cuff

  • J. Ding
  • E. Lubetzky
  • Y. Peres
  • A. Sly
  • O. Louidor

Microsoft Research

Cornell Summer School in Probability 2009

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 1 / 27

slide-2
SLIDE 2

Outline

1

Problem Definition The Potts Model Glauber Dynamics Mixing Time The Problem

2

Previous Results

3

New Results

4

Proofs

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 2 / 27

slide-3
SLIDE 3

Setup and Terminology

Let G = (V, E) be a finite graph, q ∈ N (number of colors) and ˆ β ∈ R - inverse temperature. A configuration σ is an element of Ω = {1, . . . , q}V. On Ω define the (Gibbs) measure: µ(σ) = µ ˆ

β,G(σ) =

1 Z ˆ

β,G

exp

  • ˆ

βHG(σ)

  • where:

HG(σ) =

(u,v)∈E 1

Iσ(u)=σ(v) - the (associated) Hamiltonian. Z ˆ

β,G makes µ a probability measure - the Partition Function.

Names:

q = 2: The Ising Model on G. q > 2: The Potts Model on G. G = Cn (the complete graph with n vertices): Curie-Weiss Potts/Ising or Mean-field Potts/Ising.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 4 / 27

slide-4
SLIDE 4

The Question

Question: Given a particular sequence (ˆ βn, Gn)n 1, describe µ ˆ

βn,Gn for

large n or in an appropriate limit. For Curie Weiss Potts, with ˆ βn = β

n for some β ∈ R, much is known (the

easiest case). For instance . . . .

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 5 / 27

slide-5
SLIDE 5

Fractions Vector

For σ ∈ Ω define S(σ) ∈ Sq = {x ∈ Rq

+ : x1 = 1} by:

Sk(σ) = 1 |V|

  • v∈V

1 I{k}(σ(v)) ; k = 1, . . . , q

  • the fractions vector.

Define π ˆ

β,G as the distribution of S(σ) when σ is sampled using µ ˆ β,G.

Set π∞ = limn→∞ π ˆ

βn,Gn.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 6 / 27

slide-6
SLIDE 6

Phase Transition in Curie Weiss Potts

Then there exists βc = βc(q) such that: β < βc (high temperature):

π β

n ,Cn ⇒ δ 1 q

as n → ∞. where 1

q denotes the vector (1/q, 1/q... 1/q) ∈ Sq.

β > βc (low temperature):

π β

n ,Cn ⇒ q

k=1 1 q δT k ˆ s(β)

as n → ∞. where:

ˆ s(β) =

  • ˆ

s1(β), 1−ˆ

s1(β) q−1

, . . . , 1−ˆ

s1(β) q−1

  • ∈ Sq

T k interchanges the first and k-th component.

β = βc (critical temperature):

π β

n ,Cn ⇒ p(β)δ 1 q + (1 − p(β)) q

k=1 1 q δT k ˆ s(β)

as n → ∞. where p(β) ∈ (0, 1).

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 7 / 27

slide-7
SLIDE 7

Phase Transition - Remarks

βc(q), ˆ s(β) and p(β) are explicitly known. In fact (π β

n ,Cn)n 1 satisfies a LDP on Sq with rate function:

Iβ(s) = R(s) − β 2 s − const where R(s) is the rate function for the fractions vector when β = 0.

Thus, describing π∞ = π∞(β) is solving the minimization problem of R(s) − β

2 s2 in Sq.

If q = 2, ˆ s(βc(2)) = 1

q and the mapping β → π∞(β) is continuous (under

the weak-topology for measures). Thus this is a second order phase transition. If q > 2, ˆ s(βc(q)) = 1

q , the mapping β → π∞(β) is not continuous and the

phase transition is of first order. This will play a part in the rate of mixing. Show Graphs

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 8 / 27

slide-8
SLIDE 8

Markov Chain Monte Carlo (MCMC)

A way to approximately sample from a probability measure µ on a finite space Ω. Idea: Markov Chain Monte Carlo (MCMC). Construct a Markov chain with state space Ω and µ as its stationary-ergodic distribution. Then, start from any configuration and let the chain evolve randomly for long enough time, until the distribution of the current state is close to µ. Useful when exact sampling is computationally expensive (e.g. One has to exhaust all of Ω), but computing the transition probabilities is easy. What is long enough time? One has to study the rate of convergence to stationarity - Mixing Time (later). If Ω = {1, . . . , q}V, many dynamics are possible (Glauber, Metropolis, Swendsen Wang, . . . ). Differ in how fast they mix.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 9 / 27

slide-9
SLIDE 9

Glauber Dynamics

Single site update dynamics for a measure µ on Ω = {1, . . . , q}V:

Start from any configuration σ0. Transition:

Choose a vertex u ∈ V at random. Update: σt+1(v) = σt(v) if v = u k ifv = u w.p. µ (σ(u) = k|σ(v) = σt(v) ; v = u)

Repeat.

Conditional probabilities are straightforward if µ is a Gibbs measure (part

  • f the definition).

(σt)t is a finite-states irreducible and aperiodic chain (at least if µ has the finite energy property), hence converges to its unique stationary distribution µ. But how fast?

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 10 / 27

slide-10
SLIDE 10

Mixing Time

Let (Xt)t∈N be a Markov chain with state space S, transition kernel P and stationary distribution π. Set d(t) = supx0∈S Px0(Xt ∈ •) − πTV where X0 = x0 under Px0. (Reminder: µ − νTV = supA |µ(A) − ν(A)|). The ε-Mixing Time of (Xt)t∈N is: tM(ε) = inf{t : d(t) < ε} If no ε is specified, it is customary to use ε = 1/4.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 11 / 27

slide-11
SLIDE 11

Mixing Time Asymptotics and the Cut-off Phenomenon

Let (Pn)n be a particular sequence of Markov chain kernels and denote by tM

n (ε) their ε-mixing times.

We would like to know how tM

n (ε) grows with n:

If tM

n (ε) grows polynomially, we say that the mixing is rapid.

If tM

n (ε) grows exponentially, we say that the mixing is slow.

Also, if for some (and hence any) ε0 and all ε: tM

n (ε) − tM n (1 − ε) wn(ε) = o(tM n (ε0))

as n → ∞ we say that the sequence of dynamics exhibits a cut-off.

The distance to stationarity sharply changes from 1 to 0 (relatively to the mixing time). If wn(ε) = θε(W(n)) we say that the cut-off window has order W(n).

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 12 / 27

slide-12
SLIDE 12

The Problem

Analyze the Mixing Time of the Glauber Dynamics for the Curie-Weiss Potts Model. i.e., Fix β and q, consider a sequence of Glauber dynamics for the Potts distribution on the n-complete graph: µ β

n ,Cn and analyze the mixing time

tM

n (ε) as a function of n.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 13 / 27

slide-13
SLIDE 13

Previous Results

Complete analysis for the Curie-Weiss Ising case (q = 2): β < βc(2) = 2 (high temperature):

tM

n (ε) ∼ 1 2

  • 1 − β

2

−1 n log n wn(ε) = θε(n) [Aizenman, Holley ’87], [Bubley, Dyer ’97], [Levin, Luczak, Peres ’07].

β > βc (low temperature):

tM

n (ε) is exponential in n.

[Griffiths, Weng and Langer ’66]

β = βc (critical temperature):

tM

n (ε) = θε

  • n3/2

. No cut-off. [Levin, Luczak, Peres ’07], [Ding, Lubetzky, Peres ’08]

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 15 / 27

slide-14
SLIDE 14

Previous Results - Evolution

Still q = 2 case. Now let β = βn change with n. βn = βc − δn:

δn = ω

  • 1

√n

tM

n (ε) ∼ n δ log

  • δ2n
  • , wn(ε) = θε

n

δ

  • .

δn = O

  • 1

√n

tM

n (ε) = θε

  • n3/2

, no cut-off.

βn = βc + δn:

δn = ω

  • 1

√n

  • but δn = o(1)

⇒ tM

n (ε) = θε

n

δ exp

3

4 + o(1)

  • δ2n
  • .

δn = Ω(1) ⇒ tM

n (ε) is exponential in n.

δn = O

  • 1

√n

tM

n (ε) = θε

  • n3/2

. No cut-off.

[Ding, Lubetzky, Peres ’08]

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 16 / 27

slide-15
SLIDE 15

New Results - The Case q > 2

There exists a new critical beta βM(q) < βc(q) < q such that:

β < βM:

tM

n (ε) ∼ 1 2

  • 1 − β

q

−1 n log n wn(ε) = θε(n)

β > βM:

tM

n (ε) is exponential in n.

β = βM:

tM

n (ε) = θε

  • n4/3

No cut-off.

βM(q) is explicitly known.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 18 / 27

slide-16
SLIDE 16

New Results - More

Approaching criticality - β = βn = βM − δn:

δn = ω

  • n−2/3

⇒ tM

n (ε) ∼ C n √ δ , wn(ε) = Oε

  • n

δ5/2

  • .

δn = O

  • n−2/3

⇒ tM

n (ε) = θε

  • n4/3

, no cut-off.

Essential Mixing - βM < β < βc:

There exists Ωn ⊆ Ω with µ β

n ,Cn (Ωn) e−Cn such that:

tM,Ω\Ωn

n

(ε) ∼ 1

2

  • 1 − β

q

−1 n log n wΩ\Ωn

n

(ε) = θε(n)

where tM,Ω\Ωn

n

(ε), wΩ\Ωn

n

(ε) are the mixing time and cut-off window when one is not allowed to start the dynamics from σ ∈ Ωn. We say that the dynamics essentially mixes rapidly.

Low temperature - β βc:

tM

n (ε) is exponential in n. No essentially rapid mixing.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 19 / 27

slide-17
SLIDE 17

Intuition

Intuition Comes from looking at the rate function - Iβ(s). Because of first order phase transition, near but before βc local minima emerge at T kˆ s(β); k = 1, . . . , q. These will slow down the mixing. Starts to happen exactly at βM. Show Graphs.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 21 / 27

slide-18
SLIDE 18

Above βM

Use the Bottleneck Ratio - For a Markov chain with transition kernel P and stationary distribution π: tM(1/4)

  • 1

4 sup

S⊆S π(S) 1/2

π(S)

  • x∈S,y /

∈S π(x)P(x, y)

  • 1

4 sup

S⊆S π(S) 1/2

π(S) π(∂S) Then, local minima in the rate function immediately implies exponential mixing time.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 22 / 27

slide-19
SLIDE 19

Below βM - Key Formula 1

Examine the fractions chain: St = S(σt) (Markovian). Key formula 1 - Recursion for expected distance to the equidistributed configuration: ESt+1 − 1 q 2 = ESt − 1 q 2  1 − 2

  • 1 − β

q

  • n

  + Error

  • ESt − 1

q 2, n

  • ∃η > 0, such that if ES0 − 1

q 2 < η, this gives a contraction:

ESt − 1 q 2 =  1 − 2

  • 1 − β

q

  • n

 

t

S0 − 1 q 2 + Error(n)

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 23 / 27

slide-20
SLIDE 20

Below βM - Key Formula 2

Key formula 2 - Conditional drift of one coordinate: E

  • S1

t+1 − S1 t | St

  • 1

n

  • eβS1

t

eβS1

t + (q − 1)eβ(1−S1 t )/(q−1) − S1

t

  • The r.h.s is strictly negative away from 1/q if and only if β < βM.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 24 / 27

slide-21
SLIDE 21

Below βM - Lower Bound on Mixing Time

Start from a configuration σ0 with S(σ0) < η If tn = 1

2

  • 1 − β

q

−1 n log n − γn, from the contraction formula: EStn − 1 q 2 A(γ)n−1 for n large, with A(γ) → ∞ when γ → ∞. By bounding the variance, this will imply that Stn is far from 1

q for large n

with probability tending to 1 as γ → ∞. Since π β

n ,Cn concentrates around 1

q , the same is true for

P(Stn ∈ •) − π β

n ,CnTV.

Finally P(Xtn ∈ •) − µ β

n ,CnTV P(Stn ∈ •) − π β n ,CnTV. P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 25 / 27

slide-22
SLIDE 22

Below βM - Upper Bound

Start from any configuration σ0. Due to negative drift (Key Formula 2), after kn time ESkn − 1

q 2 < η.

Now we can use the contraction (Key Formula 1): If tn = kn + 1

2

  • 1 − β

q

−1 n log n Then EStn − 1

q 2 = O(n−1)

By bounding the variance we get Stn − 1

q = O

  • n−1/2

with as high probability as needed. Introduce a coupling between this chain and one starting from π such that they coincide after an additional γn time with probability tending to 1 as γ → ∞.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 26 / 27

slide-23
SLIDE 23

Upper Bound - Continued

Use a Coupling time argument to bound the distance from stationarity - For any Markov chain with stationary distribution π: d(t) sup

x0

Px0,π(τcouple > t) where Px0,π is any coupling of two copies of the Markov chain, starting from x0 and π and τcouple is the first time the two processes coincide.

P . Cuff, J. Ding, E. Lubetzky, Y. Peres, A. Sly, O. Louidor ( Microsoft Research ) CW-Potts Mixing Cornell Summer School in Probability 2009 27 / 27