[PPT] - Improved bounds for MCMC sampling colorings of G ( n , d / n ) PowerPoint Presentation

SLIDE 1

Improved bounds for MCMC sampling colorings of G(n, d/n)

Charis Efthymiou efthymiou@gmail.com

Goethe University, Frankfurt

Joint work with: T. Hayes, D. ˇ Stefankoviˇ c and E. Vigoda Workshop on Local Algorithms

MIT Boston, June, 2018

SLIDE 2

Sampling Problem

SLIDE 3

Sampling Problem

Coloring model µ

SLIDE 4

Sampling Problem

Coloring model µ

For a graph G = (V , E) and an integer k > 0:

SLIDE 5

Sampling Problem

Coloring model µ

For a graph G = (V , E) and an integer k > 0: uniform distribution over the proper k-colorings of G

SLIDE 6

Sampling Problem

Coloring model µ

For a graph G = (V , E) and an integer k > 0: uniform distribution over the proper k-colorings of G

Sampling Problem

Input: G = (V , E), k Output: a k-coloring distributed as in µ(·)

SLIDE 7

Sampling Problem

SLIDE 8

Sampling Problem

Input graph is G(n,d/n)

SLIDE 9

Sampling Problem

Input graph is G(n,d/n)

n vertices

SLIDE 10

Sampling Problem

Input graph is G(n,d/n)

n vertices
edges appear independently with probability d/n, d is fixed

SLIDE 11

Sampling Problem

Input graph is G(n,d/n)

n vertices
edges appear independently with probability d/n, d is fixed

Efficient algorithms

SLIDE 12

Sampling Problem

Input graph is G(n,d/n)

n vertices
edges appear independently with probability d/n, d is fixed

Efficient algorithms

unlikely to have efficient algorithm

SLIDE 13

Sampling Problem

Input graph is G(n,d/n)

n vertices
edges appear independently with probability d/n, d is fixed

Efficient algorithms

unlikely to have efficient algorithm
focus on efficient approximation algorithms

SLIDE 14

Markov Chain Monte Carlo

SLIDE 15

Markov Chain Monte Carlo

Given G and integer k > 0,

SLIDE 16

Markov Chain Monte Carlo

Given G and integer k > 0,

set up an Markov Chain over the k-colorings of G

SLIDE 17

Markov Chain Monte Carlo

Given G and integer k > 0,

set up an Markov Chain over the k-colorings of G
the equilibrium distribution is the coloring model

SLIDE 18

Markov Chain Monte Carlo

Given G and integer k > 0,

set up an Markov Chain over the k-colorings of G
the equilibrium distribution is the coloring model
the algorithm simulates the Markov Chain

SLIDE 19

Markov Chain Monte Carlo

Given G and integer k > 0,

set up an Markov Chain over the k-colorings of G
the equilibrium distribution is the coloring model
the algorithm simulates the Markov Chain
outputs the configuration of the chain after

“sufficiently many” transitions

SLIDE 20

Markov Chain Monte Carlo

Given G and integer k > 0,

set up an Markov Chain over the k-colorings of G
the equilibrium distribution is the coloring model
the algorithm simulates the Markov Chain
outputs the configuration of the chain after

“sufficiently many” transitions the output should be “close” to µ

SLIDE 21

Markov Chain Monte Carlo

Given G and integer k > 0,

set up an Markov Chain over the k-colorings of G
the equilibrium distribution is the coloring model
the algorithm simulates the Markov Chain
outputs the configuration of the chain after

“sufficiently many” transitions the output should be “close” to µ it is desirable that the chain “mixes fast”

SLIDE 22

The local algorithms

SLIDE 23

The local algorithms

“Glauber dynamics’

X0 = σ
Xt → Xt+1
Choose vertex w uniformly at random from V
Set Xt+1(u) = Xt(u), for every vertex u = w
Set Xt+1(w) according to µ conditional on Xt+1(V \w).

SLIDE 24

The local algorithms

“Glauber dynamics’

X0 = σ
Xt → Xt+1
Choose vertex w uniformly at random from V
Set Xt+1(u) = Xt(u), for every vertex u = w
Set Xt+1(w) according to µ conditional on Xt+1(V \w).

Block dynamics

. . . instead of single vertices, update small the blocks.

SLIDE 25

The problem

MCMC sampling colorings of G(n, d/n) with Glauber dynamics

SLIDE 26

Some technicalities

SLIDE 27

Some technicalities

There is a standard way of dealing with . . .

ergodicity
how to get initial configuration

SLIDE 28

Some technicalities

There is a standard way of dealing with . . .

ergodicity
how to get initial configuration

Focus

. . . speed of convergence.

SLIDE 29

How to measure speed . . .

SLIDE 30

How to measure speed . . .

Mixing Time

The number of transitions needed for the chain to reach within total variation distance 1/e from µ(·). For worst case X0.

SLIDE 31

How to measure speed . . .

Mixing Time

The number of transitions needed for the chain to reach within total variation distance 1/e from µ(·). For worst case X0.

Interesting cases

. . . when the mixing time is polynomial in n

SLIDE 32

How to measure speed . . .

Mixing Time

The number of transitions needed for the chain to reach within total variation distance 1/e from µ(·). For worst case X0.

Interesting cases

. . . when the mixing time is polynomial in n . . . we have “rapid mixing”

SLIDE 33

Rapid Mixing and Maximum Degree ∆

SLIDE 34

Rapid Mixing and Maximum Degree ∆

Maximum Degree Bounds for colorings

Vigoda (1999) k > 11

6 ∆ for general G

Hayes,Vera,Vigoda (2007) k = Ω(∆/ log ∆) for planar G Goldberg, Martin, Paterson (2004) k ≥ (1.763 + ǫ)∆ for G triangle free and amenable Dyer, Frieze, Hayes, Vigoda (2004) k ≥ (1.48 + ǫ)∆ for G of girth g ≥ 7 Frieze, Vera (2006) k ≥ (1.763 + ǫ)∆ for G locally sparse.

SLIDE 35

Max degree is too high!

SLIDE 36

Max degree is too high!

Degrees for typical instances of G(n, d/n)

SLIDE 37

Max degree is too high!

Degrees for typical instances of G(n, d/n)

the maximum degree is Θ

ln n

ln ln n

SLIDE 38

Max degree is too high!

Degrees for typical instances of G(n, d/n)

the maximum degree is Θ

ln n

ln ln n

the “vast majority” of the vertices are of degree in (1 ± ǫ)d

SLIDE 39

Max degree is too high!

Degrees for typical instances of G(n, d/n)

the maximum degree is Θ

ln n

ln ln n

the “vast majority” of the vertices are of degree in (1 ± ǫ)d

Remark

the “natural” bound for k is w.r.t. the expected degree d

SLIDE 40

Max degree is too high!

Degrees for typical instances of G(n, d/n)

the maximum degree is Θ

ln n

ln ln n

the “vast majority” of the vertices are of degree in (1 ± ǫ)d

Remark

the “natural” bound for k is w.r.t. the expected degree d

Conjectured Bound

We have rapid mixing when k ≥ (1 + ǫ)d.

SLIDE 41

Previous Work

SLIDE 42

Previous Work

Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ

ln ln n

ln ln ln n

SLIDE 43

Previous Work

Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ

ln ln n

ln ln ln n

k still depends on n

SLIDE 44

Previous Work

Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ

ln ln n

ln ln ln n

k still depends on n
Mossel, Sly (2008): k ≥ dc

SLIDE 45

Previous Work

Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ

ln ln n

ln ln ln n

k still depends on n
Mossel, Sly (2008): k ≥ dc
Efthymiou (2014): k ≥ (11/2)d

SLIDE 46

Main Result

SLIDE 47

Main Result

Theorem (Rapid Mixing)

For ǫ > 0 and sufficiently large d > 0 the following is true: For k ≥ (α + ǫ)d and with probability 1 − o(1) over G(n, d/n), the Glauber dynamics exhibits Tmix = O

n2+

1 log d

,

where α = 1.763 . . . is the solution to the equation (1/z)e(1/z) = 1.

SLIDE 48

The effect of high degrees

SLIDE 49

The effect of high degrees

Strategy from Dyer et al. (2005)

“Use block dynamics & hide the high degrees inside the blocks”

SLIDE 50

The plan

SLIDE 51

The plan

define appropriate block partition

SLIDE 52

The plan

define appropriate block partition
show rapid mixing for the block dynamics

SLIDE 53

The plan

define appropriate block partition
show rapid mixing for the block dynamics
deduce rapid mixing for the Glauber dynamics

SLIDE 54

The plan

define appropriate block partition
show rapid mixing for the block dynamics
deduce rapid mixing for the Glauber dynamics
use comparison

SLIDE 55

Block Construction

SLIDE 56

Block Construction

Weights [Efthymiou (2014)]

Each vertex u of degree deg(u) is assigned weight

W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)

therwise

SLIDE 57

Block Construction

Weights [Efthymiou (2014)]

Each vertex u of degree deg(u) is assigned weight

W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)

therwise
Every path L is assigned weight

u∈L W (u)

SLIDE 58

Block Construction

Weights [Efthymiou (2014)]

Each vertex u of degree deg(u) is assigned weight

W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)

therwise
Every path L is assigned weight

u∈L W (u)

“Break Points”

SLIDE 59

Block Construction

Weights [Efthymiou (2014)]

Each vertex u of degree deg(u) is assigned weight

W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)

therwise
Every path L is assigned weight

u∈L W (u)

“Break Points”

Γ(v) := set of paths of length at most

ln n d2/5 that emanate from v.

SLIDE 60

Block Construction

Weights [Efthymiou (2014)]

Each vertex u of degree deg(u) is assigned weight

W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)

therwise
Every path L is assigned weight

u∈L W (u)

“Break Points”

Γ(v) := set of paths of length at most

ln n d2/5 that emanate from v.

For a break-point v, we have

SLIDE 61

Block Construction

Weights [Efthymiou (2014)]

Each vertex u of degree deg(u) is assigned weight

W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)

therwise
Every path L is assigned weight

u∈L W (u)

“Break Points”

Γ(v) := set of paths of length at most

ln n d2/5 that emanate from v.

For a break-point v, we have max

L∈Γ(v)

u∈L

W (u)

≤ 1.

SLIDE 62

How do the Blocks look like

SLIDE 63

How do the Blocks look like

SLIDE 64

How do the Blocks look like

Boundary of the block

Consists only of break points.

SLIDE 65

How do the Blocks look like

Low degree “buffer”

. . . between boundary vertices and a high degree vertex

SLIDE 66

How do the Blocks look like

. . . for the analysis

the effect of high degrees disappears

SLIDE 67

Proving Rapid Mixing

SLIDE 68

Proving Rapid Mixing

Path Coupling, [Bubley, Dyer 1997]

SLIDE 69

Proving Rapid Mixing

Path Coupling, [Bubley, Dyer 1997]

Consider (Xt), (Yt) such that X0 ⊕ Y0 = {w∗}

SLIDE 70

Proving Rapid Mixing

Path Coupling, [Bubley, Dyer 1997]

Consider (Xt), (Yt) such that X0 ⊕ Y0 = {w∗} For rapid mixing it suffices to have a coupling such that E [dist(X1, Y1) | X0, Y0] ≤ (1 − γ)dist(X0, Y0),

SLIDE 71

Proving Rapid Mixing

Path Coupling, [Bubley, Dyer 1997]

Consider (Xt), (Yt) such that X0 ⊕ Y0 = {w∗} For rapid mixing it suffices to have a coupling such that E [dist(X1, Y1) | X0, Y0] ≤ (1 − γ)dist(X0, Y0), where dist(σ, τ) =

u∈σ⊕τ

β(u)

SLIDE 72

Distance between σ and τ

SLIDE 73

Distance between σ and τ

dist(σ, τ) depends on the block partition B.

SLIDE 74

Distance between σ and τ

dist(σ, τ) depends on the block partition B.

SLIDE 75

Distance between σ and τ

dist(σ, τ) depends on the block partition B.

SLIDE 76

Distance between σ and τ

A distance that counts the disagreeing edges between the blocks

SLIDE 77

Distance between σ and τ

A new distance metric

Given G(n, d/n) and set of blocks B, for any two σ, τ dist(σ, τ) =

v∈∂B

1{v ∈ σ ⊕ τ}degout(v)

SLIDE 78

Distance between σ and τ

A new distance metric

Given G(n, d/n) and set of blocks B, for any two σ, τ dist(σ, τ) = n2

v∈∂B

1{v ∈ σ⊕τ}degout(v) +

v∈V \∂B

1{v ∈ σ ⊕ τ}

SLIDE 79

Distance between σ and τ

A new distance metric

Given G(n, d/n) and set of blocks B, for any two σ, τ dist(σ, τ) = n2

v∈∂B

1{v ∈ σ ⊕τ}degout(v)+

v∈V \∂B

1{v ∈ σ ⊕τ}

SLIDE 80

The coupling

SLIDE 81

The coupling

B1 B2 B3 B4 B0

SLIDE 82

The coupling

B1 B2 B3 B4 B0

SLIDE 83

The coupling

B1 B2 B3 B4 B0

SLIDE 84

The coupling

B1 B2 B3 B4 B0

SLIDE 85

The coupling

B1 B2 B3 B4 B0

SLIDE 86

‘

SLIDE 87

The coupling of X(B) and Y (B)

SLIDE 88

The coupling of X(B) and Y (B)

one vertex at a time

SLIDE 89

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

SLIDE 90

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

SLIDE 91

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

SLIDE 92

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

SLIDE 93

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

SLIDE 94

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

SLIDE 95

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

disagreement probability

̺v =

1

k−deg(v)

deg(v) < k 1

therwise

SLIDE 96

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

disagreement probability

̺v =

1

k−deg(v)

deg(v) < k 1

therwise
probability of

the most likely color

SLIDE 97

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

disagreement probability

̺v =

1

k−deg(v)

deg(v) < k 1

therwise
probability of

the most likely color

SLIDE 98

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

disagreement probability

̺v =

1

k−deg(v)

deg(v) < k 1

therwise
probability of

the most likely color

SLIDE 99

The coupling of X(B) and Y (B)

one vertex at a time
pick a vertex next to a

disagreement

disagreement probability

̺v =

1

k−deg(v)

deg(v) < k 1

therwise
probability of

the most likely color

SLIDE 100

Rapid Mixing for k > 2d

SLIDE 101

Rapid Mixing for k > 2d

Probability of Propagation

̺v =

1

k−deg(v)

v is low degree 1

therwise

SLIDE 102

Rapid Mixing for k > 2d

Probability of Propagation

̺v =

1

k−deg(v)

v is low degree 1

therwise

Block partition

SLIDE 103

Rapid Mixing for k > 2d

Probability of Propagation

̺v =

1

k−deg(v)

v is low degree 1

therwise

Block partition Distance metric

SLIDE 104

Rapid Mixing for k > 2d

Probability of Propagation

̺v =

1

k−deg(v)

v is low degree 1

therwise

Block partition Distance metric Bound for k

Path coupling implies rapid mixing for k > 2d.

SLIDE 105

Better bounds with in-degrees

Goldberg, Martin, Paterson (2004)

SLIDE 106

Better bounds with in-degrees

Goldberg, Martin, Paterson (2004)

Probability of Propagation

̺v =    1 k − deg(v) v is low degree 1

therwise

the probability of the most likely color

SLIDE 107

Better bounds with in-degrees

Goldberg, Martin, Paterson (2004)

Probability of Propagation when k > αd

̺v =    (1 − ǫ) degin(v) v is low degree 1

therwise

the probability of the most likely color

SLIDE 108

Better bounds with in-degrees

Goldberg, Martin, Paterson (2004)

Probability of Propagation when k > αd

̺v =    (1 − ǫ) deg(v) v is low degree 1

therwise

the probability of the most likely color

SLIDE 109

Better bounds with in-degrees

Goldberg, Martin, Paterson (2004)

Probability of Propagation when k > αd

̺v =    (1 − ǫ) deg(v) v is low degree 1

therwise

the probability of the most likely color

Obstacle for the above

... the coloring at the boundary is “worst case”.

SLIDE 110

Better bounds with in-degrees

Goldberg, Martin, Paterson (2004)

Probability of Propagation when k > αd

̺v =    (1 − ǫ) deg(v) v is low degree 1

therwise

the probability of the most likely color

Obstacle for the above

... the neighbors outside use too many different colors!

SLIDE 111

Local Uniformity

Theorem (Local Uniformity)

With probability 1 − o(1) over G(n, d/n) the following is true: For all ε, C1, C2 > 0, for all d > d0, for k ≥ (α + ε)d, let I = [C1N, C2N] , for a low degree v ∈ V , Pr

∃t ∈ I s.t. |Availv(Xt)| ≤ 1{Ut(v)}(1 − ε2)k exp (−deg(v)/k)
≤

exp

−d2/3

.

SLIDE 112

Rapid Mixing with uniformity

SLIDE 113

Rapid Mixing with uniformity

w∗

G

There is a single disagreement at w∗

SLIDE 114

Rapid Mixing with uniformity

w∗

G

Run the chains for CN steps, “burn-in”

SLIDE 115

Rapid Mixing with uniformity

w∗

G

The disagreements spread in the graph during burn-in

SLIDE 116

Rapid Mixing with uniformity

w∗

G

log d √ d

Typically the disagreements do not escape the ball

SLIDE 117

Rapid Mixing with uniformity

w∗

G

log d √ d disagreement area

Typically the disagreements do not escape the ball

SLIDE 118

Rapid Mixing with uniformity

w∗

G

log d √ d disagreement area

Typically the ball has uniformity.

SLIDE 119

Rapid Mixing with uniformity

w∗

G

log d √ d disagreement area

E [dist(XCN, YCN)| X0, Y0] ≤ (1 − γ)dist(X0, Y0)

SLIDE 120

Block Update with Uniformity

SLIDE 121

Block Update with Uniformity

Probability of Propagation for k > αd

̺v =    1 − ǫ degin(v) v is low degree 1

therwise

SLIDE 122

Block Update with Uniformity

Probability of Propagation for k > αd

v ∈ Ball(w∗, (log d)2) ̺v =    1 − ǫ deg(v) v is low degree 1

therwise

SLIDE 123

Concluding Remarks

SLIDE 124

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)

SLIDE 125

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d

SLIDE 126

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)

SLIDE 127

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)
Block dynamics and Comparison

SLIDE 128

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)
Block dynamics and Comparison
Improvement on the exponent of Mixing Time

SLIDE 129

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)
Block dynamics and Comparison
Improvement on the exponent of Mixing Time
We argue on the statistical properties of colorings

SLIDE 130

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)
Block dynamics and Comparison
Improvement on the exponent of Mixing Time
We argue on the statistical properties of colorings
We get improved bounds for the hard-core model

SLIDE 131

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)
Block dynamics and Comparison
Improvement on the exponent of Mixing Time
We argue on the statistical properties of colorings
We get improved bounds for the hard-core model
rapid mixing for λ < 1/d

SLIDE 132

Concluding Remarks

Glauber Dynamics for sampling k-colorings of G(n, d/n)
Mixing time O
n2+

1 log d

for k ≥ (α + ǫ)d
α = 1.7632 . . . and 1/α is the solution to zez = 1
improved the factor (11/2)
Block dynamics and Comparison
Improvement on the exponent of Mixing Time
We argue on the statistical properties of colorings
We get improved bounds for the hard-core model
rapid mixing for λ < 1/d
previous bound was λ < 1/(2d) [Efthymiou (2014)]

SLIDE 133