SLIDE 1
Improved bounds for MCMC sampling colorings of G ( n , d / n ) - - PowerPoint PPT Presentation
Improved bounds for MCMC sampling colorings of G ( n , d / n ) - - PowerPoint PPT Presentation
Improved bounds for MCMC sampling colorings of G ( n , d / n ) Charis Efthymiou efthymiou@gmail.com Goethe University, Frankfurt Joint work with: T. Hayes, D. Stefankovi c and E. Vigoda Workshop on Local Algorithms MIT Boston, June,
SLIDE 2
SLIDE 3
Sampling Problem
Coloring model µ
SLIDE 4
Sampling Problem
Coloring model µ
For a graph G = (V , E) and an integer k > 0:
SLIDE 5
Sampling Problem
Coloring model µ
For a graph G = (V , E) and an integer k > 0: uniform distribution over the proper k-colorings of G
SLIDE 6
Sampling Problem
Coloring model µ
For a graph G = (V , E) and an integer k > 0: uniform distribution over the proper k-colorings of G
Sampling Problem
Input: G = (V , E), k Output: a k-coloring distributed as in µ(·)
SLIDE 7
Sampling Problem
SLIDE 8
Sampling Problem
Input graph is G(n,d/n)
SLIDE 9
Sampling Problem
Input graph is G(n,d/n)
- n vertices
SLIDE 10
Sampling Problem
Input graph is G(n,d/n)
- n vertices
- edges appear independently with probability d/n, d is fixed
SLIDE 11
Sampling Problem
Input graph is G(n,d/n)
- n vertices
- edges appear independently with probability d/n, d is fixed
Efficient algorithms
SLIDE 12
Sampling Problem
Input graph is G(n,d/n)
- n vertices
- edges appear independently with probability d/n, d is fixed
Efficient algorithms
- unlikely to have efficient algorithm
SLIDE 13
Sampling Problem
Input graph is G(n,d/n)
- n vertices
- edges appear independently with probability d/n, d is fixed
Efficient algorithms
- unlikely to have efficient algorithm
- focus on efficient approximation algorithms
SLIDE 14
Markov Chain Monte Carlo
SLIDE 15
Markov Chain Monte Carlo
Given G and integer k > 0,
SLIDE 16
Markov Chain Monte Carlo
Given G and integer k > 0,
- set up an Markov Chain over the k-colorings of G
SLIDE 17
Markov Chain Monte Carlo
Given G and integer k > 0,
- set up an Markov Chain over the k-colorings of G
- the equilibrium distribution is the coloring model
SLIDE 18
Markov Chain Monte Carlo
Given G and integer k > 0,
- set up an Markov Chain over the k-colorings of G
- the equilibrium distribution is the coloring model
- the algorithm simulates the Markov Chain
SLIDE 19
Markov Chain Monte Carlo
Given G and integer k > 0,
- set up an Markov Chain over the k-colorings of G
- the equilibrium distribution is the coloring model
- the algorithm simulates the Markov Chain
- outputs the configuration of the chain after
“sufficiently many” transitions
SLIDE 20
Markov Chain Monte Carlo
Given G and integer k > 0,
- set up an Markov Chain over the k-colorings of G
- the equilibrium distribution is the coloring model
- the algorithm simulates the Markov Chain
- outputs the configuration of the chain after
“sufficiently many” transitions the output should be “close” to µ
SLIDE 21
Markov Chain Monte Carlo
Given G and integer k > 0,
- set up an Markov Chain over the k-colorings of G
- the equilibrium distribution is the coloring model
- the algorithm simulates the Markov Chain
- outputs the configuration of the chain after
“sufficiently many” transitions the output should be “close” to µ it is desirable that the chain “mixes fast”
SLIDE 22
The local algorithms
SLIDE 23
The local algorithms
“Glauber dynamics’
- X0 = σ
- Xt → Xt+1
- Choose vertex w uniformly at random from V
- Set Xt+1(u) = Xt(u), for every vertex u = w
- Set Xt+1(w) according to µ conditional on Xt+1(V \w).
SLIDE 24
The local algorithms
“Glauber dynamics’
- X0 = σ
- Xt → Xt+1
- Choose vertex w uniformly at random from V
- Set Xt+1(u) = Xt(u), for every vertex u = w
- Set Xt+1(w) according to µ conditional on Xt+1(V \w).
Block dynamics
. . . instead of single vertices, update small the blocks.
SLIDE 25
The problem
MCMC sampling colorings of G(n, d/n) with Glauber dynamics
SLIDE 26
Some technicalities
SLIDE 27
Some technicalities
There is a standard way of dealing with . . .
- ergodicity
- how to get initial configuration
SLIDE 28
Some technicalities
There is a standard way of dealing with . . .
- ergodicity
- how to get initial configuration
Focus
. . . speed of convergence.
SLIDE 29
How to measure speed . . .
SLIDE 30
How to measure speed . . .
Mixing Time
The number of transitions needed for the chain to reach within total variation distance 1/e from µ(·). For worst case X0.
SLIDE 31
How to measure speed . . .
Mixing Time
The number of transitions needed for the chain to reach within total variation distance 1/e from µ(·). For worst case X0.
Interesting cases
. . . when the mixing time is polynomial in n
SLIDE 32
How to measure speed . . .
Mixing Time
The number of transitions needed for the chain to reach within total variation distance 1/e from µ(·). For worst case X0.
Interesting cases
. . . when the mixing time is polynomial in n . . . we have “rapid mixing”
SLIDE 33
Rapid Mixing and Maximum Degree ∆
SLIDE 34
Rapid Mixing and Maximum Degree ∆
Maximum Degree Bounds for colorings
Vigoda (1999) k > 11
6 ∆ for general G
Hayes,Vera,Vigoda (2007) k = Ω(∆/ log ∆) for planar G Goldberg, Martin, Paterson (2004) k ≥ (1.763 + ǫ)∆ for G triangle free and amenable Dyer, Frieze, Hayes, Vigoda (2004) k ≥ (1.48 + ǫ)∆ for G of girth g ≥ 7 Frieze, Vera (2006) k ≥ (1.763 + ǫ)∆ for G locally sparse.
SLIDE 35
Max degree is too high!
SLIDE 36
Max degree is too high!
Degrees for typical instances of G(n, d/n)
SLIDE 37
Max degree is too high!
Degrees for typical instances of G(n, d/n)
- the maximum degree is Θ
ln n
ln ln n
SLIDE 38
Max degree is too high!
Degrees for typical instances of G(n, d/n)
- the maximum degree is Θ
ln n
ln ln n
- the “vast majority” of the vertices are of degree in (1 ± ǫ)d
SLIDE 39
Max degree is too high!
Degrees for typical instances of G(n, d/n)
- the maximum degree is Θ
ln n
ln ln n
- the “vast majority” of the vertices are of degree in (1 ± ǫ)d
Remark
the “natural” bound for k is w.r.t. the expected degree d
SLIDE 40
Max degree is too high!
Degrees for typical instances of G(n, d/n)
- the maximum degree is Θ
ln n
ln ln n
- the “vast majority” of the vertices are of degree in (1 ± ǫ)d
Remark
the “natural” bound for k is w.r.t. the expected degree d
Conjectured Bound
We have rapid mixing when k ≥ (1 + ǫ)d.
SLIDE 41
Previous Work
SLIDE 42
Previous Work
- Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ
ln ln n
ln ln ln n
SLIDE 43
Previous Work
- Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ
ln ln n
ln ln ln n
- k still depends on n
SLIDE 44
Previous Work
- Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ
ln ln n
ln ln ln n
- k still depends on n
- Mossel, Sly (2008): k ≥ dc
SLIDE 45
Previous Work
- Dyer, Flaxman, Frieze, Vigoda (2005): k ≥ Θ
ln ln n
ln ln ln n
- k still depends on n
- Mossel, Sly (2008): k ≥ dc
- Efthymiou (2014): k ≥ (11/2)d
SLIDE 46
Main Result
SLIDE 47
Main Result
Theorem (Rapid Mixing)
For ǫ > 0 and sufficiently large d > 0 the following is true: For k ≥ (α + ǫ)d and with probability 1 − o(1) over G(n, d/n), the Glauber dynamics exhibits Tmix = O
- n2+
1 log d
- ,
where α = 1.763 . . . is the solution to the equation (1/z)e(1/z) = 1.
SLIDE 48
The effect of high degrees
SLIDE 49
The effect of high degrees
Strategy from Dyer et al. (2005)
“Use block dynamics & hide the high degrees inside the blocks”
SLIDE 50
The plan
SLIDE 51
The plan
- define appropriate block partition
SLIDE 52
The plan
- define appropriate block partition
- show rapid mixing for the block dynamics
SLIDE 53
The plan
- define appropriate block partition
- show rapid mixing for the block dynamics
- deduce rapid mixing for the Glauber dynamics
SLIDE 54
The plan
- define appropriate block partition
- show rapid mixing for the block dynamics
- deduce rapid mixing for the Glauber dynamics
- use comparison
SLIDE 55
Block Construction
SLIDE 56
Block Construction
Weights [Efthymiou (2014)]
- Each vertex u of degree deg(u) is assigned weight
W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)
- therwise
SLIDE 57
Block Construction
Weights [Efthymiou (2014)]
- Each vertex u of degree deg(u) is assigned weight
W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)
- therwise
- Every path L is assigned weight
u∈L W (u)
SLIDE 58
Block Construction
Weights [Efthymiou (2014)]
- Each vertex u of degree deg(u) is assigned weight
W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)
- therwise
- Every path L is assigned weight
u∈L W (u)
“Break Points”
SLIDE 59
Block Construction
Weights [Efthymiou (2014)]
- Each vertex u of degree deg(u) is assigned weight
W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)
- therwise
- Every path L is assigned weight
u∈L W (u)
“Break Points”
Γ(v) := set of paths of length at most
ln n d2/5 that emanate from v.
SLIDE 60
Block Construction
Weights [Efthymiou (2014)]
- Each vertex u of degree deg(u) is assigned weight
W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)
- therwise
- Every path L is assigned weight
u∈L W (u)
“Break Points”
Γ(v) := set of paths of length at most
ln n d2/5 that emanate from v.
For a break-point v, we have
SLIDE 61
Block Construction
Weights [Efthymiou (2014)]
- Each vertex u of degree deg(u) is assigned weight
W (u) = (1 + γ)−1 deg(u) ≤ (1 + ǫ)d dc · deg(u)
- therwise
- Every path L is assigned weight
u∈L W (u)
“Break Points”
Γ(v) := set of paths of length at most
ln n d2/5 that emanate from v.
For a break-point v, we have max
L∈Γ(v)
- u∈L
W (u)
- ≤ 1.
SLIDE 62
How do the Blocks look like
SLIDE 63
How do the Blocks look like
SLIDE 64
How do the Blocks look like
Boundary of the block
Consists only of break points.
SLIDE 65
How do the Blocks look like
Low degree “buffer”
. . . between boundary vertices and a high degree vertex
SLIDE 66
How do the Blocks look like
. . . for the analysis
the effect of high degrees disappears
SLIDE 67
Proving Rapid Mixing
SLIDE 68
Proving Rapid Mixing
Path Coupling, [Bubley, Dyer 1997]
SLIDE 69
Proving Rapid Mixing
Path Coupling, [Bubley, Dyer 1997]
Consider (Xt), (Yt) such that X0 ⊕ Y0 = {w∗}
SLIDE 70
Proving Rapid Mixing
Path Coupling, [Bubley, Dyer 1997]
Consider (Xt), (Yt) such that X0 ⊕ Y0 = {w∗} For rapid mixing it suffices to have a coupling such that E [dist(X1, Y1) | X0, Y0] ≤ (1 − γ)dist(X0, Y0),
SLIDE 71
Proving Rapid Mixing
Path Coupling, [Bubley, Dyer 1997]
Consider (Xt), (Yt) such that X0 ⊕ Y0 = {w∗} For rapid mixing it suffices to have a coupling such that E [dist(X1, Y1) | X0, Y0] ≤ (1 − γ)dist(X0, Y0), where dist(σ, τ) =
- u∈σ⊕τ
β(u)
SLIDE 72
Distance between σ and τ
SLIDE 73
Distance between σ and τ
dist(σ, τ) depends on the block partition B.
SLIDE 74
Distance between σ and τ
dist(σ, τ) depends on the block partition B.
SLIDE 75
Distance between σ and τ
dist(σ, τ) depends on the block partition B.
SLIDE 76
Distance between σ and τ
A distance that counts the disagreeing edges between the blocks
SLIDE 77
Distance between σ and τ
A new distance metric
Given G(n, d/n) and set of blocks B, for any two σ, τ dist(σ, τ) =
- v∈∂B
1{v ∈ σ ⊕ τ}degout(v)
SLIDE 78
Distance between σ and τ
A new distance metric
Given G(n, d/n) and set of blocks B, for any two σ, τ dist(σ, τ) = n2
v∈∂B
1{v ∈ σ⊕τ}degout(v) +
- v∈V \∂B
1{v ∈ σ ⊕ τ}
SLIDE 79
Distance between σ and τ
A new distance metric
Given G(n, d/n) and set of blocks B, for any two σ, τ dist(σ, τ) = n2
v∈∂B
1{v ∈ σ ⊕τ}degout(v)+
- v∈V \∂B
1{v ∈ σ ⊕τ}
SLIDE 80
The coupling
SLIDE 81
The coupling
B1 B2 B3 B4 B0
SLIDE 82
The coupling
B1 B2 B3 B4 B0
SLIDE 83
The coupling
B1 B2 B3 B4 B0
SLIDE 84
The coupling
B1 B2 B3 B4 B0
SLIDE 85
The coupling
B1 B2 B3 B4 B0
SLIDE 86
‘
SLIDE 87
The coupling of X(B) and Y (B)
SLIDE 88
The coupling of X(B) and Y (B)
- one vertex at a time
SLIDE 89
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
SLIDE 90
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
SLIDE 91
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
SLIDE 92
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
SLIDE 93
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
SLIDE 94
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
SLIDE 95
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
- disagreement probability
̺v =
- 1
k−deg(v)
deg(v) < k 1
- therwise
SLIDE 96
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
- disagreement probability
̺v =
- 1
k−deg(v)
deg(v) < k 1
- therwise
- probability of
the most likely color
SLIDE 97
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
- disagreement probability
̺v =
- 1
k−deg(v)
deg(v) < k 1
- therwise
- probability of
the most likely color
SLIDE 98
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
- disagreement probability
̺v =
- 1
k−deg(v)
deg(v) < k 1
- therwise
- probability of
the most likely color
SLIDE 99
The coupling of X(B) and Y (B)
- one vertex at a time
- pick a vertex next to a
disagreement
- disagreement probability
̺v =
- 1
k−deg(v)
deg(v) < k 1
- therwise
- probability of
the most likely color
SLIDE 100
Rapid Mixing for k > 2d
SLIDE 101
Rapid Mixing for k > 2d
Probability of Propagation
̺v =
- 1
k−deg(v)
v is low degree 1
- therwise
SLIDE 102
Rapid Mixing for k > 2d
Probability of Propagation
̺v =
- 1
k−deg(v)
v is low degree 1
- therwise
Block partition
SLIDE 103
Rapid Mixing for k > 2d
Probability of Propagation
̺v =
- 1
k−deg(v)
v is low degree 1
- therwise
Block partition Distance metric
SLIDE 104
Rapid Mixing for k > 2d
Probability of Propagation
̺v =
- 1
k−deg(v)
v is low degree 1
- therwise
Block partition Distance metric Bound for k
Path coupling implies rapid mixing for k > 2d.
SLIDE 105
Better bounds with in-degrees
Goldberg, Martin, Paterson (2004)
SLIDE 106
Better bounds with in-degrees
Goldberg, Martin, Paterson (2004)
Probability of Propagation
̺v = 1 k − deg(v) v is low degree 1
- therwise
the probability of the most likely color
SLIDE 107
Better bounds with in-degrees
Goldberg, Martin, Paterson (2004)
Probability of Propagation when k > αd
̺v = (1 − ǫ) degin(v) v is low degree 1
- therwise
the probability of the most likely color
SLIDE 108
Better bounds with in-degrees
Goldberg, Martin, Paterson (2004)
Probability of Propagation when k > αd
̺v = (1 − ǫ) deg(v) v is low degree 1
- therwise
the probability of the most likely color
SLIDE 109
Better bounds with in-degrees
Goldberg, Martin, Paterson (2004)
Probability of Propagation when k > αd
̺v = (1 − ǫ) deg(v) v is low degree 1
- therwise
the probability of the most likely color
Obstacle for the above
... the coloring at the boundary is “worst case”.
SLIDE 110
Better bounds with in-degrees
Goldberg, Martin, Paterson (2004)
Probability of Propagation when k > αd
̺v = (1 − ǫ) deg(v) v is low degree 1
- therwise
the probability of the most likely color
Obstacle for the above
... the neighbors outside use too many different colors!
SLIDE 111
Local Uniformity
Theorem (Local Uniformity)
With probability 1 − o(1) over G(n, d/n) the following is true: For all ε, C1, C2 > 0, for all d > d0, for k ≥ (α + ε)d, let I = [C1N, C2N] , for a low degree v ∈ V , Pr
- ∃t ∈ I s.t. |Availv(Xt)| ≤ 1{Ut(v)}(1 − ε2)k exp (−deg(v)/k)
- ≤
exp
- −d2/3
.
SLIDE 112
Rapid Mixing with uniformity
SLIDE 113
Rapid Mixing with uniformity
w∗
G
There is a single disagreement at w∗
SLIDE 114
Rapid Mixing with uniformity
w∗
G
Run the chains for CN steps, “burn-in”
SLIDE 115
Rapid Mixing with uniformity
w∗
G
The disagreements spread in the graph during burn-in
SLIDE 116
Rapid Mixing with uniformity
w∗
G
log d √ d
Typically the disagreements do not escape the ball
SLIDE 117
Rapid Mixing with uniformity
w∗
G
log d √ d disagreement area
Typically the disagreements do not escape the ball
SLIDE 118
Rapid Mixing with uniformity
w∗
G
log d √ d disagreement area
Typically the ball has uniformity.
SLIDE 119
Rapid Mixing with uniformity
w∗
G
log d √ d disagreement area
E [dist(XCN, YCN)| X0, Y0] ≤ (1 − γ)dist(X0, Y0)
SLIDE 120
Block Update with Uniformity
SLIDE 121
Block Update with Uniformity
Probability of Propagation for k > αd
̺v = 1 − ǫ degin(v) v is low degree 1
- therwise
SLIDE 122
Block Update with Uniformity
Probability of Propagation for k > αd
v ∈ Ball(w∗, (log d)2) ̺v = 1 − ǫ deg(v) v is low degree 1
- therwise
SLIDE 123
Concluding Remarks
SLIDE 124
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
SLIDE 125
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
SLIDE 126
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
SLIDE 127
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
- Block dynamics and Comparison
SLIDE 128
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
- Block dynamics and Comparison
- Improvement on the exponent of Mixing Time
SLIDE 129
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
- Block dynamics and Comparison
- Improvement on the exponent of Mixing Time
- We argue on the statistical properties of colorings
SLIDE 130
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
- Block dynamics and Comparison
- Improvement on the exponent of Mixing Time
- We argue on the statistical properties of colorings
- We get improved bounds for the hard-core model
SLIDE 131
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
- Block dynamics and Comparison
- Improvement on the exponent of Mixing Time
- We argue on the statistical properties of colorings
- We get improved bounds for the hard-core model
- rapid mixing for λ < 1/d
SLIDE 132
Concluding Remarks
- Glauber Dynamics for sampling k-colorings of G(n, d/n)
- Mixing time O
- n2+
1 log d
- for k ≥ (α + ǫ)d
- α = 1.7632 . . . and 1/α is the solution to zez = 1
- improved the factor (11/2)
- Block dynamics and Comparison
- Improvement on the exponent of Mixing Time
- We argue on the statistical properties of colorings
- We get improved bounds for the hard-core model
- rapid mixing for λ < 1/d
- previous bound was λ < 1/(2d) [Efthymiou (2014)]
SLIDE 133