SLIDE 1 Advanced Algorithms (XIII)
Shanghai Jiao Tong University
Chihao Zhang
June 1, 2020
SLIDE 2
Total Variation Distance
SLIDE 3
Total Variation Distance
Let and be two distributions on
μ ν Ω
SLIDE 4
Total Variation Distance
Let and be two distributions on
μ ν Ω
Their total variation distance is
SLIDE 5 Total Variation Distance
Let and be two distributions on
μ ν Ω
Their total variation distance is dTV(μ, ν) = 1 2 ∑
x∈Ω
μ(x) − ν(x) = max
A⊆Ω μ(A) − ν(A)
SLIDE 6 Total Variation Distance
Let and be two distributions on
μ ν Ω
Their total variation distance is dTV(μ, ν) = 1 2 ∑
x∈Ω
μ(x) − ν(x) = max
A⊆Ω μ(A) − ν(A)
A
μ ν
SLIDE 7 Total Variation Distance
Let and be two distributions on
μ ν Ω
Their total variation distance is dTV(μ, ν) = 1 2 ∑
x∈Ω
μ(x) − ν(x) = max
A⊆Ω μ(A) − ν(A)
A
μ ν
ℓ1 1 2
SLIDE 8
Coupling
SLIDE 9
Coupling
Let and be two distributions on
μ ν Ω
SLIDE 10
Coupling
Let and be two distributions on
μ ν Ω
A coupling of and is a joint distribution on such that:
μ ν ω Ω × Ω
SLIDE 11 Coupling
Let and be two distributions on
μ ν Ω
A coupling of and is a joint distribution on such that:
μ ν ω Ω × Ω
∀x ∈ Ω, μ(x) = ∑
y∈Ω
ω(x, y)
SLIDE 12 Coupling
Let and be two distributions on
μ ν Ω
A coupling of and is a joint distribution on such that:
μ ν ω Ω × Ω
∀x ∈ Ω, μ(x) = ∑
y∈Ω
ω(x, y) ∀y ∈ Ω, ν(x) = ∑
x∈Ω
ω(x, y)
SLIDE 13
Coupling Lemma
SLIDE 14
Coupling Lemma
Let be a coupling of and
ω μ ν
SLIDE 15 Coupling Lemma
Let be a coupling of and
ω μ ν
and
(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν
SLIDE 16 Coupling Lemma
Let be a coupling of and
ω μ ν
and
(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν
Then Pr
(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)
SLIDE 17 Coupling Lemma
Let be a coupling of and
ω μ ν
and
(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν
Then Pr
(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)
Moreover, there exists such that
ω*
SLIDE 18 Coupling Lemma
Let be a coupling of and
ω μ ν
and
(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν
Then Pr
(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)
Moreover, there exists such that
ω*
Pr
(X,Y)∼ω* [X ≠ Y] = dTV(μ, ν)
SLIDE 19
Proof of Coupling Lemma
SLIDE 20
Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
SLIDE 21
Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
SLIDE 22 Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
μ ν
1 2 1 2 1 3 2 3
SLIDE 23 Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
μ ν
1 2 1 2 1 3 2 3 1 3
SLIDE 24 Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
μ ν
1 2 1 2 1 3 2 3 1 3 1 2
SLIDE 25 Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
μ ν
1 2 1 2 1 3 2 3 1 3 1 2
SLIDE 26 Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
μ ν
1 2 1 2 1 3 2 3 1 3 1 2 1 6
SLIDE 27 Proof of Coupling Lemma
For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct
Ω Ω × Ω
Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)
μ ν
1 2 1 2 1 3 2 3 1 3 1 2 1 6
is the one maximizing the sum of diagonals
ω*
SLIDE 28
Coupling of Markov Chains
SLIDE 29
Coupling of Markov Chains
Consider two copies of the chain :
P
SLIDE 30 Coupling of Markov Chains
Consider two copies of the chain :
P
- The initial distribution is
and
μ0 ν0 μT
t = μT 0 Pt
νT
t = νT 0 Pt
SLIDE 31 Coupling of Markov Chains
Consider two copies of the chain :
P
A coupling of the two chains is joint distribution
and satisfying the following conditions
ω {μt}t≥0 {νt}t≥0
- The initial distribution is
and
μ0 ν0 μT
t = μT 0 Pt
νT
t = νT 0 Pt
SLIDE 32
SLIDE 33
is a pair of processes such that
{(Xt, Yt)}t≥0 ∼ ω
SLIDE 34
is a pair of processes such that
{(Xt, Yt)}t≥0 ∼ ω
∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b)
SLIDE 35
is a pair of processes such that
{(Xt, Yt)}t≥0 ∼ ω
∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)
SLIDE 36 is a pair of processes such that
{(Xt, Yt)}t≥0 ∼ ω
∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)
Marginally, and are both chain
{Xt} {Yt} P
SLIDE 37 is a pair of processes such that
{(Xt, Yt)}t≥0 ∼ ω
∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)
Marginally, and are both chain
{Xt} {Yt} P
∀t ≥ 0, Xt = Yt ⟹ Xt′ = Yt′ for all t′ > t
SLIDE 38 is a pair of processes such that
{(Xt, Yt)}t≥0 ∼ ω
∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)
Marginally, and are both chain
{Xt} {Yt} P
∀t ≥ 0, Xt = Yt ⟹ Xt′ = Yt′ for all t′ > t
Two chains coalesce once they meet
SLIDE 39
Fundamental Theorem via Coupling
SLIDE 40 Fundamental Theorem via Coupling
If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that
P π μ
lim
t→∞ μTPt = πT
SLIDE 41 Fundamental Theorem via Coupling
If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that
P π μ
lim
t→∞ μTPt = πT
Consider two chains and
{Xt}t≥0 {Yt}t≥0
SLIDE 42 Fundamental Theorem via Coupling
If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that
P π μ
lim
t→∞ μTPt = πT
Consider two chains and
{Xt}t≥0 {Yt}t≥0
for arbitrary
X0 ∼ π Y0 ∼ μ0 μ0
SLIDE 43 Fundamental Theorem via Coupling
If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that
P π μ
lim
t→∞ μTPt = πT
Consider two chains and
{Xt}t≥0 {Yt}t≥0
for arbitrary
X0 ∼ π Y0 ∼ μ0 μ0
and run independently
Xt Yt
SLIDE 44
SLIDE 45
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
SLIDE 46
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
SLIDE 47
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
SLIDE 48
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
Pr[Xt ≠ Yt] ≤ 1 − θ < 1
SLIDE 49
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1
SLIDE 50
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1
…
SLIDE 51
irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1
…
Pr[Xkt ≠ Ykt] ≤ (1 − θ)k
SLIDE 52 irreducible + aperiodic
⟹ ∃t, ∀x, y, Pt(x, y) > 0
Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0
Then for any , there exists some s.t.
z ∈ Ω θ > 0
Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1
…
Pr[Xkt ≠ Ykt] ≤ (1 − θ)k
lim
n→∞ Pr[Xn ≠ Yn] = 0
SLIDE 53
Mixing Time
SLIDE 54
Mixing Time
The mixing time is the the first time such that the total variation distance between and is at most , for any initial
τmix(ε) t Xt π ε X0
SLIDE 55 Mixing Time
The mixing time is the the first time such that the total variation distance between and is at most , for any initial
τmix(ε) t Xt π ε X0
τmix(ε) = max
μ0
min
t≥0 dTV(μT 0 Pt, π) ≤ ε
SLIDE 56 Mixing Time
The mixing time is the the first time such that the total variation distance between and is at most , for any initial
τmix(ε) t Xt π ε X0
τmix(ε) = max
μ0
min
t≥0 dTV(μT 0 Pt, π) ≤ ε
τmix = τmix(1/4)
SLIDE 57
Random Walk on Hyper Cube
SLIDE 58 Random Walk on Hyper Cube
SLIDE 59 Random Walk on Hyper Cube
x ∼ y ∥x − y∥1 = 1
SLIDE 60 Random Walk on Hyper Cube
x ∼ y ∥x − y∥1 = 1
Standing at
- with prob. , do nothing
- otherwise, choose
u.a.r and flip
x ∈ {0,1}n 1 2 i ∈ [n] x(i)
Lazy walk on G
SLIDE 61
SLIDE 62
The chain is equivalent to
SLIDE 63 The chain is equivalent to
and u.a.r.
i ∈ [n] b ∈ {0,1} x(i) ← b
SLIDE 64 The chain is equivalent to
and u.a.r.
i ∈ [n] b ∈ {0,1} x(i) ← b
Let and be two walks
Xt Yt
SLIDE 65 The chain is equivalent to
and u.a.r.
i ∈ [n] b ∈ {0,1} x(i) ← b
Let and be two walks
Xt Yt
We couple them by choosing the same and
i b
SLIDE 66
SLIDE 67
What is the probability that ?
Xt ≠ Yt
SLIDE 68
What is the probability that ?
Xt ≠ Yt
Coupon Collector!
SLIDE 69
What is the probability that ?
Xt ≠ Yt
Coupon Collector! If , then
t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c
SLIDE 70
What is the probability that ?
Xt ≠ Yt
Coupon Collector! If , then
t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c
Coupling lemma implies that
SLIDE 71
What is the probability that ?
Xt ≠ Yt
Coupon Collector! If , then
t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c
Coupling lemma implies that τmix(ε) ≤ n log n + n log ε−1
SLIDE 72
Another Random Walk
SLIDE 73 Another Random Walk
Standing at
, do nothing
u.a.r and flip
x ∈ {0,1}n 1 n + 1 i ∈ [n] x(i)
Lazy walk on G
SLIDE 74 Another Random Walk
Standing at
, do nothing
u.a.r and flip
x ∈ {0,1}n 1 n + 1 i ∈ [n] x(i)
Lazy walk on G
A coupling argument implies τmix ≤ 1
2 n log n + O(n)
SLIDE 75
Reversible Chain
SLIDE 76
Reversible Chain
Recall that we say a Markov chain is reversible with respect to if
P π
SLIDE 77
Reversible Chain
Recall that we say a Markov chain is reversible with respect to if
P π
∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x)
SLIDE 78
Reversible Chain
Recall that we say a Markov chain is reversible with respect to if
P π
∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x) Then is a stationary distribution of
π P
SLIDE 79
Reversible Chain
Recall that we say a Markov chain is reversible with respect to if
P π
∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x) Then is a stationary distribution of
π P
We showed that spectral decomposition is a powerful tool to analyze reversible chains
SLIDE 80
Relaxation Time
SLIDE 81
Relaxation Time
SLIDE 82 Relaxation Time
λ⋆ := max
1≤i≤n−1 |λi|
SLIDE 83 Relaxation Time
λ⋆ := max
1≤i≤n−1 |λi|
τrel := 1 1 − λ⋆
SLIDE 84 Relaxation Time
λ⋆ := max
1≤i≤n−1 |λi|
τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains:
SLIDE 85 Relaxation Time
λ⋆ := max
1≤i≤n−1 |λi|
τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains: (τrel − 1)log ( 1 2ε) ≤ τmix(ε) ≤ τrel log ( 1 επmin)
SLIDE 86 Relaxation Time
λ⋆ := max
1≤i≤n−1 |λi|
τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains: (τrel − 1)log ( 1 2ε) ≤ τmix(ε) ≤ τrel log ( 1 επmin)
πmin := min
x π(x)