[PPT] - Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao PowerPoint Presentation

SLIDE 1

Advanced Algorithms (XIII)

Shanghai Jiao Tong University

Chihao Zhang

June 1, 2020

SLIDE 2

Total Variation Distance

SLIDE 3

Total Variation Distance

Let and be two distributions on

μ ν Ω

SLIDE 4

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is

SLIDE 5

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is dTV(μ, ν) = 1 2 ∑

x∈Ω

μ(x) − ν(x) = max

A⊆Ω μ(A) − ν(A)

SLIDE 6

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is dTV(μ, ν) = 1 2 ∑

x∈Ω

μ(x) − ν(x) = max

A⊆Ω μ(A) − ν(A)

A

μ ν

SLIDE 7

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is dTV(μ, ν) = 1 2 ∑

x∈Ω

μ(x) − ν(x) = max

A⊆Ω μ(A) − ν(A)

A

μ ν

distance scaled by

ℓ1 1 2

SLIDE 8

Coupling

SLIDE 9

Coupling

Let and be two distributions on

μ ν Ω

SLIDE 10

Coupling

Let and be two distributions on

μ ν Ω

A coupling of and is a joint distribution on such that:

μ ν ω Ω × Ω

SLIDE 11

Coupling

Let and be two distributions on

μ ν Ω

A coupling of and is a joint distribution on such that:

μ ν ω Ω × Ω

∀x ∈ Ω, μ(x) = ∑

y∈Ω

ω(x, y)

SLIDE 12

Coupling

Let and be two distributions on

μ ν Ω

A coupling of and is a joint distribution on such that:

μ ν ω Ω × Ω

∀x ∈ Ω, μ(x) = ∑

y∈Ω

ω(x, y) ∀y ∈ Ω, ν(x) = ∑

x∈Ω

ω(x, y)

SLIDE 13

Coupling Lemma

SLIDE 14

Coupling Lemma

Let be a coupling of and

ω μ ν

SLIDE 15

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

SLIDE 16

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

Then Pr

(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)

SLIDE 17

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

Then Pr

(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)

Moreover, there exists such that

ω*

SLIDE 18

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

Then Pr

(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)

Moreover, there exists such that

ω*

Pr

(X,Y)∼ω* [X ≠ Y] = dTV(μ, ν)

SLIDE 19

Proof of Coupling Lemma

SLIDE 20

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

SLIDE 21

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

SLIDE 22

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3

SLIDE 23

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3

SLIDE 24

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2

SLIDE 25

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2

SLIDE 26

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2 1 6

SLIDE 27

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2 1 6

is the one maximizing the sum of diagonals

ω*

SLIDE 28

Coupling of Markov Chains

SLIDE 29

Coupling of Markov Chains

Consider two copies of the chain :

P

SLIDE 30

Coupling of Markov Chains

Consider two copies of the chain :

P

The initial distribution is

and

and

μ0 ν0 μT

t = μT 0 Pt

νT

t = νT 0 Pt

SLIDE 31

Coupling of Markov Chains

Consider two copies of the chain :

P

A coupling of the two chains is joint distribution

f

and satisfying the following conditions

ω {μt}t≥0 {νt}t≥0

The initial distribution is

and

and

μ0 ν0 μT

t = μT 0 Pt

νT

t = νT 0 Pt

SLIDE 32

SLIDE 33

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

SLIDE 34

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b)

SLIDE 35

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

SLIDE 36

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

Marginally, and are both chain

{Xt} {Yt} P

SLIDE 37

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

Marginally, and are both chain

{Xt} {Yt} P

∀t ≥ 0, Xt = Yt ⟹ Xt′ = Yt′ for all t′ > t

SLIDE 38

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

Marginally, and are both chain

{Xt} {Yt} P

∀t ≥ 0, Xt = Yt ⟹ Xt′ = Yt′ for all t′ > t

Two chains coalesce once they meet

SLIDE 39

Fundamental Theorem via Coupling

SLIDE 40

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

SLIDE 41

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

Consider two chains and

{Xt}t≥0 {Yt}t≥0

SLIDE 42

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

Consider two chains and

{Xt}t≥0 {Yt}t≥0

,

for arbitrary

X0 ∼ π Y0 ∼ μ0 μ0

SLIDE 43

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

Consider two chains and

{Xt}t≥0 {Yt}t≥0

,

for arbitrary

X0 ∼ π Y0 ∼ μ0 μ0

A coupling where

and run independently

Xt Yt

SLIDE 44

SLIDE 45

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

SLIDE 46

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

SLIDE 47

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

SLIDE 48

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[Xt ≠ Yt] ≤ 1 − θ < 1

SLIDE 49

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

SLIDE 50

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

…

SLIDE 51

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

…

Pr[Xkt ≠ Ykt] ≤ (1 − θ)k

SLIDE 52

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

…

Pr[Xkt ≠ Ykt] ≤ (1 − θ)k

lim

n→∞ Pr[Xn ≠ Yn] = 0

SLIDE 53

Mixing Time

SLIDE 54

Mixing Time

The mixing time is the the first time such that the total variation distance between and is at most , for any initial

τmix(ε) t Xt π ε X0

SLIDE 55

Mixing Time

The mixing time is the the first time such that the total variation distance between and is at most , for any initial

τmix(ε) t Xt π ε X0

τmix(ε) = max

μ0

min

t≥0 dTV(μT 0 Pt, π) ≤ ε

SLIDE 56

Mixing Time

The mixing time is the the first time such that the total variation distance between and is at most , for any initial

τmix(ε) t Xt π ε X0

τmix(ε) = max

μ0

min

t≥0 dTV(μT 0 Pt, π) ≤ ε

τmix = τmix(1/4)

SLIDE 57

Random Walk on Hyper Cube

SLIDE 58

Random Walk on Hyper Cube

V = {0,1}n

SLIDE 59

Random Walk on Hyper Cube

V = {0,1}n
iff

x ∼ y ∥x − y∥1 = 1

SLIDE 60

Random Walk on Hyper Cube

V = {0,1}n
iff

x ∼ y ∥x − y∥1 = 1

Standing at

with prob. , do nothing
otherwise, choose

u.a.r and flip

x ∈ {0,1}n 1 2 i ∈ [n] x(i)

Lazy walk on G

SLIDE 61

SLIDE 62

The chain is equivalent to

SLIDE 63

The chain is equivalent to

choose

and u.a.r.

change

i ∈ [n] b ∈ {0,1} x(i) ← b

SLIDE 64

The chain is equivalent to

choose

and u.a.r.

change

i ∈ [n] b ∈ {0,1} x(i) ← b

Let and be two walks

Xt Yt

SLIDE 65

The chain is equivalent to

choose

and u.a.r.

change

i ∈ [n] b ∈ {0,1} x(i) ← b

Let and be two walks

Xt Yt

We couple them by choosing the same and

i b

SLIDE 66

SLIDE 67

What is the probability that ?

Xt ≠ Yt

SLIDE 68

What is the probability that ?

Xt ≠ Yt

Coupon Collector!

SLIDE 69

What is the probability that ?

Xt ≠ Yt

Coupon Collector! If , then

t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c

SLIDE 70

What is the probability that ?

Xt ≠ Yt

Coupon Collector! If , then

t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c

Coupling lemma implies that

SLIDE 71

What is the probability that ?

Xt ≠ Yt

Coupon Collector! If , then

t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c

Coupling lemma implies that τmix(ε) ≤ n log n + n log ε−1

SLIDE 72

Another Random Walk

SLIDE 73

Another Random Walk

Standing at

with prob.

, do nothing

otherwise, choose

u.a.r and flip

x ∈ {0,1}n 1 n + 1 i ∈ [n] x(i)

Lazy walk on G

SLIDE 74

Another Random Walk

Standing at

with prob.

, do nothing

otherwise, choose

u.a.r and flip

x ∈ {0,1}n 1 n + 1 i ∈ [n] x(i)

Lazy walk on G

A coupling argument implies τmix ≤ 1

2 n log n + O(n)

SLIDE 75

Reversible Chain

SLIDE 76

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

SLIDE 77

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x)

SLIDE 78

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x) Then is a stationary distribution of

π P

SLIDE 79

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x) Then is a stationary distribution of

π P

We showed that spectral decomposition is a powerful tool to analyze reversible chains

SLIDE 80

Relaxation Time

SLIDE 81

Relaxation Time

SLIDE 82

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

SLIDE 83

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆

SLIDE 84

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains:

SLIDE 85

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains: (τrel − 1)log ( 1 2ε) ≤ τmix(ε) ≤ τrel log ( 1 επmin)

SLIDE 86

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains: (τrel − 1)log ( 1 2ε) ≤ τmix(ε) ≤ τrel log ( 1 επmin)

πmin := min

x π(x)