Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao - - PowerPoint PPT Presentation

advanced algorithms xiii
SMART_READER_LITE
LIVE PREVIEW

Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao - - PowerPoint PPT Presentation

Advanced Algorithms (XIII) Shanghai Jiao Tong University Chihao Zhang June 1, 2020 Total Variation Distance Total Variation Distance Let and be two distributions on Total Variation Distance Let and be two distributions on


slide-1
SLIDE 1

Advanced Algorithms (XIII)

Shanghai Jiao Tong University

Chihao Zhang

June 1, 2020

slide-2
SLIDE 2

Total Variation Distance

slide-3
SLIDE 3

Total Variation Distance

Let and be two distributions on

μ ν Ω

slide-4
SLIDE 4

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is

slide-5
SLIDE 5

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is dTV(μ, ν) = 1 2 ∑

x∈Ω

μ(x) − ν(x) = max

A⊆Ω μ(A) − ν(A)

slide-6
SLIDE 6

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is dTV(μ, ν) = 1 2 ∑

x∈Ω

μ(x) − ν(x) = max

A⊆Ω μ(A) − ν(A)

A

μ ν

slide-7
SLIDE 7

Total Variation Distance

Let and be two distributions on

μ ν Ω

Their total variation distance is dTV(μ, ν) = 1 2 ∑

x∈Ω

μ(x) − ν(x) = max

A⊆Ω μ(A) − ν(A)

A

μ ν

  • distance scaled by

ℓ1 1 2

slide-8
SLIDE 8

Coupling

slide-9
SLIDE 9

Coupling

Let and be two distributions on

μ ν Ω

slide-10
SLIDE 10

Coupling

Let and be two distributions on

μ ν Ω

A coupling of and is a joint distribution on such that:

μ ν ω Ω × Ω

slide-11
SLIDE 11

Coupling

Let and be two distributions on

μ ν Ω

A coupling of and is a joint distribution on such that:

μ ν ω Ω × Ω

∀x ∈ Ω, μ(x) = ∑

y∈Ω

ω(x, y)

slide-12
SLIDE 12

Coupling

Let and be two distributions on

μ ν Ω

A coupling of and is a joint distribution on such that:

μ ν ω Ω × Ω

∀x ∈ Ω, μ(x) = ∑

y∈Ω

ω(x, y) ∀y ∈ Ω, ν(x) = ∑

x∈Ω

ω(x, y)

slide-13
SLIDE 13

Coupling Lemma

slide-14
SLIDE 14

Coupling Lemma

Let be a coupling of and

ω μ ν

slide-15
SLIDE 15

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

slide-16
SLIDE 16

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

Then Pr

(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)

slide-17
SLIDE 17

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

Then Pr

(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)

Moreover, there exists such that

ω*

slide-18
SLIDE 18

Coupling Lemma

Let be a coupling of and

ω μ ν

and

(X, Y) ∼ ω ⟹ X ∼ μ Y ∼ ν

Then Pr

(X,Y)∼ω [X ≠ Y] ≥ dTV(μ, ν)

Moreover, there exists such that

ω*

Pr

(X,Y)∼ω* [X ≠ Y] = dTV(μ, ν)

slide-19
SLIDE 19

Proof of Coupling Lemma

slide-20
SLIDE 20

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

slide-21
SLIDE 21

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

slide-22
SLIDE 22

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3

slide-23
SLIDE 23

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3

slide-24
SLIDE 24

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2

slide-25
SLIDE 25

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2

slide-26
SLIDE 26

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2 1 6

slide-27
SLIDE 27

Proof of Coupling Lemma

For finite , designing a coupling is equivalent to filling a matrix so that the marginals are correct

Ω Ω × Ω

Ω = {1,2}, μ = (1/2,1/2), ν = (1/3,2/3)

μ ν

1 2 1 2 1 3 2 3 1 3 1 2 1 6

is the one maximizing the sum of diagonals

ω*

slide-28
SLIDE 28

Coupling of Markov Chains

slide-29
SLIDE 29

Coupling of Markov Chains

Consider two copies of the chain :

P

slide-30
SLIDE 30

Coupling of Markov Chains

Consider two copies of the chain :

P

  • The initial distribution is

and

  • and

μ0 ν0 μT

t = μT 0 Pt

νT

t = νT 0 Pt

slide-31
SLIDE 31

Coupling of Markov Chains

Consider two copies of the chain :

P

A coupling of the two chains is joint distribution

  • f

and satisfying the following conditions

ω {μt}t≥0 {νt}t≥0

  • The initial distribution is

and

  • and

μ0 ν0 μT

t = μT 0 Pt

νT

t = νT 0 Pt

slide-32
SLIDE 32
slide-33
SLIDE 33

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

slide-34
SLIDE 34

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b)

slide-35
SLIDE 35

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

slide-36
SLIDE 36

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

Marginally, and are both chain

{Xt} {Yt} P

slide-37
SLIDE 37

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

Marginally, and are both chain

{Xt} {Yt} P

∀t ≥ 0, Xt = Yt ⟹ Xt′ = Yt′ for all t′ > t

slide-38
SLIDE 38

is a pair of processes such that

{(Xt, Yt)}t≥0 ∼ ω

∀a, b ∈ Ω, Pr[Xt+1 = b ∣ Xt = a] = P(a, b) ∀a, b ∈ Ω, Pr[Yt+1 = b ∣ Xt = a] = P(a, b)

Marginally, and are both chain

{Xt} {Yt} P

∀t ≥ 0, Xt = Yt ⟹ Xt′ = Yt′ for all t′ > t

Two chains coalesce once they meet

slide-39
SLIDE 39

Fundamental Theorem via Coupling

slide-40
SLIDE 40

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

slide-41
SLIDE 41

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

Consider two chains and

{Xt}t≥0 {Yt}t≥0

slide-42
SLIDE 42

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

Consider two chains and

{Xt}t≥0 {Yt}t≥0

  • ,

for arbitrary

X0 ∼ π Y0 ∼ μ0 μ0

slide-43
SLIDE 43

Fundamental Theorem via Coupling

If a finite chain is irreducible and aperiodic, then it has a unique stationary distribution . Moreover, for any initial distribution , it holds that

P π μ

lim

t→∞ μTPt = πT

Consider two chains and

{Xt}t≥0 {Yt}t≥0

  • ,

for arbitrary

X0 ∼ π Y0 ∼ μ0 μ0

  • A coupling where

and run independently

Xt Yt

slide-44
SLIDE 44
slide-45
SLIDE 45

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

slide-46
SLIDE 46

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

slide-47
SLIDE 47

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

slide-48
SLIDE 48

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[Xt ≠ Yt] ≤ 1 − θ < 1

slide-49
SLIDE 49

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

slide-50
SLIDE 50

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

slide-51
SLIDE 51

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

Pr[Xkt ≠ Ykt] ≤ (1 − θ)k

slide-52
SLIDE 52

irreducible + aperiodic

⟹ ∃t, ∀x, y, Pt(x, y) > 0

Pr[Xt = Yt] ≥ Pr[Xt = Yt = z] = Pr[Xt = z] ⋅ Pr[Yt = z] = π(z) ⋅ Pt(Y0, z) ≥ θ > 0

Then for any , there exists some s.t.

z ∈ Ω θ > 0

Pr[X2t ≠ Y2t] = Pr[X2t ≠ Y2t ∧ Xt = Yt] + Pr[X2t ≠ Y2t ∧ Xt ≠ Yt] = Pr[X2t ≠ Y2t ∣ Xt ≠ Yt] ⋅ Pr[Xt ≠ Yt] ≤ (1 − θ)2 Pr[Xt ≠ Yt] ≤ 1 − θ < 1

Pr[Xkt ≠ Ykt] ≤ (1 − θ)k

lim

n→∞ Pr[Xn ≠ Yn] = 0

slide-53
SLIDE 53

Mixing Time

slide-54
SLIDE 54

Mixing Time

The mixing time is the the first time such that the total variation distance between and is at most , for any initial

τmix(ε) t Xt π ε X0

slide-55
SLIDE 55

Mixing Time

The mixing time is the the first time such that the total variation distance between and is at most , for any initial

τmix(ε) t Xt π ε X0

τmix(ε) = max

μ0

min

t≥0 dTV(μT 0 Pt, π) ≤ ε

slide-56
SLIDE 56

Mixing Time

The mixing time is the the first time such that the total variation distance between and is at most , for any initial

τmix(ε) t Xt π ε X0

τmix(ε) = max

μ0

min

t≥0 dTV(μT 0 Pt, π) ≤ ε

τmix = τmix(1/4)

slide-57
SLIDE 57

Random Walk on Hyper Cube

slide-58
SLIDE 58

Random Walk on Hyper Cube

  • V = {0,1}n
slide-59
SLIDE 59

Random Walk on Hyper Cube

  • V = {0,1}n
  • iff

x ∼ y ∥x − y∥1 = 1

slide-60
SLIDE 60

Random Walk on Hyper Cube

  • V = {0,1}n
  • iff

x ∼ y ∥x − y∥1 = 1

Standing at

  • with prob. , do nothing
  • otherwise, choose

u.a.r and flip

x ∈ {0,1}n 1 2 i ∈ [n] x(i)

Lazy walk on G

slide-61
SLIDE 61
slide-62
SLIDE 62

The chain is equivalent to

slide-63
SLIDE 63

The chain is equivalent to

  • choose

and u.a.r.

  • change

i ∈ [n] b ∈ {0,1} x(i) ← b

slide-64
SLIDE 64

The chain is equivalent to

  • choose

and u.a.r.

  • change

i ∈ [n] b ∈ {0,1} x(i) ← b

Let and be two walks

Xt Yt

slide-65
SLIDE 65

The chain is equivalent to

  • choose

and u.a.r.

  • change

i ∈ [n] b ∈ {0,1} x(i) ← b

Let and be two walks

Xt Yt

We couple them by choosing the same and

i b

slide-66
SLIDE 66
slide-67
SLIDE 67

What is the probability that ?

Xt ≠ Yt

slide-68
SLIDE 68

What is the probability that ?

Xt ≠ Yt

Coupon Collector!

slide-69
SLIDE 69

What is the probability that ?

Xt ≠ Yt

Coupon Collector! If , then

t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c

slide-70
SLIDE 70

What is the probability that ?

Xt ≠ Yt

Coupon Collector! If , then

t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c

Coupling lemma implies that

slide-71
SLIDE 71

What is the probability that ?

Xt ≠ Yt

Coupon Collector! If , then

t ≥ n log n + cn Pr[Xt ≠ Yt] ≤ e−c

Coupling lemma implies that τmix(ε) ≤ n log n + n log ε−1

slide-72
SLIDE 72

Another Random Walk

slide-73
SLIDE 73

Another Random Walk

Standing at

  • with prob.

, do nothing

  • otherwise, choose

u.a.r and flip

x ∈ {0,1}n 1 n + 1 i ∈ [n] x(i)

Lazy walk on G

slide-74
SLIDE 74

Another Random Walk

Standing at

  • with prob.

, do nothing

  • otherwise, choose

u.a.r and flip

x ∈ {0,1}n 1 n + 1 i ∈ [n] x(i)

Lazy walk on G

A coupling argument implies τmix ≤ 1

2 n log n + O(n)

slide-75
SLIDE 75

Reversible Chain

slide-76
SLIDE 76

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

slide-77
SLIDE 77

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x)

slide-78
SLIDE 78

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x) Then is a stationary distribution of

π P

slide-79
SLIDE 79

Reversible Chain

Recall that we say a Markov chain is reversible with respect to if

P π

∀x, y ∈ Ω, π(x)P(x, y) = π(y)P(y, x) Then is a stationary distribution of

π P

We showed that spectral decomposition is a powerful tool to analyze reversible chains

slide-80
SLIDE 80

Relaxation Time

slide-81
SLIDE 81

Relaxation Time

slide-82
SLIDE 82

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

slide-83
SLIDE 83

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆

slide-84
SLIDE 84

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains:

slide-85
SLIDE 85

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains: (τrel − 1)log ( 1 2ε) ≤ τmix(ε) ≤ τrel log ( 1 επmin)

slide-86
SLIDE 86

Relaxation Time

λ⋆ := max

1≤i≤n−1 |λi|

τrel := 1 1 − λ⋆ For reversible, irreducible, aperiodic chains: (τrel − 1)log ( 1 2ε) ≤ τmix(ε) ≤ τrel log ( 1 επmin)

πmin := min

x π(x)