sampling from distributive lattices the markov chain
play

Sampling from distributive lattices the Markov chain approach - PowerPoint PPT Presentation

Sampling from distributive lattices the Markov chain approach Graduiertenkolleg MDS TU Berlin April 20., 2009 Stefan Felsner Technische Universit at Berlin felsner@math.tu-berlin.de Topics Markov Chain Monte Carlo Coupling and CFTP


  1. Sampling from distributive lattices – the Markov chain approach Graduiertenkolleg MDS TU Berlin April 20., 2009 Stefan Felsner Technische Universit¨ at Berlin felsner@math.tu-berlin.de

  2. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  3. The Sampling Problem • Ω a (large) finite set • µ : Ω → [ 0, 1 ] a probability distribution Problem. Sample from Ω according to µ . i.e., Pr ( output = ω ) = µ ( ω ) .

  4. The Sampling Problem • Ω a (large) finite set • µ : Ω → [ 0, 1 ] a probability distribution Problem. Sample from Ω according to µ . i.e., Pr ( output = ω ) = µ ( ω ) . There are many hard instances of the sampling problem. Relaxation: Approximate sampling i.e., Pr ( output = ω ) = � µ ( ω ) for some � µ ≈ µ .

  5. Applications of Sampling • Get hand on typical examples from Ω . • Approximate counting.

  6. Preliminaries on Markov Chains M transition matrix • size Ω × Ω • entries ∈ [ 0, 1 ] • row sums = 1 (stochastic)

  7. Preliminaries on Markov Chains M transition matrix • size Ω × Ω • entries ∈ [ 0, 1 ] • row sums = 1 (stochastic) Intuition: a 2 1 2 0 3 1 3 3 3 1 1 1 M = 2 4 4 2 1 0 1 3 3 2 1 c b 4 1 1 2 3 4 3 M specifies a random walk

  8. Instance of a Markov Chains ( X 0 , X 1 , X 2 , . . . X r , . . . ) an instance of M • X i random variable with values in Ω • Pr ( X i + 1 = x | X i = s ) = M ( s, x ) Proposition. Probability distribution of X t is µ t with µ t = µ 0 M t

  9. Ergodic Markov Chains M is ergodic (i.e., irreducible and aperiodic) = ⇒ multiplicity of eigenvalue 1 is one ⇒ unique π with π = π M . = Fundamental Theorem. t →∞ µ 0 M t = π . M ergodic = lim ⇒

  10. Ergodic Markov Chains M is ergodic (i.e., irreducible and aperiodic) = ⇒ multiplicity of eigenvalue 1 is one ⇒ unique π with π = π M . = Fundamental Theorem. t →∞ µ 0 M t = π . M ergodic = lim ⇒ M symmetric and ergodic ⇒ M T ✶ T = M ✶ T = ✶ T , hence ✶ M = ✶ = = ⇒ π is the uniform distribution.

  11. Example: Linear Extensions A Markov chain for linear extensions L t = x 1 , x 2 , . . . , x n the state at time t . • Choose i ∈ { 1, 2, . . . , n − 1 } uniformly. • If x i and x i + 1 are incomparable, then L t + 1 = x 1 , x 2 , . . . , x i − 1 , x i + 1 , x i , x i + 2 , . . . , x n Proposition. The chain is ergodic and symmetric.

  12. Measuring Convergence Variation distance � � µ − µ ′ � VD := 1 | µ ( x ) − µ ′ ( x ) | 2 x ∈ Ω

  13. Measuring Convergence Variation distance � � µ − µ ′ � VD := 1 | µ ( x ) − µ ′ ( x ) | 2 x ∈ Ω � µ − µ ′ � VD = max A ⊂ Ω ( µ ( A ) − µ ′ ( A )) Lemma. � µ = � µ ′ = 1 µ ′ B µ A ⇒ � A = � B =

  14. Mixing Time x = δ x M t the distrib. after t steps starting in x µ t ∆ ( t ) := max ( � µ t x − π � VD : x ∈ Ω ) τ ( ε ) = min ( t : ∆ ( t ) ≤ ε ) • τ ( ε ) is the mixing time . • M is rapidly mixing ⇒ τ ( ε ) is a polynomial function ⇐ of the problem size and log ( ε − 1 ) .

  15. Mixing Time and Eigenvalues • M stochastic = ⇒ | λ | ≤ 1 for all eigenvalues λ . • M lazy (i.e., m i,i ≥ 1/2 for all i ) = ⇒ λ ≥ 0 for all eigenvalues λ . • M ergodic = ⇒ multiplicity of eigenvalue 1 is one. • M symmetric ⇒ ONB of eigenvectors. = Proposition. Mixing time, i.e., Convergence rate to π , depends on second largest eigenvalue.

  16. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  17. Coupling for Distributions µ , ν distributions on Ω . A distribution ω on Ω × Ω is a coupling of µ and ν ⇒ ω has µ and ν as marginals, i.e., ⇐ � y ω ( x, y ) = µ ( x ) for all x and � x ω ( x, y ) = ν ( y ) for all y . Coupling Lemma. ω a coupling of µ and ν and ( X, Y ) chosen from ω then � µ − ν � VD ≤ Pr ( X � = Y ) .

  18. Coupling for Distributions Lemma. � µ − ν � VD ≤ Pr ( X � = Y ) . We use µ ( z ) = � Proof. y ω ( z, y ) ≥ ω ( z, z ) ν ( z ) = � x ω ( x, z ) ≥ ω ( z, z ) . Pr ( X � = Y ) = 1 − Pr ( X = Y ) � � = µ ( z ) − ω ( z, z ) z z � � ≥ µ ( z ) − min ( µ ( z ) , ν ( z )) z z � = µ ( z ) − ν ( z ) z : ν ≤ µ � � = max µ ( A ) − ν ( A ) = � µ − ν � VD A ⊂ Ω

  19. Coupling for Markov Chains A coupling for M is a sequence ( Z 0 , Z 1 , Z 2 , . . . ) with Z i = ( X i , Y i ) such that ( X 0 , X 1 , X 2 , . . . ) and ( Y 0 , Y 1 , Y 2 , . . . ) are instances for M . In particular Pr ( X i + 1 = x ′ | Z i = ( x, y )) = Pr ( X i + 1 = x ′ | X i = x ) M ( x, x ′ ) =

  20. Coupling and Mixing Times Z i = ( X i , Y i ) a coupling for M . Theorem [ D¨ oblin 1938 ]. � � If Pr X T � = Y T | Z 0 = ( x 0 , y 0 ) < ε for every initial ( x 0 , y 0 ) and T steps = ⇒ τ ( ε ) ≤ T Proof. Choose y 0 from stationary distribution π Y t is in stationary distribution π for all t X t is in distribution µ t x 0 . � � Pr X T � = Y T | Z 0 = ( x 0 , y 0 ) < ε ⇒ max x � µ T Coupling Lemma = x − π � VD < ε definition of τ = ⇒ τ ( ε ) ≤ T

  21. Example : Linear Extensions of Width 2 Orders 4 8 3 3 4 7 2 1 2 6 5 1 6 7 8 5 Linear extensions are paths. The Markov chain and the coupling • choose position k and s ∈ { ↑ , ↓ } • Flip the path at position k in direction s (if possible)

  22. Linear Extensions of Width 2 Orders the Analysis • dist ( X, Y ) = Area between paths ≤ n 2 • E ( dist ( X i + 1 , Y i + 1 )) ≤ dist ( X i , Y i ) The distance is a projection to a random walk on the line ⇒ expected coupling time O ( n 4 log n ). = ⇒ τ ( ε ) ∈ O ( n 4 log n log ε − 1 ) . =

  23. Coupling From the Past M a Markov chain on Ω F a family of maps f : Ω → Ω such that for random f ∈ F : Pr ( f ( x ) = x ′ ) = M ( x, x ′ )

  24. Coupling From the Past M a Markov chain on Ω F a family of maps f : Ω → Ω such that for random f ∈ F : Pr ( f ( x ) = x ′ ) = M ( x, x ′ ) Coupling-FTP F ← id Ω repeat choose f ∈ F at random F ← F ◦ f until F is a constant map return F ( x )

  25. Coupling From the Past

  26. Coupling From the Past Theorem. The state returned by Coupling-FTP is exactly( ! ) in the stationary distribution.

  27. Monotone Coupling From the Past: An Example The problem with CFTP is the need of functions f on Ω .

  28. Monotone Coupling From the Past: An Example The problem with CFTP is the need of functions f on Ω . Order relation < Ω on Ω with ^ 0 and ^ 1 • x < Ω x ′ = ⇒ f ( x ) < Ω f ( x ′ ) for all f ∈ F Example: Objects: Lattice path in a grid F = { f k,s : apply position k and direction s to all paths } This family is monotone!

  29. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  30. Distributive Lattices Fact. L is a finite distributive lattice ⇐ ⇒ there is a poset P such that that L is isomorphic to the inclusion order on downsets of P . L P P 4 5 6 1 2 3

  31. Markov Chains on Distributive Lattices A natural Markov chain on L P ( lattice walk ): Identify state with downset D • choose x ∈ P choose s ∈ { ↑ , ↓ } • depending on s move to D + x or D − x (if possible) Fact. The chain is ergodic and symmetric, i.e, π is uniform.

  32. Monotone Coupling on Distributive Lattices The coupling family F : f x,s : Use element x and direction s for all D . Is monotone! = ⇒ uniform sampling from distributive lattices is easy.

  33. Monotone Coupling on Distributive Lattices The coupling family F : f x,s : Use element x and direction s for all D . Is monotone! = ⇒ uniform sampling from distributive lattices is easy. Q: Is it fast (rapidly mixing)? A: In most cases not.

  34. Slow Mixing • On distributive lattices based on Kleitman-Rothschild posets the mixing time of the lattice walk is exponential. • The mixing time of the lattice walk is exponential for random bipartite graphs with degrees ≥ 6 . (Dyer, Frieze and Jerrum)

  35. Fast Mixing • The mixing time of the lattice walk is polynomial for random bipartite graphs with max-degree ≤ 4 . (Dyer and Greenhill) In several situations where planarity plays a role rapid mixing could be proven: • Monotone paths in the grid. • Lozenge tilings of an a × b × c hexagon. • Domino tilings of a rectangle.

  36. Topics Markov Chain Monte Carlo Coupling and CFTP Distributive Lattices α -Orientations and Heights Block Coupling for Heights

  37. alpha-Orientations Definition. Given G = ( V, E ) and α : V → IN . An α -orientation of G is an orientation with outdeg ( v ) = α ( v ) for all v . Example. Two orientations for the same α .

  38. Potentials and Lattice Structure Definition. An α -potential for G is a mapping ℘ : Faces ( G ) → Z Z such that ℘ ( outer ) = 0 and • | ℘ ( C ) − ℘ ( C ′ ) | ≤ 1 , if C and C ′ share an edge e . • ℘ ( C l ( e ) ) ≤ ℘ ( C r ( e ) ) for all e relative to some fixed α -orientation. Lemma. There is a bijection between α -potentials and α -orientations.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend