what can be sampled loca y
play

What can be sampled loca ! y ? Yitong Yin Nanjing University Joint - PowerPoint PPT Presentation

What can be sampled loca ! y ? Yitong Yin Nanjing University Joint work with: W eiming Feng, Y uxin Sun Local Computation What can be computed locally? [Noar, Stockmeyer, STOC93, SICOMP95] the LOCAL model [Linial 87] :


  1. What can be sampled loca ! y ? Yitong Yin Nanjing University Joint work with: W eiming Feng, Y uxin Sun

  2. Local Computation “ What can be computed locally? ” [Noar, Stockmeyer, STOC’93, SICOMP’95] the LOCAL model [Linial ’87] : • Communications are synchronized. • In each round: each node can send messages of unbounded sizes to all its neighbors. • Local computations are free. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t .

  3. Local Computation the LOCAL model [Linial ’87] : • In t rounds: each node can collect information up to distance t . Locally Checkable Labeling ( LCL ) problems [Noar, Stockmeyer ’93] : • CSPs with local constraints. • Construct a feasible solution: vertex/edge coloring, Lovász local lemma • Find local optimum: MIS, MM • Approximate global optimum: maximum matching, minimum vertex network G ( V , E ) cover, minimum dominating set Q: “What locally definable problems are locally computable?” in O(1) rounds by local constraints or in small number of rounds

  4. “What can be sampled locally?” • CSP with local constraints network G ( V , E ): on the network: • proper q -coloring; • independent set; • Sample a uniform random feasible solution: • distributed algorithms (in the LOCAL model) Q: “What locally definable joint distributions are locally sample - able?”

  5. Markov Random Fields (MRF) • Each vertex corresponds to a network G ( V , E ): variable with finite domain [ q ] . • Each edge e =( u , v ) ∈ E imposes a X v ∈ [ q ] weighted binary constraint: A e v u b v A e : [ q ] 2 → R ≥ 0 • Each vertex v ∈ E imposes a weighted unary constraint: b v : [ q ] → R ≥ 0 • Gibbs distribution µ : ∀ σ ∈ [ q ] V X ∈ [ q ] V follows µ ~ Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) e =( u,v ) ∈ E v ∈ V

  6. Markov Random Fields (MRF) • Gibbs distribution µ : ∀ σ ∈ [ q ] V network G ( V , E ): Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) e =( u,v ) ∈ E v ∈ V X v ∈ [ q ] • proper q -coloring: A e v u b v   0 1   1 0 .   . b v = A e =     ... . 1       1 0 • independent set:  �  � 1 1 1 A e = b v = X ∈ [ q ] V follows µ ~ 1 0 1 • local conflict colorings: [Fraigniaud, Heinrich, Kosowski FOCS’16] A e ∈ { 0 , 1 } q × q , b v ∈ { 0 , 1 } q

  7. A Motivation: Distributed Machine Learning • Data are stored in a distributed system. • Sampling from a probabilistic graphical model (e.g. the Markov random field ) by distributed algorithms.

  8. Glauber Dynamics G ( V , E ): starting from an arbitrary X 0 ∈ [ q ] V transition for X t → X t+ 1 : b v A e v v pick a uniform random vertex v ; resample X ( v ) according to the marginal distribution induced by µ at vertex v conditioning on X t ( N ( v )) ; marginal distribution: MRF: ∀ σ ∈ [ q ] V , b v ( x ) Q u ∈ N ( v ) A ( u,v ) ( X u , x ) Pr[ X v = x | X N ( v ) ] = P y ∈ [ q ] b v ( y ) Q u ∈ N ( v ) A ( u,v ) ( X u , y ) Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) stationary distribution: µ e =( u,v ) ∈ E v ∈ V mixing time: � 1 τ mix = max X 0 min t | d TV ( X t , µ ) ≤ 2e

  9. Mixing of Glauber Dynamics influence matrix : { ρ v,u } v,u ∈ V v ρ v,u : max discrepancy (in total variation distance) of marginal distributions at v caused by any pair σ , τ of boundary conditions that differ only at u u contraction of one-step Dobrushin’s condition: optimal coupling in the worst X k ρ k ∞ = max ⇢ v,u  1 � ✏ case w.r.t. Hamming distance v ∈ V u ∈ V Theorem ( Dobrushin ’70; Salas, Sokal ’97 ) : Dobrushin’s τ mix = O ( n log n ) condition for Glauber dynamics q ≥ (2+ ε ) Δ for q -coloring: Dobrushin’s condition Δ = max-degree

  10. Parallelization Glauber dynamics: G ( V , E ): starting from an arbitrary X 0 ∈ [ q ] V transition for X t → X t+ 1 : v v pick a uniform random vertex v ; resample X ( v ) according to the marginal distribution induced by µ at vertex v conditioning on X t ( N ( v )) ; Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al. , AISTAT’11] : Vertices in the same color class are updated in parallel. • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16] : All vertices are updated in parallel, ignoring concurrency issues.

  11. Warm-up: When Luby meets Glauber starting from an arbitrary X 0 ∈ [ q ] V G ( V , E ): at each step, for each vertex v ∈ V : independently sample a random number β v ∈ [0,1] ; Luby step if β v is locally maximum among its neighborhood N ( v ) : resample X ( v ) according to the Glauber marginal distribution induced by µ at step vertex v conditioning on X t ( N ( v )) ; • Luby step: Independently sample a random independent set. • Glauber step: For independent set vertices, update correctly according to the current marginal distributions. • Stationary distribution: the Gibbs distribution µ.

  12. Mixing of LubyGlauber influence matrix { ρ v,u } v,u ∈ V v Dobrushin’s condition: X k ρ k ∞ = max ⇢ v,u  1 � ✏ u v ∈ V u ∈ V Theorem ( Dobrushin ’70; Salas, Sokal ’97 ) : Dobrushin’s τ mix = O ( n log n ) condition for Glauber dynamics Dobrushin’s τ mix = O ( ∆ log n ) condition for the LubyGlauber chain

  13. influence matrix { ρ v,u } v,u ∈ V v Dobrushin’s condition: X k ρ k ∞ = max ⇢ v,u  1 � ✏ u v ∈ V u ∈ V Dobrushin’s τ mix = O ( ∆ log n ) condition for the LubyGlauber chain Proof (similar to [Hayes’04] [Dyer-Goldberg-Jerrum’06] ) : in the one-step optimal coupling ( X t , Y t ) , let p ( t ) = Pr[ X t ( v ) 6 = Y t ( v )] v p ( t +1) ≤ M p ( t ) where M = ( I − D ) + D ρ Pr[ X t 6 = Y t ] k p ( t ) k 1 D is diagonal and  n k p ( t ) k ∞ ∞ k p (0) k ∞ D v,v = Pr[ v is picked in Luby step]  n k M k t 1 ◆ t ✓ ≥ ✏ deg( v ) + 1  n 1 � ∆ + 1

  14. Crossing the Chromatic # Barrier Glauber LubyGlauber O( n log n ) O( Δ log n ) parallel speedup = θ ( n / Δ ) ∆ = max-degree χ = chromatic no. Do not update adjacent vertices simultaneously. It takes ≥ χ steps to update all vertices at least once. Q: “How to update all variables simultaneously and still converge to the correct distribution?”

  15. The LocalMetropolis Chain proposals: σ w σ u σ v u v w current: X u X v X w starting from an arbitrary X ∈ [ q ] V , at each step: each vertex v ∈ V independently proposes a random a collective σ v ∈ [ q ] with probability ; b v ( σ v ) / P i ∈ [ q ] b v ( i ) coin flipping each edge e =( u , v ) passes its check independently made between with prob. ; u and v i,j ∈ [ q ] ( A e ( i, j )) 3 A e ( X u , σ v ) A e ( σ u , X v ) A e ( σ u , σ v ) / max each vertex v ∈ V accepts its proposal and update X v to σ v if all incident edges pass their checks ; • [Feng, Sun, Y. ’17] : the LocalMetropolis chain is time-reversible w.r.t. the MRF Gibbs distribution µ .

  16. Detailed Balance Equation: ∀ X, Y ∈ [ q ] V , µ ( X ) P ( X, Y ) = µ ( Y ) P ( Y, X ) σ ∈ [ q ] V : the proposals of all vertices C ∈ { 0 , 1 } E : indicates whether each edge e ∈ E passes its check Ω X → Y , { ( σ , C ) | X → Y when the random choice is ( σ , C ) } P ( σ , C ) ∈ Ω X → Y Pr( σ )Pr( C | σ , X ) = µ ( Y ) P ( X, Y ) P ( Y, X ) = µ ( X ) P ( σ , C ) ∈ Ω Y → X Pr( σ )Pr( C | σ , Y ) Bijection is constructed as: φ X,Y : Ω X → Y → Ω Y → X C = C 0 ⇢ φ X,Y s.t. ! ( σ 0 , C 0 ) ( σ , C ) if for all e incident with v , then σ 0 7� C e = 1 v = X v otherwise σ 0 v = σ v Pr( σ )Pr( C | σ , X ) b v ( Y v ) A e ( X u , X v ) = µ ( Y ) A e ( Y u , Y v ) Y Y Pr( σ 0 )Pr( C 0 | σ 0 , Y ) = b v ( X v ) µ ( X ) v 2 V e = uv 2 E

  17. The LocalMetropolis Chain proposals: σ w σ u σ v u v w current: X u X v X w starting from an arbitrary X ∈ [ q ] V , at each step: each vertex v ∈ V independently proposes a random a collective σ v ∈ [ q ] with probability ; b v ( σ v ) / P i ∈ [ q ] b v ( i ) coin flipping each edge e =( u , v ) passes its check independently made between with prob. ; u and v i,j ∈ [ q ] ( A e ( i, j )) 3 A e ( X u , σ v ) A e ( σ u , X v ) A e ( σ u , σ v ) / max each vertex v ∈ V accepts its proposal and update X v to σ v if all incident edges pass their checks ; • [Feng, Sun, Y. ’17] : the LocalMetropolis chain is time-reversible w.r.t. the MRF Gibbs distribution µ .

  18. LocalMetropolis for Hardcore model the hardcore model on G ( V , E ) with fugacity λ : λ | I | ∀ independent set I in G : µ ( I ) = I : IS in G λ | I | P starting from an arbitrary X ∈ {0,1} V , with 1 indicating occupied at each step, each vertex v ∈ V : proposes a random σ v ∈ {0,1} independently ( λ 1 with probability 1+ λ , σ v = 1 0 with probability 1+ λ ; accepts the proposal and update X v to σ v unless for some neighbor u of v : X u = σ v =1 or σ u =X v =1 or σ u = σ v =1 ; • λ < 1/ Δ : τ mix = O(log n ), even for unbounded Δ .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend