local distributed sampling om locally defined distribution
play

Local Distributed Sampling ! om Locally - Defined Distribution - PowerPoint PPT Presentation

Local Distributed Sampling ! om Locally - Defined Distribution Yitong Yin Nanjing University Counting and Sampling [Jerrum-Valiant-Vazirani 86]: (For self-reducible problems) approx. counting (approx., exact) sampling is tractable is


  1. Local Distributed Sampling ! om Locally - Defined Distribution Yitong Yin Nanjing University

  2. Counting and Sampling [Jerrum-Valiant-Vazirani ’86]: (For self-reducible problems) approx. counting (approx., exact) sampling is tractable is tractable

  3. Computational Phase Transition Sampling almost-uniform independent set in graphs with maximum degree ∆ : • [Weitz 2006] : If ∆≤ 5 , poly-time. • [Sly 2010] : If ∆≥ 6 , no poly-time algorithm unless NP = RP . A phase transition occurs when ∆ : 5 → 6 . Local Computation?

  4. Local Computation “ What can be computed locally? ” [Naor, Stockmeyer ’93] the LOCAL model [Linial ’87] : • Communications are synchronized. • In each round: each node can exchange unbounded messages with all neighbors, perform unbounded local computation, and read/write to unbounded local memory. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t . PLOCAL: t = polylog( n )

  5. A Motivation: Distributed Machine Learning • Data are stored in a distributed system. • Distributed algorithms for: • sampling from a joint distribution (specified by a probabilistic graphical model ); • inferring according to a probabilistic graphical model.

  6. Example : Sample Independent Set µ : uniform distribution of independent sets in G . Y ∈ {0,1} V indicates an independent set • Each v ∈ V returns a Y v ∈ {0,1} , such that Y = ( Y v ) v ∈ V ∼ µ • Or: d TV ( Y , µ ) < 1/poly( n ) network G ( V , E )

  7. Inference (Local Counting) µ : uniform distribution of independent sets in G . : marginal distribution at v conditioning on σ ∈ {0,1} S . µ σ v ∀ y ∈ { 0 , 1 } : v ( y ) = Pr Y ∼ µ [ Y v = y | Y S = σ ] µ σ 0 • Each v ∈ S receives σ v as input. • Each v ∈ V returns a marginal 0 distribution such that: µ σ ˆ v 1 1 1 d TV (ˆ v ) ≤ µ σ v , µ σ poly( n ) n 1 Y Z = µ ( ∅ ) = Y ∼ µ [ Y v i = 0 | ∀ j < i : Y v j = 0] Pr network G ( V , E ) i =1 Z : # of independent sets

  8. Decay of Correlation : marginal distribution at v conditioning on σ ∈ {0,1} S . µ σ v strong spatial mixing (SSM): ∀ boundary condition B ∈ {0,1} r -sphere( v ) : v , µ σ ,B d TV ( µ σ ) ≤ poly( n ) · exp( − Ω ( r )) v SSM (iff ∆≤ 5 when µ is uniform G distribution of ind. sets) r approx. inference is solvable v B in O(log n ) rounds σ in the LOCAL model

  9. Gibbs Distribution (with pairwise interactions) • Each vertex corresponds to a network G ( V , E ): variable with finite domain [ q ] . • Each edge e =( u , v ) ∈ E has a matrix (binary constraint): A e v u b v A e : [ q ] × [ q ] → [0,1] • Each vertex v ∈ V has a vector (unary constraint): b v : [ q ] → [0,1] • Gibbs distribution µ : ∀ σ ∈ [ q ] V Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) e =( u,v ) ∈ E v ∈ V

  10. Gibbs Distribution (with pairwise interactions) • Gibbs distribution µ : ∀ σ ∈ [ q ] V network G ( V , E ): Y Y µ ( σ ) ∝ A e ( σ u , σ v ) b v ( σ v ) e =( u,v ) ∈ E v ∈ V • independent set: A e v u b v  1 �  1 � 1 A e = b v = 1 0 1 • coloring:   0   1 1 A e : [ q ] × [ q ] → [0,1] 0 .   . b v = A e =     ... .   1   b v : [ q ] → [0,1]   1 0

  11. Gibbs Distribution • Gibbs distribution µ : ∀ σ ∈ [ q ] V network G ( V , E ): Y µ ( σ ) ∝ f ( σ S ) ( f,S ) ∈ F each ( f, S ) ∈ F S is a local constraints (factors): f : [ q ] S → R ≥ 0 S ⊆ V with diam G ( S ) = O (1)

  12. Locality of Counting & Sampling For Gibbs distributions (defined by local factors): Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling easy with additive error O(log 2 n ) factor local approx. local exact inference sampling with multiplicative error distributed Las Vegas sampler

  13. Locality of Sampling Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling return a random Y = ( Y v ) v ∈ V each v can compute a µ σ ˆ v within O(log n ) -ball whose distribution ˆ µ ≈ µ 1 s.t. 1 d TV (ˆ µ, µ ) ≤ d TV (ˆ v ) ≤ µ σ v , µ σ poly( n ) poly( n ) sequential O(log n ) -local procedure: • scan vertices in V in an arbitrary order v 1 , v 2 , …, v n • for i =1,2, …, n : sample according to Y v 1 ,...,Y vi − 1 Y v i ˆ µ v i

  14. Network Decomposition C colors ( C , D ) -network-decomposition of G : • classifies vertices into clusters; • assign each cluster a color in [ C ] ; • each cluster has diameter ≤ D ; r • clusters are properly colored. rD ( C , D ) r -ND: ( C , D ) -ND of G r Given a ( C , D ) r - ND: sequential r -local procedure: r = O(log n ) r = O(log n ) • scan vertices in V in an arbitrary order v 1 , v 2 , …, v n • for i =1,2, …, n : sample according to Y v 1 ,...,Y vi − 1 Y v i ˆ µ v i can be simulated in O( CDr ) rounds in LOCAL model

  15. Network Decomposition O(log n ) colors ( C , D ) -network-decomposition of G : • classifies vertices into clusters; • assign each cluster a color in [ C ] ; • each cluster has diameter ≤ D ; O(log n ) • clusters are properly colored. O(log 2 n ) ( C , D ) r -ND: ( C , D ) -ND of G r ( O(log n ), O(log n )) r -ND can be constructed in O( r log 2 n ) rounds w.h.p. [Linial, Saks, 1993] — [Ghaffari, Kuhn, Maus, 2017]: r -local SLOCAL algorithm: O( r log 2 n ) -round LOCAL alg.: ND ∀ ordering π =( v 1 , v 2 , …, v n ) , returns w.h.p. the Y ( π ) for some ordering π returns random vector Y ( π )

  16. Locality of Sampling Inference: Sampling: Correlation Decay: O(log n )- round O(log 3 n )- round local approx. local approx. SSM inference sampling with additive error local approx. local exact inference sampling with multiplicative error distributed Las Vegas sampler

  17. An LLL -like Framework independent random variables: X 1 , …, X n with domain Ω A : a set of bad events variable set ( vbl ( A ) ⊆ [ n ] each is associated with A ∈ A { } q A : Ω vbl ( A ) → [0 , 1] function variable-framework Lovász local lemma Rejection sampling: ( with conditionally mutually independent filters ) • X 1 , …, X n are drawn independently; • each occurs independently with prob. ; � � A ∈ A 1 − q A X vbl ( A ) • the sample is accepted if none of occurs. A ∈ A Target distribution D * : X 1 , …, X n conditioned on accepted Partial rejection sampling [Guo-Jerrum-Liu’17] : resample not all variables Resample variables local to the errors? (Moser-Tardos)

  18. Local Rejection Sampling • draw independent samples of X = ( X 1 , …, X n ) ; • each occurs ( violated ) ind. with Pr[ A ]=1- q A ( X vbl ( A ) ) ; A ∈ A • while there is a violated bad event : X old ← current X A ∈ A • resample all variables in vbl ( A ) for violated A ; • for violated A : violate A with Pr[ A ] = 1- q A ( X vbl ( A ) ) ; • for non-violated A that shares variables with violated event: ⇣ ⌘ violate A with Pr[ A ] = � � X old 1 − q ∗ A · q A X vbl ( A ) /q A vbl ( A ) where q A* is a worst-case lower bound for q A ( ) : � � ≥ q ∗ ∀ X vbl ( A ) : q A X vbl ( A ) A (target soft filters: ∀ A ∈ A , q ∗ ( X 1 , …, X n ) ~ D * A > 0 distribution) upon termination Only the variables local to the violated events are resampled. (work even for dynamic filters) By a resampling table argument.

  19. Local Ising Sampler 0 < β < 1 external  λ �  β �  1 � 1 β ferro: anti-ferro: A = b = A = field β 1 λ > 0 β 1 1 • each vertex v ∈ V ind. samples a spin state σ v ∈ {0,1} ∝ b ; • each edge e =( u , v ) ∈ E fails ind. with prob. 1- A ( σ u , σ v ) ; • while there is a failed edge: σ old ← current σ • resample σ v for all vertices v involved in failed edges; • each failed e =( u , v ) is revived ind. with prob. A ( σ u , σ v ) ; • each non-failed e =( u , v ) that is incident to a failed edge, fails ind. with prob. 1 - β · A ( σ u , σ v )/ A ( σ u old , σ v old ) ; Cons : Pros : • convergence is hard to • local & parallel • dynamic graph analyze • regime is not tight • exact sampler β > 1 − Θ ( 1 ∆ ) • soft constraints • certifiable termination

  20. Locality of Sampling For Gibbs distributions (distributions defined by local factors): Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling with additive error local approx. local exact inference sampling with multiplicative error distributed Las Vegas sampler

  21. Jerrum-Valiant-Vazirani Sampler [ J errum- V aliant- V azirani ’86] ∃ an efficient algorithm that samples from ˆ µ µ ( σ ) given any σ ∈ { 0 , 1 } V and evaluates ˆ e − 1 /n 2 ≤ ˆ µ ( σ ) multiplicative error: ∀ σ ∈ { 0 , 1 } V : µ ( σ ) ≤ e 1 /n 2 Self-reduction: n n Z ( σ 1 , . . . , σ i ) Y µ σ 1 ,..., σ i − 1 Y µ ( σ ) = ( σ i ) = v i Z ( σ 1 , . . . , σ i − 1 ) i =1 i =1 ˆ Z ( σ 1 , . . . , σ i ) let Z ( σ 1 , . . . , σ i − 1 ) ≈ e ± 1 /n 3 · µ σ 1 ,..., σ i − 1 µ σ 1 ,..., σ i − 1 ˆ ( σ i ) = ( σ i ) v i v i ˆ e − 1 / 2 n 3 ≤ ˆ where by approx. counting Z ( ··· ) Z ( ··· ) ≤ e 1 / 2 n 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend