sampling counting for big data
play

Sampling & Counting for Big Data 2019 - PowerPoint PPT Presentation

Sampling & Counting for Big Data 2019 2019 8 3 Sampling vs Counting for all self-reducible problems [Jerrum-Valiant-Vazirani 86]: approx


  1. Sampling & Counting for Big Data ����� ��� 2019 �������������� 2019 � 8 � 3 ������

  2. Sampling vs Counting for all self-reducible problems [Jerrum-Valiant-Vazirani ’86]: approx counting exact ( { approx } sampling vol( Ω ) Poly-Time approx inference X = ( X 1 , X 2 , …, X n ) Turing Machine X ∼ Ω Pr[ X i = ⋅ ∣ X S = σ ]

  3. MCMC Sampling Markov chain for sampling X = ( X 1 , X 2 , …, X n ) ∼ μ • Gibbs sampling (Glauber dynamics, heat-bath) [Glauber, ’63] pick a random i ; [Geman, Geman, ’84] resample X i ~ µ v ( · | N ( v )) ; • Metropolis-Hastings algorithm pick a random i ; [Metropolis et al, ’53] propose a random c ; [Hastings, ’84] X i = c w.p. ∝ µ ( X’ )/ µ ( X ); • Analysis: coupling methods [Aldous , ’83] [Jerrum, ’95] [Bubley, Dyer ’97] may give O( n log n ) upper bound for mixing time

  4. Computational Phase Transition hardcore model: graph G ( V , E ) , max-degree Δ , fugacity λ >0 approx sample independent set I in G w.p. ∝ λ | I | • λ c ( ∆ ) = ( ∆ − 1) ( ∆ − 1) [Weitz, STOC ’06] : If λ < λ c , n O(log Δ ) time. ( ∆ − 2) ∆ • [Sly, FOCS ’10 best paper] : If λ > λ c , λ 6 NP- hard even for Δ =O(1) . 5 Hard 4 3 [Efthymiou, Hayes, Š tefankovi č , 2 Easy Vigoda, Y., FOCS ’16]: 1 If λ < λ c , O( n log n ) mixing time. 2 4 6 8 10 max-deg Δ If Δ is large enough, and there is no small cycle. A phase transition occurs at λ c .

  5. Big Data?

  6. Sampling and Inference for Big Data • Sampling from a joint distribution (specified by a probabilistic graphical model ). • Inferring according to a probabilistic graphical model. • The data ( probabilistic graphical model ) is BIG.

  7. • Parallel/distributed algorithms for sampling ? ✓ • PTIME ⟹ Polylog( n ) rounds • For parallel/distributed computing: ✓ sampling ≡ approx counting/inference ? • PTIME ⟹ Polylog( n ) rounds • Dynamic sampling algorithms ? ✓ • PTIME ⟹ Polylog( n ) incremental cost

  8. Local Computation “ What can be computed locally? ” [Noar, Stockmeyer, STOC’93, SICOMP’95] the LOCAL model [Linial ’87] : • Communications are synchronized. • In each round: unlimited local computation and communication with neighbors. • Complexity: # of rounds to terminate in the worst case. • In t rounds: each node can collect information up to distance t . PLOCAL: t = polylog( n )

  9. “What can be sampled locally?” • Joint distribution defined by local constraints: • Markov random field • Graphical model • Sample a random solution from the joint distribution: • distributed algorithms (in the LOCAL model) network G ( V , E ) Q: “What locally definable joint distributions are locally sample - able?”

  10. MCMC Sampling Classic MCMC sampling: G ( V , E ): Markov chain X t → X t+ 1 : pick a uniform random vertex v ; v v update X ( v ) conditioning on X ( N ( v )) ; O( n log n ) time when mixing Parallelization: • Chromatic scheduler [folklore] [Gonzalez et al. , AISTAT’11] : Vertices in the same color class are updated in parallel. • O( Δ log n ) mixing time ( Δ is max degree) • “Hogwild!” [Niu, Recht, Ré, Wright, NIPS’11][De Sa, Olukotun, Ré, ICML’16] : All vertices are updated in parallel, ignoring concurrency issues. • Wrong distribution!

  11. Crossing the Chromatic # Barrier Sequential Parallel O( n log n ) O( Δ log n ) parallel speedup = θ ( n / Δ ) ∆ = max-degree χ = chromatic no. Do not update adjacent vertices simultaneously. It takes ≥ χ steps to update all vertices at least once. Q: “How to update all variables simultaneously and still converge to the correct distribution?”

  12. Markov Random Fields (MRF) μ ( σ ) ∝ ∏ ∀ σ ∈ [ q ] V : ∏ ν v ( σ v ) ϕ e ( σ u , σ v ) v ∈ V e =( u , v ) ∈ E • Each vertex v ∈ V : a variable over X v ∈ [ q ] ν v domain [ q ] with distribution ϕ e v u ν v • Each edge e =( u , v ) ∈ E : a symmetric binary constraint: ϕ e : [ q ] × [ q ] → [0,1] G ( V , E )

  13. The Local-Metropolis Algorithm [Feng, Sun, Y., What can be sample locally ? PODC ’17] proposals: σ w σ u σ v u v w current: X u X v X w Markov chain X t → X t+ 1 : each vertex v ∈ V independently proposes a random ; σ v ∼ ν v each edge e =( u , v ) passes its check independently with prob: ϕ e ( X u , σ v ) ⋅ ϕ e ( σ u , X v ) ⋅ ϕ e ( σ u , σ v ); each vertex v ∈ V update X v to σ v if all its edges pass checks ; • Local-Metropolis converges to the correct distribution µ .

  14. The Local-Metropolis Algorithm [Feng, Sun, Y., What can be sample locally ? PODC ’17] each vertex v ∈ V independently proposes a random ; σ v ∼ ν v each edge e =( u , v ) passes its check independently with prob: ϕ e ( X u , σ v ) ⋅ ϕ e ( σ u , X v ) ⋅ ϕ e ( σ u , σ v ); each vertex v ∈ V update X v to σ v if all its edges pass checks ; • Local-Metropolis converges to the correct distribution µ . μ ( σ ) ∝ ∏ MRF: ∏ ν v ( σ v ) ϕ e ( σ u , σ v ) v ∈ V e =( u , v ) ∈ E • under coupling condition for Metropolis-Hastings : • Metropolis-Hastings : O( n log n ) time • (lazy) Local-Metropolis : O(log n ) time

  15. Lower Bounds [Feng, Sun, Y., What can be sample locally ? PODC ’17] Approx sampling from any MRF requires Ω (log n ) rounds. • for sampling: O(log n ) is the new criteria of “ local ” If λ > λ c , sampling from hardcore model requires Ω ( diam ) rounds. λ c ( ∆ ) = ( ∆ − 1) ( ∆ − 1) strong separation : sampling vs other ( ∆ − 2) ∆ local computation tasks λ 6 • Independent set is trivial to 5 construct locally (e.g. ∅ ). Hard 4 3 • The lower bound holds not because 2 Easy of the locality of information, but 1 because of the locality of correlation. 2 4 6 8 10 max-deg Δ

  16. • Parallel/distributed algorithms for sampling ? ✓ • PTIME ⟹ Polylog( n ) rounds • For parallel/distributed computing: ✓ sampling ≡ approx counting/inference ? • PTIME ⟹ Polylog( n ) rounds • Dynamic sampling algorithms ? ✓ • PTIME ⟹ Polylog( n ) incremental cost

  17. Example : Sample Independent Set (hardcore model) µ : distribution of independent sets I in G ∝ λ | I | • Y ∈ {0,1} V indicates an independent set • Each v ∈ V returns a Y v ∈ {0,1} , such that Y = ( Y v ) v ∈ V ∼ µ • Or: d TV ( Y , µ ) < 1/poly( n ) network G ( V , E )

  18. Inference (Local Counting) µ : distribution of independent sets I in G ∝ λ | I | : marginal distribution at v conditioning on σ ∈ {0,1} S . µ σ v ∀ y ∈ { 0 , 1 } : v ( y ) = Pr Y ∼ µ [ Y v = y | Y S = σ ] µ σ 0 • Each v ∈ S receives σ v as input. • Each v ∈ V returns a marginal 0 distribution such that: µ σ ˆ v 1 1 1 d TV (ˆ v ) ≤ µ σ v , µ σ poly( n ) n 1 Y Z = µ ( ∅ ) = Y ∼ µ [ Y v i = 0 | ∀ j < i : Y v j = 0] Pr network G ( V , E ) i =1 Z : partition function (counting)

  19. Decay of Correlation : marginal distribution at v conditioning on σ ∈ {0,1} S . µ σ v strong spatial mixing (SSM): ∀ boundary condition B ∈ {0,1} r -sphere( v ) : v , µ σ ,B d TV ( µ σ ) ≤ poly( n ) · exp( − Ω ( r )) v SSM (iff λ ≤ λ c when µ is the G hardcore model) r approx. inference is solvable v B in O(log n ) rounds σ in the LOCAL model

  20. Locality of Counting & Sampling [Feng, Y., PODC ’18] For all self-reducible graphical models: Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling easy with additive error O(log 2 n ) factor local approx. local exact inference sampling with multiplicative error distributed Las Vegas sampler

  21. Locality of Sampling Inference: Sampling: Correlation Decay: local approx. local approx. SSM inference sampling return a random Y = ( Y v ) v ∈ V each v can compute a µ σ ˆ v within O(log n ) -ball whose distribution ˆ µ ≈ µ 1 s.t. 1 d TV (ˆ µ, µ ) ≤ d TV (ˆ v ) ≤ µ σ v , µ σ poly( n ) poly( n ) sequential O(log n ) -local procedure: • scan vertices in V in an arbitrary order v 1 , v 2 , …, v n • for i =1,2, …, n : sample according to Y v 1 ,...,Y vi − 1 Y v i ˆ µ v i

  22. Network Decomposition ( C , D ) -network-decomposition of G : • classifies vertices into clusters; • assign each cluster a color in [ C ] ; • each cluster has diameter ≤ D ; • clusters are properly colored. ( C , D ) r -ND: ( C , D ) -ND of G r Given a ( C , D ) r - ND: sequential r -local procedure: r = O(log n ) r = O(log n ) • scan vertices in V in an arbitrary order v 1 , v 2 , …, v n • for i =1,2, …, n : sample according to Y v 1 ,...,Y vi − 1 Y v i ˆ µ v i can be simulated in O( CDr ) rounds in LOCAL model

  23. Network Decomposition ( C , D ) -network-decomposition of G : • classifies vertices into clusters; • assign each cluster a color in [ C ] ; • each cluster has diameter ≤ D ; • clusters are properly colored. ( C , D ) r -ND: ( C , D ) -ND of G r ( O(log n ), O(log n )) r -ND can be constructed in O( r log 2 n ) rounds w.h.p. [Ghaffari, Kuhn, Maus, STOC’17]: r -local SLOCAL algorithm: O( r log 2 n ) -round LOCAL alg.: ND ∀ ordering π =( v 1 , v 2 , …, v n ) , returns w.h.p. the Y ( π ) for some ordering π returns random vector Y ( π )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend