imp mproved par arall allel l algorit ithms for r densit
play

Imp mproved Par arall allel l Algorit ithms for r Densit - PowerPoint PPT Presentation

Imp mproved Par arall allel l Algorit ithms for r Densit ity-Base sed Ne Network rk Clusterin ing Mohsen Ghaffari Silvio Lattanzi Slobodan Mitrovi ETH Google MIT Why density-based network clustering? A wide range of


  1. Imp mproved Par arall allel l Algorit ithms for r Densit ity-Base sed Ne Network rk Clusterin ing Mohsen Ghaffari Silvio Lattanzi Slobodan Mitrovi ć ETH Google MIT

  2. Why density-based network clustering? A wide range of applications in data mining:

  3. Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ]

  4. Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ] Spam detection [Gibson et al. ‘05 ]

  5. Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ] Spam detection [Gibson et al. ‘05 ] Computational biology [Altaf-Ul-Amin et al. ‘06 ; Fratkin et al. ‘06 ; Saha et al. ‘10 ] …

  6. Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ] We study: 1.Densest subgraph Spam detection 2.k-core decomposition [Gibson et al. ‘05 ] 3.Graph orientation Computational biology [Altaf-Ul-Amin et al. ‘06 ; Fratkin et al. ‘06 ; Saha et al. ‘10 ] …

  7. Densest subgraph Goal : Given a graph G, find a subgraph H such that |E(H)| / |V(H)| is maximized .

  8. Densest subgraph Goal : Given a graph G, find a subgraph H such that |E(H)| / |V(H)| is maximized . |𝐹 𝐻 | |𝑊 𝐻 | = 17 13

  9. Densest subgraph Goal : Given a graph G, find a subgraph H such that |E(H)| / |V(H)| is maximized . |𝐹 𝐻 | |𝑊 𝐻 | = 17 13 |𝐹 𝐼 | |𝑊 𝐼 | = 11 7

  10. k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core )

  11. k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core ) 1-core

  12. k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core ) 2-core

  13. k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core ) The corenessnumber of a vertex v is the maximum k for which v is part of the k-core. 2-core

  14. Hierarchical clustering via k-core

  15. Hierarchical clustering via k-core 1-core

  16. Hierarchical clustering via k-core 1-core 2-core

  17. Hierarchical clustering via k-core 1-core 3-core 2-core

  18. Hierarchical clustering via k-core 1-core 3-core 4-core 2-core

  19. these clusters ? How to compute

  20. Traditional

  21. Traditional Algorithms performed sequentially.

  22. Traditional Algorithms performed sequentially.

  23. Traditional Modern Algorithms performed sequentially.

  24. Traditional Modern Algorithms performed sequentially.

  25. Traditional Modern Massively Parallel Computation (MPC) model An approach to handling massive data Examples: Algorithms performed • MapReduce [DG, ‘04 , ‘08 ] sequentially. • Hadoop [W, ‘12 ] • Pregel [Google, ’09] • Dryad [IBYBF, ‘07 ] • Spark [ZCFSS, ‘10 ]

  26. Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . .

  27. Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . .

  28. Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . process data locally

  29. Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . Next-round . . . data:

  30. Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . Next-round . . . data: One round

  31. Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . Next-round . . . data: One round

  32. Related work 1. Densest Subgraph in Streaming and MapReduce Bahmani, Kumar, Vassilvitskii, VLDB 2012. 2. Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams Bhattacharya, Henzinger, Nanongkai, Tsourakakis, STOC 2015. 3. Efficient Densest Subgraph Computation in Evolving Graphs Epasto, Lattanzi, Sozio, WWW 2015. 4. Densest Subgraph in Dynamic Graph Streams McGregor, Tench, Vorotnikova, Vu, MFCS 2015. 5. Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond Esfandiari, Hajiaghayi, Woodruff, SPAA 2016. 6. Efficient primal-dual graph algorithms for MapReduce Bahmani, Goel, Munagala, Workshop on Algorithms and Models for the Web-Graph 2014. 7. Parallel and streaming algorithms for k-core decomposition Esfandiari, Lattanzi, and Mirrokni, ICML 2018. 8. Streaming algorithms for k-core decomposition Saríyüce, Gedik, Jacques, Wu, Çatalyürek, VLDB 2013. 9. Distributed-Core View Materialization and Maintenance for Large Dynamic Graphs Aksu, Canim, Chang, Korpeoglu, Ulusoy, TKDE 2014.

  33. Our results n = number of vertices Theorem 1 Theorem 3 1 + 𝜗 -approximate k-core decomposition can 1 + 𝜗 -approximate densest subgraph can be be obtained in 𝑃 log log 𝑜 MPC rounds with obtained in ෨ log𝑜 MPC rounds with 𝑃 𝑜 𝜀 𝑃 ෨ 𝑃(𝑜) memory per machine. memory per machine and the total memory of 𝑃 max 𝑜 1+𝜀 ,𝑛 ෨ . Theorem 2 Theorem 4 2 + 𝜗 -approximate k-core decomposition can For a graph of arboricity 𝜇 , a 2 + 𝜗 𝜇 orientation be obtained in ෨ can be obtained in ෨ 𝑃 log 𝑜 MPC rounds with 𝑃 log 𝑜 MPC rounds with 𝑃 𝑜 𝜀 memory per machine and the total 𝑃 𝑜 𝜀 memory per machine and the total memory of ෨ memory of ෨ 𝑃 max 𝑜 1+𝜀 ,𝑛 𝑃 𝜇𝑜 . .

  34. Our results n = number of vertices Theorem 1 Theorem 3 1 + 𝜗 -approximate k-core decomposition can 1 + 𝜗 -approximate densest subgraph can be be obtained in 𝑃 log log 𝑜 MPC rounds with obtained in ෨ log𝑜 MPC rounds with 𝑃 𝑜 𝜀 𝑃 ෨ 𝑃(𝑜) memory per machine. memory per machine and the total memory of 𝑃 max 𝑜 1+𝜀 ,𝑛 ෨ . Poster: Wed, Pacific Ballroom #166 Theorem 2 Theorem 4 2 + 𝜗 -approximate k-core decomposition can For a graph of arboricity 𝜇 , a 2 + 𝜗 𝜇 orientation be obtained in ෨ can be obtained in ෨ 𝑃 log 𝑜 MPC rounds with 𝑃 log 𝑜 MPC rounds with 𝑃 𝑜 𝜀 memory per machine and the total 𝑃 𝑜 𝜀 memory per machine and the total memory of ෨ memory of ෨ 𝑃 max 𝑜 1+𝜀 ,𝑛 𝑃 𝜇𝑜 . .

  35. Next Theorem 1 (1 + 𝜗) -approximate k-core decomposition can be obtained in 𝑃 log log 𝑜 MPC rounds with ෨ 𝑃 𝑜 memory per machine.

  36. Next Theorem 1 (1 + 𝜗) -approximate k-core decomposition can be obtained in 𝑃 log log 𝑜 MPC rounds with ෨ 𝑃 𝑜 memory per machine. High-level idea: Simulate the sequential algorithm.

  37. The sequential algorithm - Given a threshold k, repeatedly remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

  38. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

  39. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

  40. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

  41. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

  42. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed. Coreness value of all remaining vertices >= 2.

  43. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed. Implementing this approach directly can take too many rounds. Coreness value of all remaining vertices >= 2.

  44. The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed. Implementing this approach directly can take too many rounds. Idea: Process only large thresholds. Coreness value of all remaining vertices >= 2.

  45. Partition vertices and process induced graphs

  46. Partition vertices and process induced graphs

  47. Partition vertices and process induced graphs Apply the sequential algorithm locally.

  48. Partition vertices and process induced graphs Partition the graph across 𝑜 machines. Apply the sequential algorithm locally.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend