dices detecting communities in network streams over the
play

DiCeS: Detecting Communities in Network Streams Over the Cloud - PowerPoint PPT Presentation

DiCeS: Detecting Communities in Network Streams Over the Cloud Panagiotis Liakos - Katia Papakonstantinopoulou Alexandros Ntoulas - Alex Delis University of Athens Athens University of Economics and Business 12 th IEEE


  1. DiCeS: Detecting Communities in Network Streams Over the Cloud Panagiotis Liakos † - Katia Papakonstantinopoulou ‡ Alexandros Ntoulas † - Alex Delis † † University of Athens ‡ Athens University of Economics and Business 12 th IEEE International Conference on Cloud Computing, Milan, Italy July 8 th –13 th , 2019

  2. Belgian Mobile Phone Network Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24

  3. Belgian Mobile Phone Network two large clusters of communities Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24

  4. Belgian Mobile Phone Network two large clusters l i m i t e of communities d i n b t e e r t a w c e t e i n o n c l u s t e r s ! Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24

  5. Belgian Mobile Phone Network two large clusters l i m i t e of communities d i n b t e e r t a w c e t e i n o n c l u s t e r s ! Brussels acts as a bridge! Fast unfolding of community hierarchies in large networks: Blondel et al. UoA Panagiotis Liakos DiCeS- • Motivation 2/24

  6. Climate change conversation on Twitter carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24

  7. Climate change conversation on Twitter real-world networks are massive! carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24

  8. Climate change conversation on Twitter real-world networks c h a n g e r a p i d l y ! are massive! carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24

  9. Climate change conversation on Twitter real-world networks c h a n g e r a p i d l y ! are massive! exhibit community structure! carbonbrief.org UoA Panagiotis Liakos DiCeS- • Motivation 3/24

  10. Motivation We want to extract the community structure of nodes in a network that changes rapidly. Many useful applications: we can launch accurate & successful advertising campaigns we can provide more informative & engaging social network feeds we can gain insights on the evolution of large real-world networks Size of graph data appears to be ever-increasing: Facebook has more than 2 billion registered users Google indexes more than 1 trillion unique URLs UoA Panagiotis Liakos DiCeS- • Motivation 4/24

  11. Prior Contribution CoEuS [LND17] IEEE Big Data 2017 A novel community detection algorithm that operates on a graph stream, using space sublinear to the number of edges. Additionally: A PageRank-like A Novel Clustering Technique Edge Quality Variation for Community Size Determination UoA Panagiotis Liakos DiCeS- • Our Approach 5/24

  12. CoEuS ’ context . . . 2 3 Graph stream 8 9 Communities initialized with seed-sets 1 5 4 6 8 2 7 3 UoA Panagiotis Liakos DiCeS- • Our Approach 6/24

  13. CoEuS ’ context . . . 2 3 centralized by design Graph stream 8 9 Communities initialized with seed-sets 1 5 4 6 8 2 7 3 UoA Panagiotis Liakos DiCeS- • Our Approach 6/24

  14. DiCeS ’ context Worker node 8 6 7 . . . 3 9 8 Worker node 2 9 5 9 2 5 2 3 . Worker node . 3 . 1 4 7 5 7 Worker node UoA Panagiotis Liakos DiCeS- • Our Approach 7/24

  15. Our Contribution We propose DiCeS , a novel distributed community detection algorithm for network streams. We implement DiCeS as a cloud application that handles streams of real-world networks at impressive rates. Using just 8 workers we can handle 50 million edges per hour. We achieve horizontal scalability that is close to linear. We offer significant improvements with regard to accuracy. UoA Panagiotis Liakos DiCeS- • Our Approach 8/24

  16. Apache Storm Apache Storm: Stream processing framework with broad use in production environments. Tuple : fundamental data unit Spout : source of tuples Bolt : responsible for transforming streams into the desired result Grouping : determines how the tuples are exchanged UoA Panagiotis Liakos DiCeS- • Technologies Involved 9/24

  17. Redis Redis: In-memory key-value data store. Ultra-fast read/write operations Complex data types: Strings Sets Sorted Sets Redis-cluster UoA Panagiotis Liakos DiCeS- • Technologies Involved 10/24

  18. Design Principles Scalability Isolate the processing for every edge Distributed key-value store Fault Tolerance All edges must be processed Failing nodes must be restored Interactivity Updating the target communities Obtaining results on demand UoA Panagiotis Liakos DiCeS- • Technologies Involved 11/24

  19. DiCeS’ Spout Community initialization Stream ingestion UoA Panagiotis Liakos DiCeS- • Cloud Components 12/24

  20. DiCeS’ Bolts Stream processing Community expansion Community pruning UoA Panagiotis Liakos DiCeS- • Cloud Components 13/24

  21. Our topology Processing Distributed Bolt key-value store (Redis Cluster) Processing Network Bolt stream Pruning . Spout Processing Bolt . Bolt . Community seed-sets Processing Bolt UoA Panagiotis Liakos DiCeS- • Cloud Components 14/24

  22. Our topology Processing Distributed Bolt key-value store (Redis Cluster) Processing Network Bolt stream Pruning . Spout Processing Bolt . Bolt . Community seed-sets Processing Bolt $ storm rebalance topology-name [-n new-num-workers] [-e component=parallelism]* UoA Panagiotis Liakos DiCeS- • Cloud Components 14/24

  23. DiCeS’ Bolt Algorithm 1: DiCeS input : A tuple emitted from the spout. begin if tuple .length == 1 then // renewed set of communities communities ← tuple [0] ; else // handling of an edge u ← tuple [0] ; v ← tuple [1] ; degrees [ u ]+ = 1 ; degrees [ v ]+ = 1 ; foreach C ∈ { nc [ u ] ∪ nc [ v ] } do if u ∈ C then cDegrees [ C ][ v ]+ = cDegrees [ C ][ u ] ; degrees [ u ] if v ∈ C then cDegrees [ C ][ u ]+ = cDegrees [ C ][ v ] ; degrees [ v ] if u ∈ C then communities [ C ] .put ( v, cDegrees [ C ][ v ] ) ; degrees [ v ] nc [ v ] .add ( C ) ; if v ∈ C then communities [ C ] .put ( u, cDegrees [ C ][ u ] ) ; degrees [ u ] nc [ u ] .add ( C ) ; emit(1); UoA Panagiotis Liakos DiCeS- • Cloud Components 15/24

  24. Dataset Graphs Type Nodes Edges Av. Degree Av. Community Size Co-authorship DBLP 317 , 080 1 , 049 , 866 3 . 31 22 . 45 Amazon Co-purchasing 334 , 863 925 , 872 2 . 76 13 . 49 Youtube Social 1 , 134 , 890 2 , 987 , 624 2 . 63 14 . 59 Social LiveJournal 3 , 997 , 962 34 , 681 , 189 8 . 67 27 . 80 Orkut Social 3 , 072 , 441 117 , 185 , 083 38 . 14 215 . 72 Friendster Social 65 , 608 , 366 1 , 806 , 067 , 135 27 . 53 46 . 81 Networks exceeding 1 . 8 billion links Accompanying ground-truth communities allow for the evaluation of accuracy UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 16/24

  25. Performance 2 bolts 600 4 bolts Average Processing Time per Edge 8 bolts 500 400 300 200 100 0 A D Y L O F o i r m B r v i u k e L e a t u n P u J z o t d o b u s e n r t n e a r l UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 17/24 Network

  26. Performance 2 bolts 600 4 bolts Average Processing Time per Edge 8 bolts 500 we can reduce our 400 processing time by adding bolts 300 200 100 0 A D Y L O F o i r m B r v i u k e L e a t u n P u J z o t d o b u s e n r t n e a r l UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 17/24 Network

  27. Scalability 600 Execution time (in s ) 500 400 300 200 100 0 Pending tuples (in thousands) 20 15 2 4 10 Worker nodes 8 5 UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 18/24

  28. Scalability maximum al- lowed pending tuples impacts 600 the performance Execution time (in s ) 500 400 300 200 100 0 Pending tuples (in thousands) 20 15 2 4 10 Worker nodes 8 5 UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 18/24

  29. Scalability 600 Execution time (in s ) 500 400 300 200 100 0 DiCeS offers Pending tuples (in thousands) 20 near-linear scaling! 15 2 4 10 Worker nodes 8 5 UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 18/24

  30. Fault Tolerance 10 8 Processing time (in sec) 6 4 2 0 0 200 400 600 800 1000 Total edges processed (in thousands) UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 19/24

  31. Fault Tolerance 10 DiCeS recovers 8 Processing time (in sec) its speed almost immediately 6 4 2 0 0 200 400 600 800 1000 Total edges processed (in thousands) UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 19/24

  32. Average Degree & Number of Communities 700 CoEuS DiCeS (8 bolts) Average Processing Time Per Edge 600 500 400 300 200 100 0 Degree:10, Comm:2K Degree:10, Comm:4K Degree:20, Comm:2K Degree:20, Comm:4K UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 20/24

  33. Average Degree & Number of Communities 700 CoEuS DiCeS (8 bolts) Average Processing Time Per Edge 600 500 400 less impact for DiCeS 300 200 100 0 Degree:10, Comm:2K Degree:10, Comm:4K Degree:20, Comm:2K Degree:20, Comm:4K UoA Panagiotis Liakos DiCeS- • Experimental Evaluation 20/24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend