Imp mproved Par arall allel l Algorit ithms for r Densit - PowerPoint PPT Presentation

Imp mproved Par arall allel l Algorit ithms for r Densit ity-Base sed Ne Network rk Clusterin ing Mohsen Ghaffari Silvio Lattanzi Slobodan Mitrovi ć ETH Google MIT

Why density-based network clustering? A wide range of applications in data mining:

Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ]

Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ] Spam detection [Gibson et al. ‘05 ]

Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ] Spam detection [Gibson et al. ‘05 ] Computational biology [Altaf-Ul-Amin et al. ‘06 ; Fratkin et al. ‘06 ; Saha et al. ‘10 ] …

Why density-based network clustering? A wide range of applications in data mining: Community detection [Leskovec et al. ‘08 ; Chen & Saad ‘12 ; Gionis & Tsourakakis ’15 ; Mitzenmacher et al. ‘15 ] We study: 1.Densest subgraph Spam detection 2.k-core decomposition [Gibson et al. ‘05 ] 3.Graph orientation Computational biology [Altaf-Ul-Amin et al. ‘06 ; Fratkin et al. ‘06 ; Saha et al. ‘10 ] …

Densest subgraph Goal : Given a graph G, find a subgraph H such that |E(H)| / |V(H)| is maximized .

Densest subgraph Goal : Given a graph G, find a subgraph H such that |E(H)| / |V(H)| is maximized . |𝐹 𝐻 | |𝑊 𝐻 | = 17 13

Densest subgraph Goal : Given a graph G, find a subgraph H such that |E(H)| / |V(H)| is maximized . |𝐹 𝐻 | |𝑊 𝐻 | = 17 13 |𝐹 𝐼 | |𝑊 𝐼 | = 11 7

k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core )

k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core ) 1-core

k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core ) 2-core

k-core decomposition Goal : Given k, find a maximal subgraph of minimum degree at least k. ( k-core ) The corenessnumber of a vertex v is the maximum k for which v is part of the k-core. 2-core

Hierarchical clustering via k-core

Hierarchical clustering via k-core 1-core

Hierarchical clustering via k-core 1-core 2-core

Hierarchical clustering via k-core 1-core 3-core 2-core

Hierarchical clustering via k-core 1-core 3-core 4-core 2-core

these clusters ? How to compute

Traditional

Traditional Algorithms performed sequentially.

Traditional Modern Algorithms performed sequentially.

Traditional Modern Massively Parallel Computation (MPC) model An approach to handling massive data Examples: Algorithms performed • MapReduce [DG, ‘04 , ‘08 ] sequentially. • Hadoop [W, ‘12 ] • Pregel [Google, ’09] • Dryad [IBYBF, ‘07 ] • Spark [ZCFSS, ‘10 ]

Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . .

Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . process data locally

Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . Next-round . . . data:

Massively Parallel Computation (MPC) round Data: . . . S S S S N machines: . . . Next-round . . . data: One round

Related work 1. Densest Subgraph in Streaming and MapReduce Bahmani, Kumar, Vassilvitskii, VLDB 2012. 2. Space- and Time-Efficient Algorithm for Maintaining Dense Subgraphs on One-Pass Dynamic Streams Bhattacharya, Henzinger, Nanongkai, Tsourakakis, STOC 2015. 3. Efficient Densest Subgraph Computation in Evolving Graphs Epasto, Lattanzi, Sozio, WWW 2015. 4. Densest Subgraph in Dynamic Graph Streams McGregor, Tench, Vorotnikova, Vu, MFCS 2015. 5. Brief Announcement: Applications of Uniform Sampling: Densest Subgraph and Beyond Esfandiari, Hajiaghayi, Woodruff, SPAA 2016. 6. Efficient primal-dual graph algorithms for MapReduce Bahmani, Goel, Munagala, Workshop on Algorithms and Models for the Web-Graph 2014. 7. Parallel and streaming algorithms for k-core decomposition Esfandiari, Lattanzi, and Mirrokni, ICML 2018. 8. Streaming algorithms for k-core decomposition Saríyüce, Gedik, Jacques, Wu, Çatalyürek, VLDB 2013. 9. Distributed-Core View Materialization and Maintenance for Large Dynamic Graphs Aksu, Canim, Chang, Korpeoglu, Ulusoy, TKDE 2014.

Our results n = number of vertices Theorem 1 Theorem 3 1 + 𝜗 -approximate k-core decomposition can 1 + 𝜗 -approximate densest subgraph can be be obtained in 𝑃 log log 𝑜 MPC rounds with obtained in ෨ log𝑜 MPC rounds with 𝑃 𝑜 𝜀 𝑃 ෨ 𝑃(𝑜) memory per machine. memory per machine and the total memory of 𝑃 max 𝑜 1+𝜀 ,𝑛 ෨ . Theorem 2 Theorem 4 2 + 𝜗 -approximate k-core decomposition can For a graph of arboricity 𝜇 , a 2 + 𝜗 𝜇 orientation be obtained in ෨ can be obtained in ෨ 𝑃 log 𝑜 MPC rounds with 𝑃 log 𝑜 MPC rounds with 𝑃 𝑜 𝜀 memory per machine and the total 𝑃 𝑜 𝜀 memory per machine and the total memory of ෨ memory of ෨ 𝑃 max 𝑜 1+𝜀 ,𝑛 𝑃 𝜇𝑜 . .

Our results n = number of vertices Theorem 1 Theorem 3 1 + 𝜗 -approximate k-core decomposition can 1 + 𝜗 -approximate densest subgraph can be be obtained in 𝑃 log log 𝑜 MPC rounds with obtained in ෨ log𝑜 MPC rounds with 𝑃 𝑜 𝜀 𝑃 ෨ 𝑃(𝑜) memory per machine. memory per machine and the total memory of 𝑃 max 𝑜 1+𝜀 ,𝑛 ෨ . Poster: Wed, Pacific Ballroom #166 Theorem 2 Theorem 4 2 + 𝜗 -approximate k-core decomposition can For a graph of arboricity 𝜇 , a 2 + 𝜗 𝜇 orientation be obtained in ෨ can be obtained in ෨ 𝑃 log 𝑜 MPC rounds with 𝑃 log 𝑜 MPC rounds with 𝑃 𝑜 𝜀 memory per machine and the total 𝑃 𝑜 𝜀 memory per machine and the total memory of ෨ memory of ෨ 𝑃 max 𝑜 1+𝜀 ,𝑛 𝑃 𝜇𝑜 . .

Next Theorem 1 (1 + 𝜗) -approximate k-core decomposition can be obtained in 𝑃 log log 𝑜 MPC rounds with ෨ 𝑃 𝑜 memory per machine.

Next Theorem 1 (1 + 𝜗) -approximate k-core decomposition can be obtained in 𝑃 log log 𝑜 MPC rounds with ෨ 𝑃 𝑜 memory per machine. High-level idea: Simulate the sequential algorithm.

The sequential algorithm - Given a threshold k, repeatedly remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed.

The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed. Coreness value of all remaining vertices >= 2.

The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed. Implementing this approach directly can take too many rounds. Coreness value of all remaining vertices >= 2.

The sequential algorithm - Given a threshold k, repeatedly k=2 remove all the vertices of degree less than k. - The coreness value of a vertex is the largest k for which it is not removed. Implementing this approach directly can take too many rounds. Idea: Process only large thresholds. Coreness value of all remaining vertices >= 2.

Partition vertices and process induced graphs

Partition vertices and process induced graphs Apply the sequential algorithm locally.

Partition vertices and process induced graphs Partition the graph across 𝑜 machines. Apply the sequential algorithm locally.

Imp mproved Par arall allel l Algorit ithms for r Densit - PowerPoint PPT Presentation

Imp mproved Par arall allel l Algorit ithms for r Densit ity-Base sed Ne Network rk Clusterin ing Mohsen Ghaffari Silvio Lattanzi Slobodan Mitrovi ETH Google MIT Why density-based network clustering? A wide range of

Al Algorit ithms & Explanatio ion: : A A Humble Framin ing Jeremy Heffner HunchLab

Xudong Zhang 1. Int roduct ion of simulat ed annealing (S A) algorit hm 2. S equent ial S A

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Probabilit y densit y f u nctions STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1 ) J u stin

Outline I t erat ive improvement algorit hms Hill climbing search Local Search

Welcome and Thank You All 1 4/4/2014 MS Capstone Project: Title HyParSAT: A Hy brid Par allel

HyPar-Flow : Exploiting MPI and Keras for Scalable Hy brid- Par allel DNN Training with Tensor Flow

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 12 February 2015

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

NLCertify : A Tool for Formal Nonlinear Optimization Victor Magron , Postdoc LAAS-CNRS 18

New Applications of Semidefinite Programming Victor Magron , RA Imperial College 3 Fvrier 2015

Beam Windows f or high t ransverse densit y beams By t he CERN Groupe Mthodes L. Bruno

Negativ ive Hydrogen Io Ion densit ity measurement in in a permanent magnet based Helic licon

Database Overview WebVision2.0 dataset 5,000 categories From Flickr & Google 16M

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of

CISOs Guide To Shutting Down Attacks Using The Dark Web Agenda The Dark Web: Whats At

Speech Thermal Analysis and Management of Multi-core Systems Prof. William Fornaciari

Semantic Keyword Search in Linked Data Andrea Cal` , Leonardo Coaccioli, Mirko Michele

Normal Complement Problem for Metacyclic Groups Surinder Kaur (Joint work with Manju Khan)

Machine learning in Python 030918 | Machine learning in Python | esten@epimed.ai | epigram medtech

Eszter KSA Department of Aquaculture, Szent Istvn University, Gdll , Hungary

Imp mproved Par arall allel l Algorit ithms for r Densit - PowerPoint PPT Presentation

Imp mproved Par arall allel l Algorit ithms for r Densit ity-Base sed Ne Network rk Clusterin ing Mohsen Ghaffari Silvio Lattanzi Slobodan Mitrovi ETH Google MIT Why density-based network clustering? A wide range of

Al Algorit ithms &amp; Explanatio ion: : A A Humble Framin ing Jeremy Heffner HunchLab

Xudong Zhang 1. Int roduct ion of simulat ed annealing (S A) algorit hm 2. S equent ial S A

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Par arall llel Performan ance Optim imiz ization and Productiv ivity EU H2020 Centre of of

Probabilit y densit y f u nctions STATISTIC AL TH IN K IN G IN P YTH ON ( PAR T 1 ) J u stin

Outline I t erat ive improvement algorit hms Hill climbing search Local Search

Welcome and Thank You All 1 4/4/2014 MS Capstone Project: Title HyParSAT: A Hy brid Par allel

HyPar-Flow : Exploiting MPI and Keras for Scalable Hy brid- Par allel DNN Training with Tensor Flow

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 12 February 2015

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

NLCertify : A Tool for Formal Nonlinear Optimization Victor Magron , Postdoc LAAS-CNRS 18

New Applications of Semidefinite Programming Victor Magron , RA Imperial College 3 Fvrier 2015

Beam Windows f or high t ransverse densit y beams By t he CERN Groupe Mthodes L. Bruno

Negativ ive Hydrogen Io Ion densit ity measurement in in a permanent magnet based Helic licon

Database Overview WebVision2.0 dataset 5,000 categories From Flickr &amp; Google 16M

Latest on Linear Sketches for Large Graphs: Lots of Problems, Little Space, and Loads of

CISOs Guide To Shutting Down Attacks Using The Dark Web Agenda The Dark Web: Whats At

Speech Thermal Analysis and Management of Multi-core Systems Prof. William Fornaciari

Semantic Keyword Search in Linked Data Andrea Cal` , Leonardo Coaccioli, Mirko Michele

Normal Complement Problem for Metacyclic Groups Surinder Kaur (Joint work with Manju Khan)

Machine learning in Python 030918 | Machine learning in Python | esten@epimed.ai | epigram medtech

Eszter KSA Department of Aquaculture, Szent Istvn University, Gdll , Hungary

Al Algorit ithms & Explanatio ion: : A A Humble Framin ing Jeremy Heffner HunchLab

Database Overview WebVision2.0 dataset 5,000 categories From Flickr & Google 16M