Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push - PowerPoint PPT Presentation

Jesmin Jahan Tithi ♥ Andrzej Stasiak * Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push Louvain Algorithms Fabrizio Petrini ♥ with Automatic Edge Pruning ♥ Parallel Computing Labs, Intel, * Data Center Group, Intel.

What is community?

What is Community?  Sets of vertices that have dense intra-connections, but sparse inter-connections  Uncover hidden structures inside a graph in a form of coherent modules of vertices  Strongly correlated to functional and structural properties community Protein-Protein Interaction Network World Wide Web Image source: Google Image

What is community detection?

What is Community Detection?  Algorithms to identify communities in a network  Applications: network analysis to retrieve information or patterns of the network http://senseable.mit.edu Virality Prediction and Community Structure in Social Networks Nodus Labs Against Putin Facebook protest group visualization, December 2011 /community_detection/

How to measure the quality of the detected communities ?

A Measure of Solution Quality  Modularity: A measure of interconnectedness of the communities � ��, � = � ∑ � �� 2� − ∑� �� 4� � Max Value of Q = 1 �∈� � � �� = � � �,� , �� , � ∈ � � � �� = � � �,� , �� ∈ � �� ∈ � � = ∑ � �,� �(�,�)  |Q| ∈ (0, 1] , and the higher the better  Community detection algorithm identifies communities in a way that maximizes modularity

How do we maximize modularity?

A Recipe of Modularity Optimization  Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� �� , � = ∑ �� − �� Max Value of Q = 1 �∈�  Large values of � correlate with high ∑ � �� and low ∑� �� - Communities that are dense within their structure and weakly coupled among each other  To get high ∑ � �� , the highest possible number of edges should fall in each community

A Recipe of Modularity Optimization  Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� �� , � = ∑ �� − �� Max Value of Q = 1 �∈�  Large values of � correlate with high ∑ � �� and low ∑� �� - Communities that are dense within their structure and weakly coupled among each other  To decrease ∑� �� , divide the network into several communities with small total degrees

NP-hardness of Modularity Optimization  Modularity: A measure of interconnectedness of the communities � ∑ � �� ∑� �� , � = ∑ �� − �� Max Value of Q = 1 �∈� Challenge: Finding communities with optimal modularity is “NP-hard”

Louvain Maximizes modularity following a greedy algorithm V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008

Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities

Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities  Phase 1: Modularity Optimization/Inner loop V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008

Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities  Phase 2: Community Aggregation and Graph Reconstruction V. D. Blondel, J.-L. Guillaume, R. Lambiotte and E. Lefebvre, "Fast unfolding of communities in large networks," J. Stat. Mech. (2008) P10008, p. 12, 2008

Louvain: Algorithm Steps  Outer Loop: Traverse the graph in several passes to incrementally build communities  Phase 1: Modularity Optimization/Inner loop - � � � + �  Phase 2: Community Aggregation and Graph Reconstruction - � � + �

A key data structure to decide pull or push

Hash map NCW– ⟨ community_id, Some of edge weights ⟩ A hash map with ⟨key = neighboring community, val = sum of edge weights to that community ⟩ 2 3 Vertex 1 is neighbor to 2, 3 (members of community 1) => sum of edges weights=2 c=1 6 1 5 c=4 Vertex 1 is neighbor to 7 (member of community 7) => sum of edge weight = 1 4 c=7 ⟨ � ommunity_id, Some of edge weights ⟩ �� =[ ⟨ c=1, � �→� =2 ⟩ , ⟨ c=7, � �→� =1 ⟩ ]

Hash map NCW– ⟨ community_id, Some of edge weights ⟩ A hash map with ⟨key = neighboring community, val = sum of edge weights to that community ⟩ 2 3 Vertex 1 is neighbor to 2, 3 (members of community 1) => sum of edges weights=2 c=1 6 1 5 c=4 Vertex 1 is neighbor to 7 (member of community 7) => sum of edge weight = 1 4 ⟨ � ommunity_id, Some of edge weights ⟩ c=7 �� =[ ⟨ c=1, � �→� =2 ⟩ , ⟨ c=7, � �→� =1 ⟩ ]

Louvain Pseudocode Repeat if there is a change in community membership

Louvain Pseudocode Initialize each vertex in its own community Compute initial modularity

Louvain Pseudocode Phase 1/ inner loop starts

Louvain Pseudocode For each vertex, build NCW by pulling community info from neighbors

Louvain Pseudocode Find the best community to move into by iterating though all entries of NCW

Louvain Pseudocode Move to the best community and update community info

Louvain Pseudocode Once done for all vertices, compute new modularity and repeat if modularity increased by a threshold

Louvain Pseudocode merged 3 2 3 1 c=1 6 1 1 5 3 3 1 c=4 4 2 c=7 When modularity stabilizes, create a new graph by merging all vertices in same community into one

We call the standard Louvain Algorithm a Pull-based Louvain Algorithm To build �� at each iteration, it pulls latest info from neighbors

Unnecessary work in Louvain

Observations

Number of vertex moves drops significantly after the first few iterations of phase1 JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations  For a particular outer loop, the number of vertices that change communities drops drastically after the first few inner loop iterations (e.g., 5).

Number of vertex moves drops significantly after the first few iterations of phase1 JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations  The number of vertices that change communities in the later inner loop iterations is minimal

Implications JohnsHopkins pokec outer loop 0 outer loop 1 outer loop 0 outer loop 1 6,000 1,600,000 1,400,000 5,000 vertices moved Vertices moved 1,200,000 4,000 1,000,000 3,000 800,000 600,000 2,000 400,000 1,000 200,000 0 0 0 5 10 15 20 0 10 20 30 40 50 inner loop iterations inner loop iterations  Wasteful to scan all neighbors to compute �� , if no change in neighborhood  Wasteful to iterate over all vertices for each iteration of phase 1, vertices do not move

Pruning Unnecessary Work in Louvain Prune vertices that are unlikely to move Prune unnecessary neighborhood exploration

Push-based Louvain Vertex does not pull, rather neighbors actively push any Algorithm changes

Push-based Louvain The Push-based algorithm starts with an initialized �� , assuming each vertex is in its own community

Push-based Louvain During Phase 1, it never recreates ��

Push-based Louvain If there is a change in community membership

Push-based Louvain Update �� for the vertex itself, and push updates to all its neighbors

Pros and Cons of Pull and Push

Pull – Cons Does redundant memory read by scanning all vertices and their neighbors to rebuild �� for each inner loop, even when the vertex’s neighborhood has not changed pokec outer loop 0 outer loop 1 1,600,000 1,400,000 Vertices moved 1,200,000 1,000,000 800,000 Unnecessary neighborhood scan 600,000 400,000 200,000 0 0 10 20 30 40 50 inner loop iterations

Push – Pros Scans through all neighbors of a vertex only when a vertex changes its community to update �� pokec outer loop 0 outer loop 1 1,600,000 1,400,000 Vertices moved 1,200,000 1,000,000 800,000 Avoids exploring edges unnecessarily 600,000 400,000 200,000 0 0 10 20 30 40 50 inner loop iterations

Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push - PowerPoint PPT Presentation

Jesmin Jahan Tithi Andrzej Stasiak * Prune the Unnecessary: Sriram Aananthakrishnan * Parallel Pull-Push Louvain Algorithms Fabrizio Petrini with Automatic Edge Pruning Parallel Computing Labs, Intel, * Data Center Group, Intel. What

CRP 021502 Prune Alley PRUNE ALLEY Improvements C O M P L E T E S T R E E T 2 WASHINGTON

NETWORKS Pavlo Molchanov Stephen Tyree Tero Karras Timo Aila Jan Kautz 2017 WHY WE CAN PRUNE

By Paul Lamey 1. Unnecessary Divisions Interfere with the Focus of Discipleship (vv. 38 41) 1.

Berries, Grapes and Kiwi Pruning Blueberries Prune to an open vase shape, leaving 4 to 6

The interval branch-and-prune algorithm for the protein structure determination by NMR Th er`

Learning to Prune Dominated Action Sequences in Online Black-box Planning Yuu Jinnai Alex

Binary Search Trees 15-121 Fall 2020 Margaret Reid-Miller Today Prune leaves example

Labral Reconstruction is Unnecessary and No Evidence to Support Primary Reconstruction J. W .

Status of Initiatives 10. Eliminate unnecessary duplication at Enterprise level and reassign to

Certification Intent: To update certain provisions, delete unused and unnecessary provisions,

The Decision to Own: Smart Leverage or Unnecessary Risk? Image of a house-shaped padlock and two

LCCMR ID: 098-C3+4 Project Title: Tree Retention Following Harvest: Benefit or Unnecessary Cost?

Objectives 1. Identify the regulatory requirements 4. Identify tools for the leadership team to

Chronic Treatment with Medication Assertion: May be unnecessary or even harmful for some

Risk factors for unnecessary ry antibiotic therapy: a majo jor role for clinical management

Unnecessary Early Deliveries June 22, 2011 Chicago, Illinois This activity is supported through

Challenges in def bubblesort(x): quantum algorithms for for j in range(len(x)): integer

Good programs, broken programs? CS 251 Fall 2019 CS 240 Spring 2020 Principles of Programming

Learn the Financial and Energy Benefits of Becoming a NYSERDA Clean Energy Community EASTERN

Formally Proved Security of Assembly Code Against Power Analysis: A Case Study on Balanced Logic

61A Lecture 19 Announcements Tree Class Tree Review Nodes Path Root value 3 Values Branch

Problem with Heuristic Search YOSSI COHEN P R O F . A R I E L F E L N E R , D R . R O N I S

Training Behavior of Sparse Neural Network Topologies Simon Alford, Ryan Robinett, Lauren

C4.5 - pruning decision trees Quiz 1 Quiz 1 Q: Is a tree with only pure leafs always the best