Understanding Performance Bottleneck to Improve Parallel Efficiency - - PowerPoint PPT Presentation

▶

Mar 19, 2024 166 likes •276 views

P D S W- D I S C S 2 0 1 9 : 4 t h Jo i n t I n t e r n a t i o n a l Wo r k s h o p o n Pa r a l l e l D a t a S t o r a g e & D a t a I n t e n s i v e S c a l a b l e C o m p u t i n g S y s t e m s Understanding Performance

SLIDE 1

Understanding Performance Bottleneck to Improve Parallel Efficiency of Louvain Algorithm

Naw Safrin Sattar Shaikh Arifuzzaman

P D S W- D I S C S 2 0 1 9 : 4 t h Jo i n t I n t e r n a t i o n a l Wo r k s h o p o n Pa r a l l e l D a t a S t o r a g e & D a t a I n t e n s i v e S c a l a b l e C o m p u t i n g S y s t e m s

New Orleans, LA 70148 USA

Big Data and Scalable Computing Research Group

SLIDE 2

Louvain Method for Community Detection

❑ Detects community based on modularity optimization ❑ One of the best methods in literature ❖ Computation time and ❖ Quality of the detected communities ❑Reveals a hierarchy of communities at different scales ❑Helps understanding the global functioning

a network

Big Data and Scalable Computing Research Group

SLIDE 3

Motivation

❑Existing scalable shared memory parallel Louvain ❑Analyze the performance bottlenecks in distributed environment ❑Scope

improvements in a hybrid parallel implementation

Big Data and Scalable Computing Research Group

SLIDE 4

Speedup of our DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing)

(a) relatively small graphs (b) large graphs

Big Data and Scalable Computing Research Group

SLIDE 5

Load-balancing with graph partitioner METIS

func-1: gathering neighbour info func-2: exchanging updated community func-3: exchanging duality resolved community func-4: gathering updated communities Big Data and Scalable Computing Research Group

SLIDE 6

MPI profiling with TAU: Runtime of MPI Functions

▪65% and 69% of the processors respectively, takes less than average time ▪Load-Imbalanced MPI_Send, MPI_Recv, functions are 430.1x, 392.6x, slower than the balanced approach

Load-Imbalanced Load-balanced

Big Data and Scalable Computing Research Group

SLIDE 7

MPI profiling with TAU: MPI Communications

Load-Imbalanced Load-balanced

Big Data and Scalable Computing Research Group

SLIDE 8

Future Works

❑Profiling memory consumption

branching and cache access patterns,
time stalled waiting for resources (such as in memory reads), etc.

❑Communication time at different phases of the algorithm

identify whether communication time overweighs computation time
change algorithm design accordingly

❑Different graph-partitioning techniques for improved load-balancing and higher parallel efficiency

Hypergraph partitioning for social networks

❑Parallel Louvain with data parallel computations in GPUs

Big Data and Scalable Computing Research Group

SLIDE 9

Thank you

Big Data and Scalable Computing Research Group Contact: nsattar@uno.edu, smarifuz@uno.edu