Understanding Performance Bottleneck to Improve Parallel Efficiency - - PowerPoint PPT Presentation

understanding performance bottleneck to improve parallel
SMART_READER_LITE
LIVE PREVIEW

Understanding Performance Bottleneck to Improve Parallel Efficiency - - PowerPoint PPT Presentation

P D S W- D I S C S 2 0 1 9 : 4 t h Jo i n t I n t e r n a t i o n a l Wo r k s h o p o n Pa r a l l e l D a t a S t o r a g e & D a t a I n t e n s i v e S c a l a b l e C o m p u t i n g S y s t e m s Understanding Performance


slide-1
SLIDE 1

Understanding Performance Bottleneck to Improve Parallel Efficiency of Louvain Algorithm

Naw Safrin Sattar Shaikh Arifuzzaman

P D S W- D I S C S 2 0 1 9 : 4 t h Jo i n t I n t e r n a t i o n a l Wo r k s h o p o n Pa r a l l e l D a t a S t o r a g e & D a t a I n t e n s i v e S c a l a b l e C o m p u t i n g S y s t e m s

New Orleans, LA 70148 USA

Big Data and Scalable Computing Research Group

slide-2
SLIDE 2

Louvain Method for Community Detection

❑ Detects community based on modularity optimization ❑ One of the best methods in literature ❖ Computation time and ❖ Quality of the detected communities ❑Reveals a hierarchy of communities at different scales ❑Helps understanding the global functioning

  • f

a network

Big Data and Scalable Computing Research Group

2

slide-3
SLIDE 3

Motivation

❑Existing scalable shared memory parallel Louvain ❑Analyze the performance bottlenecks in distributed environment ❑Scope

  • f

improvements in a hybrid parallel implementation

Big Data and Scalable Computing Research Group

3

slide-4
SLIDE 4

Speedup of our DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing)

(a) relatively small graphs (b) large graphs

Big Data and Scalable Computing Research Group

4

slide-5
SLIDE 5

Load-balancing with graph partitioner METIS

func-1: gathering neighbour info func-2: exchanging updated community func-3: exchanging duality resolved community func-4: gathering updated communities Big Data and Scalable Computing Research Group

5

slide-6
SLIDE 6

MPI profiling with TAU: Runtime of MPI Functions

▪65% and 69% of the processors respectively, takes less than average time ▪Load-Imbalanced MPI_Send, MPI_Recv, functions are 430.1x, 392.6x, slower than the balanced approach

Load-Imbalanced Load-balanced

Big Data and Scalable Computing Research Group

6

slide-7
SLIDE 7

MPI profiling with TAU: MPI Communications

Load-Imbalanced Load-balanced

Big Data and Scalable Computing Research Group

7

slide-8
SLIDE 8

Future Works

❑Profiling memory consumption

  • branching and cache access patterns,
  • time stalled waiting for resources (such as in memory reads), etc.

❑Communication time at different phases of the algorithm

  • identify whether communication time overweighs computation time
  • change algorithm design accordingly

❑Different graph-partitioning techniques for improved load-balancing and higher parallel efficiency

  • Hypergraph partitioning for social networks

❑Parallel Louvain with data parallel computations in GPUs

Big Data and Scalable Computing Research Group

8

slide-9
SLIDE 9

Thank you

Big Data and Scalable Computing Research Group Contact: nsattar@uno.edu, smarifuz@uno.edu