understanding performance bottleneck to improve parallel
play

Understanding Performance Bottleneck to Improve Parallel Efficiency - PowerPoint PPT Presentation

P D S W- D I S C S 2 0 1 9 : 4 t h Jo i n t I n t e r n a t i o n a l Wo r k s h o p o n Pa r a l l e l D a t a S t o r a g e & D a t a I n t e n s i v e S c a l a b l e C o m p u t i n g S y s t e m s Understanding Performance


  1. P D S W- D I S C S 2 0 1 9 : 4 t h Jo i n t I n t e r n a t i o n a l Wo r k s h o p o n Pa r a l l e l D a t a S t o r a g e & D a t a I n t e n s i v e S c a l a b l e C o m p u t i n g S y s t e m s Understanding Performance Bottleneck to Improve Parallel Efficiency of Louvain Algorithm Naw Safrin Sattar Shaikh Arifuzzaman Big Data and Scalable Computing Research Group New Orleans, LA 70148 USA

  2. Louvain Method for Community Detection ❑ Detects community based on modularity optimization ❑ One of the best methods in literature ❖ Computation time and ❖ Quality of the detected communities ❑ Reveals a hierarchy of communities at different scales ❑ Helps understanding the global functioning of a network Big Data and Scalable Computing Research Group 2

  3. Motivation ❑ Existing scalable shared memory parallel Louvain ❑ Analyze the performance bottlenecks in distributed environment ❑ Scope of improvements in a hybrid parallel implementation Big Data and Scalable Computing Research Group 3

  4. Speedup of our DPLAL (Distributed Parallel Louvain Algorithm with Load-balancing) (a) relatively small graphs (b) large graphs Big Data and Scalable Computing Research Group 4

  5. Load-balancing with graph partitioner METIS func-1: gathering neighbour info func-2: exchanging updated community func-3: exchanging duality resolved community func-4: gathering updated communities Big Data and Scalable Computing Research Group 5

  6. MPI profiling with TAU: Runtime of MPI Functions Load-Imbalanced Load-balanced ▪ 65% and 69% of the processors respectively, takes less than average time ▪ Load-Imbalanced MPI_Send, MPI_Recv, functions are 430.1x, 392.6x, slower than the balanced approach Big Data and Scalable Computing Research Group 6

  7. MPI profiling with TAU: MPI Communications Load-Imbalanced Load-balanced Big Data and Scalable Computing Research Group 7

  8. Future Works ❑ Profiling memory consumption o branching and cache access patterns, o time stalled waiting for resources (such as in memory reads), etc. ❑ Communication time at different phases of the algorithm o identify whether communication time overweighs computation time o change algorithm design accordingly ❑ Different graph-partitioning techniques for improved load-balancing and higher parallel efficiency o Hypergraph partitioning for social networks ❑ Parallel Louvain with data parallel computations in GPUs Big Data and Scalable Computing Research Group 8

  9. Thank you Contact: nsattar@uno.edu, smarifuz@uno.edu Big Data and Scalable Computing Research Group

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend