planar parallel lightweight architecture aware adaptive
play

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph - PowerPoint PPT Presentation

Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng , Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1 Graph Partitioning Applications of Graph Partitioning Scientific


  1. Planar: Parallel Lightweight Architecture-Aware Adaptive Graph Repartitioning Angen Zheng , Alexandros Labrinidis, and Panos K. Chrysanthis University of Pittsburgh 1

  2. Graph Partitioning  Applications of Graph Partitioning  Scientific Simulations  Distributed Graph Computation o Pregel, Hama, Giraph  VLSI Design  Task Scheduling  Linear Programming 2

  3. A Balanced Partitioning = Even Load Distribution N2 N1 N3 Balanced: 3

  4. Minimal Edge-Cut = Minimal Data Comm N2 N1 N3 Minimizing Edge-Cut: 4

  5. Minimal Edge-Cut = Minimal Data Comm But Minimal Data Comm≠ Minimal Comm Cost STD DEV: STD DEV: STD DEV: 269 . 71Mb/s 416.82Mb/s 358.34Mb/s Figure 1. Pair-Wise Network Bandwidth (J. Xue , BigData’15 ) Group neighboring vertices as close as possible The partitioner has to be Architecture-Aware 5

  6. Overview of the State-of-the-Art Balanced Graph (Re)Partitioning Partitioners Repartitioners (static graphs) (dynamic graphs) Offline Methods Offline Methods Online Methods Online Methods ( High Quality) (High Quality) (Moderate Quality) (Moderate~High Quality) (Poor Scalability ) (Poor Scalability) (High Scalability) (High Scalability) Architecture-Aware Architecture-Aware 6

  7. Roadmap Conclusions Evaluation Planar Introduction 7

  8. Planar: Problem Statement Given G=(V, E) and an initial Partitioning P: Balancing Load: Minimizing Communication: Network Cost Minimizing Migration: 8

  9. Planar: Overview S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan ★ Still beneficial? Phase-3: Convergence Check 9

  10. Phase-1a: Minimizing Comm Cost N3 1 N1 N2 N3 N1 6 1 6 1 N2 6 1 N3 1 1 6 N1 N2 10

  11. Phase-1a: Minimizing Comm Cost N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 6 1 ■ Probabilistic vertex migration 6 N1 N2 11

  12. Phase-1a: Minimizing Comm Cost N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 6 1 ■ Probabilistic vertex migration 6 N1 N2 12

  13. Phase-1a@N1: Use vertex a as an example N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 1 6 ■ Probabilistic vertex migration 6 N2 N1 g (a, N1, N1) = 0 Max Gain: 0 Optimal Dest: N1 13

  14. Phase-1a@N1: Move vertex a to N2? N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 1 6 ■ Probabilistic vertex migration 6 N2 N1 old_comm(a, N1) = 2 * 6 + 1 * 1 = 13 new_comm(a, N2) = 1 * 6 + 1 * 1 = 7 N3 1 mig(a, N1, N2) = 1 * 6 = 6 1 g (a, N1, N2) = 13 - 7 - 6 = 0 6 N2 Max Gain: 0 N1 Optimal Dest: N1 14

  15. Phase-1a@N1: Move vertex a to N3? N3 ★ Run Planar on each partition in Parallel 1 ○ Each boundary vertex of my partition ■ make a migration decision on my own 1 6 ■ Probabilistic vertex migration 6 N2 N1 old_comm(a, N1) = 2 * 6 + 1 * 1 = 13 new_comm(a, N3) = 1 * 1 + 2 * 1 = 3 N3 1 mig(a, N1, N3) = 1 * 1 = 1 1 1 g (a, N1, N3) = 13 - 3 - 1 = 9 1 N2 Max Gain: 9 N1 Optimal Dest: N3 15

  16. Phase-1a: Probabilistic Vertex Migration Migration Planning Partition N1 N2 N3 N3 1 Boundary Vtx a b d e g Migration Dest N3 N3 N3 N3 N3 1 6 Gain 9 2 3 0 0 Max Gain 9 3 0 6 N1 N2 Probability 9/9 2/3 3/3 0 0 Migrate with a probability proportional to the gain 16

  17. Phase-1b: Balancing Partitions  Quota-Based Vertex Migration Q1: How much work should each overloaded partition migrate to each underloaded partition? ■ Potential Gain Computation ● Similar to Phase-1a vertex gain computation ■ Iteratively allocate quota starting from the partition pair having the largest gain. Q2: What vertices to migrate? ■ Phase-1a vertex migration, but limited by the quota . 17

  18. Planar: Physical Vertex Migration S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan ★ Still beneficial? Phase-3: Convergence Check 18

  19. Planar: Convergence Check S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Migration Planning Phase-1: Logical Vertex Migration ○ What vertices to move? ○ Phase-1a: Minimizing Comm Cost ○ Where to move? ○ Phase-1b: Ensuring Balanced Partitions Phase-2: Physical Vertex Migration ★ Perform the Migration Plan ★ Still beneficial? Phase-3: Convergence Check 19

  20. Phase-3: Convergence Repartitioning Epoch Enough changes Converge (structure/load) S k S k+1 S k+2 S k+4 S k+5 Planar Planar Planar Planar Planar ★ Converge ○ improvement achieved per adaptation superstep < 𝜀 ○ after 𝜐 consecutive adaptation supersteps 𝜀 = 1% and 𝜐 = 10 (via Sensitivity Analysis) 20

  21. Evaluation  Microbenchmarks  Convergence Study (Param Selection)  Partitioning Quality  Real-World Workloads  Breadth First Search (BFS)  Single Source Shortest Path (SSSP)  Scalability Test  Scalability vs Graph Size  Scalability vs # of Partitions  Scalability vs Graph Size and # of Partitions 21

  22. Partitioning Quality: Setup Dataset 12 datasets from various areas # of Parts 40 (two 20-core machines) HP : Hashing Partitioning Initial Partitioners DG : Deterministic Greedy LDG : Linear Deterministic Greedy 22

  23. Partitioning Quality: Datasets Dataset |V| |E| Description wave 156.317 2,118,662 auto 448,695 6,629,222 FEM 333SP 3,712,815 22,217,266 CA-CondMat 108,300 373, 756 Collaboration Network DBLP 317,080 1,049,866 Email-Eron 36,692 183,831 as-skitter 1,696,415 22,190,596 Internet Topology Amazon 334,863 925,872 Product Network USA-roadNet 23,947,347 58,333,344 Road Network roadNet-PA 1,090,919 6,167,592 YouTube 3,223,589 24,447,548 Com-LiveJournal 4,036,537 69,362,378 Social Network Friendster 124,836,180 3,612,134,270

  24. Partitioning Quality: Planar achieved up to 68% improvement Improv. Max Avg. HP 68% 53% DG 46% 24% LDG 69% 48% 24

  25. Evaluation  Microbenchmarks  Convergence Study (Param Selection)  Partitioning Quality  Real-World Workloads  Breadth First Search (BFS)  Single Source Shortest Path (SSSP)  Scalability Test  Scalability vs Graph Size  Scalability vs # of Partitions  Scalability vs Graph Size and # of Partitions 25

  26. Real-World Workload: Setup PittMPICluster Gordon Cluster Configuration ( FDR Infiniband ) (QDR Infiniband) # of Nodes 32 1024 Single Switch 4*4*4 3D Torus of Switches Network Topology (32 nodes / switch) (16 nodes / switch) Network Bandwidth 56Gbps 8Gbps PittMPICluster Gordon Node Configuration (Intel Haswell) (Intel Sandy Bridge) 2 2 # of Sockets (10 cores / socket) (8 cores / socket) L3 Cache 25MB 20MB Memory Bandwidth 65GB/s 85GB/s 26

  27. Planar: Avoiding Resource Contention on the Memory Subsystems of Multicore Machines System Bottleneck (A. Zheng EDBT’16) PittMPICluster Gordon Memory (λ=1 ) Network (λ=0) Degree of Contention Intra-Node Network Maximal Inter-Node Network Comm Cost Comm Cost 27

  28. Real-World Workload: Baselines Balanced Graph (Re)Partitioning Partitioners Repartitioners (static graphs) (dynamic graphs) Offline Methods Offline Methods Online Methods Online Methods ( High Quality) (High Quality) (Moderate Quality) (Moderate~High Quality) (Poor Scalability ) (Poor Scalability) (High Scalability) (High Scalability) uniPlanar 28 Initial Partitioner: DG

  29. BFS Exec. Time on PittMPICluster ( λ=1 ): Planar achieved up to 9x speedups ★ as-skitter: |V|=1.6M, |E| = 22M ★ 60 Partitions: three 20-core machines 9x 7.5x 5.8x 4.1x 1.48x 1.37x 1x 29

  30. BFS Comm Volume on PittMPICluster (λ=1 ): Planar had the lowest intra-node comm volume ★ as-skitter: |V|=1.6M, |E| = 22M ★ 60 Partitions: three 20-core machines Reduction Intra-Socket Inter-Socket DG 51% 38% METIS 51% 36% PARMETIS 47% 34% uniPLANAR 44% 28% ARAGON 4.3% 0.8% PARAGON 5.2% 2.6% 30

  31. BFS Exec. Time on Gordon ( λ=0 ): Planar achieved up to 3.2x speedups ★ as-skitter: |V|=1.6M, |E| = 22M ★ 48 Partitions: three 16-core machines 3.2x 1.16x 1.21x 1x 1.05x 31

  32. BFS Comm. Volume on Gordon ( λ=0 ): Planar had the lowest inter-node comm volume ★ as-skitter: |V|=1.6M, |E| = 22M ★ 48 Partitions: three 16-core machines 51% 11% 0.1% 25% 32

  33. Conclusions  PLANAR  Architecture-Aware Adaptive Graph Acknowledgments : Repartitioner  Peyman Givi • Communication Heterogeneity  Patrick Pisciuneri  Mark Silvis • Shared Resource Contention  Up to 9x speedups on real-world Funding :  NSF OIA-1028162 workloads.  NSF CBET-1250171  Scaled up to a graph with 3.6B edges. 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend