Data Partitioning Strategies for Graph Workloads on Heterogeneous - - PowerPoint PPT Presentation
Data Partitioning Strategies for Graph Workloads on Heterogeneous - - PowerPoint PPT Presentation
SC 2015 Data Partitioning Strategies for Graph Workloads on Heterogeneous Clusters Michael LeBeane, Shuang Song, Reena Panda, Jee Ho Ryoo, Lizy K. John The University of Texas at Austin mlebeane@utexas.edu SC 2015 Motivation Data Data
SC 2015
▪ Heterogeneity is pervasive in modern data centers [][] ▪ Graph analytics are a pervasive workload in the data center []
– Many frameworks available to efficiently and easily perform graph analytics [][][][]
▪ Most frameworks are not equipped to deal with heterogeneity in the data center
Motivation
2 Michael LeBeane 11/18/2015
Network Compute Node Compute Node Compute Node Compute Node Compute Node Compute Node
Data Data Data Data Data Data
SC 2015
▪ Online vs. Offline Partitioning
Background
3 Michael LeBeane 11/18/2015
1 2 1 1 2 2 1 2 1 2 1 1
▪ All work performed on PowerGraph[] framework ▪ Three relevant graph partitioning topics:
– Online vs. Offline Partitioning – Vertex vs. Edge Cut – Gather/Apply/Scatter
SC 2015
▪ Vertex vs. Edge Cut
Background
4 Michael LeBeane 11/18/2015
Machine X Machine Y (a) Vertex Cut (b) Edge Cut
Master Ghost
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
(a) Gather (b) Apply (c) Scatter
F(x) 1 2 3 4 1’ 2’ 3’ 4’ 5 5’
▪ Gather/Apply/Scatter
SC 2015
▪ Skewed Data Partitioning
Workload Skew in Heterogeneous Data Centers
5 Michael LeBeane 11/18/2015
Time
Compute Communication Fast Node Data Data
Barrier
Slow Node Compute Communication Compute Communication Compute Communication
Runtime Improvement
Communication Compute Communication Fast Node Data Data Slow Node
Barrier
Idle Compute Communication Compute Communication
Time
Idle Compute
▪ Normal Data Partitioning
SC 2015
▪ Local node computation time dependent on data distribution ▪ To properly balance work, we need:
– Estimation of each node’s computational capacity – Partitioning algorithms that account for skewed computational capacity
Heterogeneous Graph Analytics
6 Michael LeBeane 11/18/2015
File 1 File 2 File N Loading Files Partitioning Graph Finalizing Graph App Execution Data Data Data Data
Baseline Partitioner
Heterogeneity Aware Partitioner
Computation Capacity
1 2
Graph
Node 1 Node 2 Node 3 Node n
SC 2015
▪ Computation capacity is complex ▪ Dependent on many factors:
– Hardware of the node – Nature of the graph – Nature of the algorithm – Communication patterns
▪ Can we determine a simple, static estimate?
Heterogeneous Computation Capacity
7 Michael LeBeane 11/18/2015
File 1 File 2 File N Loading Files Partitioning Graph Finalizing Graph App Execution Data Data Data Data
Baseline Partitioner
Heterogeneity Aware Partitioner Computation Capacity 1 2
Graph
Node 1 Node 2 Node 3 Node n
SC 2015
Skew Factor Calculation
8 Michael LeBeane 11/18/2015
▪ Static estimate of node computational capacity could be based on:
– Threads: Logical compute threads on node (default N – 2 ) – Memory: Physical memory assigned to a node – Profiling: Local throughput of graph subset and algorithm
▪ We will refer to the estimated ratios of computation capacity as the skew factor of the heterogeneous data center
Name HW Threads Memory Network c4.xlarge 4 7.5 GB 100 Mbps to 1.86 Gbps c4.2xlarge 8 15 GB 100 Mbps to 1.86 Gbps c4.4xlarge 16 30 GB 100 Mbps to 1.86 Gbps c4.8xlarge 36 60 GB up to 8.86 Gbps Thread Skew Factor Memory Skew Factor 1 1 3 2 7 4 17 8
SC 2015
▪ Online partitioning algorithms must be modified to support skew factor ▪ Easy to modify current online partitioning algorithms ▪ We have modified 5 popular algorithms from multiple sources
Heterogeneous Partitioning Algorithm
9 Michael LeBeane 11/18/2015
File 1 File 2 File N Loading Files Partitioning Graph Finalizing Graph App Execution Data Data Data Data
Baseline Partitioner
Heterogeneity Aware Partitioner Computation Capacity 1 2
Graph
Node 1 Node 2 Node 3 Node n
SC 2015
Problem Formulation
10 Michael LeBeane 11/18/2015
▪ Statically estimated based on:
– Threads: Logical compute threads on node (default N – 2 ) – Memory: Physical memory assigned to a node – Profiling: Local throughput of graph subset and algorithm
▪ Statically estimated based on:
SC 2015
Random Skewed Partitioner
11
▪ Original ▪ Skewed
Node 0 Node 1 Node n ….
Random Assignment
….
Random Assignment
Node 0 Node 1 Node n
Skew Factor
Edge Edge
▪ Random assignment of edges to nodes
SC 2015
Greedy Skewed Partitioner
12 Michael LeBeane 11/18/2015
▪ Original ▪ Skewed
Node 0 Node 1 Node n ….
Heuristic Assignment
….
Heuristic Assignment
Node 0 Node 1 Node n
Skew Factor
Edge Edge Balance Balance
▪ Greedy decision using current distribution of edges
– Either locally or coordinated
SC 2015
Grid Skewed Partitioner
13 Michael LeBeane 11/18/2015
▪ Original ▪ Skewed
Node 0 Node 1 Node n ….
Grid Hash
Edge
▪ Greedy decision using current distribution of edges
– Either locally or coordinated
Random Selection
Grid
Node 0 Node 1 Node n ….
Grid Hash
Edge
Random Selection
Grid
Skew Factor
SC 2015
Hybrid Skewed Partitioner
14 Michael LeBeane 11/18/2015
Node 0 Node 1 Node n
….
Random Assignment
Edge Vertex
Node 0 Node 1 Node n
….
Heuristic Assignment Degree > Threshold
Vertex
Node 0 Node 1 Node n
….
Random Assignment
Edge Vertex
Node 0 Node 1
….
Heuristic Assignment Degree > Threshold
Vertex
Node n
▪ Skewed
Skew Factor Skew Factor
▪ Random assignment of edges/verticies to nodes based on degree ▪ Original
SC 2015
Ginger Skewed Partitioner
15 Michael LeBeane 11/18/2015 15 Michael LeBeane 11/18/2015
▪ Random assignment of edges/verticies to nodes based on degree
Node 0 Node 1 Node n
….
Random Assignment
Edge Vertex
Node 0 Node 1 Node n
….
Heuristic Assignment Degree > Threshold
Vertex
Node 0 Node 1 Node n
….
Random Assignment
Edge Vertex
Node 0 Node 1
….
Heuristic Assignment Degree > Threshold
Vertex
Node n
▪ Skewed
Skew Factor Skew Factor
▪ Original
Balance Balance
SC 2015
Experimental Setup
16 Michael LeBeane 11/18/2015
▪ Algorithms
– Graph: PageRank (PR), Connected Components (CC), Triangle Count (TC) – Matrix: Stochastic Gradient Descent (SGD), Alternating Least Squares (ALS)
▪ Data Sets
Name Vertices Edges Size (Uncompressed) Type Algorithms
amazon 403,394 3,384,388 46MB Directed Graph PR,CC,TC citation 3,774,768,NA 16,518,948 268MB Directed Graph PR,CC,TC netflix NA NA 100MB Sparse Matrix ALS,SGD road-map 1,379,917 1,921,660 84MB Undirected Graph PR,CC,TC social-network 4,847,571 68,993,773 1.1GB Directed Graph PR,CC,TC twitter 41,000,000 1,400,000,000 25GB Directed Graph PR,CC,TC wiki 2,394,385 5,021,410 64MB Directed Graph PR,CC,TC
SC 2015
Experimental Setup
17 Michael LeBeane 11/18/2015
▪ Data Center
– Graph: PageRank (PR), Connected Components (CC), Triangle Count (TC) – Matrix: Stochastic Gradient Descent (SGD), Alternating Least Squares (ALS)
▪ Skew Factor
– Results use Thread Based Skew Factor
SC 2015
Execution Time
18 Michael LeBeane 11/18/2015
▪ Pagerank
10 20 30 40 50 60 Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger social_network amazon citation road_map wiki Runtime (s) Transmit Receive Gather Apply Scatter
Skewed Baseline
SC 2015
Execution Time
19 Michael LeBeane 11/18/2015
▪ Connected Components
20 40 60 80 100 120 140 160 2 4 6 8 10 12 14 16 Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger social_network amazon citation road_map wiki Runtime (s) Runtime (s) Receive Gather Apply Scatter Transmit
Skewed Baseline
Right Axis
SC 2015
Execution Time
20 Michael LeBeane 11/18/2015
▪ Triangle Count
10 20 30 40 50 60 1 2 3 4 5 6 Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger social_network amazon citation road_map wiki Runtime (s) Runtime (s) Receive Gather Apply Scatter Transmit
Skewed Baseline
Right Axis
SC 2015
Execution Time
21 Michael LeBeane 11/18/2015
▪ Stochastic Gradient Descent
2 4 6 8 10 12 14 16 18 Random Greedy Grid Hybrid Ginger netflix Runtime (s) TX RX G A S
Skewed Baseline
10 20 30 40 50 60 70 80 90 100 Random Greedy Grid Hybrid Ginger netflix Runtime (s) TX RX G A S
Skewed Baseline
▪ Alternating Least Squares
SC 2015
Data distribution
22 Michael LeBeane 11/18/2015
▪ Ideal distribution 17-7-3-1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SRandom SGreedy SGrid SHybrid SGinger SRandom SGreedy SGrid SHybrid SGinger SRandom SGreedy SGrid SHybrid SGinger SRandom SGreedy SGrid SHybrid SGinger SRandom SGreedy SGrid SHybrid SGinger Non-Skew Target-Skew social_network amazon citation road_map wiki
- ptimal
Relative Edge Distribution Node (1) Node (3) Node (7) Node (17)
SC 2015
Results
23 Michael LeBeane 11/18/2015
▪ Skewed approach generally decreases network communication
0.5 1 1.5 2 2.5 3 3.5 4 Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger social_network amazon citation road_map wiki Replication Factor
Skewed Baseline
SC 2015
Results
24 Michael LeBeane 11/18/2015
▪ Data Ingress Time
5 10 15 20 25 30 35 40 Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger Random Greedy Grid Hybrid Ginger social_network amazon citation road_map wiki Ingress Time (s)
Skewed Baseline
58
SC 2015
Scale-out Results
25 Michael LeBeane 11/18/2015
Configuration Name C4.2xlarge C4.4xlarge C4.8xlarge Config 1 12 8 4 Config 2 8 8 8 Config 3 4 8 12 Config 4 3 5 16 5 10 15 20 25 30 35 Config 1 Config 2 Config 3 Config 4 Percentage Improvement Cluster Size Random Greedy Grid Hybrid Ginger 20 40 60 80 100 120 140 160 180 200 10 20 30 40 50 60 Runtime (s) Cluster Size Random SRandom Greedy SGreedy Grid SGrid Hybrid SHybrid Ginger SGinger
▪ Extremely large Twitter graph ▪ No benefits after 36 nodes
SC 2015
Future Work
26 Michael LeBeane 11/18/2015
▪ Incorporate better network model ▪ Profile based partitioning scheme
– How do we sample graph inputs?
SC 2015
Conclusion
27 Michael LeBeane 11/18/2015
▪ Simple, static throughput estimation can greatly improve performance ▪ We modify 5 existing on-line graph partitioning strategies for heterogeneous environments ▪ Our modified algorithms improve runtime by as much as 64% and
- n average 32% on Amazon EC2
▪ We show that our strategies also work up to 48 nodes, achieving 18% performance improvement on scale-out
SC 2015
28 Michael LeBeane 11/18/2015
Thank You!
SC 2015
References
Michael LeBeane 11/18/2015 29
[1] S. Garg, S. Sundaram, and H. D. Patel. Robust heterogeneous data center design: A principled approach. SIGMETRICS Perform. Eval. Rev., 39(3):28–30, Dec. 2011. [2] B.-G. Chun, G. Iannaccone, G. Iannaccone, R. Katz, G. Lee, and L. Niccolini. An energy case for hybrid datacenters. SIGOPS Oper. Syst. Rev., 4(1):76–80, Mar. 2010. [1] J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17–30. USENIX Association, 2012.