GraphReduce: Large-Scale Graph Analytics
- n Accelerator-Based HPC Systems
Dipanjan Sengupta Kapil Agarwal Karsten Schwan CERCS - Georgia Tech
Shuaiwen Leon Song Pacific Northwest National Lab
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC - - PowerPoint PPT Presentation
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems Dipanjan Sengupta Shuaiwen Leon Song Kapil Agarwal Pacific Northwest National Karsten Schwan Lab CERCS - Georgia Tech Talk Outline Motivation Background on
Shuaiwen Leon Song Pacific Northwest National Lab
frameworks are orders of magnitude faster
processing doesn’t handle datasets that doesn’t fit in memory
Yahoo-web graph with 1.4 billion
vertices requires 6.6 GB memory just to store its vertex values.
Seve veral cha hallenge nges s in n large ge- scale scale grap graph h pro rocessing cessing How to
to parti titi tion th the graph ?
How and when to
to move th the parti titi tions betw tween host t and GPU GPU ?
How to
to best t extr tract t multi ti-level parallelis parallelism in in G GPUs ? ?
U1 U2
v
U3 U4
a b c d
U1 U2
v
U3 U4
a b c d
U1 U2
v
U3 U4
a b c d
Gather Apply Scatter
Existing systems choose either vertex- or edge-centric GAS
programming model for graph execution.
Different processing phases have different types of parallelism and
memory access characteristics
GraphReduce adopts a hybrid model with a combination of both
vertex and edge centric model
vertex_scatter (vertex v) send updates over outgoing edges of v vertex gather (vertex v) apply updates from inbound edges of v while not done for all vertices v that need to scatter updates vertex_scatter (v) for all vertices v that have updates vertex_gather (v) edge_scatter (edge e) send update over e update_gather (update u) apply update u to u. destination while not done for all edges e edge_scatter (e) for all updates u update_gather (u)
Vertex-centric GAS Edge-centric GAS
Three major components
Partition Engine
Data Movement Engine
Computation Engine
Partition Engine has two responsibilities
Load balanced shard creation, such that each shard contains approximately equal number of edges
Ordering the edges in a shard based on their source or destination vertices for efficient data movement and memory access
Data Movement Engine has following responsibilities
Moving shards in and out of limited GPU memory to process large-scale graphs
Efficiently utilize GPU hardware resources using CUDA streams and Hyper-Qs to achieve high performance
Saturate the data transfer bandwidth of the PCI-E bus connecting the host
and the GPUs
!
Four phases of computation
Gather Map: fetches all the updates/
messages along the in-edges.
Gather Reduce: reduce all the collected
updates for each vertex
Apply: apply the update to each vertex Scatter: distribute the updated states of
the vertices along the out-edges
Combination of vertex and edge centric
implementation
Gather Map – edge centric Gather Reduce – vertex centric Apply – vertex centric Scatter – edge centric
GraphR hRed educ uce dev develops elops a g a graph raph proces processin ing f fram ramew ework
for input t data tasets ts th that t may or may not t fit t in GPU me memo mory
ts a combinati tion of both th edge and verte tex centr tric implementa tati tion of GAS programming model
DA str treams and hardware supports ts like hyper-Qs to to str tream data ta in and out t of GPU for high perf perform
ance ce
tperforms CPU-based out- t-of-core graph processing framework across a variety ty of real data ta sets ts