Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems - - PowerPoint PPT Presentation

efficient large scale graph processing on hybrid cpu and
SMART_READER_LITE
LIVE PREVIEW

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems - - PowerPoint PPT Presentation

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems Abdullah Gharaibeh, Elizeu Santos-Neto, Lauro Costa and Matei Ripeanu Reviewer: Varun Gandhi (vg292) Computer Laboratory CPU-GPU Hybrid Systems One of the fastest desktop CPU


slide-1
SLIDE 1

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems

Abdullah Gharaibeh, Elizeu Santos-Neto, Lauro Costa and Matei Ripeanu

Reviewer: Varun Gandhi (vg292)

Computer Laboratory

slide-2
SLIDE 2

2

CPU-GPU Hybrid Systems + One of the fastest desktop CPU & GPU 2048 CUDA cores 8 cores

slide-3
SLIDE 3

3

Conventional Applications

slide-4
SLIDE 4

4

New Dimension

Single node graph computation

slide-5
SLIDE 5

5

Real-world graph characteristics

Single node bottlenecks

  • High memory foot print
  • Heterogenous degree
  • Cost of partitioning

Key Idea

  • Load balancing across GPU & CPU
  • Algorithm agnostic
  • Different than GraphCHI1
slide-6
SLIDE 6

6

Hybrid Model

  • Two processing units
  • Communication rate: edges per

second

  • Majority of edges remain at CPU
  • Random partitioning
slide-7
SLIDE 7

7

Simulation Results Predicted gains based on simulated model

slide-8
SLIDE 8

8

TOTEM

  • Implemented in both C & CUDA
  • Adopts BSP model
  • Computation phase
  • Communication phase
  • Termination
slide-9
SLIDE 9

9

Trade-off: Graph Representation

  • Compressed Sparse rows
  • Low memory footprint
  • Expensive updates
slide-10
SLIDE 10

10

Trade off: Communication Overhead

  • Mutable graph structures expensive
  • GPU cannot be leveraged
  • Outbox values copied to Inbox
  • Aggregate at source
  • Transfer based on user-provided callback
slide-11
SLIDE 11

11

Graph Partitioning

  • High degree — GPU
  • Low degree — CPU
  • Leverages low communication overhead
  • Fails to maintain boundary edge threshold
slide-12
SLIDE 12

12

Synthetic Workload

slide-13
SLIDE 13

13

Evaluation

slide-14
SLIDE 14

14

Conclusions

  • CSR representation not ideal
  • Dependent on GPU memory
  • Keniograph is a possibility
  • New paradigm in graph computing