Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi - - PowerPoint PPT Presentation

out of of gpu memory ry graph processing
SMART_READER_LITE
LIVE PREVIEW

Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi - - PowerPoint PPT Presentation

Su Subway : : Min inimizing Data Transfer during Out-of of-GPU-Memory ry Graph Processing Amir Hossein Nodehi Sabet, Zhijia Zhao, Rajiv Gupta Computer Science and Engineering UC Riverside 1 Background and Motivation GPUs enable massive


slide-1
SLIDE 1

Su Subway: : Min inimizing Data Transfer during Out-of

  • f-GPU-Memory

ry Graph Processing

Amir Hossein Nodehi Sabet, Zhijia Zhao, Rajiv Gupta Computer Science and Engineering UC Riverside

1

slide-2
SLIDE 2

Background and Motivation

  • GPUs enable massive parallelism for graph processing
  • CuSha [1]
  • Gunrock [2]
  • Tigr [3]
  • Graphs can be large and tend to grow over time
  • Web graphs
  • Social networks
  • But GPU memory is limited!!
  • Out-of-GPU-Memory Graph Processing

[1] Khorasani, Farzad, et al. "CuSha: vertex-centric graph processing on GPUs.” HPDC’14 [2] Wang, Yangzihao, et al. "Gunrock: A high-performance graph processing library on the GPU.” PPoPP’16 [3] Nodehi Sabet, Amir Hossein, Junqiao Qiu, and Zhijia Zhao. "Tigr: Transforming irregular graphs for gpu-friendly graph processing.” ASPLOS’18 2

slide-3
SLIDE 3

Partition-based Graph Processing

Computation Transferring

Main Memory GPU Memory

3

slide-4
SLIDE 4

A Key Observation

Ratio of active vertices (edges) is often low in most iterations

Algo. friendster Uk-2007 SSSP 9.1% 5.1% BFS 4.1% 0.6% CC 9.8% 3.2%

Average Ratio of Active Edges across Iterations

4

slide-5
SLIDE 5

Only Load Active Edges to GPU?

Main Memory GPU Memory

5

Too expensive to generate ?!

slide-6
SLIDE 6

Efficient Subgraph Generation

Subway:

  • a concise subgraph representation, called SubCSR
  • a highly parallel algorithm for subgraph generation
  • an efficient GPU-accelerated implementation

6

slide-7
SLIDE 7

SubCSR Generation Cost

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FS UK FS UK FS UK SSSP BFS CC Relative Cost PT (Transfer) Subway-sync (SubCSR + Transfer)

3% 17%

Costs: Partitioning-based vs. Subway (subgraph generation)

7

slide-8
SLIDE 8

Takeaway

Too expensive to dynamically generate subgraphs! Improve performance up to 28X !

Subway

8

slide-9
SLIDE 9

Thank you

Amir Nodehi: anode001@ucr.edu or on Slack The source code (to be posted soon): https://github.com/AutomataLab/Subway

9