GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing - - PowerPoint PPT Presentation

gravf gravf a vertex centric a vertex centric graph
SMART_READER_LITE
LIVE PREVIEW

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing - - PowerPoint PPT Presentation

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework Framework on FPGA on FPGA Nina Engelhardt August 31, 2016 Graphs and Graph Traversal Algorithms 1 Vertex-centric Programming Model: From POV of


slide-1
SLIDE 1

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework Framework

  • n FPGA
  • n FPGA

Nina Engelhardt

August 31, 2016

slide-2
SLIDE 2

Graphs and Graph Traversal Algorithms

1

Vertex-centric Programming Model: From POV of an individual vertex Receive messages from neighbors, calculate, send messages to neighbors Global barrier after each iteration

2 / 9

slide-3
SLIDE 3

Synchronous Vertex-centric Programming Model

Advantages: naturally distributed

  • nly very short kernel to write

Challenges: barrier means stragglers can unduly delay whole computation have to store all messages from one superstep to the next → Floating barrier [1]: PE broadcasts barrier message when it finishes superstep, allowing overlapping of supersteps

3 / 9

slide-4
SLIDE 4

Architecture modifications

PE PE ... PE PE PE ... PE ... PE PE ... PE ... ... ...

Apply Kernel Apply Scatter Vertex Storage Edge Storage Scatter Kernel update queue

PE

Split Vertex Kernel in 2 phases: Apply and Scatter Everything is pipelined: free to move registers

4 / 9

slide-5
SLIDE 5

Barrier Synchronization

PE0

Scatter update queue Apply Arbiter from PE 0 from PE 1 ... from PE n-1 barrier message for superstep i message for superstep i+1

Apply ahead of Scatter: messages processed immediately update queue only 2|V | entries instead of |V |2

5 / 9

slide-6
SLIDE 6

Results

200 400 600 800 1000 1200 1400 1600 1 2 4 8 16 x1000 Cycles Number of PEs Weak Scaling PR BFS 5 10 15 20 25 12 4 8 16 32 Edges traversed per cycle Number of PEs Strong Scaling PR BFS

3-3.5 GTEPS: comparable to hand-implemented solutions

6 / 9

slide-7
SLIDE 7

Future Work

Extend communication to multiple FPGA boards Include data transfers: direct SSD access from FPGA

7 / 9

slide-8
SLIDE 8

Thank you for listening

8 / 9

slide-9
SLIDE 9

Works Cited I

1Betweenness Centrality; By Claudio Rocchini, CC BY 2.5,

https://commons.wikimedia.org/w/index.php?curid=1988980

Qingbo Wang, Weirong Jiang, Yinglong Xia, and V. Prasanna. A message-passing multi-softcore architecture on FPGA for breadth-first search. In Field-Programmable Technology (FPT), 2010 International Conference on, pages 70–77, Dec 2010.

9 / 9