ArrayFire Graph : Dynamic Graph Library for GPUs Kumar Aatish - - PowerPoint PPT Presentation

arrayfire graph
SMART_READER_LITE
LIVE PREVIEW

ArrayFire Graph : Dynamic Graph Library for GPUs Kumar Aatish - - PowerPoint PPT Presentation

ArrayFire Graph : Dynamic Graph Library for GPUs Kumar Aatish ArrayFire Accelerating computation HPC consulting for 10 years Maintain open source ArrayFire Library Domains Industries Computer Vision Defense Machine


slide-1
SLIDE 1

ArrayFire Graph :

Dynamic Graph Library for GPUs

Kumar Aatish

slide-2
SLIDE 2

ArrayFire

Domains

  • Computer Vision
  • Machine Learning
  • Image Processing
  • Computer Graphics

Industries

  • Defense
  • Oil and Gas
  • Finance
  • Media
  • Accelerating computation
  • HPC consulting for 10 years
  • Maintain open source ArrayFire Library
slide-3
SLIDE 3

Outline

  • Static Graphs
  • Dynamic Graphs
  • AF Graph

○ Insertion ○ Deletion ○ SSSP ○ BFS ○ Out Of Core

slide-4
SLIDE 4

What is a (Static) Graph?

  • G = (V,E). Set of Vertices and Edges
  • Edges = relationship between vertices
  • Metadata

Snapshot of relationships between entities

1 2 3

slide-5
SLIDE 5

What is a (Static) Graph?

2 6 8 1 2 3 Offsets Adjacency Lists

CSR Data Structure

1 2 3 3 1 2 1 2 3

slide-6
SLIDE 6

What is a Dynamic Graph?

  • G = (V,E). Set of Vertices and Edges
  • Edges = relationship between vertices
  • Metadata

Interactions/Relationships can change

  • Vertices
  • Edge
  • Edge Weights
  • etc.

1 2 3 4 5 6

slide-7
SLIDE 7

What is a Dynamic Graph?

2 6 8 1 2 3 Offsets Adjacency Lists

CSR Data Structure Cannot handle insertions/removal

1 2 3 3 1 2 1 2 3

slide-8
SLIDE 8

ArrayFire Graph

  • GPU Optimized Dynamic Data Structure

○ Fast Insertions, Updates, Deletions ○ Tracks weights and temporal metadata ○ Optimized for GPU cache locality

  • Minimize CPU Intervention
  • Handles out of core graph operations
  • Performant Graph Analytic Algorithms

○ BFS ○ SSSP ○ PageRank ○ and more soon

slide-9
SLIDE 9

ArrayFire Graph

adjacencyU adjacencyV v u adjacencyU adjacencyV v u

  • Conceptually similar to CSR
  • Algorithms therefore easily portable
slide-10
SLIDE 10

ArrayFire Graph

  • Memory manager to improve performance
  • Requires user defined upper limit on vertex degree
slide-11
SLIDE 11

ArrayFire Graph

  • Vertices with greater degree than limit?

○ Create zero weight edge from VB to empty vertex VE ○ Add edges to VE ○ User responsible for resolution ○ Will be done automatically in future

slide-12
SLIDE 12

Insertion

  • Consolidate Edge Inputs

○ Remove duplicates ○ Resolve weight/temporal metadata

SRC DST W T First T Last U V w1 TF1 TL1 U V w2 TF2 TL2 U V

  • p(w1, w2)

min(TF1, TF2) max(TL1, TL2)

slide-13
SLIDE 13

Insertion

  • Update

○ Metadata ○ Adjacency Lists

slide-14
SLIDE 14

Update—Adjacency List

Add edges to active vertices

  • Identify active vertices
  • Expand adjacency list size
  • Copy new edges
slide-15
SLIDE 15

Benchmark—Update Adjacency List

Graph Vertices (M) Edges(M) ldoor 0.952 45.5 soc-LiveJournal1 4.8 69 cage15 5.2 94 kron_g500-logn21 2 201 uk-2002 18.5 523

Benchmark graphs takes from [1] and [2]

slide-16
SLIDE 16

Deletion

slide-17
SLIDE 17

SSSP

Benchmarked against nvGraph[3]

  • GPU Static Library
  • NVIDIA Tesla P100 16 GB
slide-18
SLIDE 18

Benchmark—SSSP

Graph Vertices(M) Edges(M) eu-2005 0.860 32 coPapersDBLP 0.540 30.5 in-2004 1.4 27.2 ldoor 0.950 45.5 road_central 14.0 33.8 road_usa 23.9 57.7

slide-19
SLIDE 19

Benchmark—SSSP

Graph Vertices(M) Edges(M) eu-2005 0.860 32 coPapersDBLP 0.540 30.5 in-2004 1.4 27.2 ldoor 0.950 45.5 road_central 14.0 33.8 road_usa 23.9 57.7

slide-20
SLIDE 20

BFS

Graph Vertices(M) Edges(M) ldoor 0.950 45.5 soc-LiveJournal1 4.8 69 cage15 5.2 94 kron_g500-logn21 2 201 uk-2002 18.5 523

slide-21
SLIDE 21

Out of Core

  • Insertion
  • Graph Analytics Functions
  • Maximum edge count on device
  • Maintain adjacency list on host memory
slide-22
SLIDE 22

Out of Core

  • Identify active remote vertices
  • Exchange

○ Inactive local vertices ○ Active remote vertices

  • Work on current batch of local active vertices
  • Repeat
slide-23
SLIDE 23

Benchmark—Out of Core

  • Benchmarked on road_central [1] (V : 14M, E : 33.8 M)
  • Insertion and BFS
  • Nvidia Tesla K40 12 GB, Host Memory : DDR3 1333 Mhz 8GB
slide-24
SLIDE 24

Summary

  • Great dynamic graph creation performance
  • Highly performant graph algorithms
  • Can benefit from advances in static graph algorithms
  • Out of Core support

Future Plans

  • Multi-GPU implementation near completion
  • Add support for multiple edge types
  • Add user queries for filtering graph by time stamps, edge types, etc.
  • Avoid temporal or static metadata when user does not require them
slide-25
SLIDE 25

Contact Us

ArrayFire.com Speaker : Kumar Aatish (kumar@ArrayFire.com) Sales : Scott Blakeslee (sales@ArrayFire.com)

  • We are exploring use cases for

real-world applications

  • Interested in AFGraph for your

application?

  • Contact us to tailor it to your

needs

slide-26
SLIDE 26

References

[1] D. A. Bader, H. Meyerhenke, P. Sanders, and D. Wagner, Eds., Graph Partitioning and Graph Clustering. 10th DIMACS Implementation Challenge Workshop, ser. Contemporary Mathematics, no. 588, 2013. [2] Stanford Network Analysis Package, 2012 (accessed April 2012). [Online]. Available: snap.stanford.edu/data/ [3] NVIDIA, “nvGraph,” 2016. developer.nvidia.com/nvgraph