cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green - - PowerPoint PPT Presentation

custinger supporting dynamic graph algorithms for gpus
SMART_READER_LITE
LIVE PREVIEW

cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green - - PowerPoint PPT Presentation

cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs Oded Green & David Bader What we will see today The first dynamic graph data structure for the GPU. Scalable in size Supports the same functionality is its CPU


slide-1
SLIDE 1

cuSTINGER - Supporting Dynamic Graph Algorithms for GPUs

Oded Green & David Bader

slide-2
SLIDE 2

What we will see today

  • The first dynamic graph data structure for

the GPU.

– Scalable in size – Supports the same functionality is its CPU counterpart

  • Supports extremely fast update rates.
  • Good performance for static graph

algorithms.

Oded Green, HPEC'16

2

slide-3
SLIDE 3

Big Data problems need Graph Analysis

Communication networks:

  • World-wide connectivity
  • High velocity changes
  • Different types of extracted

data:

– Physical communication network. – Person-to-person communication network.

Oded Green, HPEC'16

3

Health-Care networks:

  • Various players.
  • Pattern matching and

epidemic monitoring.

  • Problem sizes have

doubled in last 5 years.

Financial networks:

  • Transactions between

players.

  • Different transactions

types (property graph)

Graphs are a unifying motif for data analytics.

More importantly are dynamic and streaming graphs!

slide-4
SLIDE 4

Definitions

  • STINGER: Spatio-Temporal Interaction

Networks and Graphs (STING) Extensible Representation

  • Dynamic graphs

– Graph can change over time. – Changes can be to topology, edges, or vertices.

  • For example new edges between two vertices.
  • Streaming graphs:

– Graphs changing at high rates. – 100s of thousands of updates per second.

Oded Green, HPEC'16

4

slide-5
SLIDE 5

Streaming graph example

  • Only a subset of the

entire graph…

  • Dynamic/Streaming:

– At time :

  • and become friends.
  • _ ,

– At time ̂:

  • and no longer friends
  • d_ ,

Oded Green, HPEC'16

5

slide-6
SLIDE 6

STING Extensible Representation

  • Semi-dense edge

list blocks with free space

  • Supports property

graphs (vertex & edge type, vertex & edge weights, time-stamps, and more).

  • Maps from

application IDs to storage IDs

Oded Green, HPEC'16

6

slide-7
SLIDE 7

STINGER

  • Enable algorithm designers to implement

dynamic & streaming graph algorithms with ease.

  • Portable semantics for various platforms

– Linked list of edge blocks not ideal for the GPU

  • Good performance for all types of graph

problems and algorithms - static and dynamic.

  • Assumes globally addressable memory access

Oded Green, HPEC'16

7

slide-8
SLIDE 8

STINGER and cuSTINGER Properties

A Simple programming model Millions of updates per second to graph Updates are not bottlenecks for analytics. Hundreds of thousands of updates per second for numerous analytics.W Advanced memory manager

Transfers data between host and device automatically Reduces initialization time Allows for simple update processes

Main Papers: [Bader et al.; 2007; Tech Report] [Ediger et al.; HPEC; 2012], [McColl et al.; PPAA; 2014]

Oded Green, HPEC'16

8

slide-9
SLIDE 9

Lots of great graph libraries CPU-based

  • Galois
  • Ligra
  • LLAMA
  • STINGER

– DISTINGER

Oded Green, HPEC'16

9

GPU-based

  • Gunrock
  • GasCL
  • BelRed
  • BlazeGraph

Most of these target STATIC graphs and use CSR

slide-10
SLIDE 10

Compressed Sparse Row (CSR)

Pros:

  • Uses precise storage

requirements

  • Great locality

– Good for GPUs

  • Handful of arrays

– Simple to use and manage

Cons:

  • Inflexible.
  • Network growth

unsupported

  • Topology changes

unsupported

  • Property graphs not

supported

Oded Green, HPEC'16

10 Vertex Weight: Offset:

1 2 3 #

Destination: Edge Weight:

$

Legend: Optional Field Mandatory Field

slide-11
SLIDE 11

cuSTINGER – Data Structure

  • Great locality

– STINGER uses an Array of Structures (AOS) – cuSTINGER uses a Structure of Arrays (SOA)

  • Each vertex has its
  • wn adjacency list
  • Can compact data

similar to CSR.

Oded Green, HPEC'16

11

Legend: Optional Field Mandatory Field

1 2 3 V

Destination: Used Used: Allocated: Pointer:

slide-12
SLIDE 12

cuSTINGER – Supports Growth

  • Great locality
  • Supports updates

– Supports edge insertion and deletion – Supports vertex insertion and deletion

Oded Green, HPEC'16

12

Legend: Optional Field Mandatory Field

Used: Allocated: Pointer:

1 2 3 V # &

Destination: Allocated Used

slide-13
SLIDE 13

cuSTINGER – Allocation modes

  • Great locality
  • Supports updates

– Supports edge insertion and deletion – Supports vertex insertion and deletion

  • Supports multiple

allocation modes

– Runtime configurable

Oded Green, HPEC'16

13

Legend: Optional Field Mandatory Field

Used: Allocated: Pointer:

1 2 3 V # &

Destination: Allocated Used Destination: Allocated Used

Option 1: Option 2:

slide-14
SLIDE 14

Used: Allocated: Vertex Weight: Vertex Type: Pointer:

1 2 3 V

Destination: Edge Weight: Edge Type: Time Stamp 1: Time Stamp 2:

cuSTINGER – Supports Properties

  • Great locality
  • Supports updates

– Supports edge insertion and deletion – Supports vertex insertion and deletion

  • Supports multiple

allocation modes

  • Supports STINGER

properties

Oded Green, HPEC'16

14

Legend: Optional Field Mandatory Field

Allocated Used

slide-15
SLIDE 15

Edge Insertions

  • Given an edge update, =

()*, +,(- :

– Check that edge doesn’t already exist – Check for available space – Increment “used” and append to end – Adjacency list is not sorted

  • Updates are done in batches

– Better utilization – Requires identifying two identical edges in a batch.

Oded Green, HPEC'16

15

()*

Destination: Edge Weight: Edge Type: Time Stamp 1: Time Stamp 2:

Legend: Optional Field Mandatory Field

Allocated Used

slide-16
SLIDE 16

Edge Insertions – Out of Memory

  • Given an edge update, =

()*, +,(-

  • Adjacency list is full
  • Allocate new list
  • Copy old list into new list
  • Append to end

Oded Green, HPEC'16

16 Destination: Edge Weight: Edge Type: Time Stamp 1: Time Stamp 2:

Legend: Optional Field Mandatory Field

Allocated Used

()*

Destination: Edge Weight: Edge Type: Time Stamp 1: Time Stamp 2: Allocated Used

slide-17
SLIDE 17

Experiment Setup

  • NVIDIA K40 GPU

– Kepler micro-architecture – 15 SMs, total of 2880 SPs – 12GB of RAM

  • Intel i7-4770K

– Haswell micro-architecture – Quad core – 8MB L3 cache – 32GB of RAM

Oded Green, HPEC'16

17

slide-18
SLIDE 18

Inputs Graphs

  • DIMACS 10 Graph Implementation

Challenge

  • SNAP – Stanford Network Analysis Project

Oded Green, HPEC'16

18

Name Type |/| |0| Source 123425678 Collaboration 299: 1.95= DIMACS > − : Trace route 1.69= 11.1= SNAP :2_21 Random 2= 201= DIMACS 1 − A> Citation 3.77= 16.5= SNAP 1>15 Matrix 5.15= 94= DIMACS : − 2002 Webcrawl 18.52= 523= DIMACS

slide-19
SLIDE 19

Experiment metrics

  • Initialization time

– Preferably as small as possible

  • Update rate

– Number of updates per second that cuSTINGER can sustain

  • Static graph support

– We compare a clustering-coefficient implementation using CSR with a CUSTINGER implementation

Oded Green, HPEC'16

19

slide-20
SLIDE 20

Initialization Time

  • Time correlated with number of vertices

Oded Green, HPEC'16

20

slide-21
SLIDE 21

Update rate – Small Batches

  • Updating a single edge

at a time:

– 15F updates per second – Same rate for insertions and deletions

  • For small batches

– Upto 1000 edges per batch – Millions of updates per second

Oded Green, HPEC'16

21

slide-22
SLIDE 22

Insertion rate – Large Batches

  • Increase chance of

vertex not having enough storage available.

  • Some structures are

copied back from device to host

– Overhead is big for mid- size batches. – Overhead “>AA>” for larger batches.

Oded Green, HPEC'16

22

slide-23
SLIDE 23

Deletion rate – Large Batches

  • Performance is

consistent for all graphs and unique batches.

  • No memory allocation or

de-allocation are required.

– Unlike for the insertions case.

  • Currently, memory

reclamation is not supported.

Oded Green, HPEC'16

23

slide-24
SLIDE 24

Triangle Counting – Static Graph

  • Algorithm taken from [Green et al; I3J;2014]
  • Simply replace CSR accesses with cuSTINGER
  • Execution times are similar

Oded Green, HPEC'16

24

Name |/| |0| Time-CSR (sec.) Time- cuSTINGER (sec.) Execution Difference 123425678 299: 1.95= 0.218 0.242 +10% > − : 1.69= 11.1= 57.14 59.37 +3.8% :2_21 2= 201= 2992 2996 +0.14% 1 − A> 3.77= 16.5= 0.814 0.830 +2% 1>15 5.15= 94= 6.544 7.204 +10% : − 2002 18.52= 523= 424.9 431.4 +1.6%

slide-25
SLIDE 25

Summary

  • cuSTINGER supports high update rates
  • Memory manager

– Responsible for allocating and transferring data on/from device – Reduces initialization time – Programmers can focus on algorithms instead

  • f complex data management
  • Great performance for both dynamic and

static graph algorithms

Oded Green, HPEC'16

25

slide-26
SLIDE 26

Acknowledgments

  • Devavret Makkar, Graduate Student (Georgia Tech)

Oded Green, HPEC'16

26

slide-27
SLIDE 27

Acknowledgment of Support

Oded Green, HPEC'16

27

slide-28
SLIDE 28

Thank you

Oded Green, HPEC'16

28

  • Email: ogreen@gatech.edu
  • STINGER:

–Documentation: http://stingergraph.com/

  • cuSTINGER

–Coming soon…

slide-29
SLIDE 29

Backup Slides

Oded Green, HPEC'16

29

slide-30
SLIDE 30

Array of Structures Vs. Structure of Arrays

Oded Green, HPEC'16

30

STINGER (AOS) cuSTINGER (SOA)

Used: Allocated: Vertex Weight: Vertex Type: Pointer:

1 2 3 V

90° degrees