A Sparse Dynamic Graph and Matrix Data Layout Oded Green Going to - - PowerPoint PPT Presentation

a sparse dynamic graph and matrix data layout oded green
SMART_READER_LITE
LIVE PREVIEW

A Sparse Dynamic Graph and Matrix Data Layout Oded Green Going to - - PowerPoint PPT Presentation

A Sparse Dynamic Graph and Matrix Data Layout Oded Green Going to talk about 2 things Hornet A scalable and dynamic data structure for graph algorithms and linear algebra based problems Formerly known as cuSTINGER HornetsNest


slide-1
SLIDE 1

A Sparse Dynamic Graph and Matrix Data Layout Oded Green

slide-2
SLIDE 2

Going to talk about 2 things

  • Hornet

– A scalable and dynamic data structure for graph algorithms and linear algebra based problems – Formerly known as cuSTINGER

  • HornetsNest

– A framework for static and dynamic analytics

Oded Green, GTC-DC-17

2

slide-3
SLIDE 3

Hornet – Upfront Summary

  • Can support over 250 million updates per

second

  • Low overhead in comparison with CSR

– Initializing is also relatively in-expensive – usually less than 3X slower – Equal performance

  • Currently implemented for CUDA

– We are porting Hornet to the CPU

  • Really easy to use

Oded Green, GTC-DC-17

3

slide-4
SLIDE 4

Graph Primitives – Upfront summary

  • Great performance for static and dynamic

graph algorithms

  • Scalable
  • Simple to use

Oded Green, GTC-DC-17

4

slide-5
SLIDE 5

Big Data problems need Graph Analysis

Commu mmuni nicat cation ion netwo works ks:

  • World-wide connectivity
  • High velocity changes
  • Different types of extracted

data:

– Physical communication network. – Person-to-person communication network. Oded Green, GTC-DC-17

5

Health th-Care Care networ

  • rks:

ks:

  • Various players.
  • Pattern matching and

epidemic monitoring.

  • Problem sizes have

doubled in last 5 years.

Financi ncial al netwo works ks:

  • Transactions between

players.

  • Different transactions

types (property graph)

Graphs are a unifying motif for data analytics.

slide-6
SLIDE 6

STINGER

  • STINGER: Spatio-Temporal Interaction Networks

and Graphs (STING) Extensible Representation

  • Enable algorithm designers to implement dynamic

& streaming graph algorithms with ease.

  • Portable semantics for various platforms

– Linked list of edge blocks not ideal for the GPU

  • Good performance for all types of graph problems

and algorithms - static and dynamic.

  • Assumes globally addressable memory access

Oded Green, GTC-DC-17

6

slide-7
SLIDE 7

STINGER and cuSTINGER Properties

 A Simple programming model  Millions of updates per second to graph  Updates are not bottlenecks for analytics.  Advanced memory manager

 Transfers data between host and device automatically  Reduces initialization time  Allows for simple update processes STINGER Papers: [Bader et al.; 2007; Tech Report], [Ediger et al.; HPEC; 2012], [McColl et al.; PPAA; 2014] cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuS uSTIN TINGER GER: : Sup uppor

  • rti

ting ng dynamic namic graph h algorit rithm hms for GPUs

Oded Green, GTC-DC-17

7

slide-8
SLIDE 8

cuSTINGER is now HORNET

8

Oded Green, GTC-DC-17

slide-9
SLIDE 9

Definitions

  • Dynamic graphs (matrices)

– Graph can change over time. – Changes can be to topology, edges, or vertices.

  • For example new edges between two vertices.

– Changes to edge or vertex weights

  • Streaming graphs:

– Graphs changing at high rates. – 100s of thousands of updates per second.

Oded Green, GTC-DC-17

9

slide-10
SLIDE 10

Dynamic graph example

  • Only a subset of the entire

graph…

  • Dynamic:

– At time 𝑢:

  • 𝑤 and 𝑥 become friends.
  • 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑕𝑓 (𝑤, 𝑥)

– At time Ƹ 𝑢:

  • 𝑣 and 𝑤 no longer friends
  • d𝑓𝑚𝑓𝑢𝑓𝑓𝑒𝑕𝑓 𝑣,𝑤
  • Additional operations

include vertex insertions & deletions

Oded Green, GTC-DC-17

10

𝑤 𝑣 𝑥

slide-11
SLIDE 11

“Separation of powers”

  • Dynamic graph data structure and dynamic

graph algorithms are in two different repositories

– Easy to integrate with external library – Can also be used with matrices

  • First part of today’s talk will be on the

dynamic data structure

Oded Green, GTC-DC-17

11

slide-12
SLIDE 12

Part 1 – Data Structure

cu cuST STINGER INGER Ver ersi sion

  • n 2.

2.0

  • Improved initialization time

– 100x of time faster than Version 1.0

  • New memory manager

– Reduces fragmentation – Enables memory reclamation – Offers good memory bounds

  • Scalable data structure

– Can easily grow 1000X its initial size without needing to be re- initialized

  • Faster updates

Oded Green, GTC-DC-17

12

slide-13
SLIDE 13

So what else do we need?

13

Na Name mes Pr Pros Cons

  • ns

Dense Adjacency Matrix

  • Flexible
  • Limited utilization for

sparse data Linked lists

  • Flexible
  • Poor locality
  • Allocation time is costly

COO (Edge list) - unsorted

  • Has some flexibility
  • Updates are simple
  • Poor locality
  • Stores both the source

and destination CSR

  • Uses exact amount of

memory

  • Good locality
  • Inflexible
  • We need a dynamic graph data structure
  • These data structures don’t cut it

Oded Green, GTC-DC-17

slide-14
SLIDE 14

Compressed Sparse Row (CSR)

Pros:

  • Uses precise storage

requirements

  • Great locality

– Good for GPUs

  • Handful of arrays

– Simple to use and manage

Cons: ns:

  • Inflexible.
  • Network growth

unsupported

  • Topology changes

unsupported

  • Property graphs not

supported

Oded Green, GTC-DC-17

14

1 2 3 4 5 6 7 2 4 7 9 11 13 14 14

Src/Row Offset

1 2 5 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2

Dest./Col. Value

slide-15
SLIDE 15

Hornet – A High Level View

  • Supports updates

– Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion.

Oded Green, GTC-DC-17

15

1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

Over-allocated space

Dest./Col. Value 1 2 3 4 5 6 7 2 2 3 2 2 2 1 Ver ertex Id Used ed Pointer er

USER-INTERFACE

slide-16
SLIDE 16

Hornet – Property Graph Support

Oded Green, GTC-DC-17

16

1 2 3 4 5 6 7 2 2 3 2 2 2 1 Ver ertex Id Used ed Pointer er 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

USER-INTERFACE

Dest./Col. Weight Type Time 1 User 1 User 2 ….

slide-17
SLIDE 17

Hornet in Detail

17

Oded Green, GTC-DC-17

1 1 1 1 1 1 1 1 1 1 1

1 2 3 4 5 6 7 2 2 3 2 2 2 1 Vertex Id Used ed (#Neigh eighbor bors/ s/nnz) Pointer er 1 2 5 2 0 5 5 7 0 3 4 2 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

𝑪𝑩𝟏,𝟐 𝑪𝑩𝟐,𝟐 𝑪𝑩𝟐,𝟑 𝑪𝑩𝟑,𝟐

Bit status Over-allocated space for vertex insertions

USER-INTERFACE

Dest./Col. Weight

MEMORY MANAGER

bsize=1 bsize =2 bsize =2 bsize =4 Vec-Tree Over-allocated space for power-of-two rule

slide-18
SLIDE 18

Hornet Performance Analysis

  • Memory Utilization
  • Initialization Overhead
  • Update rate

– Number of sustainable updates per second

Oded Green, GTC-DC-17

18

slide-19
SLIDE 19
  • Unless noted otherwise, all performance

analysis is for the P100

Experimental Setup

GPU 𝝂Arch SMs SPs SPs Memor

  • ry

(GB) B) Memor

  • ry

Type K40 Kepler 15 2880 12 GDDR5 P100 Pascal 56 3584 16 HBM2

Oded Green, GTC-DC-17

19

slide-20
SLIDE 20

Inputs Graphs

  • DIMACS 10 Graph Implementation Challenge
  • SNAP – Stanford Network Analysis Project
  • Florida Matrix Collection

The following is only a subset of these graphs:

Oded Green, GTC-DC-17

20

Name Type |𝑾| |𝑭|* Source 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 Collaboration 299𝑙 1.95𝑁 DIMACS 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 SNAP 𝑙𝑠𝑝𝑜_21 Random 2𝑁 201𝑁 DIMACS 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 Citation 3.77𝑁 16.5𝑁 SNAP 𝑑𝑏𝑕𝑓15 Matrix 5.15𝑁 94𝑁 DIMACS 𝑣𝑙 − 2002 Webcrawl 18.52𝑁 523𝑁 DIMACS

slide-21
SLIDE 21

Memory Utilization - Overall

  • 70% average utilization of CSR
  • Better utilization in comparison to: COO,

cuSTINGER, AIMS

Oded Green, GTC-DC-17

21

0% 20% 40% 60% 80% 100%

Space Efficiency

Hornet COO cuSTINGER

slide-22
SLIDE 22

Insertion Rates

  • Supports over 250M updates per second
  • Hornet

– 4𝑌 − 10𝑌 faster than cuSTINGER – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like cuSTINGER

  • Scalable growth in update rate

Oded Green, GTC-DC-17

22

cuSTIN INGE GER Ho Horne net

1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Rate (edges per second)

in-2004 soc-LiveJournal1 cage15 kron_g500-logn21

1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Rate (edges per second)

in-2004 soc-LiveJournal1 cage15 kron_g500-logn21

103 104 105 106 107 108 109 103 104 105 106 107 108 109

slide-23
SLIDE 23

Part 2: HornetsNest

  • Algorithm framework for Hornet data

structure

– We support CSR as well

  • All algorithms are implemented using a small

set of operations

– We show that these operators are efficient for static graph algorithms and can be used for dynamic graph algorithms

  • Uses features from C++11 and C++14

Oded Green, GTC-DC-17

23

slide-24
SLIDE 24

Algorithmic Graph Primitives

  • All algorithms are implemented through this

API

  • Simple primitives

– 𝐺𝑝𝑠𝐵𝑚𝑚𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡𝐽𝑜𝐻 𝐻, 𝑔 𝑤 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢 – 𝐺𝑝𝑠𝐵𝑚𝑚𝐹𝑒𝑕𝑓𝑡𝐽𝑜𝐻 𝐻, 𝑔 𝑡𝑠𝑑 ∈ 𝑊, 𝑒𝑓𝑡𝑢 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢 – 𝐽𝑜𝑡𝑓𝑠𝑢𝐹𝑒𝑕𝑓𝑡 𝐻, 𝐹𝑜𝑓𝑥 – 𝑆𝑓𝑛𝑝𝑤𝑓𝐹𝑒𝑕𝑓𝑡 𝐻, 𝐹𝑠𝑓𝑛 – 𝐽𝑜𝑡𝑓𝑠𝑢𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡 𝐻, 𝑊

𝑜𝑓𝑥

– 𝑆𝑓𝑛𝑝𝑤𝑓𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡 𝐻, 𝑊

𝑠𝑓𝑛

24

Oded Green, GTC-DC-17

slide-25
SLIDE 25

Algorithmic Graph Primitives

  • 𝐺𝑝𝑠𝐵𝑚𝑚𝑊𝑓𝑠𝑢𝑗𝑑𝑓𝑡𝐽𝑜𝐵𝑠𝑠𝑏𝑧 𝐻, 𝐵, 𝑔 𝑤 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢
  • 𝐺𝑝𝑠𝐵𝑚𝑚𝐹𝑒𝑕𝑓𝑡𝐽𝑜𝐵𝑠𝑠𝑏𝑧 𝐻, 𝐵𝑊, 𝑔 𝑡𝑠𝑑 ∈ 𝑊, 𝑒𝑓𝑡𝑢 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢

– Array of vertices that will traverse all neighbors – Breadth first search and betweenness centrality

  • 𝐺𝑝𝑠𝐵𝑚𝑚𝐹𝑒𝑕𝑓𝑡𝐽𝑜𝐵𝑠𝑠𝑏𝑧 𝐻, 𝐵𝐹, 𝑔 𝑡𝑠𝑑 ∈ 𝑊, 𝑒𝑓𝑡𝑢 ∈ 𝑊 , 𝑡𝑢𝑠𝑣𝑑𝑢

– Array of explicit edge pairs – Great for processing edges batches

25

Oded Green, GTC-DC-17

slide-26
SLIDE 26

Performance Analysis

  • Sparse Vector Matrix Multiplication
  • Breadth First Search
  • Triangle Counting

Oded Green, GTC-DC-17

26

slide-27
SLIDE 27

Sparse Matrix Vector Multiplication

  • In comparison to DCSR [King et al; 2016; ISC]

– DCSR requires customized SpMV

  • Hornet uses identical algorithm code as CSR.

27

Oded Green, GTC-DC-17

1 10 100

Speedup versus DCSR

CSR Hornet

slide-28
SLIDE 28

Actual BFS Code

  • Hardware agnostic
  • This code actually runs on the GPU

28

Oded Green, GTC-DC-17

slide-29
SLIDE 29

Breadth First Search

  • Using a similar algorithm in Gunrock

– Gunrock has additional optimizations that can make it faster than cuSTINGER – “Apples to Apples” comparison

29

Oded Green, GTC-DC-17

1,067 398 2,048 2,259 1,551 547 55,667 4,631 4,724 74 5,673 80,875 10,003 4,529 5,889 2.4 1.7 5.5 3.5 2.0 1.6 2.9 1.2 1.1 1.7 1.4 3.9 0.9 1.4 1.3 0.1 1.0 10.0

Speedup

CSR Hornet Gunrock

slide-30
SLIDE 30

Triangle Counting: CSR Vs. Hornet

  • Triangle counting algorithm taken from [Green et al;

𝐽𝐵3;2014]

  • Simply replace CSR accesses with Hornet
  • Executed on a K40

Oded Green, GTC-DC-17

30

Name |𝑾| |𝑭| Time me-CSR CSR (sec.) Time me-cuS uSTIN INGER ER (sec.) Executi ution n Differenc erence 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 299𝑙 1.95𝑁 0.218 0.242 +10% 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 1.69𝑁 11.1𝑁 57.14 59.37 +3.8% 𝑙𝑠𝑝𝑜_21 2𝑁 201𝑁 2992 2996 +0.14% 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 3.77𝑁 16.5𝑁 0.814 0.830 +2% 𝑑𝑏𝑕𝑓15 5.15𝑁 94𝑁 6.544 7.204 +10% 𝑣𝑙 − 2002 18.52𝑁 523𝑁 424.9 431.4 +1.6%

slide-31
SLIDE 31

Library Overview

Completed algorithms and on-going Of course many more algorithms to come…

Oded Green, GTC-DC-17

31

Algorit ithm hm Stati tic Dynami mic Referen erence ce

Breadth first search

  • n-going

Triangle Counting

 

Static - [Green et al; IA32014] Dynamic - [Makkar; HiPC 2017] Connect components

  • n-going

[McColl; HiPC 2013] Betweenness Centrality

  • n-going

[Green; SocialCom 2012] Page Rank

  • n-going

New algorithm (non linear algebra formulation) Katz Centrality

New algorithm (non linear algebra formulation) KTruss

[Green; HPEC 2017] – HPEC Graph Challenge Innovation Award

slide-32
SLIDE 32

Take away

  • Dynamic data structure for sparse data sets
  • Supports high update rates
  • Simple and high-level programming model

– Utilizes graph primitives

  • Scalable in both data size and in

performance

Oded Green, GTC-DC-17

32

slide-33
SLIDE 33

Hornet Team (Past & Current)

Oded Green, GTC-DC-17

33

slide-34
SLIDE 34

Thank you

Oded Green, GTC-DC-17

34

  • Email: ogreen@gatech.edu
  • Hornet:

– https://github.com/hornet-gt/hornet

  • HornetsNest:

– https://github.com/hornet-gt/hornetsnest

slide-35
SLIDE 35

Backup slides

35

Oded Green, GTC-DC-17