Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and - - PowerPoint PPT Presentation

hornet an efficient data structure for dynamic sparse
SMART_READER_LITE
LIVE PREVIEW

Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and - - PowerPoint PPT Presentation

Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices Oded Green Hornet A scalable and dynamic data structure for Sparse data Graph algorithms Linear algebra based problems Formerly known as cuSTINGER


slide-1
SLIDE 1

Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices

Oded Green

slide-2
SLIDE 2

Hornet

  • A scalable and dynamic data structure for

– Sparse data – Graph algorithms – Linear algebra based problems

  • Formerly known as cuSTINGER

– Hornet initialization is hundreds of times faster – Hornet updates are 4X-10X faster – The Hornet data structure offers is more robust and scalable than cuSTINGER.

  • Essentially a dynamic CSR data structure
  • Easy to use

Oded Green, GTC-18

2

slide-3
SLIDE 3

“Separation of powers”

  • Dynamic graph data structure and dynamic

graph algorithms are in two different repositories

– Easy to integrate with external library – Can also be used with matrices

  • This talk focuses on the data structure

Oded Green, GTC-18

3

slide-4
SLIDE 4

Graph Primitives – Upfront summary

  • Great performance for static and dynamic

graph algorithms

  • Scalable
  • Simple to use
  • Will discuss algorithm framework later today

– 1:00pm – Same room as this talk

Oded Green, GTC-18

4

slide-5
SLIDE 5

Hornet – Upfront Summary

  • Can support over 150 million updates per second
  • Can easily scale to graphs with billions of vertices
  • CSR comparison

– Initializing is also relatively in-expensive – usually less than 3X slower – Hornet requires 30% more storage – Identical performance

  • COO (edge-list) comparison

– Hornet requires 20% less storage – Hornet has better locality

Oded Green, GTC-18

5

slide-6
SLIDE 6

Big Data problems need Graph Analysis

Commu mmuni nicat cation ion netwo works ks:

  • World-wide connectivity
  • High velocity changes
  • Different types of extracted

data:

– Physical communication network. – Person-to-person communication network. Oded Green, GTC-18

6

He Health th-Care Care networks:

  • rks:
  • Various players.
  • Pattern matching and

epidemic monitoring.

  • Problem sizes have

doubled in last 5 years.

Financi ncial al netwo works ks:

  • Transactions between

players.

  • Different transactions

types (property graph)

slide-7
SLIDE 7

Hornet Properties

✓ A Simple programming model

✓ Enable algorithm designers to implement dynamic & streaming graph algorithms with ease.

✓ Can easily grows 1000X initial size (no restart needed) ✓ Millions of updates per second to graph ✓ Updates are not bottlenecks for analytics. ✓ Automated data management

✓ Transfers data between host and device automatically ✓ Reduces fragmentation ✓ Supports memory reclamation

  • Scalable data structure

cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTINGE INGER: Supporti porting ng dynami mic graph h algorithms hms for GPUs

Oded Green, GTC-18

7

slide-8
SLIDE 8

Definitions

  • Dynamic graphs

– Graph can change over time. – Changes can be to topology, edges, or vertices.

  • For example new edges between two vertices.

– Changes to edge or vertex weights

  • Streaming graphs:

– Graphs changing at high rates. – 100s of thousands of updates per second.

  • Dynamic matrices

– Adding a perturbation to the matrix

Oded Green, GTC-18

8

slide-9
SLIDE 9

Dynamic graph example

  • Only a subset of the entire

graph…

  • Dynamic:

– At time 𝑢:

  • 𝑤 and 𝑥 become friends.
  • 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑕𝑓 (𝑤, 𝑥)

– At time Ƹ 𝑢:

  • 𝑣 and 𝑤 no longer friends
  • d𝑓𝑚𝑓𝑢𝑓𝑓𝑒𝑕𝑓 𝑣,𝑤
  • Additional operations

include vertex insertions & deletions

Oded Green, GTC-18

9

𝑤 𝑣 𝑥

slide-10
SLIDE 10

Widely used graph data structures

10

Na Name mes Pr Pros Cons

  • ns

Dense Adjacency Matrix

  • Supports updates
  • Poor locality
  • Massive storage

requirements Linked lists

  • Flexible
  • Poor locality
  • Limited parallelism
  • Allocation time is costly

COO (Edge list) - unsorted

  • Has some flexibility
  • Updates are simple
  • Lots of parallelism
  • Poor locality
  • Stores both the source and

destination CSR

  • Uses exact amount of

memory

  • Good locality
  • Lots of parallelism
  • Inflexible

These data structures don’t cut it

Oded Green, GTC-18

slide-11
SLIDE 11

Compressed Sparse Row (CSR)

Pros:

  • Uses precise storage

requirements

  • Great locality

– Good for GPUs

  • Handful of arrays

– Simple to use and manage

Cons: ns:

  • Inflexible.
  • Network growth

unsupported

  • Topology changes

unsupported

  • Property graphs not

supported

Oded Green, GTC-18

11

1 2 3 4 5 6 7 2 4 7 9 11 13 14 14

Src/Row Offset

1 2 5 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2

Dest./Col. Value

slide-12
SLIDE 12

Hornet – A High Level View

  • Every vertex points at its own array
  • Many edges array (blocks)
  • Block size is determined by the number of neighbors (always powers of 2)
  • Extra space left at the end of the block

Oded Green, GTC-18

12

1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

Over-allocated space

Dest./Col. Value 1 2 3 4 5 6 7 2 2 3 2 2 2 1 Ver ertex Id Used ed Pointer er

USER-INTERFACE

slide-13
SLIDE 13

Hornet – Property Graph Support

Oded Green, GTC-18

13

1 2 3 4 5 6 7 2 2 3 2 2 2 1 Ver ertex Id Used ed Pointer er 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

USER-INTERFACE

Dest./Col. Weight Type Time 1 User 1 User 2 ….

  • Programmers can add fields per edge
  • Easy to mange for static graph data structures
  • Hornet manages the data movement
slide-14
SLIDE 14

Hornet in Detail

14

Oded Green, GTC-18

1 1 1 1 1 1 1 1 1 1 1

1 2 3 4 5 6 7 2 2 3 2 2 2 1 Vertex Id Used ed (#Neigh eighbor bors/ s/nnz) Pointer er 1 2 5 2 0 5 5 7 0 3 4 2 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

𝑪𝑩𝟏,𝟐 𝑪𝑩𝟐,𝟐 𝑪𝑩𝟐,𝟑 𝑪𝑩𝟑,𝟐

Bit status Over-allocated space for vertex insertions

USER-INTERFACE

Dest./Col. Weight

MEMORY MANAGER

bsize=1 bsize =2 bsize =2 bsize =4 Vec-Tree Over-allocated space for power-of-two rule

slide-15
SLIDE 15

Hornet Performance

  • Memory Utilization

– Independent of the GPU being used

  • Initialization overhead
  • Update rate

15

Oded Green, GTC-18

slide-16
SLIDE 16

Hornet Performance Analysis

  • All performance analysis is for the P100

– 56 SMs – 3584 SPs – 16GB HBM2 memory

Oded Green, GTC-18

16

slide-17
SLIDE 17

Inputs Graphs

  • DIMACS 10 Graph Implementation Challenge
  • SNAP – Stanford Network Analysis Project
  • Florida Matrix Collection

The following is only a subset of these graphs:

Oded Green, GTC-18

17

Name Type |𝑾| |𝑭|* Source 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 Collaboration 299𝑙 1.95𝑁 DIMACS 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 SNAP 𝑙𝑠𝑝𝑜_21 Random 2𝑁 201𝑁 DIMACS 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 Citation 3.77𝑁 16.5𝑁 SNAP 𝑑𝑏𝑕𝑓15 Matrix 5.15𝑁 94𝑁 DIMACS 𝑣𝑙 − 2002 Webcrawl 18.52𝑁 523𝑁 DIMACS

slide-18
SLIDE 18

Memory Utilization - Overall

  • BlockArrays of size 216
  • 70% average utilization of CSR
  • Better utilization then: COO, cuSTINGER, AIM

– AIM allocates all GPU memory

Oded Green, GTC-18

18

0% 20% 40% 60% 80% 100%

Space Efficiency

Hornet COO cuSTINGER AIM 216

slide-19
SLIDE 19

Initialization overhead

  • Time to initialize data structure in comparison to CSR
  • In most cases 2X-3X slower

– One time penalty

  • Much faster than cuSTINGER

Oded Green, GTC-18

19

1 10 100 1,000

Slowdown versus CSR

Hornet cuSTINGER

slide-20
SLIDE 20

Insertion Rates

  • Supports over 150M updates per second
  • Hornet

– 4𝑌 − 10𝑌 faster than cuSTINGER – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like cuSTINGER

  • Scalable growth in update rate

Oded Green, GTC-18

20

cuSTIN INGE GER Ho Horne net

1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Rate (edges per second)

in-2004 soc-LiveJournal1 cage15 kron_g500-logn21

1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Rate (edges per second)

in-2004 soc-LiveJournal1 cage15 kron_g500-logn21

103 104 105 106 107 108 109 103 104 105 106 107 108 109

slide-21
SLIDE 21

Take away

  • Anything you can do with CSR you can also do with

Hornet (other way is not true)

  • Supports high update rates
  • Scalable in both data size and in performance
  • Simple and high-level programming model

– See you at 1:00pm

  • Also, look for James Fox’s talk on a cool algorithm

for finding the maximal K-Truss in a graph

– Uses dynamic triangle counting and the Hornet’s deletion…

Oded Green, GTC-18

21

slide-22
SLIDE 22

Hornet Team (Current & Alumni)

Oded Green, GTC-18

22

slide-23
SLIDE 23

Thank you

Oded Green, GTC-18

23

  • Email: ogreen@gatech.edu
  • Hornet:

– https://github.com/hornet-gt/hornet

  • HornetsNest:

– https://github.com/hornet-gt/hornetsnest

slide-24
SLIDE 24

Backup slides

24

Oded Green, GTC-18

slide-25
SLIDE 25

Memory Utilization - Overall

  • 70% average utilization of CSR
  • Better utilization in comparison to: COO,

cuSTINGER, AIMS

Oded Green, GTC-18

25

0% 20% 40% 60% 80% 100%

Space Efficiency

Hornet Hornet Hornet COO cuSTINGER AIM 216 222 218

slide-26
SLIDE 26

Part 2: HornetsNest

  • Algorithm framework for Hornet data

structure

– We support CSR as well

  • All algorithms are implemented using a small

set of operations

– We show that these operators are efficient for static graph algorithms and can be used for dynamic graph algorithms

  • Uses features from C++11 and C++14

Oded Green, GTC-18

26

slide-27
SLIDE 27

Sparse Matrix Vector Multiplication

  • In comparison to DCSR [King et al; 2016; ISC]

– DCSR requires customized SpMV

  • Hornet uses identical algorithm code as CSR.

27

Oded Green, GTC-18

1 10 100

Speedup versus DCSR

CSR Hornet