Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and - - PowerPoint PPT Presentation
Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and - - PowerPoint PPT Presentation
Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices Oded Green Hornet A scalable and dynamic data structure for Sparse data Graph algorithms Linear algebra based problems Formerly known as cuSTINGER
Hornet
- A scalable and dynamic data structure for
– Sparse data – Graph algorithms – Linear algebra based problems
- Formerly known as cuSTINGER
– Hornet initialization is hundreds of times faster – Hornet updates are 4X-10X faster – The Hornet data structure offers is more robust and scalable than cuSTINGER.
- Essentially a dynamic CSR data structure
- Easy to use
Oded Green, GTC-18
2
“Separation of powers”
- Dynamic graph data structure and dynamic
graph algorithms are in two different repositories
– Easy to integrate with external library – Can also be used with matrices
- This talk focuses on the data structure
Oded Green, GTC-18
3
Graph Primitives – Upfront summary
- Great performance for static and dynamic
graph algorithms
- Scalable
- Simple to use
- Will discuss algorithm framework later today
– 1:00pm – Same room as this talk
Oded Green, GTC-18
4
Hornet – Upfront Summary
- Can support over 150 million updates per second
- Can easily scale to graphs with billions of vertices
- CSR comparison
– Initializing is also relatively in-expensive – usually less than 3X slower – Hornet requires 30% more storage – Identical performance
- COO (edge-list) comparison
– Hornet requires 20% less storage – Hornet has better locality
Oded Green, GTC-18
5
Big Data problems need Graph Analysis
Commu mmuni nicat cation ion netwo works ks:
- World-wide connectivity
- High velocity changes
- Different types of extracted
data:
– Physical communication network. – Person-to-person communication network. Oded Green, GTC-18
6
He Health th-Care Care networks:
- rks:
- Various players.
- Pattern matching and
epidemic monitoring.
- Problem sizes have
doubled in last 5 years.
Financi ncial al netwo works ks:
- Transactions between
players.
- Different transactions
types (property graph)
Hornet Properties
✓ A Simple programming model
✓ Enable algorithm designers to implement dynamic & streaming graph algorithms with ease.
✓ Can easily grows 1000X initial size (no restart needed) ✓ Millions of updates per second to graph ✓ Updates are not bottlenecks for analytics. ✓ Automated data management
✓ Transfers data between host and device automatically ✓ Reduces fragmentation ✓ Supports memory reclamation
- Scalable data structure
cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTINGE INGER: Supporti porting ng dynami mic graph h algorithms hms for GPUs
Oded Green, GTC-18
7
Definitions
- Dynamic graphs
– Graph can change over time. – Changes can be to topology, edges, or vertices.
- For example new edges between two vertices.
– Changes to edge or vertex weights
- Streaming graphs:
– Graphs changing at high rates. – 100s of thousands of updates per second.
- Dynamic matrices
– Adding a perturbation to the matrix
Oded Green, GTC-18
8
Dynamic graph example
- Only a subset of the entire
graph…
- Dynamic:
– At time 𝑢:
- 𝑤 and 𝑥 become friends.
- 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑓 (𝑤, 𝑥)
– At time Ƹ 𝑢:
- 𝑣 and 𝑤 no longer friends
- d𝑓𝑚𝑓𝑢𝑓𝑓𝑒𝑓 𝑣,𝑤
- Additional operations
include vertex insertions & deletions
Oded Green, GTC-18
9
𝑤 𝑣 𝑥
Widely used graph data structures
10
Na Name mes Pr Pros Cons
- ns
Dense Adjacency Matrix
- Supports updates
- Poor locality
- Massive storage
requirements Linked lists
- Flexible
- Poor locality
- Limited parallelism
- Allocation time is costly
COO (Edge list) - unsorted
- Has some flexibility
- Updates are simple
- Lots of parallelism
- Poor locality
- Stores both the source and
destination CSR
- Uses exact amount of
memory
- Good locality
- Lots of parallelism
- Inflexible
These data structures don’t cut it
Oded Green, GTC-18
Compressed Sparse Row (CSR)
Pros:
- Uses precise storage
requirements
- Great locality
– Good for GPUs
- Handful of arrays
– Simple to use and manage
Cons: ns:
- Inflexible.
- Network growth
unsupported
- Topology changes
unsupported
- Property graphs not
supported
Oded Green, GTC-18
11
1 2 3 4 5 6 7 2 4 7 9 11 13 14 14
Src/Row Offset
1 2 5 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2
Dest./Col. Value
Hornet – A High Level View
- Every vertex points at its own array
- Many edges array (blocks)
- Block size is determined by the number of neighbors (always powers of 2)
- Extra space left at the end of the block
Oded Green, GTC-18
12
1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2
Over-allocated space
Dest./Col. Value 1 2 3 4 5 6 7 2 2 3 2 2 2 1 Ver ertex Id Used ed Pointer er
USER-INTERFACE
Hornet – Property Graph Support
Oded Green, GTC-18
13
1 2 3 4 5 6 7 2 2 3 2 2 2 1 Ver ertex Id Used ed Pointer er 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2
USER-INTERFACE
Dest./Col. Weight Type Time 1 User 1 User 2 ….
- Programmers can add fields per edge
- Easy to mange for static graph data structures
- Hornet manages the data movement
Hornet in Detail
14
Oded Green, GTC-18
1 1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 2 2 3 2 2 2 1 Vertex Id Used ed (#Neigh eighbor bors/ s/nnz) Pointer er 1 2 5 2 0 5 5 7 0 3 4 2 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2
𝑪𝑩𝟏,𝟐 𝑪𝑩𝟐,𝟐 𝑪𝑩𝟐,𝟑 𝑪𝑩𝟑,𝟐
Bit status Over-allocated space for vertex insertions
USER-INTERFACE
Dest./Col. Weight
MEMORY MANAGER
bsize=1 bsize =2 bsize =2 bsize =4 Vec-Tree Over-allocated space for power-of-two rule
Hornet Performance
- Memory Utilization
– Independent of the GPU being used
- Initialization overhead
- Update rate
15
Oded Green, GTC-18
Hornet Performance Analysis
- All performance analysis is for the P100
– 56 SMs – 3584 SPs – 16GB HBM2 memory
Oded Green, GTC-18
16
Inputs Graphs
- DIMACS 10 Graph Implementation Challenge
- SNAP – Stanford Network Analysis Project
- Florida Matrix Collection
The following is only a subset of these graphs:
Oded Green, GTC-18
17
Name Type |𝑾| |𝑭|* Source 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 Collaboration 299𝑙 1.95𝑁 DIMACS 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 SNAP 𝑙𝑠𝑝𝑜_21 Random 2𝑁 201𝑁 DIMACS 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 Citation 3.77𝑁 16.5𝑁 SNAP 𝑑𝑏𝑓15 Matrix 5.15𝑁 94𝑁 DIMACS 𝑣𝑙 − 2002 Webcrawl 18.52𝑁 523𝑁 DIMACS
Memory Utilization - Overall
- BlockArrays of size 216
- 70% average utilization of CSR
- Better utilization then: COO, cuSTINGER, AIM
– AIM allocates all GPU memory
Oded Green, GTC-18
18
0% 20% 40% 60% 80% 100%
Space Efficiency
Hornet COO cuSTINGER AIM 216
Initialization overhead
- Time to initialize data structure in comparison to CSR
- In most cases 2X-3X slower
– One time penalty
- Much faster than cuSTINGER
Oded Green, GTC-18
19
1 10 100 1,000
Slowdown versus CSR
Hornet cuSTINGER
Insertion Rates
- Supports over 150M updates per second
- Hornet
– 4𝑌 − 10𝑌 faster than cuSTINGER – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like cuSTINGER
- Scalable growth in update rate
Oded Green, GTC-18
20
cuSTIN INGE GER Ho Horne net
1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000Update Rate (edges per second)
in-2004 soc-LiveJournal1 cage15 kron_g500-logn21
1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000Update Rate (edges per second)
in-2004 soc-LiveJournal1 cage15 kron_g500-logn21
103 104 105 106 107 108 109 103 104 105 106 107 108 109
Take away
- Anything you can do with CSR you can also do with
Hornet (other way is not true)
- Supports high update rates
- Scalable in both data size and in performance
- Simple and high-level programming model
– See you at 1:00pm
- Also, look for James Fox’s talk on a cool algorithm
for finding the maximal K-Truss in a graph
– Uses dynamic triangle counting and the Hornet’s deletion…
Oded Green, GTC-18
21
Hornet Team (Current & Alumni)
Oded Green, GTC-18
22
Thank you
Oded Green, GTC-18
23
- Email: ogreen@gatech.edu
- Hornet:
– https://github.com/hornet-gt/hornet
- HornetsNest:
– https://github.com/hornet-gt/hornetsnest
Backup slides
24
Oded Green, GTC-18
Memory Utilization - Overall
- 70% average utilization of CSR
- Better utilization in comparison to: COO,
cuSTINGER, AIMS
Oded Green, GTC-18
25
0% 20% 40% 60% 80% 100%
Space Efficiency
Hornet Hornet Hornet COO cuSTINGER AIM 216 222 218
Part 2: HornetsNest
- Algorithm framework for Hornet data
structure
– We support CSR as well
- All algorithms are implemented using a small
set of operations
– We show that these operators are efficient for static graph algorithms and can be used for dynamic graph algorithms
- Uses features from C++11 and C++14
Oded Green, GTC-18
26
Sparse Matrix Vector Multiplication
- In comparison to DCSR [King et al; 2016; ISC]
– DCSR requires customized SpMV
- Hornet uses identical algorithm code as CSR.
27
Oded Green, GTC-18
1 10 100
Speedup versus DCSR
CSR Hornet