Oded Green Going to talk about 2 things A scalable and dynamic data - - PowerPoint PPT Presentation
Oded Green Going to talk about 2 things A scalable and dynamic data - - PowerPoint PPT Presentation
cuSTINGER A Sparse Dynamic Graph and Matrix Data Structure Oded Green Going to talk about 2 things A scalable and dynamic data structure for graph algorithms and linear algebra based problems A framework for static and dynamic
Going to talk about 2 things
- A scalable and dynamic data structure for
graph algorithms and linear algebra based problems
- A framework for static and dynamic analytics
NVIDIA GTC cuSTINGER Oded Green, 2017
2
Upfront Summary of Results
- Can support upto 90 million updates per second
- Low overhead in comparison with CSR
– Initializing is also relatively inexpensive 20%200% – Equal performance
- Great performance for static graph algorithms
- Simple to use
NVIDIA GTC cuSTINGER Oded Green, 2017
3
Big Data problems need Graph Analysis
Communication networks:
- Worldwide connectivity
- High velocity changes
- Different types of extracted
data:
– Physical communication network. – Persontoperson communication network. NVIDIA GTC cuSTINGER Oded Green, 2017
4
HealthCare networks:
- Various players.
- Pattern matching and
epidemic monitoring.
- Problem sizes have
doubled in last 5 years.
Financial networks:
- Transactions between
players.
- Different transactions
types (property graph)
Graphs are a unifying motif for data analytics.
STINGER
- STINGER: SpatioTemporal Interaction Networks and
Graphs (STING) Extensible Representation
- Enable algorithm designers to implement dynamic &
streaming graph algorithms with ease.
- Portable semantics for various platforms
– Linked list of edge blocks not ideal for the GPU
- Good performance for all types of graph problems
and algorithms static and dynamic.
- Assumes globally addressable memory access
NVIDIA GTC cuSTINGER Oded Green, 2017
5
STINGER and cuSTINGER Properties
✓ A Simple programming model ✓ Millions of updates per second to graph ✓ Updates are not bottlenecks for analytics. ✓ Advanced memory manager
✓ Transfers data between host and device automatically ✓ Reduces initialization time ✓ Allows for simple update processes STINGER Papers: [Bader et al.; 2007; Tech Report], [Ediger et al.; HPEC; 2012], [McColl et al.; PPAA; 2014] cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTINGER: Supporting dynamic graph algorithms for GPUs
NVIDIA GTC cuSTINGER Oded Green, 2017
6
Definitions
- Dynamic graphs (matrices)
– Graph can change over time. – Changes can be to topology, edges, or vertices.
- For example new edges between two vertices.
– Changes to edge or vertex weights
- Streaming graphs:
– Graphs changing at high rates. – 100s of thousands of updates per second.
NVIDIA GTC cuSTINGER Oded Green, 2017
7
Dynamic graph example
- Only a subset of the entire
graph…
- Dynamic:
– At time 𝑢:
- 𝑤 and 𝑥 become friends.
- 𝑗𝑜𝑡𝑓𝑠𝑢_𝑓𝑒𝑓 (𝑤, 𝑥)
– At time Ƹ 𝑢:
- 𝑣 and 𝑤 no longer friends
- d𝑓𝑚𝑓𝑢𝑓𝑓𝑒𝑓 𝑣,𝑤
- Additional operations include
vertex insertions & deletions
NVIDIA GTC cuSTINGER Oded Green, 2017
8
𝑤 𝑣 𝑥
“Separation of powers”
- Dynamic graph data structure and dynamic
graph algorithms are in two different repositories
– Easy to integrate with external library – Can also be used with matrices
- First part of today’s talk will be on the dynamic
data structure
NVIDIA GTC cuSTINGER Oded Green, 2017
9
Part 1 – Data Structure
cuSTINGER Version 2.0
- Improved initialization time
– 100s of time faster than Version 1.0
- New memory manager
– Reduces fragmentation – Enables memory reclamation – Offers good memory bounds
- Scalable data structure
– Can easily grow 1000X its initial size without needing to be reinitialized
- Faster updates
Coming soon…(probably late May)
NVIDIA GTC cuSTINGER Oded Green, 2017
10
Restrictions of existing static graph representations
NVIDIA GTC cuSTINGER Oded Green, 2017
11
Name Pros Cons
Adjacency Matrix
- Flexible
- Limited utilization for
sparse data Linked lists
- Flexible
- Poor locality
- Allocation time is costly
COO (Edge list) unsorted
- Has some flexibility
- Updates are simple
- Poor locality
- Stores both the source
and destination CSR
- Uses exact amount of
memory
- Good locality
- Inflexible
Compressed Sparse Row (CSR)
Pros:
- Uses precise storage
requirements
- Great locality
– Good for GPUs
- Handful of arrays
– Simple to use and manage
Cons:
- Inflexible.
- Network growth
unsupported
- Topology changes
unsupported
- Property graphs not
supported
NVIDIA GTC cuSTINGER Oded Green, 2017
12
1 2 3 4 5 6 7 2 4 7 9 11 13 14 14
Src/Row Offset
1 2 5 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2
Dest./Col. Value
Part 1: cuSTINGER – A High Level View
- Supports updates
– Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion.
NVIDIA GTC cuSTINGER Oded Green, 2017
13
1 2 3 4 5 6 7 2 2 3 2 2 2 1 2 2 4 2 2 2 1 Vertex Id Used BSize Pointer 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2
Overallocated space USERINTERFACE
Dest./Col. Value
cuSTINGER – Property Graph Support
NVIDIA GTC cuSTINGER Oded Green, 2017
14
1 2 3 4 5 6 7 2 2 3 2 2 2 1 2 2 4 2 2 2 1 Vertex Id Used BSize Pointer 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2
USERINTERFACE
Dest./Col. Weigth Type Time 1 User 1 User 2 ….
- These are optional fields
Challenges
- Memory allocations are costly.
- Seems like we are suggesting that we need
𝑃(𝑊) allocations
– Absolutely not. – Our first implementation did this… ouch…
NVIDIA GTC cuSTINGER Oded Green, 2017
15
Memory Manager
Made up three parts:
- 1. Vectorized Bit Trees
- 2. BlockArrays
- 3. 𝐶+𝑈𝑠𝑓𝑓𝑡 of BlockArrays
- THIS IS AN INTERNAL REPRESENTATION (HIDDEN FROM
USERS)
NVIDIA GTC cuSTINGER Oded Green, 2017
16
- Definition – an array made up of equal size
blocks.
- Each block can contain an equal number of
edges
BlockArrays
NVIDIA GTC cuSTINGER Oded Green, 2017
17
1 2 2 5 0 5 2 7 2 6 1 2 2 5 4 1
BlockArray (with 4 blocks) Block (with 2 edges)
cuSTINGER – BlockArray allocations
NVIDIA GTC cuSTINGER Oded Green, 2017
18
1 2 3 4 5 6 7 2 2 3 2 2 2 1 2 2 4 2 2 2 1 Vertex Id Used BSize Pointer 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2
𝑪𝑩𝟏,𝟐 𝑪𝑩𝟐,𝟐 𝑪𝑩𝟐,𝟑 𝑪𝑩𝟑,𝟐 available space Overallocated space USERINTERFACE USER INTERFACE
Dest./Col. Value
- Relatively small number of BlockArrays are needed
– Exact number not known at compile time (or even at runtime given updates)
INTERNAL REPRESENTATION. HIDDEN FROM USERS
Memory Manager
Made up three parts:
- 1. Vectorized Bit Trees
– Helps determine which blocks are empty – Key components for efficient memory reclamation
- 2. BlockArrays
- 3. 𝐶+𝑈𝑠𝑓𝑓𝑡 of BlockArrays
NVIDIA GTC cuSTINGER Oded Green, 2017
19
- Each block is either full (0) or empty (1).
VecTrees
NVIDIA GTC cuSTINGER Oded Green, 2017
20
1 2 2 5 5 2 7 2 6 1 2 2 5 4 1 block VectTree implementation Machine word
𝐶𝐵1,1
VectTree representation 1 2 2 5 2 6 1 2
1 1 1 1 1 1 1 1 1
Next available position
𝐶𝐵1,1
Machine word VectTree implementation
(a) Full BlockArray (b) Partially Empty BlockArray
VecTrees Complexity
- Given a BlockArray with 𝐶𝐵
- Storage complexity 𝑃 𝐶𝐵
- bits. In practice
this is close to 2 ⋅ 𝐶𝐵 bits
– Relatively small overhead.
- VecTree Updates require 𝑃 log 𝐶𝐵
- perations
NVIDIA GTC cuSTINGER Oded Green, 2017
21
Memory Manager
Made up three parts:
- 1. Vectorized Bit Trees
- 2. BlockArrays
- 3. 𝑪+𝑼𝒔𝒇𝒇𝒕 of BlockArrays
– Responsible for deciding when more memory needs to be allocated
NVIDIA GTC cuSTINGER Oded Green, 2017
22
𝑪+𝑼𝒔𝒇𝒇𝒕 of BlockArrays
- Each block sizes has a different tree.
- The KEY of the 𝑪+𝑼𝒔𝒇𝒇𝒕 is the number of
empty blocks
NVIDIA GTC cuSTINGER Oded Green, 2017
23
1 2 ... 31
𝐶+𝑈𝑠𝑓𝑓 Array
𝐶+𝑈𝑠𝑓𝑓 for BlockArray with 1 edge in a block 𝐶+𝑈𝑠𝑓𝑓 for BlockArray with 4 edges in a block
3 4 4 1 4
B+ Node
𝐶+𝑈𝑠𝑓𝑓 for BlockArray with 2 edges in a block
1 available block
𝐶𝐵2,2
- Log. of block size
𝑪+𝑼𝒔𝒇𝒇𝒕 of BlockArrays
- Currently supports adjacency lists with upto
231 edges
- Can easily support up 263 edge blocks!!!
NVIDIA GTC cuSTINGER Oded Green, 2017
24
1 2 ... 31
𝐶+𝑈𝑠𝑓𝑓 Array
- Log. of block size
𝐶+𝑈𝑠𝑓𝑓 for BlockArray with 1 edge in a block 𝐶+𝑈𝑠𝑓𝑓 for BlockArray with 4 edges in a block 𝐶+𝑈𝑠𝑓𝑓 for BlockArray with 2 edges in a block
2 5 1 4 1 1 5 3 7 2 7 6 1
B+ Node
1 available block
𝐶𝐵2,2
3 4 4 1 4
𝐶𝐵2,1
0 available blocks
B+ Node
𝑪+𝑼𝒔𝒇𝒇𝒕 Properties
- A new BlockArray is allocated when all existing
BlockArrays are full.
- Great for memory utilization.
NVIDIA GTC cuSTINGER Oded Green, 2017
25
cuSTINGER – Update Process for Edge Insertions
1. Count number of insertions for each 𝑇𝑝𝑣𝑠𝑑𝑓 2. Check edge availability for each 𝑇𝑝𝑣𝑠𝑑𝑓. If not enough edges:
1. 𝑛𝑓𝑛𝑝𝑠𝑧 𝑛𝑏𝑜𝑓𝑠 – get larger block that can store all edges 2. Copy existing edges from old block to new block 3. 𝑛𝑓𝑛𝑝𝑠𝑧 𝑛𝑏𝑜𝑏𝑓𝑠 old block is reclaimed
3. Insert new edges (while avoiding duplicates)
NVIDIA GTC cuSTINGER Oded Green, 2017
26
1 2 3 4 5 6 7 2 3 3 2 4 2 1 2 4 4 2 4 2 1 Vertex Id Used BSize Pointer
USERINTERFACE
1 4 4 1 3 7 1 6 1 Source Destination Value
Update Batch
2 5 4 1 2 5 4 1 2 5 3 7 4 1 6 1
cuSTINGER – Full View (after update)
NVIDIA GTC cuSTINGER Oded Green, 2017
27
1 1 1 1 1 1 1 1 1 1
1 2 3 4 5 6 7 2 3 3 2 4 2 1 2 4 4 2 4 2 1 Vertex Id Used BSize Pointer 1 2 2 5 2 6 1 2 1 4 7 1 3 2
𝑪𝑩𝟏,𝟐 𝑪𝑩𝟐,𝟐 𝑪𝑩𝟐,𝟑 USERINTERFACE USER INTERFACE
1 1 1 1
2 5 3 7 4 1 6 1 0 3 4 4 1 4 0 5 1 2 7 1 𝑪𝑩𝟑,𝟑
𝑪𝑩𝟑,𝟐
1 4 4 1 3 7 1 6 1 Source Destination Value
Update Batch
VecTree bit status INTERNAL REPRESENTATION. HIDDEN FROM USERS
Dest./Col. Value
cuSTINGER – Go Home with this View
NVIDIA GTC cuSTINGER Oded Green, 2017
28
1 2 3 4 5 6 7 2 3 3 2 4 2 1 2 4 4 2 4 2 1 Vertex Id Used BSize Pointer 1 2 2 5 2 6 1 2 1 4 7 1 3 2
USERINTERFACE USER INTERFACE
0 3 4 4 1 4 1 4 4 1 3 7 1 6 1 Source Destination Value
Update Batch
Dest./Col. Value 2 5 3 7 4 1 6 1 0 5 1 2 7 1
cuSTINGER – Data Structure
- https://github.com/cuStinger/cuStinger
- Build instructions
– git clone recursive https://github.com/cuStinger/cuStinger.git – mkdir build && cd build – cmake .. – make j8
NVIDIA GTC cuSTINGER Oded Green, 2017
29
Performance Analysis
- Initialization Overhead
- Memory Utilization
- Update rate
– Number of sustainable updates per second
- CSR Vs. cuSTINGER
– SpMV
NVIDIA GTC cuSTINGER Oded Green, 2017
30
- For the K80 GPU, we use only one GPU
- We report for the K80, unless noted
Experimental Setup
GPU 𝝂Arch SMs SPs Memory (GB) Memory Type K40 Kepler 15 2880 12 GDDR5 K80 Kepler 2x13 2x2496 2x12 GDDR5 P100 Pascal 56 3584 16 HBM2
NVIDIA GTC cuSTINGER Oded Green, 2017
31
Inputs Graphs
- DIMACS 10 Graph Implementation Challenge
- SNAP – Stanford Network Analysis Project
- Florida Matrix Collection
The following is only a subset of these graphs:
NVIDIA GTC cuSTINGER Oded Green, 2017
32
Name Type |𝑾| |𝑭|* Source 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 Collaboration 299𝑙 1.95𝑁 DIMACS 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 SNAP 𝑙𝑠𝑝𝑜_21 Random 2𝑁 201𝑁 DIMACS 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 Citation 3.77𝑁 16.5𝑁 SNAP 𝑑𝑏𝑓15 Matrix 5.15𝑁 94𝑁 DIMACS 𝑣𝑙 − 2002 Webcrawl 18.52𝑁 523𝑁 DIMACS
Memory Utilization Edges
- 𝑉𝑢𝑗𝑚𝑗𝑨𝑏𝑢𝑗𝑝𝑜 =
𝑉𝑡𝑓𝑒 𝐵𝑚𝑚𝑝𝑑𝑏𝑢𝑓𝑒
- 70% average utilization
NVIDIA GTC cuSTINGER Oded Green, 2017
33
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Space Efficiency
Edge Utilization
Memory Utilization Blocks
- 𝑉𝑢𝑗𝑚𝑗𝑨𝑏𝑢𝑗𝑝𝑜 =
𝑉𝑡𝑓𝑒 𝐵𝑚𝑚𝑝𝑑𝑏𝑢𝑓𝑒
- 90% average utilization
NVIDIA GTC cuSTINGER Oded Green, 2017
34
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Space Efficiency
Block Utilization
Memory Utilization Overall
- 70% average utilization
- 30% overhead in comparison to CSR
NVIDIA GTC cuSTINGER Oded Green, 2017
35
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Space Efficiency
Overall Utilization
Initialization Time
- Copying from the CPU to GPU is costly
- Initializing cuSTINGER does not add much overhead
- We still need to optimize this process for P100
- Over 100𝑦 than cuSTINGER Version 1.0
NVIDIA GTC cuSTINGER Oded Green, 2017
36
1 2 3 4 CSR Copy Host to Device Initilization on Device
Overhead vs. CSR Memcpy
Insertion Rates
- Supports up to 90M updates per second
- Version 2.0 is
– 4𝑌 − 10𝑌 faster than Version 1.0 – Does not have 𝑞𝑓𝑠𝑔𝑝𝑠𝑛𝑏𝑜𝑑𝑓 𝑒𝑗𝑞 like Version 1.0
- Scalable growth in update rate
NVIDIA GTC cuSTINGER Oded Green, 2017
37
Version 1.0 Version 2.0
SpMV: CSR Vs. cuSTINGER
- Simply replace CSR accesses with cuSTINGER
– A real “apples to apples” comparison
NVIDIA GTC cuSTINGER Oded Green, 2017
38
1 10 100 1.000 10.000 100.000
MFLOPS
CSR cuSTINGER
Part 2: Algorithms for cuSTINGER
- Goal support Static and Dynamic graph
algorithms
- Already showed that the graph update process
is efficient
- All algorithms are implemented using the same
set of operations
– We show that these operators are efficient for static graph algorithms
NVIDIA GTC cuSTINGER Oded Green, 2017
39
cuSTINGER Programming Model
- “Keep things simple”
– Limit the amount of GPU programming users need to do.
- Uses vertex and edge frontiers
– Similar to Gunrock & LIGRA – Necessary for good utilization – Still requires good loadbalancing – Edge frontiers are created implicitly from vertex frontiers.
NVIDIA GTC cuSTINGER Oded Green, 2017
40
Vertex and Edge Frontiers
- Ligra
– CPU HPC graph framework by Julian Shun – Two backends: CILK and OpenMP – Edge frontiers created implicitly from vertex frontiers
- Typically one phase
- Gunrock
– Highly tuned GPU graph library from Prof. John Owens – Supports multiGPU analytics (same sharednode) – Each operation consists of two phases – Edge frontiers created explicitly by programmer.
NVIDIA GTC cuSTINGER Oded Green, 2017
41
Case Study 1: Label Propagation Connected Components
- Connected
component algorithms such as ShiloachVishkin
- Initially every vertex
is in its components
- Vertices move to with
smallest ID
– Vertices can swap components multiple times
NVIDIA GTC - cuSTINGER - Oded Green, 2017
42
4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1 4 2 3 1
One Iteration of Connected Components Algorithm
// Label propagation 1) 𝐺𝑝𝑠 𝑏𝑚𝑚 𝑤 ∈ 𝑊 2) 𝐺𝑝𝑠 𝑏𝑚𝑚 𝑣 ∈ 𝑏𝑒𝑘 𝑤 3) 𝑗𝑔 𝐷𝐷 𝑣 < 𝐷𝐷 𝑤 4) 𝐷𝐷 𝑣 ← 𝐷𝐷 𝑤 // Shortcutting 5) 𝐺𝑝𝑠 𝑏𝑚𝑚 𝑤 ∈ 𝑊 6) 𝐷𝐷 𝑤 ← 𝐷𝐷 𝐷𝐷 𝑤
NVIDIA GTC - cuSTINGER - Oded Green, 2017
43
Traverse all edges 𝑃 𝐹 Traverse all vertices 𝑃 𝑊
Revisiting the algorithm in parallel*
- Notice, no triple brackets <<<>>>
* We have more optimized versions in the library (require about 4 more lines of code….)
44
NVIDIA GTC - cuSTINGER - Oded Green, 2017
Case Study 2: Katz Centrality
- Given pseudo code for a variant of Katz
Centrality
- It took 56 hours to port from the CPU to the GPU.
– NVIDIA P100 GPU, initial speedup: 70X – CPU version had some addition optimizations. Within additional two hours speedup was: 100X
- This was feasible because of the preexisting
loadbalanced primitives in the library
NVIDIA GTC - cuSTINGER - Oded Green, 2017
45
cuSTINGER Algorithms
- https://github.com/cuStinger/cuStingerAlg
- Build instructions
– git clone –recursive https://github.com/cuStinger/cuStingerAlg.git – mkdir build && cd build – cmake .. – make j8
- By default, cuSTINGER will also be cloned
– Though you will need to build both repos
NVIDIA GTC cuSTINGER Oded Green, 2017
46
cuSTINGERv2 Algorithms
- https://github.com/cuStinger/cuStingerAlg
- Build instructions
– git clone –recursive https://github.com/cuStinger/cuStingerAlg.git – cd cuStingerAlg/build – cmake .. – make j
NVIDIA GTC cuSTINGER Oded Green, 2017
47
Performance Analysis
- Connected Components
- Breadth First Search
- Triangle Counting
– Static – Dynamic
NVIDIA GTC cuSTINGER Oded Green, 2017
48
Connected Components – NVIDIA P100
- Using label propagation
- Within 25% for most cases.
NVIDIA GTC cuSTINGER Oded Green, 2017
49
Name Gunrock (msec.) cuSTINGER (msec.) Speedup coAuthorsDBLP 1.72 2.17 0.79X asSkitter 3.68 17.4 0.21X kron_21 86.84 66.4 1.33X citpatents 38.85 40.8 0.95X Cage15 46.1 56.1 0.82X uk2002 407 489 0.83X
BFS – Classic TopDown – NVIDIA P100
- Using a similar algorithm in Gunrock
– Gunrock has additional optimizations that can make it faster than cuSTINGER
NVIDIA GTC cuSTINGER Oded Green, 2017
50
Name Gunrock (msec.) cuSTINGER (msec.) Speedup coAuthorsDBLP 2.74 2.44 1.12X asSkitter 7.74 10.6 0.73X kron_21 45.4 25.7 1.76X citpatents 16.5 23.3 0.71X cage15 29.1 43.2 0.67X uk2002 39.9 81.6 0.49X
Triangle Counting: CSR Vs. cuSTINGER
- Triangle counting algorithm taken from [Green et al; 𝐽𝐵3;2014]
- Simply replace CSR accesses with cuSTINGER
- Executed on a K40
NVIDIA GTC cuSTINGER Oded Green, 2017
51
Name |𝑾| |𝑭| TimeCSR (sec.) TimecuSTINGER (sec.) Execution Difference 𝑑𝑝𝐵𝑣𝑢ℎ𝑝𝑠𝑡𝐸𝐶𝑀𝑄 299𝑙 1.95𝑁 0.218 0.242 +10% 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 1.69𝑁 11.1𝑁 57.14 59.37 +3.8% 𝑙𝑠𝑝𝑜_21 2𝑁 201𝑁 2992 2996 +0.14% 𝑑𝑗𝑢 − 𝑞𝑏𝑢𝑓𝑜𝑢𝑡 3.77𝑁 16.5𝑁 0.814 0.830 +2% 𝑑𝑏𝑓15 5.15𝑁 94𝑁 6.544 7.204 +10% 𝑣𝑙 − 2002 18.52𝑁 523𝑁 424.9 431.4 +1.6%
Summary
52
NVIDIA GTC cuSTINGER Oded Green, 2017
Library Overview
Completed algorithms and ongoing Of course many more algorithms to come…
NVIDIA GTC cuSTINGER Oded Green, 2017
53
Algorithm Static Dynamic Reference
Breadth first search
✓
Triangle Counting
✓ ✓
Static [Green et al; IA32014] Dynamic – new algorithm [Makkar; 2017 submitted] Connect components
✓
- ngoing
[McColl; HiPC 2013] Betweenness Centrality
✓
- ngoing
[Green; SocialCom 2012] Page Rank
✓
- ngoing
New algorithm (non linear algebra based) Katz Centrality
✓
✓
New algorithm (non linear algebra based)
Upcoming projects using cuSTINGER
- Extend dynamic triangle counting to
Jaccard Indices
- Scalable pattern and motif detection on the
GPU
NVIDIA GTC cuSTINGER Oded Green, 2017
54
Take away
- Dynamic data structure for sparse data sets
- Supports high update rates
- Scalable in both data size and in performance
NVIDIA GTC cuSTINGER Oded Green, 2017
55
Collaborators
- Prof. David Bader (Georgia Tech)
- Prof. Jimeng Sun (Georgia Tech)
- Dr. Jason Riedy (Georgia Tech)
- Federico Busato, Visiting PhD student (Universita di Verona)
- James Fox, PhD student (Georgia Tech)
- Euna Kim, PhD student (Georgia Tech)
- Muhammad Osama Sakhi, BSc student (Georgia Tech)
- Alok Tripathy, BSc student (Georgia Tech)
- Manas George, BSc student (Georgia Tech)
- Graduates (GT)
– Devavret Makkar, MSc. (Tower Research)
NVIDIA GTC cuSTINGER Oded Green, 2017
56
Thank you
NVIDIA GTC cuSTINGER Oded Green, 2017
57
- Email: ogreen@gatech.edu
- Data structure:
– https://github.com/cuStinger/cuStingerAlg
- Algorithms:
– https://github.com/cuStinger/cuStingerAlg.git
- Versions 2.0, coming soon to a GPU near you