James Fox Collaborators Oded Green, Research Scientist (GT) Euna - - PowerPoint PPT Presentation

james fox collaborators
SMART_READER_LITE
LIVE PREVIEW

James Fox Collaborators Oded Green, Research Scientist (GT) Euna - - PowerPoint PPT Presentation

Fast and Scalable Subgraph Isomorphism using Dynamic Graph Techniques James Fox Collaborators Oded Green, Research Scientist (GT) Euna Kim, PhD student (GT) Federico Busato, PhD student (Universita di Verona) Dr. Nicola


slide-1
SLIDE 1

Fast and Scalable Subgraph Isomorphism using Dynamic Graph Techniques

James Fox

slide-2
SLIDE 2

Collaborators

  • Oded Green, Research Scientist (GT)
  • Euna Kim, PhD student (GT)
  • Federico Busato, PhD student (Universita di Verona)
  • Dr. Nicola Bombieri (Universita di Verona)
  • Kartik Lakhotia, PhD student (USC)
  • Shijie Zhou, PhD student (USC)
  • Shreyas Singapura, PhD student (USC)
  • Hanqing Zeng, PhD student (USC)
  • Dr. Rajgopal Kannan, (USC)
  • Prof. Viktor Prasanna (USC)
  • Prof. David Bader (GT)

Quickly Finding a Truss in Haystack

2

slide-3
SLIDE 3

Outline

  • K-Truss

– Introduction – Sequential Approaches

  • Our new algorithm

– Dynamic Triangle Counting – Hornet: data structure for dynamic graphs

  • Performance Analysis

3

Quickly Finding a Truss in Haystack

slide-4
SLIDE 4

K-Truss

  • Definition:

: for given 𝑙, the 𝑙 − 𝑢𝑠𝑣𝑡𝑡 is a subgraph such that each edge closes at least 𝑙 − 2 triangles, i.e. “support” of 𝑙 − 2

  • A well-connected subgraph

– “Relaxation of k-clique, stricter than k-core” [Cohen; 2008] – Computationally efficient to find

  • Maximal k-truss: focus of our work

4

Quickly Finding a Truss in Haystack

slide-5
SLIDE 5

Example

5

Quickly Finding a Truss in Haystack

1 3 2 7 6 5 4 1 3 2 7 6 5 4

1 1 3 2 2 2 3 1 1

1 3 2 7 6 5 4

1 1 3 2 2 2 3 1 1

1 3 2 7 6 5 4

2 2 2 2 2

K=3 Truss K=4 Truss

slide-6
SLIDE 6

Over 1000x time faster

Graph Challenge Innovation Award (HPEC’17) Three main factors

  • Algorithmic Optimization
  • 1. Uses dynamic graph data structure
  • 2. Novel algorithm for dynamically updating

triangle counts

  • Parallelization
  • Programming model – vertex centric more efficient

than linear algebra

6

Quickly Finding a Truss in Haystack

slide-7
SLIDE 7

Simple Vertex Centric

𝑙 ← 3 𝑥ℎ𝑗𝑚𝑓 𝐹 ≠ ∅ 𝑠𝑓𝑞𝑓𝑏𝑢 𝑣𝑜𝑢𝑗𝑚 𝑜𝑝 𝑛𝑝𝑠𝑓 𝑑ℎ𝑏𝑜𝑕𝑓𝑡 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 𝑗𝑔 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 < 𝑙 − 2 𝑒𝑓𝑚𝑓𝑢𝑓 𝑓 𝑔𝑠𝑝𝑛 𝐹 𝑙 ← 𝑙 + 1 𝑙 ← 𝑙 − 1

7

Quickly Finding a Truss in Haystack

slide-8
SLIDE 8

Linear Algebra Formulation

  • Given k
  • Bold letters refer to vectors and matrices

𝑺 = 𝑭𝑩 𝒚 = 𝑔𝑗𝑜𝑒 𝑆 == 2 ⋅ 𝟐 < 𝑙 − 2 𝑥ℎ𝑗𝑚𝑓 𝒚 𝑭𝒚 = 𝑭 𝒚, : 𝑭 = 𝑭 𝒚𝒅, : 𝑺 = 𝑭 𝒚𝒅, : 𝑩 𝑺 = 𝑺 − 𝑭 𝑭𝒚𝑭𝒚

𝑼 − 𝑒𝑗𝑏𝑕 𝑭𝒚𝑭𝒚 𝑼

𝒚 = 𝑔𝑗𝑜𝑒 𝑆 == 2 ⋅ 𝟐 < 𝑙 − 2

8

Quickly Finding a Truss in Haystack

slide-9
SLIDE 9

New Algorithm for finding Maximal Truss

𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 w e ← 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 𝑙 ← 3 𝑥ℎ𝑗𝑚𝑓 𝐹 ≠ ∅ 𝑠𝑓𝑞𝑓𝑏𝑢 𝑣𝑜𝑢𝑗𝑚 𝑜𝑝 𝑛𝑝𝑠𝑓 𝑑ℎ𝑏𝑜𝑕𝑓𝑡 𝑚𝑗𝑡𝑢 ← ∅ 𝑔𝑝𝑠 e = 𝑣, 𝑤 ∈ 𝐹 𝑗𝑔 𝑏𝑒𝑘 𝑣 ∩ 𝑏𝑒𝑘 𝑤 < 𝑙 − 2 𝑏𝑞𝑞𝑓𝑜𝑒 𝑚𝑗𝑡𝑢, 𝑓 𝐻RST ← CreateGraph(𝑚𝑗𝑡𝑢) 𝑠𝑓𝑛𝑝𝑤𝑓𝐹𝑒𝑕𝑓𝑡 𝐻, 𝐻RST 𝑉𝑞𝑒𝑏𝑢𝑓𝑈𝑠𝑗𝑏𝑜𝑕𝑚𝑓𝐷𝑝𝑣𝑜𝑢 𝐻, 𝐻RST 𝑙 ← 𝑙 + 1 𝑙 ← 𝑙 − 1

9

Quickly Finding a Truss in Haystack

ü - par paral allel ü - par paral allel ü - par paral allel ü - par paral allel ü - par paral allel

slide-10
SLIDE 10

𝐻RST ← CreateGraph(𝑚𝑗𝑡𝑢)

  • We will create a graph from all the deleted

edges

  • Adjacencies will be sorted

10

Quickly Finding a Truss in Haystack

1 3 2 7 6 5 4

1 1 1 1

1 3 2 7 6 5 4

1 1 3 2 2 3 1 1

𝐻RST 𝐻

slide-11
SLIDE 11

𝑉𝑞𝑒𝑏𝑢𝑓𝑈𝑠𝑗𝑏𝑜𝑕𝑚𝑓𝐷𝑝𝑣𝑜𝑢 𝐻, 𝐻RST

  • Must update counts of non-removed edges
  • Don’t want to re-compute globally

11

Quickly Finding a Truss in Haystack

1 3 2 7 6 5 4

3 2 2 2 3

After deletion (incorrect triangle counts)

1 3 2 7 6 5 4

2 2 2 2 2

Updated triangle counts

slide-12
SLIDE 12

Three “types” of triangles affected

  • 1. One edge removed
  • 2. Two edges removed
  • 3. All three edges removed

[Makkar; HiPC’17]

12

Quickly Finding a Truss in Haystack

w v u w v u w v u

slide-13
SLIDE 13

One edge removed

  • 𝑣, 𝑤 deleted
  • By intersecting the list of 𝑣 with the list of 𝑤

we can find all common neighbors

– Decrement support by 1

  • For all 𝑓 = 𝑣, 𝑤 ∈ 𝐻RST

– 𝐽𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢(𝑣, 𝐻, 𝑤, 𝐻)

13

Quickly Finding a Truss in Haystack

w v u

slide-14
SLIDE 14

Two edges removed

  • 𝑣, 𝑤 and 𝑣, 𝑥 deleted
  • Intersecting the adjacencies like

before won’t work.

  • Instead we will intersect adjacencies from the

two graphs: 𝐻 and 𝐻RST

  • For all 𝑓 = 𝑣, 𝑤 ∈ 𝐻RST

– 𝐽𝑜𝑢𝑓𝑠𝑡𝑓𝑑𝑢(𝑣, 𝐻, 𝑤, 𝐻RST)

  • Can handle double-counting

14

Quickly Finding a Truss in Haystack

w v u

slide-15
SLIDE 15

Three edges removed

  • 𝑣, 𝑤 , 𝑣, 𝑥 , 𝑥, 𝑤 deleted
  • No need to update supports!

15

Quickly Finding a Truss in Haystack

w v u

slide-16
SLIDE 16

So what else do we need?

Quickly Finding a Truss in Haystack

16

  • We need a dynamic graph data structure
  • These data structures don’t cut it

Na Names De Dense Ad Adjacency Ma Matrix Li Linked li lists COO ( OO (Edge li list) CS CSR/CS /CSC Good Locality ❌ ❌ ❌ ü Flexible Updates ü ü ❌ ❌

slide-17
SLIDE 17

Hornet…

  • Supports updates

– Supports edge insertion\deletion and deletion. – Supports vertex insertion\deletion.

  • Good locality

– Edge list contiguous

  • Efficient memory manager

– Memory reclamation – Hidden from user

  • Framework

Quickly Finding a Truss in Haystack

17

1 2 3 4 5 6 7 2 2 3 2 2 2 1 2 2 4 2 2 2 1 Vertex Id Id Us Used BS BSiz ize Po Pointer 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

Over-allocated space USER-INTERFACE

Dest./Col. Value

slide-18
SLIDE 18

Experimental Setup - CPU

Intel Dual Processor

  • Intel Xeon E5-2695
  • 16 cores / per processor (32 in total)

– 64 threads with Hyperthreading

  • 45MB LLC
  • 1TB of DDR4

18

Quickly Finding a Truss in Haystack

slide-19
SLIDE 19

Experimental Setup - GPU

Single Pascal 𝑄100

  • 56 processors (SMs)
  • 64 threads / per processors (SPs)
  • 3584 hardware threads
  • 16GB of HBM2

– 720 GB/s bandwidth

19

Quickly Finding a Truss in Haystack

slide-20
SLIDE 20

Inputs Graphs

  • HPEC Graph Challenge
  • SNAP – Stanford Network Analysis Project

The following is only a subset of these graphs:

Quickly Finding a Truss in Haystack

20

Na Name Network T Type |𝑾| |𝑭|* 𝑑𝑗𝑢 − 𝐼𝑓𝑞𝑄ℎ Citation 35k 421k 𝑏𝑛𝑏𝑨𝑝𝑜0601 Co-purchasing 400𝑙 2.4𝑁 𝑠𝑝𝑏𝑒𝑂𝑓𝑢 − 𝑄𝐵 Road 1𝑁 1.5𝑁 𝑏𝑡 − 𝑡𝑙𝑗𝑢𝑢𝑓𝑠 Trace route 1.69𝑁 11.1𝑁 𝑕𝑠𝑏𝑞ℎ500 − 𝑡𝑑𝑏𝑚𝑓21 Random 2.1𝑁 34𝑁

*largest: |E|= 134M

slide-21
SLIDE 21

Benchmarks

  • 1. Graph Challenge
  • 1. Julia
  • 2. Python
  • 3. Matlab\Octave
  • 2. Our algorithms

1.

  • 1. Ite

Iterati tive - uses static triangle counting 2.

  • 2. Delta

ta - uses new algorithm

21

Quickly Finding a Truss in Haystack

slide-22
SLIDE 22

Finding the Maximal Truss

22

Quickly Finding a Truss in Haystack

Time out – 8 hours Usually – 200X-500X faster Many times over 2000X faster Sometimes 10,000X faster

slide-23
SLIDE 23

Execution time per iteration

23

Quickly Finding a Truss in Haystack

slide-24
SLIDE 24

Future Work

  • We still think that we can improve by another

10X…

  • New triangle counting kernel

– Balanced and imbalanced intersections – Improved warp utilization

24

Quickly Finding a Truss in Haystack

slide-25
SLIDE 25

Summary

  • New algorithm for finding the maximal K-

Truss

  • Given a static input we use techniques from

dynamic graph algorithms

  • Hundreds to thousands of times faster than

the benchmarks

  • We still think that we can improve by another

10X…

25

Quickly Finding a Truss in Haystack

slide-26
SLIDE 26

Thank you

Quickly Finding a Truss in Haystack

26

  • Email: jfox43@gatech.edu
slide-27
SLIDE 27

Backup Slides

Quickly Finding a Truss in Haystack

27

slide-28
SLIDE 28

Wang & Chang; 2012

  • Modified version of Cohen’s algorithm
  • Sorts the edges based on their support

– In each iteration, edges with a support smaller than 𝑙 − 2 are removed

  • Inherently sequential (due to update process)
  • Yet, significantly faster than Cohen’s

algorithm

  • Uses hash maps for intersections

28

Quickly Finding a Truss in Haystack

slide-29
SLIDE 29

Hornet Data Layout

  • A scalable and dynamic data structure for graph

algorithms and linear algebra based problems

  • Can support up-to 90 million updates per second
  • Low overhead in comparison with CSR

– Initializing is also relatively in-expensive 20%-200% – Equal performance

  • Simple to use
  • Implemented for CUDA, yet portable for other

architectures

cuSTINGER paper: [Green&Bader; HPEC, 2016]: cuSTIN INGER: S : Supporting d dynamic g graph a algorithms fo for G GPUs

Quickly Finding a Truss in Haystack

29

slide-30
SLIDE 30

Hornet – Property Graph Support

Quickly Finding a Truss in Haystack

30

1 2 3 4 5 6 7 2 2 3 2 2 2 1 2 2 4 2 2 2 1 Vertex Id Id Us Used BS BSiz ize Po Pointer 1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

USER-INTERFACE

Dest./Col. Weight Type Time 1 User 1 User 2 ….

  • These are optional fields
slide-31
SLIDE 31

Finding the Maximal Truss

  • Note log-log scale
  • Algorithms execution stopped after 8 hours.
  • Typically Octave was the fastest of the Graph

Challenge benchmarks

  • Our new algorithm is usually 500X faster

than Octave

  • In many cases, our new algorithm is over

2000X faster than Python

  • Julia is the slowest

31

Quickly Finding a Truss in Haystack

slide-32
SLIDE 32

Finding Trusses of K=4

32

Quickly Finding a Truss in Haystack

slide-33
SLIDE 33

Finding Trusses of K=4

  • Similar performances results as those for the

maximal truss

  • Note log-log scale
  • Algorithms execution stopped after 8 hours.

33

Quickly Finding a Truss in Haystack

slide-34
SLIDE 34

Comparison with [Wang&Cheng; VLDB’12]

Amaz Amazon Wi Wiki As As-Sk Skit itter Li Live-Jou Journal Wang&Cheng 31 121 281 664 New-iterative 0.43 9.07 57.1 258 Speedup 72 13 5 2.57

34

Quickly Finding a Truss in Haystack

  • All times in seconds
  • Finds maximal trusses
  • Code is not open source
  • Wang&Cheng used SNAP graphs.
slide-35
SLIDE 35

Comparison with Graphulo

S1 S10 S1 S11 S1 S12 S1 S13 S1 S14 S1 S15 S1 S16 Graphulo 1.63 3.93 12.1 37.2 110 3290 8770 New- iterative 0.003 0.007 0.016 0.042 0.106 0.352 1.18 Speedup 518 595 741 883 1041 9330 7847

35

Quickly Finding a Truss in Haystack

  • All times in seconds
  • Graphulo results taken from Graphulo

[Hutchison et al; HPEC’15]

  • Find Trusses of K=4
  • Compare only synthetic graphs (RMAT)

– S10 means graph with 2wx vertices.