Scalable K-Core Decomposition for Static Graphs Using a Dynamic - - PowerPoint PPT Presentation

β–Ά
scalable k core decomposition for static graphs using a
SMART_READER_LITE
LIVE PREVIEW

Scalable K-Core Decomposition for Static Graphs Using a Dynamic - - PowerPoint PPT Presentation

Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure Alok Tripathy What Ill Show Maximal -core algorithm Up to 4 faster than previous research Up to 58 faster than popular graph


slide-1
SLIDE 1

Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure

Alok Tripathy

slide-2
SLIDE 2

What I’ll Show

  • Maximal 𝑙-core algorithm

– Up to 4π‘Œ faster than previous research – Up to 58π‘Œ faster than popular graph libraries

  • 𝑙-core edge decomposition algorithm

– Up to 8π‘Œ faster than previous research – Up to 129π‘Œ faster than popular graph libraries

2

Alok Tripathy, GTC 2019

slide-3
SLIDE 3

What I’ll Show

  • Maximal 𝑙-core algorithm

– Up to 4π‘Œ faster than previous research – Up to 58π‘Œ faster than popular graph libraries

  • 𝑙-core edge decomposition algorithm

– Up to 8π‘Œ faster than previous research – Up to 129π‘Œ faster than popular graph libraries – Us Uses a d dynamic g graph op

  • peration
  • ns

3

Alok Tripathy, GTC 2019

slide-4
SLIDE 4

Takeaways

  • Algorithms on static graphs can use dynamic

graph operations efficiently with the GPU.

  • Dynamic graph operations can be computed
  • n a GPU efficiently.

– Check out the Hornet data structure! – https://github.com/hornet-gt/hornet

4

Alok Tripathy, GTC 2019

slide-5
SLIDE 5

Motivation

  • Two types of graphs

– Static graphs that don’t change – Dynamic graphs that change frequently

  • Edge/vertex insertions/deletions
  • e.g. Facebook, road networks

5

Alok Tripathy, GTC 2019

slide-6
SLIDE 6

Motivation

  • Two types of graphs

– Static graphs that don’t change – Dynamic graphs that change frequently

  • Edge/vertex insertions/deletions
  • e.g. Facebook, road networks
  • Algorithms on static graphs can benefit from

dynamic graph operations

6

Alok Tripathy, GTC 2019

slide-7
SLIDE 7
  • 𝑙-truss problem

7

Alok Tripathy, GTC 2019

Dynamic Operations on Static Graphs

slide-8
SLIDE 8
  • 𝑙-truss problem

– Subgraph where all edges belong to at least

𝑙 Β‘ βˆ’ 2 triangles

– Can be extended to maximal 𝑙-truss

8

Alok Tripathy, GTC 2019

𝑙 = 4

Dynamic Operations on Static Graphs

slide-9
SLIDE 9
  • 𝑙-truss problem

– Subgraph where all edges belong to at least

𝑙 Β‘ βˆ’ 2 triangles

– Can be extended to maximal 𝑙-truss – Applications: community detection, anomaly

detection

9

Alok Tripathy, GTC 2019

𝑙 = 4

Dynamic Operations on Static Graphs

slide-10
SLIDE 10

𝑙-truss Algorithm

10

  • ­‑ 𝐹. = Β‘all Β‘edges Β‘in Β‘β‰₯ 𝑙 Β‘ βˆ’ 2 triangles
  • ­‑ while Β‘ 𝐹. > 0
  • ­‑ delete ‘𝐹. Β‘from Β‘G Β‘
  • ­‑ update Β‘triangles Β‘in Β‘G
  • ­‑ 𝐹. = Β‘all Β‘edges Β‘in Β‘β‰₯ 𝑙 Β‘ βˆ’ 2 triangles

Alok Tripathy, GTC 2019

slide-11
SLIDE 11

Takeaways

  • Algorithms on static graphs can use dynamic

graph operations efficiently with the GPU.

  • Dynamic graph operations can be computed
  • n a GPU efficiently.

– Check out the Hornet data structure! – https://github.com/hornet-gt/hornet

11

Alok Tripathy, GTC 2019

slide-12
SLIDE 12

Widely used graph data structures

12

Na Names Pr Pros

  • s

Con Cons

Dense Adjacency Matrix

  • Supports updates
  • Poor locality
  • Massive storage

requirements Linked lists

  • Flexible
  • Poor locality
  • Limited parallelism
  • Allocation time is costly

COO (Edge list) - unsorted

  • Has some flexibility
  • Updates are simple
  • Lots of parallelism
  • Poor locality
  • Stores both the source and

destination CSR

  • Uses exact amount of

memory

  • Good locality
  • Lots of parallelism
  • Inflexible

These data structures don’t cut it

Oded Green, Alok Tripathy, GTC 2019

slide-13
SLIDE 13

Compressed Sparse Row (CSR)

Pr Pros:

  • Uses precise storage

requirements

  • Great locality

– Good for GPUs

  • Handful of arrays

– Simple to use and manage

Co Cons ns:

  • Inflexible.
  • Network growth

unsupported

  • Topology changes

unsupported

  • Property graphs not

supported

13

1 2 3 4 5 6 7 2 4 7 9 11 13 14 14

Src/Row Offset

1 2 5 3 4 2 6 2 5 1 4 3 2 5 2 7 4 1 4 1 2 4 1 7 1 2

Dest./Col. Value Oded Green, Alok Tripathy, GTC 2019

slide-14
SLIDE 14

Hornet – A High Level View

14

1 2 2 5 0 5 2 7 0 3 4 4 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

Over-­‑allocated Β‘space

Dest Value 1 2 3 4 5 6 7 2 2 3 2 2 2 1 Vertex Id Id Us Used Po Pointer

USER-­‑INTERFACE

Oded Green, Alok Tripathy, GTC 2019

slide-15
SLIDE 15

Hornet in Detail

15

1 1 1 1 1 1 1 1 1 1 1

1 2 3 4 5 6 7 2 2 3 2 2 2 1 Vertex Β‘Id Used ( (#Neighbors/nnz nnz) Po Pointer 1 2 5 2 0 5 5 7 0 3 4 2 1 4 2 6 1 2 2 5 4 1 1 4 7 1 3 2

π‘ͺπ‘©πŸ,𝟐 π‘ͺπ‘©πŸ,𝟐 π‘ͺπ‘©πŸ,πŸ‘ π‘ͺπ‘©πŸ‘,𝟐

Bit Β‘status Β‘ Over-­‑allocated space for Β‘vertex insertions

USER-­‑INTERFACE

Dest./Col. Weight

MEMORY MANAGER

bsize=1 bsize =2 bsize =2 bsize =4 Vec-­‑Tree Over-­‑allocated space for Β‘power-­‑of-­‑two rule

Oded Green, Alok Tripathy, GTC 2019

slide-16
SLIDE 16

Hornet Insertion

16

Oded Green, Alok Tripathy, GTC 2019

slide-17
SLIDE 17

Hornet Insertion Pseudocode

17

parallel Β‘for Β‘(u, Β‘v) Β‘in Β‘batch Β‘

  • ­‑ if Β‘u’s Β‘block Β‘is Β‘too Β‘full
  • ­‑ allocate Β‘a Β‘new Β‘block
  • ­‑ queue.add(u)

parallel Β‘for Β‘v Β‘in Β‘queue

  • ­‑ copy Β‘adjacency Β‘list Β‘to Β‘new Β‘block

parallel Β‘for Β‘(u, Β‘v) Β‘in Β‘batch

  • ­‑ add Β‘(u, Β‘v) Β‘to Β‘u’s Β‘block

Alok Tripathy, GTC 2019

slide-18
SLIDE 18 1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Β‘Rate Β‘(edges Β‘per Β‘second)

in-­‑2004 soc-­‑LiveJournal1 cage15 kron_g500-­‑logn21

Insertion Rates

  • Supports over 150M updates per second
  • Hornet

– 4π‘Œ βˆ’ 10π‘Œ faster than cuSTINGER – Does not have π‘žπ‘“π‘ π‘”π‘π‘ π‘›π‘π‘œπ‘‘π‘“ Β‘π‘’π‘—π‘ž like cuSTINGER

  • Scalable growth in update rate

18

cuSTIN INGER Ho Horne net

1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Β‘Rate Β‘(edges Β‘per Β‘second)

in-­‑2004 soc-­‑LiveJournal1 cage15 kron_g500-­‑logn21

103 104 105 106 107 108 109 103 104 105 106 107 108 109

Oded Green, Alok Tripathy, GTC 2019

slide-19
SLIDE 19

Takeaways

  • Algorithms on static graphs can use dynamic

graph operations efficiently with the GPU.

  • Dynamic graph operations can be computed
  • n a GPU efficiently.

– Check out the Hornet data structure! – https://github.com/hornet-gt/hornet

19

Alok Tripathy, GTC 2019

slide-20
SLIDE 20

Motivation

  • Current idea:

– Dynamic graph operations are only for dynamic graphs, not static graphs.

  • Very expensive
  • Why bother?

20

Alok Tripathy, GTC 2019

slide-21
SLIDE 21

Motivation

  • Current idea:

– Dynamic graph operations are only for dynamic graphs, not static graphs.

  • Very expensive
  • Why bother?
  • New idea: Algorithms on static graphs can

benefit from dynamic graph operations

– If If we can efficiently parallelize operations

21

Alok Tripathy, GTC 2019

slide-22
SLIDE 22

What I’ll Show

  • 3 static graph algorithms

– All 3 leverage NVIDIA P100 GPUs.

  • 2 beat the state-of-the-art
  • 1 does not (does not have good GPU

utilization)

22

Alok Tripathy, GTC 2019

slide-23
SLIDE 23

Algorithms

  • Old maximal 𝑙-core algorithm
  • New maximal 𝑙-core algorithm
  • 𝑙-core edge decomposition

23

Alok Tripathy, GTC 2019

slide-24
SLIDE 24

Algorithms

  • Old maximal 𝑙-core algorithm L
  • New maximal 𝑙-core algorithm
  • 𝑙-core edge decomposition

24

Alok Tripathy, GTC 2019

slide-25
SLIDE 25
  • 𝑙-core

– Maximal subgraph where all vertices have

degree at least 𝑙

25

Alok Tripathy, GTC 2019

𝑙 = 2

Maximal 𝑙-core Definitions

slide-26
SLIDE 26
  • 𝑙-core

– Maximal subgraph where all vertices have

degree at least 𝑙

  • Maximal 𝑙-core

– Largest 𝑙 such that 𝑙-core exists in graph

26

Alok Tripathy, GTC 2019

Maximal 𝑙-core Definitions

𝑙 = 3

slide-27
SLIDE 27
  • 𝑙-core

– Maximal subgraph where all vertices have

degree at least 𝑙

  • Maximal 𝑙-core

– Largest 𝑙 such that 𝑙-core exists in graph

  • Applications: visualization, community detection

27

Alok Tripathy, GTC 2019

Maximal 𝑙-core Definitions

𝑙 = 3

slide-28
SLIDE 28

Maximal 𝑙-core High-Level

28

π‘žπ‘“π‘“π‘š = 0 while Β‘vertices Β‘exist Β‘in Β‘G Β‘

  • ­‑ delete Β‘all Β‘vertices Β‘ Β‘ Β‘

with Β‘degree Β‘<= Β‘π‘žπ‘“π‘“π‘š

  • ­‑ if Β‘there Β‘aren’t Β‘any
  • ­‑ increment Β‘π‘žπ‘“π‘“π‘š

2 2 5 3 4 4 5 1 1 1 π‘žπ‘“π‘“π‘š = 1

Alok Tripathy, GTC 2019

slide-29
SLIDE 29

Maximal 𝑙-core High-Level

29

2 2 5 3 4 4 2 π‘žπ‘“π‘“π‘š = 2

Alok Tripathy, GTC 2019

π‘žπ‘“π‘“π‘š = 0 while Β‘vertices Β‘exist Β‘in Β‘G Β‘

  • ­‑ delete Β‘all Β‘vertices Β‘ Β‘ Β‘

with Β‘degree Β‘<= Β‘π‘žπ‘“π‘“π‘š

  • ­‑ if Β‘there Β‘aren’t Β‘any
  • ­‑ increment Β‘π‘žπ‘“π‘“π‘š
slide-30
SLIDE 30

Maximal 𝑙-core High-Level

30

3 3 3 3 π‘žπ‘“π‘“π‘š = 3

Alok Tripathy, GTC 2019

π‘žπ‘“π‘“π‘š = 0 while Β‘vertices Β‘exist Β‘in Β‘G Β‘

  • ­‑ delete Β‘all Β‘vertices Β‘ Β‘ Β‘

with Β‘degree Β‘<= Β‘π‘žπ‘“π‘“π‘š

  • ­‑ if Β‘there Β‘aren’t Β‘any
  • ­‑ increment Β‘π‘žπ‘“π‘“π‘š
slide-31
SLIDE 31

Old Maximal 𝑙-core Algorithm

31

π‘žπ‘“π‘“π‘š = 0 while Β‘vertices Β‘exist Β‘in ‘𝐻

  • ­‑ reset Β‘colors Β‘
  • ­‑ color Β‘all Β‘vertices

with Β‘degree ‘≀ π‘žπ‘“π‘“π‘š

  • ­‑ if Β‘#coloredvertices > Β‘0
  • ­‑ delete Β‘colored Β‘vertices
  • ­‑ delete Β‘incident Β‘edges
  • ­‑ insert Β‘vertices Β‘in ‘𝐻

J

  • ­‑ insert Β‘edges Β‘in ‘𝐻

J Β‘

  • ­‑ else
  • ­‑ increment Β‘π‘žπ‘“π‘“π‘š

2 2 5 3 4 4 5 1 1 1 π‘žπ‘“π‘“π‘š = 1

Alok Tripathy, GTC 2019

slide-32
SLIDE 32

Old Maximal 𝑙-core Code

32

Alok Tripathy, GTC 2019

slide-33
SLIDE 33
  • ParK

– parallel 𝑙-core algorithm; IEEE BigData 2014 – Some parallelism – No dynamic graph operations

  • igraph

– network analysis toolkit – Sequential – No dynamic graph operations

  • Both run on Intel Xeon E5-2695; 36 cores, 72

threads

33

Alok Tripathy, GTC 2019

Compared Against

slide-34
SLIDE 34

Old Maximal 𝑙-core Results

  • Our algorithm is sometimes better than igraph.
  • Our algorithm never beats ParK.
  • Why are we so slow?

34

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| 𝑷𝒗𝒔 π’ƒπ’Žπ’‰π’‘π’”π’‹π’–π’Šπ’ 𝑸𝒃𝒔𝑳 Β‘ π’‹π’‰π’”π’ƒπ’’π’Š

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 2.2π‘Œ 15π‘Œ 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 1.3π‘Œ 15π‘Œ 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 𝑃𝑃𝑁 11.3π‘Œ 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 0.6π‘Œ 16.6π‘Œ 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 𝑃𝑃𝑁 6.8π‘Œ 1π‘Œ

π‘₯π‘—π‘™π‘—π‘žπ‘“π‘’π‘—π‘ βˆ’ π‘šπ‘—π‘œπ‘™ βˆ’ 𝑒𝑓

3.2𝑁 65.8𝑁 𝑃𝑃𝑁 5.1π‘Œ 1π‘Œ

Alok Tripathy, GTC 2019

slide-35
SLIDE 35

GPU Utilization

35

Alok Tripathy, GTC 2019

slide-36
SLIDE 36

GPU Utilization / Batch Size

36

Alok Tripathy, GTC 2019

1,000 10,000 100,000 1,000,000 10,000,000 100,000,000 1,000,000,000

Update Β‘Rate Β‘(edges Β‘per Β‘second)

in-­‑2004 soc-­‑LiveJournal1 cage15 kron_g500-­‑logn21

slide-37
SLIDE 37

Algorithms

  • Old maximal 𝑙-core algorithm L
  • New maximal 𝑙-core algorithm
  • 𝑙-core edge decomposition

37

Alok Tripathy, GTC 2019

slide-38
SLIDE 38

Algorithms

  • Old maximal 𝑙-core algorithm L
  • New maximal 𝑙-core algorithm J
  • 𝑙-core edge decomposition

38

Alok Tripathy, GTC 2019

slide-39
SLIDE 39

New Maximal 𝑙-core Algorithm

39

  • Flag vertices instead of deleting them.

while Β‘not Β‘every Β‘vertex Β‘is Β‘flagged

  • ­‑ flag Β‘all Β‘vertices Β‘with Β‘degree Β‘<= Β‘π‘žπ‘“π‘“π‘š
  • ­‑ if Β‘there Β‘aren’t Β‘any
  • ­‑ increment Β‘π‘žπ‘“π‘“π‘š
  • ­‑ else
  • ­‑ for Β‘each Β‘flagged Β‘vertex ‘𝑀
  • ­‑ for Β‘each Β‘neighbor Β‘of ‘𝑀
  • ­‑ decrement Β‘neighbor’s Β‘degree

Alok Tripathy, GTC 2019

slide-40
SLIDE 40

New Maximal 𝑙-core Code

40

Alok Tripathy, GTC 2019

slide-41
SLIDE 41

New Maximal 𝑙-core Results

  • Our algorithm always beats igraph.
  • Our algorithm is sometimes better than ParK.

– At best, 3.9π‘Œ faster – At worst, 4.3π‘Œ slower

  • Learned that batch size affected performance.

41

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| 𝑷𝒗𝒔 Β‘π’ƒπ’Žπ’‰π’‘π’”π’‹π’–π’Šπ’ 𝑸𝒃𝒔𝑳 π’‹π’‰π’”π’ƒπ’’π’Š

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 58π‘Œ 15π‘Œ 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 26π‘Œ 15π‘Œ 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 7.4π‘Œ 11.3π‘Œ 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 15π‘Œ 16.6π‘Œ 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 1.6π‘Œ 6.8π‘Œ 1π‘Œ

Alok Tripathy, GTC 2019

slide-42
SLIDE 42

Algorithms

  • Old maximal 𝑙-core algorithm L
  • New maximal 𝑙-core algorithm J
  • 𝑙-core edge decomposition

42

Alok Tripathy, GTC 2019

slide-43
SLIDE 43

Algorithms

  • Old maximal 𝑙-core algorithm L
  • New maximal 𝑙-core algorithm J
  • 𝑙-core edge decomposition J

43

Alok Tripathy, GTC 2019

slide-44
SLIDE 44
  • 𝑙-core edge decomposition

– For each edge, what is the largest 𝑙-core that

edge belongs to?

44

Alok Tripathy, GTC 2019

1 2 2 2 2 2 2

𝑙-core Decomp. Definitions

slide-45
SLIDE 45

𝑙-core Decomp. Algorithm

45

while Β‘vertices Β‘exist Β‘in Β‘G

  • ­‑ find Β‘the Β‘maximal Β‘k-­‑core Β‘in Β‘G Β‘
  • ­‑ mark Β‘all Β‘edges Β‘in Β‘k-­‑core Β‘with Β‘value

k

  • ­‑ delete Β‘k-­‑core Β‘from Β‘G Β‘

Alok Tripathy, GTC 2019

slide-46
SLIDE 46

𝑙-core Decomp. Code

46

Alok Tripathy, GTC 2019

slide-47
SLIDE 47
  • ParK Extension

– parallel 𝑙-core algorithm; IEEE BigData 2014 – Some parallelism – No dynamic graph operations – vertex flagging

  • igraph Extension

– network analysis toolkit – Sequential – Uses edge deletions

  • Both run on Intel Xeon E5-2695; 36 cores, 72

threads

47

Alok Tripathy, GTC 2019

Compared Against

slide-48
SLIDE 48

𝑙-core Decomp. Results

  • Our algorithm always beats igraph
  • Our algorithm always beats ParK (1.2π‘Œ βˆ’ 7.8π‘Œ).

– Usually ~2π‘Œ faster

  • Our algorithm uses dynamic graph operations

– And effectively uses the GPU

48

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| ‘𝑷𝒗𝒔 Β‘ π’ƒπ’Žπ’‰π’‘π’”π’‹π’–π’Šπ’ ‘𝑸𝒃𝒔𝑳 Β‘ π’‹π’‰π’”π’ƒπ’’π’Š

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 129.2π‘Œ 51.5π‘Œ 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 63.8π‘Œ 25π‘Œ 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 25.9π‘Œ 3.3π‘Œ 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 85.9π‘Œ 36.3π‘Œ 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 4.7π‘Œ 4.1π‘Œ 1π‘Œ

Alok Tripathy, GTC 2019

slide-49
SLIDE 49

𝑙-core Decomp. GPU Utilization

49

Alok Tripathy, GTC 2019

slide-50
SLIDE 50
  • Decomp. vs. Slow Maximal 𝑙-core

50

Alok Tripathy, GTC 2019

slide-51
SLIDE 51

Conclusion

  • Dynamic graph operations can be computed
  • n a GPU efficiently.
  • Current idea:

– Dynamic graph operations are only for dynamic graphs, not static graphs

  • New idea: Static graph algorithms can benefit

from dynamic graph operations

– If If we can efficiently utilize the system

51

Alok Tripathy, GTC 2019

slide-52
SLIDE 52

Takeaway

  • Consider dynamic graph operations when you

implement graph algorithms

– Even if the graph doesn’t change over time.

52

Alok Tripathy, GTC 2019

slide-53
SLIDE 53

Thank you

53

  • 𝑙-core Paper: Proceedings of IEEE BigData 2018
  • 𝑙-truss, Hornet Paper: Proceedings of IEEE HPEC 2017/18
  • Code: https://github.com/hornet-gt/hornet

Oded Green Georgia Tech/NVIDIA

  • green@gatech.edu

@OdedGreen Polo Chau Georgia Tech polo@gatech.edu @PoloChau cc.gatech.edu/~dchau/ Fred Hohman Georgia Tech fredhohman@gatech.edu @fredhohman fredhohman.com Alok Tripathy Georgia Tech atripathy8@gatech.edu @alokpathy www.aloktripathy.me

Scalable K-Core Decomposition for Static Graphs Using a Dynamic Graph Data Structure

Alok Tripathy, GTC 2019

slide-54
SLIDE 54

Backup slides

54

Oded Green, HPEC-18

slide-55
SLIDE 55
  • Compared against

– ParK: parallel 𝑙-core algorithm; BigData 2014 – igraph: network analysis toolkit

  • Dynamic graph data structure

– Hornet, GPU-based

  • Systems used

– Our algorithms: NVIDIA P100 – ParK, igraph: Intel Xeon E5-2695; 36 cores, 72 threads

  • igraph is sequential

55

Alok Tripathy, GTC 2019

Performance

slide-56
SLIDE 56
  • Compared against

– Wang & Cheng: sequential algorithm for finding 𝑙-truss – Graphulo: parallel algorithm for finding 𝑙-tru

  • Dynamic graph data structure

– cuSTINGER-Delta, GPU-based

  • Evolved into Hornet
  • Systems used

– Our algorithm: NVIDIA P100 – Wang & Cheng: Intel Core2 dual-core 2.80GHz CPU – Graphulo: 2 Intel i7dual-core

56

Alok Tripathy, GTC 2019

Performance

slide-57
SLIDE 57

GPU Utilization / Batch Size

57

Alok Tripathy, GTC 2019

slide-58
SLIDE 58

HKS (maximal k-core) results

  • ParK: k-core algorithm from IEEE Big Data 2014
  • HKS run on NVIDIA P100 with Hornet data structure.

58

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| 𝑰𝑳𝑻 Β‘(𝒕𝒇𝒅. ) 𝑸𝒃𝒔𝑳 Β‘(𝒕𝒇𝒅. ) π’‹π’‰π’”π’ƒπ’’π’Š Β‘(𝒕𝒇𝒅. )

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 0.731 2.2π‘Œ 0.105 15π‘Œ 1.633 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 2.953 1.3π‘Œ 0.253 15π‘Œ 3.825 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 𝑃𝑃𝑁 𝑃𝑃𝑁 0.549 11.3π‘Œ 6.191 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 4.331 0.6π‘Œ 0.155 16.6π‘Œ 2.586 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 𝑃𝑃𝑁 𝑃𝑃𝑁 3.052 6.8π‘Œ 20.693 1π‘Œ

π‘₯π‘—π‘™π‘—π‘žπ‘“π‘’π‘—π‘ βˆ’ π‘šπ‘—π‘œπ‘™ βˆ’ 𝑒𝑓

3.2𝑁 65.8𝑁 𝑃𝑃𝑁 𝑃𝑃𝑁 0.764 5.1π‘Œ 3.954 1π‘Œ

Alok Tripathy, BigData 2018

slide-59
SLIDE 59

HDS (k-core decomp) results

  • ParK: k-core algorithm from IEEE Big Data 2014
  • HDS run on NVIDIA P100 with Hornet data structure.

59

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| 𝑰𝑬𝑻 Β‘(𝒕𝒇𝒅. ) 𝑸𝒃𝒔𝑳 Β‘(𝒕𝒇𝒅. ) π’‹π’‰π’”π’ƒπ’’π’Š Β‘(𝒕𝒇𝒅. )

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 6.184 13.3π‘Œ 1.595 51.5π‘Œ 82.066 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 91.481 3.6π‘Œ 13.294 25π‘Œ 331.538 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 𝑃𝑃𝑁 𝑃𝑃𝑁 487.112 3.3π‘Œ 1572.985 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 50.049 4.7π‘Œ 6.488 36.3π‘Œ 235.790 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 𝑃𝑃𝑁 𝑃𝑃𝑁 1148.638 4.1π‘Œ 4725.317 1π‘Œ

π‘₯π‘—π‘™π‘—π‘žπ‘“π‘’π‘—π‘ βˆ’ π‘šπ‘—π‘œπ‘™ βˆ’ 𝑒𝑓

3.2𝑁 65.8𝑁 𝑃𝑃𝑁 𝑃𝑃𝑁 1397.323 2.1π‘Œ 3003.166 1π‘Œ

Alok Tripathy, BigData 2018

slide-60
SLIDE 60

GPU Utilization

60

Alok Tripathy, BigData 2018

slide-61
SLIDE 61

GPU Utilization / Batch Size

61

Alok Tripathy, BigData 2018

slide-62
SLIDE 62

Maximal K-Core Algorithm (HKO)

62

Alok Tripathy, BigData 2018

while Β‘there Β‘are Β‘non-­‑flagged Β‘vertices flag Β‘all Β‘vertices Β‘with Β‘degree Β‘<= Β‘π‘žπ‘“π‘“π‘š if Β‘there Β‘aren’t Β‘any increment Β‘π‘žπ‘“π‘“π‘š else for Β‘each Β‘flagged Β‘vertex ‘𝑀 for Β‘each Β‘neighbor Β‘of ‘𝑀 decrement Β‘neighbor’s Β‘degree

π‘žπ‘“π‘“π‘š = 3

slide-63
SLIDE 63

Maximal K-Core Algorithm (HKO)

63

Alok Tripathy, BigData 2018

slide-64
SLIDE 64

Maximal K-Core Algorithm 1 (HKS)

64

Alok Tripathy, BigData 2018

slide-65
SLIDE 65

Maximal K-Core Algorithm 1 (HKS)

65

Alok Tripathy, BigData 2018

slide-66
SLIDE 66

Maximal K-Core Algorithm 1 (HKS)

66

Alok Tripathy, BigData 2018

slide-67
SLIDE 67

Maximal K-Core Algorithm 1 (HKS)

67

Alok Tripathy, BigData 2018

slide-68
SLIDE 68

Maximal K-Core Algorithm 1 (HKS)

68

Alok Tripathy, BigData 2018

slide-69
SLIDE 69

K-Core Decomp. Algorithm 1 (HDS)

69

Alok Tripathy, BigData 2018

slide-70
SLIDE 70

HKO (maximal k-core) results

  • ParK: k-core algorithm from IEEE Big Data 2014
  • HKO run on NVIDIA P100 with Hornet data structure.

70

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| 𝑰𝑳𝑷 Β‘(𝒕𝒇𝒅. ) 𝑸𝒃𝒔𝑳 Β‘(𝒕𝒇𝒅. ) π’‹π’‰π’”π’ƒπ’’π’Š Β‘(𝒕𝒇𝒅. )

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 0.028 15π‘Œ 0.105 15π‘Œ 1.633 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 0.147 26π‘Œ 0.253 15π‘Œ 3.825 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 0.838 7.4π‘Œ 0.549 11.3π‘Œ 6.191 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 0.174 15π‘Œ 0.155 16.6π‘Œ 2.586 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 13.160 1.6π‘Œ 3.052 6.8π‘Œ 20.693 1π‘Œ

π‘₯π‘—π‘™π‘—π‘žπ‘“π‘’π‘—π‘ βˆ’ π‘šπ‘—π‘œπ‘™ βˆ’ 𝑒𝑓

3.2𝑁 65.8𝑁 1.987 2π‘Œ 0.764 5.1π‘Œ 3.954 1π‘Œ

Alok Tripathy, BigData 2018

slide-71
SLIDE 71

HDO (k-core decomp) results

  • ParK: k-core algorithm from IEEE Big Data 2014
  • HDO run on NVIDIA P100 with Hornet data structure.

71

𝑂𝑏𝑛𝑓 |𝑾| |𝑭| 𝑰𝑬𝑷 Β‘(𝒕𝒇𝒅. ) 𝑸𝒃𝒔𝑳 Β‘(𝒕𝒇𝒅. ) π’‹π’‰π’”π’ƒπ’’π’Š Β‘(𝒕𝒇𝒅. )

π‘’π‘π‘šπ‘ž βˆ’ π‘π‘£π‘’β„Žπ‘π‘ 

5.5𝑁 8.6𝑁 0.635 129.2π‘Œ 1.595 51.5π‘Œ 82.066 1π‘Œ

π‘žπ‘π‘’π‘“π‘œπ‘’π‘‘π‘—π‘’π‘“

3.8𝑁 16.5𝑁 5.200 63.8π‘Œ 13.294 25π‘Œ 331.538 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘€π‘—π‘€π‘“πΎπ‘π‘£π‘ π‘œπ‘π‘š1

4.8𝑁 42.9𝑁 60.755 25.9π‘Œ 487.112 3.3π‘Œ 1572.985 1π‘Œ

𝑑𝑝𝑑 βˆ’ π‘žπ‘π‘™π‘“π‘‘ βˆ’ π‘ π‘“π‘šπ‘π‘’π‘—π‘π‘œπ‘‘β„Žπ‘—π‘žπ‘‘

1.6𝑁 22.3𝑁 2.756 85.9π‘Œ 6.488 36.3π‘Œ 235.790 1π‘Œ

𝑒𝑠𝑏𝑑𝑙𝑓𝑠𝑑

27.7𝑁 140.6𝑁 1006.954 4.7π‘Œ 1148.638 4.1π‘Œ 4725.317 1π‘Œ

π‘₯π‘—π‘™π‘—π‘žπ‘“π‘’π‘—π‘ βˆ’ π‘šπ‘—π‘œπ‘™ βˆ’ 𝑒𝑓

3.2𝑁 65.8𝑁 266.923 11.3π‘Œ 1397.323 2.1π‘Œ 3003.166 1π‘Œ

Alok Tripathy, BigData 2018