The More the Merrier: Efficient Multi-Source Graph Traversal Manuel - - PowerPoint PPT Presentation

the more the merrier efficient multi source graph
SMART_READER_LITE
LIVE PREVIEW

The More the Merrier: Efficient Multi-Source Graph Traversal Manuel - - PowerPoint PPT Presentation

The More the Merrier: Efficient Multi-Source Graph Traversal Manuel Then * , Moritz Kaufmann * , Fernando Chirigati , Tuan-Anh Hoang-Vu , Kien Pham , Huy T. Vo , Alfons Kemper * , Thomas Neumann * * Technische Universit t M


slide-1
SLIDE 1

The More the Merrier: Efficient Multi-Source Graph Traversal

Manuel Then*, Moritz Kaufmann*, Fernando Chirigati†, Tuan-Anh Hoang-Vu†, Kien Pham†, Huy T. Vo†, Alfons Kemper*, Thomas Neumann*

* Technische Universität München, † New York University

slide-2
SLIDE 2

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 2

Outline

  • Motivation
  • Challenges
  • Goals
  • Multi-Source BFS
  • Evaluation
  • Summary

The More the Merrier: Efficient Multi-Source Graph Traversal

slide-3
SLIDE 3

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 3

Motivation

  • Graph traversal vital part of graph analytics
  • BFS, DFS, Neighbor traversals, Random walks, ...
slide-4
SLIDE 4

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 4

Motivation

  • Graph traversal vital part of graph analytics
  • BFS, DFS, Neighbor traversals, Random walks, ...
  • Often multiple BFS traversals necessary to compute results
  • Closeness centrality, Shortest paths, ...
slide-5
SLIDE 5

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 5

Motivation

  • Graph traversal vital part of graph analytics
  • BFS, DFS, Neighbor traversals, Random walks, ...
  • Often multiple BFS traversals necessary to compute results
  • Closeness centrality, Shortest paths, ...
  • Real-world graphs often are small-world networks
  • Social networks, Web graphs, Communication networks
slide-6
SLIDE 6

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 6

Motivation

  • Graph traversal vital part of graph analytics
  • BFS, DFS, Neighbor traversals, Random walks, ...
  • Often multiple BFS traversals necessary to compute results
  • Closeness centrality, Shortest paths, ...
  • Real-world graphs often are small-world networks
  • Social networks, Web graphs, Communication networks
  • Subject of this talk: efficiently run multiple BFSs on real-world graphs
slide-7
SLIDE 7

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 7

Challenges

  • Random data access intrinsic to graph traversal algorithms
  • bad cache behavior, frequent CPU stalls
slide-8
SLIDE 8

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 8

Challenges

  • Random data access intrinsic to graph traversal algorithms
  • bad cache behavior, frequent CPU stalls
  • Single bit accesses waste memory bandwidth
  • e.g. for BFS seen bitmaps
slide-9
SLIDE 9

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 9

Challenges

  • Random data access intrinsic to graph traversal algorithms
  • bad cache behavior, frequent CPU stalls
  • Single bit accesses waste memory bandwidth
  • e.g. for BFS seen bitmaps
  • Independent BFS runs redundantly visit vertices multiple times
slide-10
SLIDE 10

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 10

Challenge - Redundant visits

Example: BFSs in a simple graph Initial BFS1

slide-11
SLIDE 11

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 11

Challenge - Redundant visits

Example: BFSs in a simple graph Initial Iteration 1 BFS1

slide-12
SLIDE 12

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 12

Challenge - Redundant visits

Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS1

slide-13
SLIDE 13

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 13

Challenge - Redundant visits

Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS1 BFS2

slide-14
SLIDE 14

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 14

Challenge - Redundant visits

Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS1 BFS2

slide-15
SLIDE 15

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 15

Challenge - Redundant visits

Example: BFSs in a simple graph Initial Iteration 1 Iteration 2 BFS1 BFS2

slide-16
SLIDE 16

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 16

Challenge - Redundant visits (cont.)

Redundant vertex visits for 512 BFSs on LDBC 1M social network graph

  • After a few iterations, many redundant visits in small-world networks
slide-17
SLIDE 17

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 17

Goals

  • Leverage knowledge that multiple BFS traversal are run
slide-18
SLIDE 18

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 18

Goals

  • Leverage knowledge that multiple BFS traversal are run
  • Optimize data access patterns
  • embrace memory accesses instead of trying to hide them
  • CPUs always fetch full cache lines - use all of them
slide-19
SLIDE 19

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 19

Goals

  • Leverage knowledge that multiple BFS traversal are run
  • Optimize data access patterns
  • embrace memory accesses instead of trying to hide them
  • CPUs always fetch full cache lines - use all of them
  • Avoid redundant computation and vertex visits
  • touch vertex information as rarely as possible
slide-20
SLIDE 20

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 20

Multi-Source BFS

  • Concurrently run many independent BFS traversals on the same graph
  • 100s of BFSs on a single CPU core
slide-21
SLIDE 21

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 21

Multi-Source BFS

  • Concurrently run many independent BFS traversals on the same graph
  • 100s of BFSs on a single CPU core

visit seen next

+

X X

slide-22
SLIDE 22

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 22

Multi-Source BFS

  • Concurrently run many independent BFS traversals on the same graph
  • 100s of BFSs on a single CPU core
  • Store concurrent BFSs state as 3 bitsets per vertex
  • Represent BFS traversal as SIMD bit operations on these bitsets

visit seen next

slide-23
SLIDE 23

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 23

Multi-Source BFS

  • Concurrently run many independent BFS traversals on the same graph
  • 100s of BFSs on a single CPU core
  • Store concurrent BFSs state as 3 bitsets per vertex
  • Represent BFS traversal as SIMD bit operations on these bitsets
  • Fully utilize cache line-sized memory accesses of modern CPUs
  • Efficiently share traversals whenever possible
  • neighbors traversed only once for all concurrent BFSs

visit seen next

slide-24
SLIDE 24

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 24

Multi-Source BFS - Example

Initial

slide-25
SLIDE 25

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 25

Multi-Source BFS - Example

Initial Iteration 1

slide-26
SLIDE 26

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 26

Multi-Source BFS - Example

Initial Iteration 1

slide-27
SLIDE 27

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 27

Multi-Source BFS - Example

Initial Iteration 1

slide-28
SLIDE 28

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 28

Multi-Source BFS - Example

Initial Iteration 1 Iteration 2

slide-29
SLIDE 29

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 29

Multi-Source BFS - Example

Initial Iteration 1 Iteration 2

slide-30
SLIDE 30

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 30

Multi-Source BFS - Example

Initial Iteration 1 Iteration 2

slide-31
SLIDE 31

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 31

Multi-Source BFS - Further Improvements

  • Aggregated neighbor processing
  • reduce number of random writes
  • Batching heuristics for maximum sharing
  • Direction-optimizing
  • Prefetching

... see paper

slide-32
SLIDE 32

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 32

Evaluation - The More the Merrier

slide-33
SLIDE 33

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 33

Evaluation

  • MS-BFS-based closeness centrality. 4x Intel Xeon E7-4870v2, 1TB
slide-34
SLIDE 34

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 34

Evaluation

  • MS-BFS-based closeness centrality. 4x Intel Xeon E7-4870v2, 1TB
slide-35
SLIDE 35

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 35

Summary

  • Making graph traversals aware of each other can lead to substantial

performance increase

  • Multi-Source BFS (MS-BFS) runs multiple independent BFSs ...
  • ... on the same graph ...
  • ... concurrently on a single CPU ...
  • ... and shares their traversals.
  • MS-BFS shows 10-100x speedup over existing single-source BFSs
slide-36
SLIDE 36

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 36

Backup 1

slide-37
SLIDE 37

2015-09-01 The More the Merrier: Efficient Multi-Source Graph Traversal 37

Backup 2