Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz - - PowerPoint PPT Presentation

evaluation of parallel graph loading techniques
SMART_READER_LITE
LIVE PREVIEW

Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz - - PowerPoint PPT Presentation

Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann Technical University of Munich Chair of Database Systems Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 3


slide-1
SLIDE 1

Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann Technical University of Munich Chair of Database Systems

Evaluation of Parallel Graph Loading Techniques

slide-2
SLIDE 2

3 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

slide-3
SLIDE 3

4 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

slide-4
SLIDE 4

Goal: Efficiently load a given graph dataset for explorative analytics

5 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

slide-5
SLIDE 5

Problem: The optimal way of loading the graph depends on various factors:

  • Format of the graph data
  • Source of the data
  • Properties of the input data
  • Target graph data structure
  • Execution machine

Graph loading pipeline must be adapted to the scenario at hand

6 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Scenario-specific Graph Loading

slide-6
SLIDE 6

Goal: Efficiently load a given graph dataset for explorative analytics

7 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

slide-7
SLIDE 7

Goal: Efficiently load a given graph dataset for explorative analytics

8 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string?

slide-8
SLIDE 8

Goal: Efficiently load a given graph dataset for explorative analytics

9 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string? Can input data be read multiple times?

slide-9
SLIDE 9

Goal: Efficiently load a given graph dataset for explorative analytics

10 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times?

slide-10
SLIDE 10

Goal: Efficiently load a given graph dataset for explorative analytics

11 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available?

slide-11
SLIDE 11

Goal: Efficiently load a given graph dataset for explorative analytics

12 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available?

slide-12
SLIDE 12

Goal: Efficiently load a given graph dataset for explorative analytics

13 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available? Which data structure to generate?

slide-13
SLIDE 13

Goal: Efficiently load a given graph dataset for explorative analytics

14 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available? Which data structure to generate?

slide-14
SLIDE 14

Goal: Efficiently load a given graph dataset for explorative analytics

15 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

slide-15
SLIDE 15

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

16 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

slide-16
SLIDE 16

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

17 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

slide-17
SLIDE 17

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

18 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-18
SLIDE 18

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

19 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-19
SLIDE 19

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

20 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-20
SLIDE 20

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

21 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-21
SLIDE 21

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

Vectorized decimal parsing

  • Leverage wide vector units for identifier parsing

22 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-22
SLIDE 22

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

Vectorized decimal parsing

  • Leverage wide vector units for identifier parsing

23 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

2x 20x 200x

Parsers

  • T. Mủhlbauer, W. Rỏdiger, R. Seilbeck, A. Reiser, A. Kemper, and T. Neumann

Instant loading for main memory databases. Proceedings of the VLDB Endowment, 2013.

slide-23
SLIDE 23

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

Vectorized decimal parsing

  • Leverage wide vector units for identifier parsing

24 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-24
SLIDE 24

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

Vectorized decimal parsing

  • Leverage wide vector units for identifier parsing

25 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-25
SLIDE 25

Binary reader

  • No parsing necessary => directly copy vertex identifiers
  • Every edge same size => work splitting trivial

Library-provided decimal parsing

  • Readily-available for many languages
  • We evaluated C++’s stream operator and strtol
  • Varying edge length => work splitting more complex

Iterative decimal parsing

  • Multiply by ten and add character’s respective digit

Vectorized decimal parsing

  • Leverage wide vector units for identifier parsing

Parser code generation

26 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Parsers

2x 20x 200x

slide-26
SLIDE 26

Goal: Efficiently load a given graph dataset for explorative analytics

27 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

slide-27
SLIDE 27

Closely related areas

28 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Data Structures and Identifier Relabeling

slide-28
SLIDE 28

Closely related areas Map of Neighbor Lists => No relabeling (Identity)

  • Directly use dataset identifiers
  • Runtime overhead for neighbor and property accesses
  • Simple and efficient to load

29 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Data Structures and Identifier Relabeling

1 1 2 2

slide-29
SLIDE 29

Closely related areas Map of Neighbor Lists => No relabeling (Identity)

  • Directly use dataset identifiers
  • Runtime overhead for neighbor and property accesses
  • Simple and efficient to load

30 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Data Structures and Identifier Relabeling

1 1 2 2

Hash-based access

slide-30
SLIDE 30

Closely related areas Map of Neighbor Lists => No relabeling (Identity)

  • Directly use dataset identifiers
  • Runtime overhead for neighbor and property accesses
  • Simple and efficient to load

Compressed Sparse Row (CSR) => Dense relabeling

  • Dense identifiers [0, |V|-1]
  • Packed, sequential memory layout
  • Allows offset-based data structure access
  • e.g. for neighbor lists, or properties
  • Overhead during loading

31 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Data Structures and Identifier Relabeling

1 1 2 2

1 1 2 2 Hash-based access

slide-31
SLIDE 31

Closely related areas No relabeling (Identity) => Map of Neighbor Lists

  • Directly use dataset identifiers
  • Runtime overhead for neighbor and property accesses
  • Simple and efficient to load

Dense relabeling => Compressed Sparse Row (CSR)

  • Dense identifiers [0, |V|-1]
  • Packed, sequential memory layout
  • Allows offset-based data structure access
  • e.g. for neighbor lists, or properties
  • Overhead during loading

32 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Data Structures and Identifier Relabeling

1 1 2 2

1 1 2 2 Hash-based access Offset-based access

slide-32
SLIDE 32

Mapping

  • Assign dense identifiers while reading the input data
  • Global: All workers use a shared map
  • Local: Each worker creates a local relabeling

33 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Relabeling Strategies

slide-33
SLIDE 33

Mapping

  • Assign dense identifiers while reading the input data
  • Global: All workers use a shared map
  • Local: Each worker creates a local relabeling

Collection

  • Gather unique identifiers while reading the input
  • Assign dense identifiers at the end
  • Global: Shared identifier set for all workers
  • Local: Use a local set per worker

34 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Relabeling Strategies

∪ ∪ ∪

slide-34
SLIDE 34

Mapping

  • Assign dense identifiers while reading the input data
  • Global: All workers use a shared map
  • Local: Each worker creates a local relabeling

Collection

  • Gather unique identifiers while reading the input
  • Assign dense identifiers at the end
  • Global: Shared identifier set for all workers
  • Local: Use a local set per worker

Relabeling is finalized/applied when the graph data structure is written

35 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Relabeling Strategies

∪ ∪ ∪

slide-35
SLIDE 35

Graph loading times for various relabeling strategies No further dataset properties leveraged

36 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Relabeling Strategies - Measurements

slide-36
SLIDE 36

Graph loading times for various relabeling strategies No further dataset properties leveraged

37 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Relabeling Strategies - Measurements

slide-37
SLIDE 37

Goal: Efficiently load a given graph dataset for explorative analytics

38 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

General Graph Loading Pipeline

Read

  • Parse edges and create relabeling
  • Write edges to worker-local buffer

Sync

  • Find unique vertices
  • Count neighbors

Write

  • Create final graph data structure
  • Apply final relabeling

Analytics • The actual analytics work

slide-38
SLIDE 38

Explicit vertex lists

  • All unique vertices in the dataset are known beforehand
  • No need to find and count vertices => improves loading efficiency

39 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties

slide-39
SLIDE 39

Explicit vertex lists

  • All unique vertices in the dataset are known beforehand
  • No need to find and count vertices => improves loading efficiency

Partitioned edge list

  • Edge list partitioned by source vertex
  • Each source vertex has a responsible worker thread
  • determined by the input data chunk
  • Significantly reduces worker communication overhead

40 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties

slide-40
SLIDE 40

Explicit vertex lists

  • All unique vertices in the dataset are known beforehand
  • No need to find and count vertices => improves loading efficiency

Partitioned edge list

  • Edge list partitioned by source vertex
  • Each source vertex has a responsible worker thread
  • determined by the input data chunk
  • Significantly reduces worker communication overhead

41 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties

Partitioned 1 2 1 3 1 4 2 1 2 4 3 1 3 2 4 3 Unpartitioned 4 3 1 3 3 1 1 4 2 1 1 2 3 2 2 4

slide-41
SLIDE 41

42 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties - Measurements

Graphs

  • LDBC-1000, |V| = 3.6M, |E| = 447M
  • Twitter , |V| = 41.6M, |E| = 1.5B
slide-42
SLIDE 42

43 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties - Measurements

Graphs

  • LDBC-1000, |V| = 3.6M, |E| = 447M
  • Twitter , |V| = 41.6M, |E| = 1.5B
slide-43
SLIDE 43

44 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties - Measurements

Graphs

  • LDBC-1000, |V| = 3.6M, |E| = 447M
  • Twitter , |V| = 41.6M, |E| = 1.5B
slide-44
SLIDE 44

45 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Leveraging Dataset Properties - Measurements

Graphs

  • LDBC-1000, |V| = 3.6M, |E| = 447M
  • Twitter , |V| = 41.6M, |E| = 1.5B
slide-45
SLIDE 45

46 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Comparison with Existing Systems

Twitter LDBC Oracle PGX 2153s 632s GraphBIG

  • ut of memory

1682s Ours non-partitioned 88s 24s Ours partitioned 34s 7s

Graphs

  • LDBC-1000, |V| = 3.6M, |E| = 447M
  • Twitter , |V| = 41.6M, |E| = 1.5B

Machine:

  • 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
  • 256GB, Ubuntu 15.10, kernel 4.2.0
slide-46
SLIDE 46

CSR (relabeled) Load + Run = Total Neighbors Map (identity) Load + Run = Total PageRank 37s 33s 70s---- 25s 194s 219s---- Triangle Counting 37s 49s 86s---- 25s 66s 92s----

47 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Influence on Analytics

Graphs

  • Twitter , |V| = 41.6M, |E| = 1.5B

Machine:

  • 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
  • 256GB, Ubuntu 15.10, kernel 4.2.0
slide-47
SLIDE 47

Optimal loading pipeline for a graph dataset is highly dependent on the

  • Data format
  • Source of the data
  • Properties of the dataset
  • Algorithm-dependent graph data structure
  • Target machine

Custom iterative identifier parsing always beneficial Concurrent identifier relabeling mostly beneficial

  • More challenging than identity mapping, but usually worth it

Leveraging properties of the dataset can lead to enormous speedups

48 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques

Summary