Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz - - PowerPoint PPT Presentation
Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz - - PowerPoint PPT Presentation
Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann Technical University of Munich Chair of Database Systems Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 3
3 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
4 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Goal: Efficiently load a given graph dataset for explorative analytics
5 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Problem: The optimal way of loading the graph depends on various factors:
- Format of the graph data
- Source of the data
- Properties of the input data
- Target graph data structure
- Execution machine
Graph loading pipeline must be adapted to the scenario at hand
6 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Scenario-specific Graph Loading
Goal: Efficiently load a given graph dataset for explorative analytics
7 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Goal: Efficiently load a given graph dataset for explorative analytics
8 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string?
Goal: Efficiently load a given graph dataset for explorative analytics
9 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string? Can input data be read multiple times?
Goal: Efficiently load a given graph dataset for explorative analytics
10 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times?
Goal: Efficiently load a given graph dataset for explorative analytics
11 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available?
Goal: Efficiently load a given graph dataset for explorative analytics
12 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available?
Goal: Efficiently load a given graph dataset for explorative analytics
13 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available? Which data structure to generate?
Goal: Efficiently load a given graph dataset for explorative analytics
14 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Identifier data type? binary, decimal, string? Random access possible? Can input data be read multiple times? Explicit vertex list available? Which data structure to generate?
Goal: Efficiently load a given graph dataset for explorative analytics
15 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
16 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
17 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
18 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
19 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
20 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
21 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
Vectorized decimal parsing
- Leverage wide vector units for identifier parsing
22 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
Vectorized decimal parsing
- Leverage wide vector units for identifier parsing
23 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
2x 20x 200x
Parsers
- T. Mủhlbauer, W. Rỏdiger, R. Seilbeck, A. Reiser, A. Kemper, and T. Neumann
Instant loading for main memory databases. Proceedings of the VLDB Endowment, 2013.
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
Vectorized decimal parsing
- Leverage wide vector units for identifier parsing
24 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
Vectorized decimal parsing
- Leverage wide vector units for identifier parsing
25 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Binary reader
- No parsing necessary => directly copy vertex identifiers
- Every edge same size => work splitting trivial
Library-provided decimal parsing
- Readily-available for many languages
- We evaluated C++’s stream operator and strtol
- Varying edge length => work splitting more complex
Iterative decimal parsing
- Multiply by ten and add character’s respective digit
Vectorized decimal parsing
- Leverage wide vector units for identifier parsing
Parser code generation
26 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Parsers
2x 20x 200x
Goal: Efficiently load a given graph dataset for explorative analytics
27 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Closely related areas
28 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
Closely related areas Map of Neighbor Lists => No relabeling (Identity)
- Directly use dataset identifiers
- Runtime overhead for neighbor and property accesses
- Simple and efficient to load
29 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1 1 2 2
Closely related areas Map of Neighbor Lists => No relabeling (Identity)
- Directly use dataset identifiers
- Runtime overhead for neighbor and property accesses
- Simple and efficient to load
30 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1 1 2 2
Hash-based access
Closely related areas Map of Neighbor Lists => No relabeling (Identity)
- Directly use dataset identifiers
- Runtime overhead for neighbor and property accesses
- Simple and efficient to load
Compressed Sparse Row (CSR) => Dense relabeling
- Dense identifiers [0, |V|-1]
- Packed, sequential memory layout
- Allows offset-based data structure access
- e.g. for neighbor lists, or properties
- Overhead during loading
31 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1 1 2 2
1 1 2 2 Hash-based access
Closely related areas No relabeling (Identity) => Map of Neighbor Lists
- Directly use dataset identifiers
- Runtime overhead for neighbor and property accesses
- Simple and efficient to load
Dense relabeling => Compressed Sparse Row (CSR)
- Dense identifiers [0, |V|-1]
- Packed, sequential memory layout
- Allows offset-based data structure access
- e.g. for neighbor lists, or properties
- Overhead during loading
32 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Data Structures and Identifier Relabeling
1 1 2 2
1 1 2 2 Hash-based access Offset-based access
Mapping
- Assign dense identifiers while reading the input data
- Global: All workers use a shared map
- Local: Each worker creates a local relabeling
33 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
Mapping
- Assign dense identifiers while reading the input data
- Global: All workers use a shared map
- Local: Each worker creates a local relabeling
Collection
- Gather unique identifiers while reading the input
- Assign dense identifiers at the end
- Global: Shared identifier set for all workers
- Local: Use a local set per worker
34 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
∪ ∪ ∪
Mapping
- Assign dense identifiers while reading the input data
- Global: All workers use a shared map
- Local: Each worker creates a local relabeling
Collection
- Gather unique identifiers while reading the input
- Assign dense identifiers at the end
- Global: Shared identifier set for all workers
- Local: Use a local set per worker
Relabeling is finalized/applied when the graph data structure is written
35 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies
∪ ∪ ∪
Graph loading times for various relabeling strategies No further dataset properties leveraged
36 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies - Measurements
Graph loading times for various relabeling strategies No further dataset properties leveraged
37 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Relabeling Strategies - Measurements
Goal: Efficiently load a given graph dataset for explorative analytics
38 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
General Graph Loading Pipeline
Read
- Parse edges and create relabeling
- Write edges to worker-local buffer
Sync
- Find unique vertices
- Count neighbors
Write
- Create final graph data structure
- Apply final relabeling
Analytics • The actual analytics work
Explicit vertex lists
- All unique vertices in the dataset are known beforehand
- No need to find and count vertices => improves loading efficiency
39 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Explicit vertex lists
- All unique vertices in the dataset are known beforehand
- No need to find and count vertices => improves loading efficiency
Partitioned edge list
- Edge list partitioned by source vertex
- Each source vertex has a responsible worker thread
- determined by the input data chunk
- Significantly reduces worker communication overhead
40 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Explicit vertex lists
- All unique vertices in the dataset are known beforehand
- No need to find and count vertices => improves loading efficiency
Partitioned edge list
- Edge list partitioned by source vertex
- Each source vertex has a responsible worker thread
- determined by the input data chunk
- Significantly reduces worker communication overhead
41 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties
Partitioned 1 2 1 3 1 4 2 1 2 4 3 1 3 2 4 3 Unpartitioned 4 3 1 3 3 1 1 4 2 1 1 2 3 2 2 4
42 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
- LDBC-1000, |V| = 3.6M, |E| = 447M
- Twitter , |V| = 41.6M, |E| = 1.5B
43 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
- LDBC-1000, |V| = 3.6M, |E| = 447M
- Twitter , |V| = 41.6M, |E| = 1.5B
44 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
- LDBC-1000, |V| = 3.6M, |E| = 447M
- Twitter , |V| = 41.6M, |E| = 1.5B
45 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Leveraging Dataset Properties - Measurements
Graphs
- LDBC-1000, |V| = 3.6M, |E| = 447M
- Twitter , |V| = 41.6M, |E| = 1.5B
46 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Comparison with Existing Systems
Twitter LDBC Oracle PGX 2153s 632s GraphBIG
- ut of memory
1682s Ours non-partitioned 88s 24s Ours partitioned 34s 7s
Graphs
- LDBC-1000, |V| = 3.6M, |E| = 447M
- Twitter , |V| = 41.6M, |E| = 1.5B
Machine:
- 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
- 256GB, Ubuntu 15.10, kernel 4.2.0
CSR (relabeled) Load + Run = Total Neighbors Map (identity) Load + Run = Total PageRank 37s 33s 70s---- 25s 194s 219s---- Triangle Counting 37s 49s 86s---- 25s 66s 92s----
47 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques
Influence on Analytics
Graphs
- Twitter , |V| = 41.6M, |E| = 1.5B
Machine:
- 2x Intel Xeon E5-2660 v2 2 × 20 @ 2.2GHz)
- 256GB, Ubuntu 15.10, kernel 4.2.0
Optimal loading pipeline for a graph dataset is highly dependent on the
- Data format
- Source of the data
- Properties of the dataset
- Algorithm-dependent graph data structure
- Target machine
Custom iterative identifier parsing always beneficial Concurrent identifier relabeling mostly beneficial
- More challenging than identity mapping, but usually worth it
Leveraging properties of the dataset can lead to enormous speedups
48 Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques