An Analysis of the Feasibility of Graph Compression Techniques for - - PowerPoint PPT Presentation

an analysis of the feasibility of graph compression
SMART_READER_LITE
LIVE PREVIEW

An Analysis of the Feasibility of Graph Compression Techniques for - - PowerPoint PPT Presentation

An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries Frank Tetzel, Hannes Voigt, Marcus Paradies, Wolfgang Lehner May 19, 2017 1 Regular Path Queries (RPQs) Matching paths conforming to regular


slide-1
SLIDE 1

An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries

Frank Tetzel, Hannes Voigt, Marcus Paradies, Wolfgang Lehner

May 19, 2017

1

slide-2
SLIDE 2

Regular Path Queries (RPQs)

Matching paths conforming to regular expression Only distinct (start, end) vertex pairs in result set

s q1 f

a b a

start state fjnal state Automaton representing (ab)+ v2 v2 v1 v1 v3 v3 v5 v5 v4 v4 v6 v6 v0 v0 a b a b a a a b Search in data graph start end v2 v0, v2, v3 v3 v0, v2, v3 v6 v0, v2, v3 Final result set

2

slide-3
SLIDE 3

Processing strategies

Baseline Guided search with automaton on data graph Adjacency list on column store MR-Index Store results of RPQs for future use Treat vertex pairs as edges of a reachability graph Use graph compression for reachability graph

v v v v Reachability graph

3

slide-4
SLIDE 4

Processing strategies

Baseline Guided search with automaton on data graph Adjacency list on column store MR-Index Store results of RPQs for future use Treat vertex pairs as edges of a reachability graph Use graph compression for reachability graph

v2 v2 v3 v3 v6 v6 v0 v0 Reachability graph

3

slide-5
SLIDE 5

K2-tree graph compression

Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix

1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0

Conceptual K2-tree for k T 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation 4

slide-6
SLIDE 6

K2-tree graph compression

Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix

1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0

Conceptual K2-tree for k = 2 T 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation 4

slide-7
SLIDE 7

K2-tree graph compression

Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix

1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0

Conceptual K2-tree for k = 2 T 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation 4

slide-8
SLIDE 8

K2-tree graph compression

Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix

1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0

Conceptual K2-tree for k = 2 T 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation 4

slide-9
SLIDE 9

K2-tree graph compression

Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix

1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0

Conceptual K2-tree for k = 2 T 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation 4

slide-10
SLIDE 10

K2-tree graph compression

Compact representation of a binary relation, e.g., an adjacency matrix Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix

1 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0

Conceptual K2-tree for k = 2 T1 = 1010 T2 = 0011 0011 L = 1010 1111 1000 1100 Bitvector representation 4

slide-11
SLIDE 11

Measurements

LDBC social network dataset with scale factor 1 (3 million vertices and 17 million edges) generate all possible RPQs with path length up to 3 hops, and recursions of them

−11 −10 −9 −8 −7 −6 −5 −4 50 100 150 200 selectivity in 10x runtime in ms base ADJ K2 K2C

(ADJ - adjacency list, K2 - K2-tree, K2C - K2-tree with leaf compression) 5

slide-12
SLIDE 12

Space consumption

0.5 1 1.5 2 ·10−4 0.5 1 1.5 ·1010 selectivity space consumption in MB ADJ K2 K2C

Space consumption of queries 6

slide-13
SLIDE 13

Measurements

Batch of 300 queries, memory budget of 10 GB

base ADJ K2 K2C best 2 4 6 8 10 time in 103· s ADJ K2 K2C 50 100 150 number of uses

Batch processing with sampled query sets 7

slide-14
SLIDE 14

Conclusion and Future Work

Graph compression promising for storing reachability information K2-trees not benefjcial for all results sets

◮ Too much overhead for tiny result sets ◮ No good compression for huge result sets

Future Work Experiment with other K2-trees to compress uniform 1-regions as well Improve access time by providing specialized range queries, extracting submatrices Compare memory consumption and access time with other compact reachability indices like FERRARI Thank You 8

slide-15
SLIDE 15

Conclusion and Future Work

Graph compression promising for storing reachability information K2-trees not benefjcial for all results sets

◮ Too much overhead for tiny result sets ◮ No good compression for huge result sets

Future Work Experiment with other K2-trees to compress uniform 1-regions as well Improve access time by providing specialized range queries, extracting submatrices Compare memory consumption and access time with other compact reachability indices like FERRARI Thank You 8