an analysis of the feasibility of graph compression
play

An Analysis of the Feasibility of Graph Compression Techniques for - PowerPoint PPT Presentation

An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries Frank Tetzel, Hannes Voigt, Marcus Paradies, Wolfgang Lehner May 19, 2017 1 Regular Path Queries (RPQs) Matching paths conforming to regular


  1. An Analysis of the Feasibility of Graph Compression Techniques for Indexing Regular Path Queries Frank Tetzel, Hannes Voigt, Marcus Paradies, Wolfgang Lehner May 19, 2017 1

  2. Regular Path Queries (RPQs) Matching paths conforming to regular expression Final result set end start Search in data graph b a a a b a b a 2 start state fjnal state Only distinct (start, end) vertex pairs in result set s f Automaton representing (ab)+ v 4 v 4 v 5 v 5 v 0 v 0 v 2 v 2 v 6 v 6 a v 3 v 3 v 2 v 0 , v 2 , v 3 a b q 1 v 3 v 0 , v 2 , v 3 v 1 v 1 v 6 v 0 , v 2 , v 3

  3. Processing strategies Baseline Guided search with automaton on data graph Adjacency list on column store MR-Index Store results of RPQs for future use Treat vertex pairs as edges of a reachability graph Use graph compression for reachability graph v v v v Reachability graph 3

  4. Processing strategies Treat vertex pairs as edges of a reachability graph Reachability graph Baseline Use graph compression for reachability graph 3 Store results of RPQs for future use MR-Index Adjacency list on column store Guided search with automaton on data graph v 0 v 0 v 2 v 2 v 6 v 6 v 3 v 3

  5. K 2 -tree graph compression Conceptual K 2 -tree for k 1 0 0 1 1 0 T Compact representation of a binary relation, e.g., an adjacency matrix 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation 0 1 1 1 1 1 5 0 0 0 0 0 0 0 0 Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 1 0 0 1 4 1 0 1 0 1 0 0 0 1 1 0 0

  6. K 2 -tree graph compression 0 0 1 0 0 1 1 T 1 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation Compact representation of a binary relation, e.g., an adjacency matrix 1 1 1 1 4 1 Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 Conceptual K 2 -tree for k = 2

  7. K 2 -tree graph compression 0 0 1 0 0 1 1 T 1 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation Compact representation of a binary relation, e.g., an adjacency matrix 1 1 1 1 4 1 Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 Conceptual K 2 -tree for k = 2

  8. K 2 -tree graph compression 0 0 1 0 0 1 1 T 1 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation Compact representation of a binary relation, e.g., an adjacency matrix 1 1 1 1 4 1 Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 Conceptual K 2 -tree for k = 2

  9. K 2 -tree graph compression 0 0 1 0 0 1 1 T 1 1010 T 0011 0011 L 1010 1111 1000 1100 Bitvector representation Compact representation of a binary relation, e.g., an adjacency matrix 1 1 1 1 4 1 Hierarchical graph compression 0 1 2 3 4 5 6 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 1 1 0 0 0 0 3 1 0 1 1 0 0 0 0 4 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Adjacency matrix 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 Conceptual K 2 -tree for k = 2

  10. K 2 -tree graph compression 0 Bitvector representation 0 1 1 0 0 1 0 1 1 1 1 1 Compact representation of a binary relation, e.g., an adjacency matrix 1 4 0 0 1 2 3 4 5 6 Adjacency matrix 0 0 0 0 0 0 0 0 6 1 0 1 1 0 0 0 0 5 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 3 1 0 1 1 0 0 0 0 2 1 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 Hierarchical graph compression 1 0 1 0 1 0 0 0 1 1 0 0 Conceptual K 2 -tree for k = 2 T 1 = 1010 T 2 = 0011 0011 L = 1010 1111 1000 1100

  11. Measurements LDBC social network dataset with scale factor 1 (3 million vertices and 17 million edges) (ADJ - adjacency list, K2 - K 2 -tree, K2C - K 2 -tree with leaf compression) K2C K2 ADJ base runtime in ms 5 generate all possible RPQs with path length up to 3 hops, and recursions of them 200 150 100 50 0 − 11 − 10 − 9 − 8 − 7 − 6 − 5 − 4 selectivity in 10 x

  12. Space consumption selectivity Space consumption of queries K2C K2 ADJ space consumption in MB 6 · 10 10 1 . 5 1 0 . 5 0 0 0 . 5 1 1 . 5 2 · 10 − 4

  13. Measurements Batch of 300 queries, memory budget of 10 GB Batch processing with sampled query sets number of uses K2C K2 ADJ 7 10 150 8 time in 10 3 · s 6 100 4 50 2 base ADJ K2 K2C best

  14. Conclusion and Future Work Graph compression promising for storing reachability information K 2 -trees not benefjcial for all results sets Future Work Improve access time by providing specialized range queries, extracting submatrices Compare memory consumption and access time with other compact reachability indices like FERRARI Thank You 8 ◮ Too much overhead for tiny result sets ◮ No good compression for huge result sets Experiment with other K 2 -trees to compress uniform 1 -regions as well

  15. Conclusion and Future Work Graph compression promising for storing reachability information K 2 -trees not benefjcial for all results sets Future Work Improve access time by providing specialized range queries, extracting submatrices Compare memory consumption and access time with other compact reachability indices like FERRARI Thank You 8 ◮ Too much overhead for tiny result sets ◮ No good compression for huge result sets Experiment with other K 2 -trees to compress uniform 1 -regions as well

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend