experimental study of context free path query evaluation
play

Experimental Study of Context-Free Path Query Evaluation Methods - PowerPoint PPT Presentation

Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019 Introduction MSc student CS & Eng. at TU/e Academic internship at Neo4j Supervised by:


  1. Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019

  2. Introduction ● MSc student CS & Eng. at TU/e ● Academic internship at Neo4j ● Supervised by: George Fletcher Tobias Lindaaker Nikolay Yakovets TU/e Database Group Neo4j ● We implemented and evaluated four methods for computing context-free path query results

  3. Context-Free Grammars Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

  4. Context-Free Grammars Example: the language of even-length Example derivation of the string a b b a palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

  5. Context-Free Grammars Example: the language of even-length Example derivation of the string a b b a palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

  6. Context-Free Grammars Example: the language of even-length Example derivation of the string a b b a palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

  7. Context-Free Grammars Example: the language of even-length Example derivation of the string a b b a palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

  8. Context-Free Grammars Example: the language of even-length Example derivation of the string a b b a palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

  9. Context-Free Path Query ● A query is a context-free grammar ● Grammar where terminals are edge-labels ● Find paths whose edge labels are accepted by the grammar

  10. Context-Free Path Query ● Why? ● Increased expressiveness w.r.t. regular expressions (regular path query) ● Use-cases in ○ biological data analysis ○ static code analysis ○ …

  11. Our work ● We implemented four context-free path query evaluation methods ● Used Neo4j components ○ Graph store (vertices and edges) ○ PageCache ● Query evaluation is separately implemented on top of these components ○ (not integrated into Cypher)

  12. The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on graphs." arXiv preprint arXiv:1502.02242 (2015). 2. Matrix multiplication (GPGPU) Azimov, Rustam, and Semyon Grigorev. "Context-free path querying by matrix multiplication." Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). ACM, 2018. 3. Adapted GLR (Tomita) parser Santos, Fred C., Umberto S. Costa, and Martin A. Musicante. "A Bottom-Up Algorithm for Answering Context-Free Path Queries in Graph Databases." International Conference on Web Engineering. Springer, Cham, 2018. 4. Adapted Earley parser Sevon, Petteri, and Lauri Eronen. "Subgraph queries by context-free grammars." Journal of Integrative Bioinformatics 5.2 (2008): 157-172.

  13. 1. Annotating the grammar Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[u,v] ⇔ there exists an A-path from u to v

  14. 1. Annotating the grammar Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2]

  15. 1. Annotating the grammar Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2] S[1,2], S[3,2] ⇒ (1,2) and (3,2) are vertex pairs matching the grammar

  16. 2. Matrix Multiplication ● Relation matrix representation of the annotated grammar method ● Each grammar non-terminal is stored in the matrix 1 2 3 4 1 B A ● The step of combining X ⇒ Y Z is implemented as a 2 A “multiplication” 3 A ● Can be implemented on GPU 4 B

  17. 2. Matrix Multiplication ● Relation matrix representation of the annotated grammar method ● Each grammar non-terminal is stored in the matrix 1 2 3 4 1 S B A ● The step of combining X ⇒ Y Z is implemented as a 2 A “multiplication” 3 S A ● Can be implemented on GPU 4 B

  18. 3. Adapted GLR (Tomita) parser ● GLR is a generalization of LR parsers ● Use context-free grammars to parse input strings ● Whenever the parser has multiple options, the parse state is duplicated and both options are tested separately ● If at least one of these options leads to acceptance, the input is accepted ● Has a data structure that reduces duplicate work

  19. 3. Adapted GLR (Tomita) parser Adaptations for graph parsing instead of string parsing ● A separate parse state is initialized for each vertex ● Consumes edges instead of string symbols ● Accepting states in w are backtraced to vertex v where parsing started ○ Emits result (v,w) ● The data structure helps keep duplicate work low ● There are some conditions where this algorithm terminates too early ○ Failing to produce some results

  20. 4. Subgraph Parsing ● Similar to the previous method, this is a string parser (Earley parser) adapted for graph input ● Upon acceptance at vertex v, backtracking is used to find all paths that accept at v, and are added to a new graph. ● Query result is the induced subgraph of accepted paths! ● Termination problem ○ This algorithm depends on a maximum length parameter to stop ○ This makes it unsuitable for matching paths of arbitrary length ○ Further: There exist conditions where it is missing results or returns no results at all

  21. Results C ⇒ c C c -1 Grammar 1: S ⇒ A B C B ⇒ b B D ⇒ d A ⇒ a a B ⇒ b C ⇒ D A ⇒ a -1 a -1

  22. Results Grammar 2: S ⇒ a X a -1 X ⇒ b X b -1 X ⇒ d X ⇒ c X c -1

  23. Results Highly ambiguous Method Time (s) Memory (MB) grammar: GLR (list) 2,798.6 3.15 GLR (matrix) 372.0 2.36 S ⇒ X X ⇒ X X Ann. Gram (relational) 0.7 0.31 X ⇒ a Ann. Gram (arbitrary) 0.7 0.48 X ⇒ b Ann. Gram (shortest) 3.7 1.55 Tested on a small Ann. Gram (all-path) 2.8 9.09 (a,b)-labeled graph Matrix Multiplication 0.1 < 0.01 of just 50 vertices

  24. Conclusions ● CFPQ evaluation is not real-time ○ For a graph of 15,000 vertices, run time typically exceeds 1 hour ● Requires large amounts of memory ○ Grammar 2 at 5,000 vertices required multiple gigabytes of memory for most methods ● Annotating the grammar seems most promising ○ Robust, can handle ambiguous grammars well ○ Many possible query semantics ○ Running time: arbitrary path ≈ all-path

  25. Future work ● Specialized methods for more restrictive grammars could be much faster ● The annotated grammar and the matrix representation could serve as a path index or reachability index respectively ○ Related to path index work being done at Neo4j

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend