Experimental Study of Context-Free Path Query Evaluation Methods - - PowerPoint PPT Presentation
Experimental Study of Context-Free Path Query Evaluation Methods - - PowerPoint PPT Presentation
Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019 Introduction MSc student CS & Eng. at TU/e Academic internship at Neo4j Supervised by:
Introduction
- MSc student CS & Eng. at TU/e
- Academic internship at Neo4j
- Supervised by:
George Fletcher Tobias Lindaaker Nikolay Yakovets TU/e Database Group Neo4j
- We implemented and evaluated four methods for computing
context-free path query results
Context-Free Grammars
Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε
Context-Free Grammars
Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a
Context-Free Grammars
Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a
Context-Free Grammars
Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a
Context-Free Grammars
Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a
Context-Free Grammars
Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a
Context-Free Path Query
- A query is a context-free grammar
- Grammar where terminals are edge-labels
- Find paths whose edge labels are
accepted by the grammar
Context-Free Path Query
- Why?
- Increased expressiveness w.r.t. regular expressions (regular path query)
- Use-cases in
○ biological data analysis ○ static code analysis ○ …
Our work
- We implemented four context-free path query evaluation methods
- Used Neo4j components
○ Graph store (vertices and edges) ○ PageCache
- Query evaluation is separately implemented on top of these components
○ (not integrated into Cypher)
The evaluated methods
1. Annotating the context-free grammar
Hellings, Jelle. "Path results for context-free grammar queries on graphs." arXiv preprint arXiv:1502.02242 (2015).
2. Matrix multiplication (GPGPU)
Azimov, Rustam, and Semyon Grigorev. "Context-free path querying by matrix multiplication." Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). ACM, 2018.
3. Adapted GLR (Tomita) parser
Santos, Fred C., Umberto S. Costa, and Martin A. Musicante. "A Bottom-Up Algorithm for Answering Context-Free Path Queries in Graph Databases." International Conference on Web Engineering. Springer, Cham, 2018.
4. Adapted Earley parser
Sevon, Petteri, and Lauri Eronen. "Subgraph queries by context-free grammars." Journal of Integrative Bioinformatics 5.2 (2008): 157-172.
Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[u,v] ⇔ there exists an A-path from u to v
- 1. Annotating the grammar
Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2]
- 1. Annotating the grammar
Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2] S[1,2], S[3,2] ⇒ (1,2) and (3,2) are vertex pairs matching the grammar
- 1. Annotating the grammar
- 2. Matrix Multiplication
- Relation matrix representation of
the annotated grammar method
- Each grammar non-terminal is
stored in the matrix
- The step of combining X ⇒ Y Z
is implemented as a “multiplication”
- Can be implemented on GPU
1 2 3 4 1 B A 2 A 3 A 4 B
- 2. Matrix Multiplication
- Relation matrix representation of
the annotated grammar method
- Each grammar non-terminal is
stored in the matrix
- The step of combining X ⇒ Y Z
is implemented as a “multiplication”
- Can be implemented on GPU
1 2 3 4 1 S B A 2 A 3 S A 4 B
- 3. Adapted GLR (Tomita) parser
- GLR is a generalization of LR parsers
- Use context-free grammars to parse input strings
- Whenever the parser has multiple options,
the parse state is duplicated and both options are tested separately
- If at least one of these options leads to acceptance, the input is accepted
- Has a data structure that reduces duplicate work
Adaptations for graph parsing instead of string parsing
- A separate parse state is initialized for each vertex
- Consumes edges instead of string symbols
- Accepting states in w are backtraced to vertex v where parsing started
○ Emits result (v,w)
- The data structure helps keep duplicate work low
- There are some conditions where this algorithm terminates too early
○ Failing to produce some results
- 3. Adapted GLR (Tomita) parser
- 4. Subgraph Parsing
- Similar to the previous method, this is a string parser (Earley parser)
adapted for graph input
- Upon acceptance at vertex v, backtracking is used to find all paths that
accept at v, and are added to a new graph.
- Query result is the induced subgraph of accepted paths!
- Termination problem
○ This algorithm depends on a maximum length parameter to stop ○ This makes it unsuitable for matching paths of arbitrary length ○ Further: There exist conditions where it is missing results or returns no results at all
Results
Grammar 1: S ⇒ A B C B ⇒ b B C ⇒ c C c-1 D ⇒ d A ⇒ a a B ⇒ b C ⇒ D A ⇒ a-1 a-1
Results
Grammar 2: S ⇒ a X a-1 X ⇒ b X b-1 X ⇒ d X ⇒ c X c-1
Results
Highly ambiguous grammar: S ⇒ X X ⇒ X X X ⇒ a X ⇒ b Tested on a small (a,b)-labeled graph
- f just 50 vertices
Method Time (s) Memory (MB) GLR (list) 2,798.6 3.15 GLR (matrix) 372.0 2.36
- Ann. Gram (relational)
0.7 0.31
- Ann. Gram (arbitrary)
0.7 0.48
- Ann. Gram (shortest)
3.7 1.55
- Ann. Gram (all-path)
2.8 9.09 Matrix Multiplication 0.1 < 0.01
Conclusions
- CFPQ evaluation is not real-time
○ For a graph of 15,000 vertices, run time typically exceeds 1 hour
- Requires large amounts of memory
○ Grammar 2 at 5,000 vertices required multiple gigabytes of memory for most methods
- Annotating the grammar seems most promising
○ Robust, can handle ambiguous grammars well ○ Many possible query semantics ○ Running time: arbitrary path ≈ all-path
Future work
- Specialized methods for more restrictive grammars could be much faster
- The annotated grammar and the matrix representation could serve as a path
index or reachability index respectively
○ Related to path index work being done at Neo4j