Experimental Study of Context-Free Path Query Evaluation Methods - PowerPoint PPT Presentation

Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019

Introduction ● MSc student CS & Eng. at TU/e ● Academic internship at Neo4j ● Supervised by: George Fletcher Tobias Lindaaker Nikolay Yakovets TU/e Database Group Neo4j ● We implemented and evaluated four methods for computing context-free path query results

Context-Free Grammars Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

Context-Free Grammars Example: the language of even-length Example derivation of the string a b b a palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

Context-Free Path Query ● A query is a context-free grammar ● Grammar where terminals are edge-labels ● Find paths whose edge labels are accepted by the grammar

Context-Free Path Query ● Why? ● Increased expressiveness w.r.t. regular expressions (regular path query) ● Use-cases in ○ biological data analysis ○ static code analysis ○ …

Our work ● We implemented four context-free path query evaluation methods ● Used Neo4j components ○ Graph store (vertices and edges) ○ PageCache ● Query evaluation is separately implemented on top of these components ○ (not integrated into Cypher)

The evaluated methods 1. Annotating the context-free grammar Hellings, Jelle. "Path results for context-free grammar queries on graphs." arXiv preprint arXiv:1502.02242 (2015). 2. Matrix multiplication (GPGPU) Azimov, Rustam, and Semyon Grigorev. "Context-free path querying by matrix multiplication." Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). ACM, 2018. 3. Adapted GLR (Tomita) parser Santos, Fred C., Umberto S. Costa, and Martin A. Musicante. "A Bottom-Up Algorithm for Answering Context-Free Path Queries in Graph Databases." International Conference on Web Engineering. Springer, Cham, 2018. 4. Adapted Earley parser Sevon, Petteri, and Lauri Eronen. "Subgraph queries by context-free grammars." Journal of Integrative Bioinformatics 5.2 (2008): 157-172.

1. Annotating the grammar Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[u,v] ⇔ there exists an A-path from u to v

1. Annotating the grammar Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2]

1. Annotating the grammar Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2] S[1,2], S[3,2] ⇒ (1,2) and (3,2) are vertex pairs matching the grammar

2. Matrix Multiplication ● Relation matrix representation of the annotated grammar method ● Each grammar non-terminal is stored in the matrix 1 2 3 4 1 B A ● The step of combining X ⇒ Y Z is implemented as a 2 A “multiplication” 3 A ● Can be implemented on GPU 4 B

2. Matrix Multiplication ● Relation matrix representation of the annotated grammar method ● Each grammar non-terminal is stored in the matrix 1 2 3 4 1 S B A ● The step of combining X ⇒ Y Z is implemented as a 2 A “multiplication” 3 S A ● Can be implemented on GPU 4 B

3. Adapted GLR (Tomita) parser ● GLR is a generalization of LR parsers ● Use context-free grammars to parse input strings ● Whenever the parser has multiple options, the parse state is duplicated and both options are tested separately ● If at least one of these options leads to acceptance, the input is accepted ● Has a data structure that reduces duplicate work

3. Adapted GLR (Tomita) parser Adaptations for graph parsing instead of string parsing ● A separate parse state is initialized for each vertex ● Consumes edges instead of string symbols ● Accepting states in w are backtraced to vertex v where parsing started ○ Emits result (v,w) ● The data structure helps keep duplicate work low ● There are some conditions where this algorithm terminates too early ○ Failing to produce some results

4. Subgraph Parsing ● Similar to the previous method, this is a string parser (Earley parser) adapted for graph input ● Upon acceptance at vertex v, backtracking is used to find all paths that accept at v, and are added to a new graph. ● Query result is the induced subgraph of accepted paths! ● Termination problem ○ This algorithm depends on a maximum length parameter to stop ○ This makes it unsuitable for matching paths of arbitrary length ○ Further: There exist conditions where it is missing results or returns no results at all

Results C ⇒ c C c -1 Grammar 1: S ⇒ A B C B ⇒ b B D ⇒ d A ⇒ a a B ⇒ b C ⇒ D A ⇒ a -1 a -1

Results Grammar 2: S ⇒ a X a -1 X ⇒ b X b -1 X ⇒ d X ⇒ c X c -1

Results Highly ambiguous Method Time (s) Memory (MB) grammar: GLR (list) 2,798.6 3.15 GLR (matrix) 372.0 2.36 S ⇒ X X ⇒ X X Ann. Gram (relational) 0.7 0.31 X ⇒ a Ann. Gram (arbitrary) 0.7 0.48 X ⇒ b Ann. Gram (shortest) 3.7 1.55 Tested on a small Ann. Gram (all-path) 2.8 9.09 (a,b)-labeled graph Matrix Multiplication 0.1 < 0.01 of just 50 vertices

Conclusions ● CFPQ evaluation is not real-time ○ For a graph of 15,000 vertices, run time typically exceeds 1 hour ● Requires large amounts of memory ○ Grammar 2 at 5,000 vertices required multiple gigabytes of memory for most methods ● Annotating the grammar seems most promising ○ Robust, can handle ambiguous grammars well ○ Many possible query semantics ○ Running time: arbitrary path ≈ all-path

Future work ● Specialized methods for more restrictive grammars could be much faster ● The annotated grammar and the matrix representation could serve as a path index or reachability index respectively ○ Related to path index work being done at Neo4j

Experimental Study of Context-Free Path Query Evaluation Methods - PowerPoint PPT Presentation

Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019 Introduction MSc student CS & Eng. at TU/e Academic internship at Neo4j Supervised by:

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Coordination-free query evaluation and multi-query optimization in parallel and distributed

TEDI: Efficient Shortest Path Query Answering on Graphs Fang Wei University of Freiburg SIGMOD

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Last lecture Multiple-query PRM Lazy PRM (single-query PRM) NUS CS 5247 David Hsu 1

A * A path finding algorithm. A path finding algorithm. Given a state space, such as a

On Path Generation, Path Following On Path Generation, Path Following and Time Coordination for

Using Off-Path and On-Path Signaling for Internet Security Saikat Guha, Paul Francis Cornell

Efficient Regular Path Query Evaluation in PGX Author : Supervisor : Xuming Meng dr. G.H.L. F

Zero-query information retrieval system no explicit query from user IR triggered by

The implicit QZ algorithm for the palindromic eigenvalue problem David S. Watkins

Coxeter groups and palindromic Poincar e polynomials Edward Richmond (joint work with W.

Undecidability and Rices Theorem Lecture 25, April 27 CS 374, Spring 2017 . R. E. .

CHAPTER 3 & 4 Stacks & Queues The Collection Framework 1 10/26/2017 Stack Abstract Data

ASPEN: A Scalable In- SRAM Architecture for Pushdown Automata Kevin A Angstadt , Arun

1 The Die Class Code Walk-Thru The Die class contains two data values Walk-thru of Rephactor

Structured Doubling Algorithms for Solving g-Palindromic Quadratic Eigenvalue Problems Eric

Growth Series of Cyclotomic and Root Lattices Federico Ardila (San Francisco State University)

Experimental Study of Context-Free Path Query Evaluation Methods - PowerPoint PPT Presentation

Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019 Introduction MSc student CS & Eng. at TU/e Academic internship at Neo4j Supervised by:

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Coordination-free query evaluation and multi-query optimization in parallel and distributed

TEDI: Efficient Shortest Path Query Answering on Graphs Fang Wei University of Freiburg SIGMOD

CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Last lecture Multiple-query PRM Lazy PRM (single-query PRM) NUS CS 5247 David Hsu 1

A * A path finding algorithm. A path finding algorithm. Given a state space, such as a

On Path Generation, Path Following On Path Generation, Path Following and Time Coordination for

Using Off-Path and On-Path Signaling for Internet Security Saikat Guha, Paul Francis Cornell

Efficient Regular Path Query Evaluation in PGX Author : Supervisor : Xuming Meng dr. G.H.L. F

Zero-query information retrieval system no explicit query from user IR triggered by

The implicit QZ algorithm for the palindromic eigenvalue problem David S. Watkins

Coxeter groups and palindromic Poincar e polynomials Edward Richmond (joint work with W.

Undecidability and Rices Theorem Lecture 25, April 27 CS 374, Spring 2017 . R. E. .

CHAPTER 3 &amp; 4 Stacks &amp; Queues The Collection Framework 1 10/26/2017 Stack Abstract Data

ASPEN: A Scalable In- SRAM Architecture for Pushdown Automata Kevin A Angstadt , Arun

1 The Die Class Code Walk-Thru The Die class contains two data values Walk-thru of Rephactor

Structured Doubling Algorithms for Solving g-Palindromic Quadratic Eigenvalue Problems Eric

Growth Series of Cyclotomic and Root Lattices Federico Ardila (San Francisco State University)

CHAPTER 3 & 4 Stacks & Queues The Collection Framework 1 10/26/2017 Stack Abstract Data