Experimental Study of Context-Free Path Query Evaluation Methods - - PowerPoint PPT Presentation

experimental study of context free path query evaluation
SMART_READER_LITE
LIVE PREVIEW

Experimental Study of Context-Free Path Query Evaluation Methods - - PowerPoint PPT Presentation

Experimental Study of Context-Free Path Query Evaluation Methods Jochem Kuijpers Fifth openCypher Implementers Meeting Berlin 2019 Introduction MSc student CS & Eng. at TU/e Academic internship at Neo4j Supervised by:


slide-1
SLIDE 1

Experimental Study of Context-Free Path Query Evaluation Methods

Jochem Kuijpers

Fifth openCypher Implementers Meeting Berlin 2019

slide-2
SLIDE 2

Introduction

  • MSc student CS & Eng. at TU/e
  • Academic internship at Neo4j
  • Supervised by:

George Fletcher Tobias Lindaaker Nikolay Yakovets TU/e Database Group Neo4j

  • We implemented and evaluated four methods for computing

context-free path query results

slide-3
SLIDE 3

Context-Free Grammars

Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε

slide-4
SLIDE 4

Context-Free Grammars

Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a

slide-5
SLIDE 5

Context-Free Grammars

Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a

slide-6
SLIDE 6

Context-Free Grammars

Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a

slide-7
SLIDE 7

Context-Free Grammars

Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a

slide-8
SLIDE 8

Context-Free Grammars

Example: the language of even-length palindromes of {a, b}* = { ε, a a, b b, a a a a, a b b a, b a a b, … } A grammar that accepts this language: S ⇒ a S a S ⇒ b S b S ⇒ ε Example derivation of the string a b b a

slide-9
SLIDE 9

Context-Free Path Query

  • A query is a context-free grammar
  • Grammar where terminals are edge-labels
  • Find paths whose edge labels are

accepted by the grammar

slide-10
SLIDE 10

Context-Free Path Query

  • Why?
  • Increased expressiveness w.r.t. regular expressions (regular path query)
  • Use-cases in

○ biological data analysis ○ static code analysis ○ …

slide-11
SLIDE 11

Our work

  • We implemented four context-free path query evaluation methods
  • Used Neo4j components

○ Graph store (vertices and edges) ○ PageCache

  • Query evaluation is separately implemented on top of these components

○ (not integrated into Cypher)

slide-12
SLIDE 12

The evaluated methods

1. Annotating the context-free grammar

Hellings, Jelle. "Path results for context-free grammar queries on graphs." arXiv preprint arXiv:1502.02242 (2015).

2. Matrix multiplication (GPGPU)

Azimov, Rustam, and Semyon Grigorev. "Context-free path querying by matrix multiplication." Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA). ACM, 2018.

3. Adapted GLR (Tomita) parser

Santos, Fred C., Umberto S. Costa, and Martin A. Musicante. "A Bottom-Up Algorithm for Answering Context-Free Path Queries in Graph Databases." International Conference on Web Engineering. Springer, Cham, 2018.

4. Adapted Earley parser

Sevon, Petteri, and Lauri Eronen. "Subgraph queries by context-free grammars." Journal of Integrative Bioinformatics 5.2 (2008): 157-172.

slide-13
SLIDE 13

Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[u,v] ⇔ there exists an A-path from u to v

  • 1. Annotating the grammar
slide-14
SLIDE 14

Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2]

  • 1. Annotating the grammar
slide-15
SLIDE 15

Grammar in Chomsky Normal Form S ⇒ A B A ⇒ a B ⇒ b Annotate the grammar: A[1,4], A[2,1], A[3,4] B[2,3], B[4,2] S[1,2], S[3,2] ⇒ (1,2) and (3,2) are vertex pairs matching the grammar

  • 1. Annotating the grammar
slide-16
SLIDE 16
  • 2. Matrix Multiplication
  • Relation matrix representation of

the annotated grammar method

  • Each grammar non-terminal is

stored in the matrix

  • The step of combining X ⇒ Y Z

is implemented as a “multiplication”

  • Can be implemented on GPU

1 2 3 4 1 B A 2 A 3 A 4 B

slide-17
SLIDE 17
  • 2. Matrix Multiplication
  • Relation matrix representation of

the annotated grammar method

  • Each grammar non-terminal is

stored in the matrix

  • The step of combining X ⇒ Y Z

is implemented as a “multiplication”

  • Can be implemented on GPU

1 2 3 4 1 S B A 2 A 3 S A 4 B

slide-18
SLIDE 18
  • 3. Adapted GLR (Tomita) parser
  • GLR is a generalization of LR parsers
  • Use context-free grammars to parse input strings
  • Whenever the parser has multiple options,

the parse state is duplicated and both options are tested separately

  • If at least one of these options leads to acceptance, the input is accepted
  • Has a data structure that reduces duplicate work
slide-19
SLIDE 19

Adaptations for graph parsing instead of string parsing

  • A separate parse state is initialized for each vertex
  • Consumes edges instead of string symbols
  • Accepting states in w are backtraced to vertex v where parsing started

○ Emits result (v,w)

  • The data structure helps keep duplicate work low
  • There are some conditions where this algorithm terminates too early

○ Failing to produce some results

  • 3. Adapted GLR (Tomita) parser
slide-20
SLIDE 20
  • 4. Subgraph Parsing
  • Similar to the previous method, this is a string parser (Earley parser)

adapted for graph input

  • Upon acceptance at vertex v, backtracking is used to find all paths that

accept at v, and are added to a new graph.

  • Query result is the induced subgraph of accepted paths!
  • Termination problem

○ This algorithm depends on a maximum length parameter to stop ○ This makes it unsuitable for matching paths of arbitrary length ○ Further: There exist conditions where it is missing results or returns no results at all

slide-21
SLIDE 21

Results

Grammar 1: S ⇒ A B C B ⇒ b B C ⇒ c C c-1 D ⇒ d A ⇒ a a B ⇒ b C ⇒ D A ⇒ a-1 a-1

slide-22
SLIDE 22

Results

Grammar 2: S ⇒ a X a-1 X ⇒ b X b-1 X ⇒ d X ⇒ c X c-1

slide-23
SLIDE 23

Results

Highly ambiguous grammar: S ⇒ X X ⇒ X X X ⇒ a X ⇒ b Tested on a small (a,b)-labeled graph

  • f just 50 vertices

Method Time (s) Memory (MB) GLR (list) 2,798.6 3.15 GLR (matrix) 372.0 2.36

  • Ann. Gram (relational)

0.7 0.31

  • Ann. Gram (arbitrary)

0.7 0.48

  • Ann. Gram (shortest)

3.7 1.55

  • Ann. Gram (all-path)

2.8 9.09 Matrix Multiplication 0.1 < 0.01

slide-24
SLIDE 24

Conclusions

  • CFPQ evaluation is not real-time

○ For a graph of 15,000 vertices, run time typically exceeds 1 hour

  • Requires large amounts of memory

○ Grammar 2 at 5,000 vertices required multiple gigabytes of memory for most methods

  • Annotating the grammar seems most promising

○ Robust, can handle ambiguous grammars well ○ Many possible query semantics ○ Running time: arbitrary path ≈ all-path

slide-25
SLIDE 25

Future work

  • Specialized methods for more restrictive grammars could be much faster
  • The annotated grammar and the matrix representation could serve as a path

index or reachability index respectively

○ Related to path index work being done at Neo4j