Efficient Regular Path Query Evaluation in PGX Author : Supervisor : - - PowerPoint PPT Presentation

efficient regular path query
SMART_READER_LITE
LIVE PREVIEW

Efficient Regular Path Query Evaluation in PGX Author : Supervisor : - - PowerPoint PPT Presentation

Efficient Regular Path Query Evaluation in PGX Author : Supervisor : Xuming Meng dr. G.H.L. F LETCHER 15-08-2016 Introduction & Problem Statement Regular Path Query ( RPQ ) in PGX . - an in-memory parallel graph analytics framework,


slide-1
SLIDE 1

Efficient Regular Path Query Evaluation in PGX

Author: Xuming Meng

Supervisor:

  • dr. G.H.L. FLETCHER

15-08-2016

slide-2
SLIDE 2

Introduction & Problem Statement

Regular Path Query (RPQ) in PGX.

  • an in-memory parallel graph analytics framework, developed by Oracle Lab.
  • Space requirement
  • Performance requirement
  • Commitment to deliver result
slide-3
SLIDE 3

Introduction & Problem Statement

RPQ: (X, knows∘like+∘(like*∘dislike)+, Y) Three types of clauses:

  • Non-Kleene star clause, i.e. knows
  • Non-nested Kleene star clause, i.e. like+
  • Nested Kleene star clause, i.e. (like*∘ dislike)+

Algorithm & possible optimizations:

  • Naive: search in the graph by standard algorithms, such as BFS or DFS
  • Cache: speed-up with materialization (space/speed trade-off)
  • Context-specific: specialized in-memory search
slide-4
SLIDE 4

Existing Approaches

Index-based

  • k-path index (Fletcher et al. 2016)
  • Reachability index (Gubichev et al. 2013)

Automata-based

  • Automata-based approach (Koschmieder et al. 2012)

Datalog-based

  • Datalog-based relational database (Saumen C. Dey et al. 2013)

Transitive Closure-based

  • Full Transitive Closure (Rakesh Agrawal 1988)

General Drawbacks

  • Large potential intermediate results
  • Impractical precomputation cost
slide-5
SLIDE 5

RPQ Operator Design

How to adapt transitive closure algorithms to solve non-nested Kleene star clause on labeled digraphs?

slide-6
SLIDE 6

RPQ Operator Design

RPQ: (X, dislike+, Y)

Materializing dislike

Reachability Graph (R.G.)

slide-7
SLIDE 7

RPQ Operator Design

Question: what if there is not enough memory for R.G.?

Materializing dislike Virtual Reachability Graph

slide-8
SLIDE 8

Size Estimation Overview

Non-Kleene

  • Capturing correlations between labels in paths is critical to a precise estimate
  • We adopt the method in (Ashraf Aboulnaga et al. 2001) that captures certain degree
  • f co-relationship between edge labels in paths

Kleene star

  • Need estimates for transitive closures, E.g. like+
  • Traditional methods produce poor estimates due to lack of deduction
  • We use min-hash sketch (Edith Cohen, 1997) for estimation
slide-9
SLIDE 9

RPQ Life Cycle

Next clause available Return result RPQ input Obtain clause Clause type Non-Kleene star clause evaluation R.G size estimate Nested Kleene star clause evaluation TC evaluation Query Plan R.G. Construction Result merging Y N

slide-10
SLIDE 10

RPQ Operator Implementation

Depending on whether the R.G. has small-world property

  • Bitmap-based BFS (M. Yang and C. Zaniolo, 2014)
  • Multi-source BFS (M. Then et al., 2014)
slide-11
SLIDE 11

Experiments & Result analysis

Objectives

  • Effectiveness of materializing reachability graph.
  • Performance impact of reachability graph construction.
  • Performance impact of reachability graph type and algorithm choice

NOTICE: All queries are designed with Kleene star clause Below, only results from LDBC dataset are presented.

slide-12
SLIDE 12

Experiments & Result analysis

slide-13
SLIDE 13

Experiments & Result analysis

slide-14
SLIDE 14

Conclusion & Future work

Achievement

  • Boosting RPQ evaluation using partial materialization
  • Switching physical TC operator depending on graph type
  • Trading performance for space if necessary

Possible Improvement

  • A better query estimation method
  • An efficient in-memory RPQ evaluation solution without R.G.
  • Facilitating graph traversal with effective cache usage
slide-15
SLIDE 15

Thank You