From relation algebra to semi-join algebra: an approach for graph - - PowerPoint PPT Presentation

from relation algebra to semi join algebra an approach
SMART_READER_LITE
LIVE PREVIEW

From relation algebra to semi-join algebra: an approach for graph - - PowerPoint PPT Presentation

From relation algebra to semi-join algebra: an approach for graph query optimization Jelle Hellings 1 Catherine L. Pilachowski 2 Dirk Van Gucht 2 Marc Gyssens 1 Yuqing Wu 3 1 Hasselt University 2 Indiana University 3 Pomona College 1/19 Graph


slide-1
SLIDE 1

1/19

From relation algebra to semi-join algebra: an approach for graph query optimization

Jelle Hellings1 Catherine L. Pilachowski2 Dirk Van Gucht2 Marc Gyssens1 Yuqing Wu3

1 Hasselt University 2 Indiana University 3 Pomona College

slide-2
SLIDE 2

2/19

Graph queries: data model

Alice Bob Carol

ParentOf ParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

Peggy

FriendOf FriendOf

Victor

FriendOf WorksWith

Wendy

FriendOf

slide-3
SLIDE 3

3/19

Graph queries: basic path queries

Alice Bob Carol

ParentOf ParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

Peggy

FriendOf FriendOf

Victor

FriendOf WorksWith

Wendy

FriendOf

(WorksWith ∪ FriendOf ) ◦ [ParentOf ]+ ◦ FriendOf

slide-4
SLIDE 4

3/19

Graph queries: basic path queries

Alice Bob Carol

ParentOf ParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

Peggy

FriendOf FriendOf

Victor

FriendOf WorksWith

Wendy

FriendOf

(WorksWith ∪ FriendOf ) ◦ [ParentOf ]+ ◦ FriendOf

slide-5
SLIDE 5

4/19

Graph queries: node-tests and branching

Alice Bob Carol

ParentOf ParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

Peggy

FriendOf FriendOf

Victor

FriendOf WorksWith

Wendy

FriendOf

π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

slide-6
SLIDE 6

4/19

Graph queries: node-tests and branching

Alice Bob Carol

ParentOf ParentOf

Dan

ParentOf

Faythe

ParentOf

Grace

ParentOf

Peggy

FriendOf FriendOf

Victor

FriendOf WorksWith

Wendy

FriendOf

π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

slide-7
SLIDE 7

5/19

Graph querying: relation algebra

id ∪

  • +

RPQs

  • 2RPQs

π Nested RPQs π ∩ − Navigational XPath, Graph XPath di FO[3] + transitive closure

slide-8
SLIDE 8

6/19

Relation algebra and query evaluation

id ∪

  • +
  • π

π ∩ − di

Cheap (∪, , π, ∩, −). Cost linearly upper bounded by operands In between (id, π). Cost linearly upper bounded by #nodes Expensive (◦, +, di). Worst-case quadratically lower bounded by #nodes

slide-9
SLIDE 9

7/19

Naive query evaluation: an inefficient example

Return pairs of (great-grandparent, friend)

π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

  • 1. Compute (grandparent, grandchild):

X = ParentOf ◦ ParentOf

  • 2. Compute (great-grandparent, great-grandchild):

Y = ParentOf ◦ X

  • 3. Throw away the great-grandchildren:

Z = π1[Y ]

  • 4. Compute (great-grandparent, friend):

Result = Z ◦ FriendOf

slide-10
SLIDE 10

8/19

Optimize query evaluation: add specialized operators?

Return pairs of (great-grandparent, friend)

π1[ParentOf ◦ ParentOf ◦ ParentOf ] ◦ FriendOf

  • 1. Compute (grandparent, ???):

X = ParentOf ⋉ ParentOf

  • 2. Compute (great-grandparent, ???):

Y = ParentOf ⋉ (X)

  • 3. Throw away ???:

Z = π1[Y ]

  • 4. Compute (great-grandparent, friend):

Result = Z ⋊ FriendOf π1[ParentOf ⋉ (ParentOf ⋉ ParentOf )] ⋊ FriendOf

slide-11
SLIDE 11

9/19

Simple idea: automatic query rewriting

◮ Rewrite composition into semi-joins ◮ Rewrite transitive closure into fixpoints

In such a way that the rewritten query is equivalent

slide-12
SLIDE 12

10/19

When are expressions equivalent?

Definition

Queries q1 and q2 are path-equivalent if, for every graph G, [ [q1] ]G = [ [q2] ]G (denoted by q1 ≡path q2) left-projection-equivalent if, for every graph G, [ [q1] ]G|1 = [ [q2] ]G|1 (denoted by q1 ≡π1 q2) right-projection-equivalent if, for every graph G, [ [q1] ]G|2 = [ [q2] ]G|2 (denoted by q1 ≡π2 q2)

Example

◮ R ∩ S ≡path R − (R − S) ◮ R ◦ S ≡π1 R ⋉ S ◮ π1[R ◦ S] ≡path π1[R ⋉ S]

slide-13
SLIDE 13

11/19

The main result

id ∪

  • +
  • π

π di ∩ − id ∪ ⋉,⋊ fp

  • π

π di ∩ − ≡path ≡π1 ≡π2 FO[2] + fixpoint

◮ Collapse also holds for fragments (that include π) ◮ Example: Nested RPQs are projection-equivalent to

expressions using only id, ∪, ⋉, ⋊, fp, , and π

slide-14
SLIDE 14

12/19

Intersection ∩ and difference −

Issues when combining composition with ∩ or − (FriendOf ◦ FriendOf ) ∩ FriendOf

◮ Restricting: use ∩ and − only on composition-free expressions

◮ Exact syntactic fragment of FO[3] + TC that is

projection-equivalent to FO[2] + fixpoint.

◮ Data models: usage of ∩ and − is sometimes redundant

◮ Sibling-ordered trees: FOtree π FO[2] + fixpoints. ◮ Downward queries on trees [DBPL 2015] ◮ ...

◮ Partial rewriting: keep compositions when necessary

slide-15
SLIDE 15

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf

slide-16
SLIDE 16

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) .

slide-17
SLIDE 17

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) = τπ2(π1[((W ◦ W ) ∩ F) ◦ E]) ⋊ τ(S) .

slide-18
SLIDE 18

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) = τπ2(π1[((W ◦ W ) ∩ F) ◦ E]) ⋊ τ(S) = π1[τπ1(((W ◦ W ) ∩ F) ◦ E)] ⋊ S .

slide-19
SLIDE 19

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) = τπ2(π1[((W ◦ W ) ∩ F) ◦ E]) ⋊ τ(S) = π1[τπ1(((W ◦ W ) ∩ F) ◦ E)] ⋊ S = π1[τ◦1((W ◦ W ) ∩ F; τπ1(E))] ⋊ S .

slide-20
SLIDE 20

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) = τπ2(π1[((W ◦ W ) ∩ F) ◦ E]) ⋊ τ(S) = π1[τπ1(((W ◦ W ) ∩ F) ◦ E)] ⋊ S = π1[τ◦1((W ◦ W ) ∩ F; τπ1(E))] ⋊ S = π1[(τ(W ◦ W ) ∩ τ(F)) ⋉ E] ⋊ S .

slide-21
SLIDE 21

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) = τπ2(π1[((W ◦ W ) ∩ F) ◦ E]) ⋊ τ(S) = π1[τπ1(((W ◦ W ) ∩ F) ◦ E)] ⋊ S = π1[τ◦1((W ◦ W ) ∩ F; τπ1(E))] ⋊ S = π1[(τ(W ◦ W ) ∩ τ(F)) ⋉ E] ⋊ S = π1[((τ(W ) ◦ τ(W )) ∩ F) ⋉ E] ⋊ S .

slide-22
SLIDE 22

13/19

The rewrite functions - partial rewriting

τ(e) ≡path e τπ1(e) ≡π1 e τπ2(e) ≡π2 e τ◦1(e; ε) ≡π1 e ⋉ ε τ◦2(e; ε) ≡π2 ε ⋊ e

Example

π1[((WorksOn ◦ WorksOn) ∩ FriendOf ) ◦ EditorOf ] ◦ StudentOf τ(e) = τπ2(π1[((W ◦ W ) ∩ F) ◦ E]) ⋊ τ(S) = π1[τπ1(((W ◦ W ) ∩ F) ◦ E)] ⋊ S = π1[τ◦1((W ◦ W ) ∩ F; τπ1(E))] ⋊ S = π1[(τ(W ◦ W ) ∩ τ(F)) ⋉ E] ⋊ S = π1[((τ(W ) ◦ τ(W )) ∩ F) ⋉ E] ⋊ S = π1[((W ◦ W ) ∩ F) ⋉ E] ⋊ S.

slide-23
SLIDE 23

14/19

Query optimization

◮ Cost of each operator ◮ Input size of each operator ◮ Number of necessary evaluation steps

slide-24
SLIDE 24

14/19

Query optimization

◮ Cost of each operator ✓ ◮ Input size of each operator ◮ Number of necessary evaluation steps

slide-25
SLIDE 25

14/19

Query optimization

◮ Cost of each operator ✓ ◮ Input size of each operator

Example

Let R = {(1, i) | 0 ≤ i ≤ m}. Consider R ◦ R ≡π1 R ⋉ R.

◮ Number of necessary evaluation steps

slide-26
SLIDE 26

14/19

Query optimization

◮ Cost of each operator ✓ ◮ Input size of each operator ✓

Example

Let R = {(1, i) | 0 ≤ i ≤ m}. Consider R ◦ R ≡π1 R ⋉ R. Solution: use single-column evaluation algorithms

◮ Number of necessary evaluation steps

slide-27
SLIDE 27

14/19

Query optimization

◮ Cost of each operator ✓ ◮ Input size of each operator ✓

Example

Let R = {(1, i) | 0 ≤ i ≤ m}. Consider R ◦ R ≡π1 R ⋉ R. Solution: use single-column evaluation algorithms

◮ Number of necessary evaluation steps ✗

slide-28
SLIDE 28

15/19

Expressions and evaluation steps

Expression size we denote the expression size of e by e. Evaluation size we denote the evaluation size of e by eval-steps(e).

Example

e1 = ((R ◦ R) ◦ (R ◦ R)) ◦ ((R ◦ R) ◦ (R ◦ R)) e2 = R ⋉ (R ⋉ (R ⋉ (R ⋉ (R ⋉ (R ⋉ (R ⋉ R))))))

◮ e1 ≡π1 e2 ◮ We have e1 = 7 and eval-steps(e1) = 3:

  • 1. X = R ◦ R
  • 2. Y = X ◦ X
  • 3. Result = Y ◦ Y

◮ We have e2 = 7 and eval-steps(e2) = 7.

slide-29
SLIDE 29

16/19

Evaluation size and unions

Example

e1 = (A ∪ B) ◦ (C ∪ D) ◦ (E ∪ F) e2 = A ⋉ (C ⋉ E) ∪ A ⋉ (C ⋉ F) ∪ . . .

◮ e1 ≡π1 e2 ◮ We have e1 = eval-steps(e1) = 5. ◮ We have e2 = eval-steps(e2) = 23.

slide-30
SLIDE 30

16/19

Evaluation size and unions

Example

e1 = (A ∪ B) ◦ (C ∪ D) ◦ (E ∪ F) e2 = A ⋉ (C ⋉ E) ∪ A ⋉ (C ⋉ F) ∪ . . .

◮ e1 ≡π1 e2 ◮ We have e1 = eval-steps(e1) = 5. ◮ We have e2 = eval-steps(e2) = 23.

e3 = (A ⋉ X) ∪ (B ⋉ X), X = (C ⋉ Y ) ∪ (D ⋉ Y ), Y = (E ∪ F)

◮ e1 ≡π1 e′ 3 ◮ We have e′ 2 = 13 and eval-steps(e′ 2) = 7.

slide-31
SLIDE 31

16/19

Evaluation size and unions

Example

e1 = (A ∪ B) ◦ (C ∪ D) ◦ (E ∪ F) e2 = A ⋉ (C ⋉ E) ∪ A ⋉ (C ⋉ F) ∪ . . .

◮ e1 ≡π1 e2 ◮ We have e1 = eval-steps(e1) = 5. ◮ We have e2 = eval-steps(e2) = 23.

e3 = (A ⋉ X) ∪ (B ⋉ X), X = (C ⋉ Y ) ∪ (D ⋉ Y ), Y = (E ∪ F)

◮ e1 ≡π1 e′ 3 ◮ We have e′ 2 = 13 and eval-steps(e′ 2) = 7. ◮ τ◦i(e; ε) does this! ✓

slide-32
SLIDE 32

17/19

The main results (revised)

Theorem

Let e be an expression. We have τ(e) ≡path e, τπi(e) ≡πi e, and

  • 1. eval-steps(τ(e)) ≤ u + e;
  • 2. eval-steps(τπi(e)) ≤ u + e;
  • 3. τ(e) = Θ(e · 2u) in the worst case;
  • 4. τπi(e) = Θ(e · 2u) in the worst case,

with u the number of rewrite steps involving τ◦i(e1 ∪ e2; ε).

slide-33
SLIDE 33

18/19

Conclusion and future work

  • 1. Real-life systems
  • 2. Relational databases
  • 3. Intersection and difference elimination
  • 4. Extending FO[3] (e.g. counting)
slide-34
SLIDE 34

19/19

The FO[2] fixpoint we use

◮ Notation fpi,N[iterative case union base case] ◮ i specifies output column ◮ N is a variable representing the growing output (node-test) ◮ Subset of traditional inflationary fixpoints

Example

The query π1[[ParentOf ]+ ◦ OwnsPet] returns ancestors of pet-owners. We rewrite this into π1[fp1,N[ParentOf ⋉ N union ParentOf ⋉ OwnsPet]]