THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 - - PowerPoint PPT Presentation

thosp an algorithm for nesting property graphs
SMART_READER_LITE
LIVE PREVIEW

THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 - - PowerPoint PPT Presentation

THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 Andr Petermann 2 Danilo Montesi 1 1 st Joint GRADES-NDA International Workshop, 2018 10th June 2018 Universit di Bologna 1 , Universitt Leipzig 2 Key Ideas Key Ideas


slide-1
SLIDE 1

THoSP: an Algorithm for Nesting Property Graphs

Giacomo Bergami 1 André Petermann 2 Danilo Montesi 1 1st Joint GRADES-NDA International Workshop, 2018 10th June 2018

Università di Bologna1, Universität Leipzig2

slide-2
SLIDE 2

Key Ideas

slide-3
SLIDE 3

Key Ideas – Research Problem

1 An operator allowing to generalize the current “grouping” and

“nesting” is missing. Nevertheless, current (G)DBMSs allow to express nesting operations, but their query languages’ plans do not allow to optimize the whole process by combining the following tasks:

  • path joins separately for both paterns.
  • grouping to create an id collection over the matched elements.

2 The general nesting algorithm could lead to an exponential

evaluation time.

1/16

slide-4
SLIDE 4

Key Ideas – Use Case

Author Paper∗ authorOf

Vertex Patern

Authorsrc Paper∗ Authorsrc =Authordst Authordst authorOf authorOf

Edge Patern

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10

Input Bibliography Network

2/16

slide-5
SLIDE 5

Key Ideas – Desired Result

Paper title : On Joining Graphs 3 Paper title : Object Databases 4 ǫ(0 → 1), ǫ(1 → 0) Paper title : On Nesting Graphs 5 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 coauthorship coauthorship ǫ(1)

Expected result

3/16

slide-6
SLIDE 6

Key Ideas – Research Goals

1 As for graph joins, the data model must enhance the

serialization of both operands and graph result.

4/16

slide-7
SLIDE 7

Key Ideas – Research Goals

1 As for graph joins, the data model must enhance the

serialization of both operands and graph result.

2 The logical graph nesting operator must be general enough to

support both the THoSP algorithm and other graph summarization tasks.

4/16

slide-8
SLIDE 8

Key Ideas – Research Goals

1 As for graph joins, the data model must enhance the

serialization of both operands and graph result.

2 The logical graph nesting operator must be general enough to

support both the THoSP algorithm and other graph summarization tasks.

3 Grouping can be avoided by defining a nesting index, through

which the containment is associated to the container. This can be achieved by extending the Graph Join’s data structures with the aforementioned data structure.

4/16

slide-9
SLIDE 9

Logical Model

slide-10
SLIDE 10

Logical Model – Design (1)

The nested (property) graph data model is an extension of the logical model for graph joins. Therefore, we want to preserve the same assumptions: The resulting nested graph is not a materialized view (as in SQL’s SELECT). The nested graph is serialized by only using the ID information. Atribute, values and labels can be completely reconstructed from these informations and the patern rewriting information.

5/16

slide-11
SLIDE 11

Logical Model – Design (2)

The following modelling choices allow the reconstruction of the required pieces of information: Vertices and edges are distinctly identified by ids (N2). A nested graph database is a property graph, where each vertex and edge may contain (nest) another property graph (ν, ǫ). Each vertex or edge within the graph can be considered as a possible graph operand.

6/16

slide-12
SLIDE 12

Logical Model – Definition

Graph Nesting A nested graph database is a nested graph, where each vertex and edge may represent a graph. Given a nested graph G = (V, E), a vertex patern gV, a edge patern gE vertex patern containing grouping references: ηkeep

ι

(G) =

  • { v ∈ V | gV(v) = ∅, keep } ∪ ι(gV(G)),

{ e ∈ E | gE(e) = ∅, keep } ∪ ι(gE(G))

  • where ι is an indexing function associating to each matched graph into one

new single identifier not appearing in G, and keep is set to true whether the non-traversed vertices and edges must be preserved into the final graph. The newly generated nested graph is inserted into the graph database which also contains G. Values associated to both nested vertices and edges are determined by user defined functions.

7/16

slide-13
SLIDE 13

THoSP Algorithm

slide-14
SLIDE 14

THoSP Algorithm – Physical Model

Motivations:

1 Reduce the number of graph visiting times by visiting the

subpatern first, and then extending the visit to the remaining paterns.

2 Represent the nested graph as an adjacency list enriched with

an external nesting index. The algorithm uses the same principles that were adopted for implementing graph joins: Use memory mapping (OS buffering). Serialized graphs represent vertices associated to both ingoing and outgoing edges. No additional indexing structures are exploited.

8/16

slide-15
SLIDE 15

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10

9/16

slide-16
SLIDE 16

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3

9/16

slide-17
SLIDE 17

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner ǫ(0)

9/16

slide-18
SLIDE 18

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner ǫ(0) Author name : Cassie surname : Norman 2

9/16

slide-19
SLIDE 19

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) Author name : Cassie surname : Norman 2 coauthorship

9/16

slide-20
SLIDE 20

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Cassie surname : Norman 2 coauthorship

9/16

slide-21
SLIDE 21

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 ǫ(1) Author name : Cassie surname : Norman 2 coauthorship

9/16

slide-22
SLIDE 22

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 ǫ(0 → 1), ǫ(1 → 0) Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 ǫ(1) Author name : Cassie surname : Norman 2 coauthorship coauthorship

9/16

slide-23
SLIDE 23

THoSP Algorithm – Example

Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h

  • r

O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 ǫ(0 → 1), ǫ(1 → 0) Paper title : On Nesting Graphs 5 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 coauthorship coauthorship ǫ(1)

9/16

slide-24
SLIDE 24

Experimental Evaluation

slide-25
SLIDE 25

Experimental Evaluation – Dataset

We want to show that the combination of THoSP with the proposed physical data model outperforms the query plans for other query languages (Cypher, SPARQL, SQL, AQL). We performed our tests on both synthetic and real world data, using n = 1 ÷ 8 operands with vertex size 10n:

  • GMark graph generator.
  • Random samples of Microsof Academic Graph.

Our tests’ source code is available at: https://bitbucket.org/unibogb/graphnestingc/src

10/16

slide-26
SLIDE 26

Experimental Evaluation – Competing DataBases

Given that the only graph database using Java was the the worst performing one, we implemented our solution only in C++ The graph nesting operator was implemented in each DB language by redurning ID collections.

  • PostgreSQL was used to evaluate SQL queries. We ran the

queries directly in psql.

  • SPARQL queries were evaluated over Virtuoso. SPARQL

queries were send via ODBC (C++).

  • Cypher queries were evaluated over Neo4J. SPARQL queries

were send via the execute method.

  • AQL queries were evaluated over ArangoDB. We ran the

queries directly in arangosh.

11/16

slide-27
SLIDE 27

Experimental Evaluation – GMark Benchmark

Operands Size Two HOp Separated Pattern Time (C/C++) (ms) |V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP 10 3 2.10 11 15.00 681.40 0.11 102 58 9.68 63 3.89 1,943.98 0.14 103 968 17.96 63 12.34 >3.60×106 0.46 104 8, 683 69.27 364 46.74 >3.60×106 4.07 105 88, 885 294.23 4,153 508.87 >3.60×106 43.81 106 902, 020 2,611.48 50,341 7,212.19 >3.60×106 563.02 107 8, 991, 417 25,666.14 672,273 922,590.00 >3.60×106 8,202.93 108 89, 146, 891 396,523.88 >3.60×106 >3.60×106 >3.60×106 91,834.20

12/16

slide-28
SLIDE 28

Experimental Evaluation – Microsof Academic Graph Bench- mark

Operands Size Two HOp Separated Pattern Time (C/C++) (ms) |V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP 10 19 1.69·100 3.4·101 6.57·10−1 2.38·103 2.82·10−1 102 255 1.75·100 3.22·102 2.51·100 1.01·104 3.46·10−1 103 23,119 4.71·101 1.22·103 8.18·101 >1H 1.39·101 104 5,411,205 1.53·104 2.77·105 2.08·104 >1H 2.58·103 105 97,079,329 1.20·106 >1H OOM1 >1H 1.97·105 106 241,448,529 >1H >1H OOM1 >1H 6.22·105 107 361,759,509 OOM2 >1H OOM1 >1H 7.74·105

13/16

slide-29
SLIDE 29

Experimental Evaluation – Results

  • This further benchmarks shows that all the current data model

supporting nested representation do not support query plans allowing for a specific case of (graph) nesting.

  • The proposed approach extended the secondary memory’s

property graph representation by adding associations to nested vertices and edges.

  • The serialized data structure provides a graph having an

external containment data structure.

  • This data model achieves structural aggregation for graph data,

where aggregated data may preserve the original vertices and edges.

14/16

slide-30
SLIDE 30

Experimental Evaluation – Further Results

GROQ: THoSP can be generalized into a more general algorithm. Generalized Semistructured Model: This data structure can be generalized into a broader data representation.

15/16

slide-31
SLIDE 31

Experimental Evaluation – Future Work

GROQ: Further benchmarks have to be carried out over this more general general nesting algorithm. General Nesting: Provide a query plan where either grouping or GROQ are used.

16/16

slide-32
SLIDE 32

Backup Slides

slide-33
SLIDE 33

Backup Slides – Nested Graph Database

Nested Graph DataBase Given a set Σ∗ of strings, a nested (property) graph database G is a tuple G = V, E, λ, ℓ, ω, ν, ǫ, where:

  • V, E ∈ N2 s.t. V ∩ E = ∅
  • source and target λ: E → V2.
  • labelling ℓ : V ∪ E → ℘(Σ∗)
  • object mapping ω : V ∪ E → Ω
  • vertices’ containment: ν: (V ∪ E) → ℘(V)
  • edges’ containment: ǫ: (V ∪ E) → ℘(E)

Each vertex or edge o ∈ V ∪ E induces a nested (property) graph as the following pair: Go =

  • ν(o),
  • e ∈ ǫ(o)
  • λ(e) ∈ (∪n≥0 νǫ(n)({o}))2
slide-34
SLIDE 34

THoSP Pseudocode

nest ( Cont , patt , u , S ) : for each s in S s . t . patt . d o S e r i a l i z e ( s ) : Cont . write ( <u , s >) Input : G, gV , gE Cont ← ∅ NestedGraph ← ∅ a ← V ∩ E \ ( γV ∪ γsrc

E

∪ γdst

E

) ; for each v : v e r t e x in G s . t . a ( v ) : for each V( u →e v ) : u : = d t l ( u ) c ; nest ( Cont , V , u , { u , e , v } ) NGraph (V) ← NGraph (V) ∪ { u } for each V(w →e′ v ) s . t . E ( u →e ve′ ←w) w : = d t l (w) c ; e’ : = d t l ( u ,w) c ; nest ( Cont , E , e’ , { u , e , v , e ' ,w} ) NGraph ( E ) ← NGraph ( E ) ∪ { u →e’ w }