THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 - - PowerPoint PPT Presentation
THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 - - PowerPoint PPT Presentation
THoSP: an Algorithm for Nesting Property Graphs Giacomo Bergami 1 Andr Petermann 2 Danilo Montesi 1 1 st Joint GRADES-NDA International Workshop, 2018 10th June 2018 Universit di Bologna 1 , Universitt Leipzig 2 Key Ideas Key Ideas
Key Ideas
Key Ideas – Research Problem
1 An operator allowing to generalize the current “grouping” and
“nesting” is missing. Nevertheless, current (G)DBMSs allow to express nesting operations, but their query languages’ plans do not allow to optimize the whole process by combining the following tasks:
- path joins separately for both paterns.
- grouping to create an id collection over the matched elements.
2 The general nesting algorithm could lead to an exponential
evaluation time.
1/16
Key Ideas – Use Case
Author Paper∗ authorOf
Vertex Patern
Authorsrc Paper∗ Authorsrc =Authordst Authordst authorOf authorOf
Edge Patern
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10
Input Bibliography Network
2/16
Key Ideas – Desired Result
Paper title : On Joining Graphs 3 Paper title : Object Databases 4 ǫ(0 → 1), ǫ(1 → 0) Paper title : On Nesting Graphs 5 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 coauthorship coauthorship ǫ(1)
Expected result
3/16
Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
4/16
Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
2 The logical graph nesting operator must be general enough to
support both the THoSP algorithm and other graph summarization tasks.
4/16
Key Ideas – Research Goals
1 As for graph joins, the data model must enhance the
serialization of both operands and graph result.
2 The logical graph nesting operator must be general enough to
support both the THoSP algorithm and other graph summarization tasks.
3 Grouping can be avoided by defining a nesting index, through
which the containment is associated to the container. This can be achieved by extending the Graph Join’s data structures with the aforementioned data structure.
4/16
Logical Model
Logical Model – Design (1)
The nested (property) graph data model is an extension of the logical model for graph joins. Therefore, we want to preserve the same assumptions: The resulting nested graph is not a materialized view (as in SQL’s SELECT). The nested graph is serialized by only using the ID information. Atribute, values and labels can be completely reconstructed from these informations and the patern rewriting information.
5/16
Logical Model – Design (2)
The following modelling choices allow the reconstruction of the required pieces of information: Vertices and edges are distinctly identified by ids (N2). A nested graph database is a property graph, where each vertex and edge may contain (nest) another property graph (ν, ǫ). Each vertex or edge within the graph can be considered as a possible graph operand.
6/16
Logical Model – Definition
Graph Nesting A nested graph database is a nested graph, where each vertex and edge may represent a graph. Given a nested graph G = (V, E), a vertex patern gV, a edge patern gE vertex patern containing grouping references: ηkeep
ι
(G) =
- { v ∈ V | gV(v) = ∅, keep } ∪ ι(gV(G)),
{ e ∈ E | gE(e) = ∅, keep } ∪ ι(gE(G))
- where ι is an indexing function associating to each matched graph into one
new single identifier not appearing in G, and keep is set to true whether the non-traversed vertices and edges must be preserved into the final graph. The newly generated nested graph is inserted into the graph database which also contains G. Values associated to both nested vertices and edges are determined by user defined functions.
7/16
THoSP Algorithm
THoSP Algorithm – Physical Model
Motivations:
1 Reduce the number of graph visiting times by visiting the
subpatern first, and then extending the visit to the remaining paterns.
2 Represent the nested graph as an adjacency list enriched with
an external nesting index. The algorithm uses the same principles that were adopted for implementing graph joins: Use memory mapping (OS buffering). Serialized graphs represent vertices associated to both ingoing and outgoing edges. No additional indexing structures are exploited.
8/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 AuthorOf 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner ǫ(0)
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner ǫ(0) Author name : Cassie surname : Norman 2
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) Author name : Cassie surname : Norman 2 coauthorship
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Cassie surname : Norman 2 coauthorship
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 ǫ(1) Author name : Cassie surname : Norman 2 coauthorship
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 ǫ(0 → 1), ǫ(1 → 0) Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 ǫ(1) Author name : Cassie surname : Norman 2 coauthorship coauthorship
9/16
THoSP Algorithm – Example
Author name : Abigail surname : Conner Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 Paper title : On Nesting Graphs 5 AuthorOf 6 A u t h
- r
O f 7 AuthorOf 8 AuthorOf 9 AuthorOf 10 Paper title : On Joining Graphs 3 Paper title : Object Databases 4 ǫ(0 → 1), ǫ(1 → 0) Paper title : On Nesting Graphs 5 Author name : Abigail surname : Conner ǫ(0) ǫ(0 → 2), ǫ(2 → 0) ǫ(2) Author name : Baldwin surname : Oliver 1 Author name : Cassie surname : Norman 2 coauthorship coauthorship ǫ(1)
9/16
Experimental Evaluation
Experimental Evaluation – Dataset
We want to show that the combination of THoSP with the proposed physical data model outperforms the query plans for other query languages (Cypher, SPARQL, SQL, AQL). We performed our tests on both synthetic and real world data, using n = 1 ÷ 8 operands with vertex size 10n:
- GMark graph generator.
- Random samples of Microsof Academic Graph.
Our tests’ source code is available at: https://bitbucket.org/unibogb/graphnestingc/src
10/16
Experimental Evaluation – Competing DataBases
Given that the only graph database using Java was the the worst performing one, we implemented our solution only in C++ The graph nesting operator was implemented in each DB language by redurning ID collections.
- PostgreSQL was used to evaluate SQL queries. We ran the
queries directly in psql.
- SPARQL queries were evaluated over Virtuoso. SPARQL
queries were send via ODBC (C++).
- Cypher queries were evaluated over Neo4J. SPARQL queries
were send via the execute method.
- AQL queries were evaluated over ArangoDB. We ran the
queries directly in arangosh.
11/16
Experimental Evaluation – GMark Benchmark
Operands Size Two HOp Separated Pattern Time (C/C++) (ms) |V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP 10 3 2.10 11 15.00 681.40 0.11 102 58 9.68 63 3.89 1,943.98 0.14 103 968 17.96 63 12.34 >3.60×106 0.46 104 8, 683 69.27 364 46.74 >3.60×106 4.07 105 88, 885 294.23 4,153 508.87 >3.60×106 43.81 106 902, 020 2,611.48 50,341 7,212.19 >3.60×106 563.02 107 8, 991, 417 25,666.14 672,273 922,590.00 >3.60×106 8,202.93 108 89, 146, 891 396,523.88 >3.60×106 >3.60×106 >3.60×106 91,834.20
12/16
Experimental Evaluation – Microsof Academic Graph Bench- mark
Operands Size Two HOp Separated Pattern Time (C/C++) (ms) |V| #Subgraph SQL+JSON SPARQL AQL Cypher THoSP 10 19 1.69·100 3.4·101 6.57·10−1 2.38·103 2.82·10−1 102 255 1.75·100 3.22·102 2.51·100 1.01·104 3.46·10−1 103 23,119 4.71·101 1.22·103 8.18·101 >1H 1.39·101 104 5,411,205 1.53·104 2.77·105 2.08·104 >1H 2.58·103 105 97,079,329 1.20·106 >1H OOM1 >1H 1.97·105 106 241,448,529 >1H >1H OOM1 >1H 6.22·105 107 361,759,509 OOM2 >1H OOM1 >1H 7.74·105
13/16
Experimental Evaluation – Results
- This further benchmarks shows that all the current data model
supporting nested representation do not support query plans allowing for a specific case of (graph) nesting.
- The proposed approach extended the secondary memory’s
property graph representation by adding associations to nested vertices and edges.
- The serialized data structure provides a graph having an
external containment data structure.
- This data model achieves structural aggregation for graph data,
where aggregated data may preserve the original vertices and edges.
14/16
Experimental Evaluation – Further Results
GROQ: THoSP can be generalized into a more general algorithm. Generalized Semistructured Model: This data structure can be generalized into a broader data representation.
15/16
Experimental Evaluation – Future Work
GROQ: Further benchmarks have to be carried out over this more general general nesting algorithm. General Nesting: Provide a query plan where either grouping or GROQ are used.
16/16
Backup Slides
Backup Slides – Nested Graph Database
Nested Graph DataBase Given a set Σ∗ of strings, a nested (property) graph database G is a tuple G = V, E, λ, ℓ, ω, ν, ǫ, where:
- V, E ∈ N2 s.t. V ∩ E = ∅
- source and target λ: E → V2.
- labelling ℓ : V ∪ E → ℘(Σ∗)
- object mapping ω : V ∪ E → Ω
- vertices’ containment: ν: (V ∪ E) → ℘(V)
- edges’ containment: ǫ: (V ∪ E) → ℘(E)
Each vertex or edge o ∈ V ∪ E induces a nested (property) graph as the following pair: Go =
- ν(o),
- e ∈ ǫ(o)
- λ(e) ∈ (∪n≥0 νǫ(n)({o}))2
THoSP Pseudocode
nest ( Cont , patt , u , S ) : for each s in S s . t . patt . d o S e r i a l i z e ( s ) : Cont . write ( <u , s >) Input : G, gV , gE Cont ← ∅ NestedGraph ← ∅ a ← V ∩ E \ ( γV ∪ γsrc
E
∪ γdst
E
) ; for each v : v e r t e x in G s . t . a ( v ) : for each V( u →e v ) : u : = d t l ( u ) c ; nest ( Cont , V , u , { u , e , v } ) NGraph (V) ← NGraph (V) ∪ { u } for each V(w →e′ v ) s . t . E ( u →e ve′ ←w) w : = d t l (w) c ; e’ : = d t l ( u ,w) c ; nest ( Cont , E , e’ , { u , e , v , e ' ,w} ) NGraph ( E ) ← NGraph ( E ) ∪ { u →e’ w }