Extending In-Memory Relational Database Engines with Native Graph - PowerPoint PPT Presentation

Extending In-Memory Relational Database Engines with Native Graph Support Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 2 University of California – Davis, CA, USA 1 Purdue University – West Lafayette, IN, USA EDBT’18

Graphs are Ubiquitous 2 Biological Network Road Network Social Network Datacenter Network

Specialized Graph Database Systems 3 ¨ Specialized graph databases can handle graph query-workloads ¤ Vital queries include shortest-path and reachability queries

Why Use Relational Database Systems to Support Graphs ? 4 ¨ RDBMS technology is very mature and widely-adopted ¨ Relational data can have latent graphs ¨ Can easily represent graphs using relational tables ¨ Many applications involve graph queries ¤ Queries that involve both relational and graph predicates n E.g., for each Patient P in a selected area, find the nearest hospital to P ¨ How can an RDBMS effectively and efficiently handle graph query workloads ?

Graph Support in RDBMSs 5 ¨ Why is it challenging ? ¤ There is an impedance mismatch between the relational model and the graph model ¨ Two common approaches for supporting graphs: ¤ Native Relational-Core ¤ Native Graph-Core ¨ Native G+R Core (The proposed GRFusion system)

Native Relational-Core 6 ¨ Use a vanilla RDBMS Results ¨ Encode graphs in relational schemas Graph Queries ¨ Support limited graph queries ¨ Translate the supported graph queries into SQL or procedural SQL Relational Queries SQL Translation Layer ¨ E.g., SQLGraph [SIGMOD’15], (SQL) Grail [CIDR’15] ¨ Pros: ¤ Use of very mature RDBMS technology Relational Data Graph Encoded into ¨ Cons: Relational Tables ¤ Several graph queries are inefficient to evaluate using pure SQL ¤ Graphs are encoded in complex schemas Relational Database

Native Graph-Core 7 ¨ Build on top of an RDBMS Results ¨ Extract graphs from the RDBMS Graph Queries ¨ Store graphs and process queries outside the realm of the RDBMS Graph Extraction and Materialization Engine ¨ E.g., Ringo [SIGMOD’15], GraphGen [VLDB’15, SIGMOD’17] Graph Extraction Extracted Graphs ¨ Pros: Queries (SQL) ¤ Native processing of graph operations ¨ Cons: Relational Data ¤ Graph updates require re-extracting the graphs ¤ Queries cannot reference any non-extracted relational data Relational Database

The Relational Model vs. the Graph Model 8 ¨ Graph-core approach ¤ +ve: Queries involving graph traversals are efficiently handled in the graph model (e.g., shortest paths) ¤ -ve: Not as pervasive and mature as RDBMSs ¨ Relational-core approach ¤ +ve: Mature and pervasive ¤ -ve: Either many temporary inserts/deletes/updates, or too many joins to traverse a graph n Intermediate-result size and cardinality estimation ¨ Can the best of the two worlds be combined ? ¤ Support native graph processing inside an RDBMS

Proposed Approach: Native G+R Core 9 ¨ Assume that graphs have relational schemas Results ¨ Relational schemas describe the edges/nodes Graph-Relational Queries (SQL) ¨ Enables graphs to be defined as native database objects π Graph and Relational Operators ¨ Store graphs in non-relational structures ⋈ in the Same QEP optimized for graph operations σ GraphOp ¨ Extend the SQL language Graph Views (Topology Relational Data ¤ Queries can compose relational and + Tuple Pointers) graph operations ¨ Cross-Data-Model QEPs (Query Evaluation Plans) ¨ Graph updates are supported Graph Construction Relational Database

GRFusion: Realizing the G+R Approach 10 Declarative Graph-Relational Queries ¨ We realize the G+R approach inside VoltDB Query Parser ¤ An open-source in-memory RDBMS Query Optimizer ¤ GRFusion: Our realization of the G+R Plan Executor approach into VoltDB Graph-Relational Query Engine ¤ A demo of GRFusion will appear in SIGMOD 2018 Relational Data Graph Views In-Memory Relational Database

Create Graph View 11 ¨ Create-Graph-View statement ¤ Create a named graph database object that can be referenced in queries ¤ Define the relational sources of the graph’s vertexes/edges ¤ Materialize the topology of the graph in main-memory as a singleton graph structure

Graph-View of a Social Network 12

Graph-View Structure [Traversal Index] 13

Graph-View Structure [Traversal Index] 14

The VERTEXES Construct 15 ¨ Appears in the FROM clause and references a graph view ¤ Select … From MyGraphView.VERTEXES v ¨ VERTEXES represents the vertexes of a graph view ¨ A vertex is a tuple with the following properties: ¤ Id ¤ FanIn ¤ FanOut ¤ Property for each vertex attribute

The EDGES Construct 16 ¨ Appears in the FROM clause and references a graph view ¤ Select … From MyGraphView.EDGES v ¨ EDGES represents the edges of a graph view ¨ An edge is a tuple with the following properties: ¤ Id ¤ StartVertexId ¤ EndVertexId ¤ Property for each edge attribute

The PATHS Construct – Extended SQL 17 ¨ Appears in the FROM clause and references a graph view ¤ Select … From MyGraphView.PATHS P ¨ PATHS represents a set of lazily-evaluated paths ¨ A path is a set of consecutive edges ¨ Each edge has two endpoint vertexes ¤ E.g., (V:attributes) –(:E:attributes) à (V:attributes) ….. ¨ A path is a tuple with the following properties: ¤ Length ¤ StartVertex ¤ EndVertex ¤ Vertexes ¤ Edges

Declarative Graph-Relational Queries 18

The PathScan Operator 19 ¨ PathScan is a logical operator that acts on a graph-view ¤ Has three corresponding physical operators: BFScan, DFScan, SPScan ¨ The output of PathScan is a tuple ¤ Extends the standard relational tuple ¤ PathScan output can be ingested by other relational operators in the QEP ¨ PathScan accepts the id of the vertex to start the traversal from ¤ Otherwise, all the vertexes will be considered as start vertexes ¨ Filters can be pushed as Hints into the PathScan operator ¤ E.g., P.PathLength = 2

Friends-of-Friends Query Example 20 ¨ For all the users working as lawyers, retrieve the last name of their friends of friends, where the friendships happened after 1/1/2000

QEP of the Friends-of-Friends Query 21

Reachability Query Example 22 ¨ Check if Protein X interacts directly (i.e., by an edge) or indirectly (i.e., by a path) with Protein Y through either a covalent or a stable interaction type.

Shortest-Path Queries with Relational Predicates 23

Evaluating The Native G+R Approach 24 ¨ Realized a certralized version of GRFusion ( Native G+R Core approach) inside VoltDB Version 6.7 ¨ Single node running Linux kernel Version 3.17.7 n 32 cores of Intel Xeon 2.90 GHz n 384 GB of RAM ¨ Comparing against: ¤ Native Relational-Core: n SQLGraph [SIGMOD’15], Grail [CIDR’15] ¤ Natice Graph-Core Systems: n Neo4j [neo4j.com] and Titan [thinkaurelius.github.io/titan]

Experimental Setup 25 ¨ Native relational-core approach ¤ SQLGraph [SIGMOD’15] n Represent path traversal using recursive relational joins n Commercial system (code not available) n Implemented the techniques in VoltDB in-memory ¤ Grail [CIDR’15] n Implemented Grail in VoltDB n Also evaluated Grail in Hekaton n Got similar conclusions (Do not report the Hekaton results here)

Experimental Setup (Cont’d) 26 ¨ Native Graph Approch ¤ Neo4j [neo4j.com] and Titan [thinkaurelius.github.io/titan] n Native graph-cores (specialized graph systems) n Disk-based systems n Titan: configured to use the in-memory storage configuration n Neo4j: Run on RamDisk to mitigate the disk IO cost ¤ GRFusion uses simple graph algorithms (single-source-shortest-path - Dijkstra’s algorithm) n Want to investigate performance gains, if any, of the G+R approach in contrast to the native relational-core

Evaluating GRFusion 27 ¨ Graph queries ¤ Reachability queries ¤ Reachability queries with filtering predicates ¤ Shortest path queries ¤ Subgraph queries (e.g., count triangles) ¨ Datasets

Reachability Queries (DBLP Dataset) 28 ¨ Performance of GRFusion, Neo4j, Titan more stable in contrast to SQLGraph ¤ Avoid overheads of recursive relational joins ¨ GRFusion performs better than Neo4J &Titan ¤ VoltDB is optimized for main-memory ¤ Disk-based Titan/Neo4j (although runs on RamDisk) are not optimized for main-memory ¤ Graph views in GRFusion are more compact n Encode only the topology within the graph n No vertex/edge attributes in the topology n Thus, GRFusion makes better use of caching ¤ GRFusion/VoltDB are C++-based ¤ Neo4j and Titan are Java-based n Overheads from the automatic memory management of Java

Reachability Queries (String Dataset) 29 ¨ String dataset: ~ 0.5B edges >> DBLP ¨ SQLGraph (based on VoltDB): ¤ Materialize the join results at each intermediate stage ¤ Explosion in size of intermediate results (perform more than 11 joins) ¨ SQLGraph and GRFusion follow BFS evaluation ¨ GRFusion follows as iterative model: ¤ Evaluate one path at a time ¤ Also, only the vertex Ids are stored in BFS queue ¤ More efficient storage-wise than storing the tuples of the relational joins as intermediate results

Reachability Queries (Twitter Dataset) 30 ¨ Twitter dataset: 1.4B edges dataset ¨ Fan-out is also a factor in the performance ¤ But we did not study effect of fanout ¤ Would require synthetic datasets ¤ Current study focus on real datasets

Extending In-Memory Relational Database Engines with Native Graph - PowerPoint PPT Presentation

Extending In-Memory Relational Database Engines with Native Graph Support Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 2 University of California Davis, CA, USA 1 Purdue University West

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Extending In-Memory Relational Database Engines with Native Graph Support EDBT18 Mohamed S.

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

The relational data model and relational algebra 1 Preliminaries The early days of database engines

This Lecture The Relational Model Relational data structures Relations and Relational

Extending Relational Databases Toon Calders t.calders@tue.nl Last Lectures Relational query

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

CSE 154 LECTURE 22:RELATIONAL DATABASES AND SQL Relational databases relational database : A

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Connecting declarative software tools Declarative tools [for] connecting software Salvador Lucas

The Database as a Value Rich Hickey What is Datomic? A functional database A sound model

A Formally Verified Compiler for Lustre Timothy Bourke 1 , 2 Llio Brun 1 , 2 Pierre-variste

@odin odinthe thener nerd not the god Auto-Intern GmbH 1 @odinthenerd A possible future

Internal Simulation as a Key to Cognitive Function Lund, 2012 Germund Hesslow Problems of the

Logistics Database Management Systems Go to http://www.ccs.neu.edu/~mirek/classes/2010-F-

The Myria Big Data Management and Analytics System and Cloud Service Jingjing Wang, Tobin Baker,

CSCI: 4500/6500 Programming Languages Origin & Evolution 1 Maria Hybinette, UGA Programming

Extending In-Memory Relational Database Engines with Native Graph - PowerPoint PPT Presentation

Extending In-Memory Relational Database Engines with Native Graph Support Mohamed S. Hassan 1 Tatiana Kuznetsova 1 Hyun Chai Jeong 1 Walid G. Aref 1 Mohammad Sadoghi 2 2 University of California Davis, CA, USA 1 Purdue University West

Chapter 3: Relational Model Structure of Relational Databases Relational Algebra Tuple

Extending In-Memory Relational Database Engines with Native Graph Support EDBT18 Mohamed S.

Chapter 2: Relational Model Chapter 2: Relational Model Structure of Relational Databases

Chapter 8 Evaluation of Relational Operators Implementing the Relational Algebra Relational

Relational Algebra Relational Query Languages Recall: Query = Retrieval Program Language

Relational Algebra 1 / 39 Relational Algebra Relational model specifies stuctures and

Relational Query Languages (2) SQL and QBE Walid G. Aref Query Languages For The Relational

The relational data model and relational algebra 1 Preliminaries The early days of database engines

This Lecture The Relational Model Relational data structures Relations and Relational

Extending Relational Databases Toon Calders t.calders@tue.nl Last Lectures Relational query

CSE 154 LECTURE 13:RELATIONAL DATABASES AND SQL Relational databases relational database : A

CSC 337 LECTURE 20: RELATIONAL DATABASES AND SQL Relational databases relational database : A

Extended RA Database Systems: The Complete Book Ch 5.1-5.2, 15.4 1 Relational Algebra A Set of

CSE 154 LECTURE 22:RELATIONAL DATABASES AND SQL Relational databases relational database : A

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Game Engines 1 Overview Game engines are a significant part of the modern games industry

Connecting declarative software tools Declarative tools [for] connecting software Salvador Lucas

The Database as a Value Rich Hickey What is Datomic? A functional database A sound model

A Formally Verified Compiler for Lustre Timothy Bourke 1 , 2 Llio Brun 1 , 2 Pierre-variste

@odin odinthe thener nerd not the god Auto-Intern GmbH 1 @odinthenerd A possible future

Internal Simulation as a Key to Cognitive Function Lund, 2012 Germund Hesslow Problems of the

Logistics Database Management Systems Go to http://www.ccs.neu.edu/~mirek/classes/2010-F-

The Myria Big Data Management and Analytics System and Cloud Service Jingjing Wang, Tobin Baker,

CSCI: 4500/6500 Programming Languages Origin &amp; Evolution 1 Maria Hybinette, UGA Programming

CSCI: 4500/6500 Programming Languages Origin & Evolution 1 Maria Hybinette, UGA Programming