GraphGen: Adaptive Graph Processing using Relational Databases
Department of Computer Science University of Maryland
GraphGen : Adaptive Graph Processing using Relational Databases - - PowerPoint PPT Presentation
GraphGen : Adaptive Graph Processing using Relational Databases Department of Computer Science University of Maryland Graph Analytics / Querying Graph datasets can provide value in many domains Protein Interaction Email Networks Social
Department of Computer Science University of Maryland
Graph datasets can provide value in many domains
Social Networks Email Networks Protein Interaction Networks Stock Trading Networks
Many different types of ways to manage graph data
DECLARATIVE
customer_key
Orders
c1
c2
c3
part_key
LineItem
p1
p2
p1
p3
p1
p2
p2 c_key name
Customer
c_key p_key c1 p1 c1 p2 c3 p2 c4 p1 c6 p1
Orders LineItem
On order_key
Which customer bought which product?
On p_key
Which customers bought the same item?
c1 c4 cust1 cust2 c1 c6 c1 c3
c1
c4 c6
c4 c3 c6
c_1 John c_2 Jane
customer_key
Orders
c1
c2
c3
y part_ke y
LineItem
p1
p2
p1
p3
p1
p2
p2 c_ke y name
Customer
c_key p_key c1 p1 c1 p2 c3 p2 c4 p1 c6 p1
Orders LineItem
On order_key On p_key
Which customers bought the same item?
c1 c4 cust1 cust2 c1 c6 c1 c3
c1
c4 c6
c4 c3 c6
c_1 John c_2 Jane
Many other graphs of potential interest:
Which customer bought which product?
Backend Relational DBMS
Java Program
Graph Definition Queries
DSL Parser + Optimizer
In-Memory Engine SQL Queries
Graph Analysis Queries
Results Direct Graph Access
Vertex- Centric Directly over Graph
GraphGen
○ User specifies how to construct the Nodes and Edges
CREATE GRAPHVIEW CoAuthors AS Nodes(ID, name) :- Author(ID, name). Edges(ID1, ID2, wt=$COUNT(pub)) :- AuthorPub(ID1, pub), AuthorPub(ID2, pub).
○ Can enable many optimizations
Edge Property: number of publications
CREATE GRAPHVIEW AuthorEgoNetworks(X) WHERE Author(X) AS Nodes(X, name) :- Author(X, name). Nodes(ID, name) :- AuthorPub(X,pub), AuthorPub(ID,pub), Author(ID, name). Edges(ID1, ID2) :- AuthorPub(ID1, pub), AuthorPub(ID2, pub).
Extract all
ego-graphs
SPARQL, Cypher, PGQL etc.
computation over the Edges VIEW
USING GRAPHVIEW CoAuthors Triangle(X, Y, Z) :- Nodes(X, _, “ML” ),Nodes(Y, _, “DB”), Nodes(Z, _, “AL” ),Edges(X, Y),Edges(Y, Z),Edges(X, Z).
Find triangles of authors whose areas follow: “ML” -> “DB” -> “AL”
Backend Relational DBMS
Java Program
Graph Definition Queries
DSL Parser + Optimizer
In-Memory Engine SQL Queries
Graph Analysis Queries
Results Direct Graph Access
Vertex- Centric Directly over Graph
GraphGen
Backend Relational DBMS
Java Program
Graph Definition Queries
DSL Parser + Optimizer
In-Memory Engine SQL Queries
Graph Analysis Queries
Results Direct Graph Access
Vertex- Centric Directly over Graph
GraphGen
based on the query/analysis.
In-memory execution In-database execution
Dataset In-memory ETL
MySQL PosgreSQL
Small 0.001 s 2.05 s
0.8 s 0.1 s
Large 0.015 s 17.52 s
4.26 0.704 s
Triangle Pattern Matching
Dataset DBS1 DBS2 Small 0.899 s 0.22 Large 4.25 s NA
estimation errors associated with them.
With Nodes as (...) With Edges as (...) (SQL for answering query) Create View Edges as (...) Create View Nodes as (...) (SQL for answering query)
DISTINCT DISTINCT
1) With vs VIEW 2) Duplicate Elimination (DISTINCT)
the query / analysis doesn’t care about them!
With Nodes as (...) With Edges as (...) (SQL for answering query) Create View Edges as (...) Create View Nodes as (...) (SQL for answering query)
DISTINCT DISTINCT
1) With vs VIEW 2) Duplicate Elimination (DISTINCT)
the query / analysis doesn’t care about them!
Time for query to finish in seconds
Snapshots
CREATE GRAPHVIEW CoAuthorsSnapshot(X) WHERE X IN RANGE (1950 , 2017 , 1) Nodes(ID,name) :- Author(ID,name). Edges(ID1,ID2) :- AuthorPub(ID1, pub), AuthorPub(ID2, pub), Publication(pub, _, Y), Y <= X.
Key Challenge: Develop a systematic approach to optimizing the extraction of and execution against such multi-graph views.
E.g. Ego-Graph Analysis
Please see full paper
savings
collections of graphs
aid1 a1 a1 a6 a1 aid2 a2 a5 a6 a7 a7 a8 a5 a3 a3 a4 a2 a3 tag a1 a1 a6 a1 a7 a5 a3 a2
Tagged Edges Table
e1.aid2 = e2.aid1
aid1 a1 a1 a6 a1 aid2 a2 a5 a6 a7 a7 a8 a5 a3 a3 a4 a2 a3 tags[] [a1] [a1] [a1,a6] [a1] [a6,a7] [a5,a1] [a2,a3,a5] [a1,a2]
Tag Aggregation
Find the edges 1-hop away for the source (tag) & Union the result with the initial Tagged Edges table aid1 a2 a5 a6 aid2 a3 a3 a7 a7 a8 a3 a4 a3 a4 tag a1 a1 a1 a6 a5 a2 a1 a1 a6 a1 a2 a5 a6 a7 a7 a8 a5 a3 a3 a4 a2 a3 a1 a1 a6 a1 a7 a5 a3 a2 Tags show which ego-graphs involve the edge
graphs stored implicitly in a structured data store.
○ GraphViews over relational schemas ○ Declarative Graph queries
GraphViews There is a variety of challenges & opportunities here in terms of: