MIT Relational Database GraphiQL Expensive! Graph Intuitive Query - - PowerPoint PPT Presentation

mit relational database
SMART_READER_LITE
LIVE PREVIEW

MIT Relational Database GraphiQL Expensive! Graph Intuitive Query - - PowerPoint PPT Presentation

GraphiQL Graph Intuitive Query Language for Relational Databases at Talking on IEEE BigData 2014 Alekh Jindal collaborate Supervisors Amol Deshpande Sam Madden work sabbatical Mike Stonebraker work University work of Maryland


slide-1
SLIDE 1

Alekh Jindal

GraphiQL


Graph Intuitive Query Language for Relational Databases

Sam Madden Mike Stonebraker Amol Deshpande

MIT

University

  • f Maryland

IEEE BigData 2014

Talking on at Supervisors work work collaborate work sabbatical

slide-2
SLIDE 2

Alekh Jindal

GraphiQL


Graph Intuitive Query Language for Relational Databases

Sam Madden Mike Stonebraker Amol Deshpande

MIT

University

  • f Maryland

IEEE BigData 2014

Talking on at Supervisors work work collaborate work sabbatical

Relational Database

Expensive! Expensive!

slide-3
SLIDE 3

Graph Analysis = Graph Algorithms

Store Extract Preprocess Update Failover Postprocess

+

slide-4
SLIDE 4

Graph Analysis = Graph Algorithms

Store Extract Preprocess Update Failover Postprocess

+

“Counting Triangles with Vertica” “Scalable Social Graph Analytics Using the Vertica Analytic Platform,” “Graph Analysis: Do We Have to Reinvent the Wheel?” “Query Optimization of Distributed Pattern Matching,” “GraphX: A Resilient Distributed Graph System on Spark,” “Vertexica: Your Relational Friend for Graph Analytics!” 


Relational Database Relational Database

slide-5
SLIDE 5

Problem !

slide-6
SLIDE 6

Alekh Jindal

GraphiQL


Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande

MIT

University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical

SQL

slide-7
SLIDE 7

SELECT UPDATE FROM GROUP BY SUM COUNT WHERE

slide-8
SLIDE 8

Redundant Effort

Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical
slide-9
SLIDE 9

Optimizations?

slide-10
SLIDE 10

GraphiQL

slide-11
SLIDE 11

Alekh Jindal

GraphiQL


Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande

MIT

University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical

SQL

slide-12
SLIDE 12 Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical Alekh Jindal GraphiQL
 Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande MIT University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical

GraphiQL

slide-13
SLIDE 13

Key Features

  • Graph view of relational data; the system takes

care of mapping to the relational world

  • Inspired from PigLatin: right balance between

declarative and procedural style language

  • Key graph constructs: looping, recursion,

neighborhood access

  • Compiles to optimized SQL
slide-14
SLIDE 14

Graph Table Relational Table

Alekh Jindal

GraphiQL


Graph Intuitive Query Language for Relational Databases Sam Madden Mike Stonebraker Amol Deshpande

MIT

University
  • f Maryland
IEEE BigData 2014 Talking on at Supervisors work work collaborate work sabbatical

GraphiQL SQL

Graph Table

slide-15
SLIDE 15

Graph Elements node1 node2 edge1 edge2 edge3 node3 node4 node5 weight type id

node6 node7 edge4 edge5 edge6 node8 node9 edge7 edge8 edge9
  • utgoing

incoming

Graph Table

slide-16
SLIDE 16

Graph Table Definition

  • Create
  • Load
  • Drop

CREATE GRAPHTABLE g AS
 NODE (p1,p2,..)
 EDGE (q1,q2,..) LOAD g AS
 NODE FROM graph_nodes DELIMITER d
 EDGE FROM graph_edges DELIMITER d DROP GRAPHTABLE g

slide-17
SLIDE 17

Graph Table Manipulation

  • Iterate
  • Filter
  • Retrieve
  • Update
  • Aggregate

FOREACH element in g
 [WHILE condition] g’ = g(k1=v1,k2=v2,…,kn=vn) GET expr1,expr2,…,exprn
 [WHERE condition] SET variable TO expr
 [WHERE condition] SUM, COUNT, MIN, MAX, AVG

slide-18
SLIDE 18

Nested Manipulation

inner


  • uter

Iterate Aggregate Retrieve Update Iterate Aggregate Retrieve Update

slide-19
SLIDE 19

Example 1: PageRank

FOREACH n IN g(type=N)
 SET n.pr TO new_pr

slide-20
SLIDE 20

Example 1: PageRank

FOREACH n IN g(type=N)
 SET n.pr TO 0.15/num_nodes + 0.85*SUM(pr_neighbors)

slide-21
SLIDE 21

Example 1: PageRank

FOREACH n IN g(type=N)
 SET n.pr TO 0.15/num_nodes + 0.85*SUM(
 FOREACH n’ IN n.in(type=N)
 GET pr_n’
 )

slide-22
SLIDE 22

Example 1: PageRank

FOREACH n IN g(type=N)
 SET n.pr TO 0.15/num_nodes + 0.85*SUM(
 FOREACH n’ IN n.in(type=N)
 GET n’.pr/COUNT(n’.out(type=N))
 )

slide-23
SLIDE 23

Example 1: PageRank

FOREACH iterations IN [1:10]
 FOREACH n IN g(type=N)
 SET n.pr TO 0.15/num_nodes + 0.85*SUM(
 FOREACH n’ IN n.in(type=N)
 GET n’.pr/COUNT(n’.out(type=N))
 )

slide-24
SLIDE 24

Example 1: PageRank

FOREACH iterations IN [1:10]
 FOREACH n IN g(type=N)
 SET n.pr TO 0.15/num_nodes + 0.85*SUM(
 FOREACH n’ IN n.in(type=N)
 GET n’.pr/COUNT(n’.out(type=N))
 )

Reason about graph Neighborhood Access Looping Nested Manipulations

slide-25
SLIDE 25

Example 2: SSSP

FOREACH n IN g(type=N)
 SET n.dist TO min_dist

slide-26
SLIDE 26

Example 2: SSSP

FOREACH n IN g(type=N)
 SET n.dist TO MIN(n.in(type=N).dist)+1 AS dist’
 WHERE dist’ < n.dist

slide-27
SLIDE 27

Example 2: SSSP

WHILE updates > 0
 FOREACH n IN g(type=N)
 updates = 
 SET n.dist TO MIN(n.in(type=N).dist)+1 AS dist’
 WHERE dist’ < n.dist

slide-28
SLIDE 28

Example 2: SSSP

SET g(type=N).dist TO inf
 SET g(type=N,id=start).dist TO 0
 WHILE updates > 0
 FOREACH n IN g(type=N)
 updates = 
 SET n.dist TO MIN(n.in(type=N).dist)+1 AS dist’
 WHERE dist’ < n.dist

slide-29
SLIDE 29

GraphiQL Compiler

  • Graph Table manipulations to relational
  • perators:

  • filter selection predicates

  • iterate driver loop

  • retrieve projections

  • update update in place

  • aggregate group-by aggregate
  • Graph Tables to relational tables:

  • mapping
slide-30
SLIDE 30

GraphiQL Compiler

g(type=N) N g(type=E) E g(type=N).out(type=E) N ⋈ E g(type=E).out(type=E) E ⋈ E g(type=N).out(type=N) N ⋈ E ⋈ N g.out.in = g.in g.in.out = g.out

slide-31
SLIDE 31

Example: SSSP

SET g(type=N).dist TO inf
 SET g(type=N,id=start).dist TO 0
 WHILE updates > 0
 FOREACH n IN g(type=N)
 updates = 
 SET n.dist TO 
 MIN(n.in(type=N).dist)+1 AS dist’
 WHERE dist’ < n.dist lupdateCount>0 (
 n.dist ← σn.dist>dist’ (
 !min(n’.dist)+1 (
 "n.id (
 N ⋈ E ⋈ N’
 )
 )
 )
 )

slide-32
SLIDE 32

GraphiQL Optimizations

  • De-duplicating graph elements
  • Selection pushdown
  • Cross-product as join
  • Pruning redundant joins
slide-33
SLIDE 33

Performance

slide-34
SLIDE 34

Performance

Machine:

2GHz, 24 threads, 48GB memory, 1.4TB disk

slide-35
SLIDE 35

Performance

Machine:

2GHz, 24 threads, 48GB memory, 1.4TB disk Dataset:

Small: 81k/1.7m directed; 334k/925k undirected Large: 4.8m/68m directed; 4m/34m undirected

slide-36
SLIDE 36

Performance - small graph

Time (seconds) 16 32 48 64 PageRank Shortest Path Triangles (global) Triangles (local) Strong Overlap Weak Ties

Apache Giraph GraphiQL

12x Speedup!

slide-37
SLIDE 37

Time (seconds) 400 800 1200 1600 PageRank Shortest Path Triangles (global) Triangles (local) Strong Overlap Weak Ties

Apache Giraph GraphiQL

Performance - large graph

4.3x Speedup!

slide-38
SLIDE 38

Summary

  • Several real world graph analysis are better off in

relational databases

  • We need both the graph as well as relational view of data
  • GraphiQL introduces Graph Tables to allows users to

think in terms of graphs

  • Graph Table supports recursive association, nested

manipulations, and SQL compilation

  • GraphiQL allows users to easily write a variety of graph

analysis

slide-39
SLIDE 39

Thanks!

slide-40
SLIDE 40

Other Languages

Imperative languages: e.g. Green Marl XPath: e.g. Cypher, Gremlin Datalog: e.g. Socialite SPARQL: Teradata blog Procedural language: e.g. Vertex-centric