Graphs On Databases
Alekh Jindal
CSAIL, MIT
Sam Madden Mike Stonebraker
Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker - - PowerPoint PPT Presentation
Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT + = Jena FlockDB AllegeroGraph TAO Pegasus Neo4j DEX HypergraphDB Pregel Titan GraphBase Twister Giraph Trinity HaLoop GraphLab PrItr SQL ? ? Can we
Alekh Jindal
CSAIL, MIT
Sam Madden Mike Stonebraker
edge tables
tables
Parallel graph exploration
Nested queries; database handles the optimizations
UDFs, Stored Procedures
Keep connections alive between iterations
Sort orders, indexes
Time (seconds)
1 10 100 1000 10000
Facebook Twitter GPlus LiveJournal
Graph Database SQL: Main-memory Store SQL: Row Store SQL: Column Store
4K 88K 76K 107K 4.8M 2.4M 30M 69M Nodes Edges
4K 88K 76K 107K 4.8M 2.4M 30M 69M Nodes Edges
Time (seconds)
1 10 100 1000 10000 100000
Facebook Twitter GPlus LiveJournal
212.7 20.2 21.3 4.4 18,702.2 1,231.7 168.1 8.7 428.4 4.7 395.6 3.2
Graph Database SQL: Main-memory Store SQL: Row Store SQL: Column Store
UPDATE NNodes AS nnode SET Estimate = new_nnode.Estimate, Predecessor = new_nnode.Predecessor FROM (SELECT temp.Id, temp.Estimate, edge.from_node AS Predecessor FROM NNodes AS nn, edge, (SELECT e.to_node AS Id, min(n1.Estimate+1) AS Estimate FROM NNodes AS n1, edge AS e, NNodes AS n2 WHERE n1.Id=e.from_node AND n2.Id=e.to_node GROUP BY e.to_node, n2.Estimate HAVING min(n1.Estimate+1) < n2.Estimate ) AS temp WHERE nn.Id=edge.from_node AND edge.to_node=temp.Id AND nn.estimate=temp.estimate-1 ) AS new_nnode WHERE nnode.Id = new_nnode.Id;
void compute(vector<vfloat> messages){
vfloat mindist = id==START_NODE ? 0 : DBL_MAX; for(vector<vfloat>::iterator it = messages.begin(); it != messages.end(); ++it) mindist = min(mindist,*it);
vfloat vvalue = getVertexValue(); if(mindist < vvalue){ modifyVertexValue(mindist); vector<vint> edges = getOutEdges(); for(vector<vint>::iterator it = edges.begin(); it != edges.end(); ++it) sendMessage(*it, mindist+1); }
voteToHalt(); }
queries in a SQL database
vertex in the graph
state or if they have an incoming message
messages, vertex value, vertex edges, etc.
as UDFs in the SQL database
redistributes the messages from one super step to the next
Table unions in place of joins
Batching several vertices in each UDF call
Replace messages table, no in-place updates
Sort orders, indexes
4K 88K 76K 107K 4.8M 2.4M 30M 69M Nodes Edges
Time (seconds)
1 10 100 1000
Facebook Twitter GPlus LiveJournal
100.9 47.0 35.0 28.2 439.8 92.5 18.4 5.7 212.7 20.2 21.3 4.4
SQL: Column Store Vertex-centric: Column Store Vertex-centric: Apache Giraph
…. right within the SQL database system!
queries (plus UDFs)
graph-natural query interfaces
Alekh Jindal
CSAIL, MIT
Sam Madden Mike Stonebraker