The ¡case ¡against ¡specialized ¡ graph ¡engines ¡
Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel University of Wisconsin – Madison
01/06/2015 University of Wisconsin–Madison 1
The case against specialized graph engines Jing Fan, - - PowerPoint PPT Presentation
The case against specialized graph engines Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel University of Wisconsin Madison 01/06/2015 University of WisconsinMadison 1 Motivation Graph
01/06/2015 University of Wisconsin–Madison 1
01/06/2015 University of Wisconsin–Madison 2
The house always wins
01/06/2015 University of Wisconsin–Madison 3
01/06/2015 University of Wisconsin–Madison 4
01/06/2015 University of Wisconsin–Madison 5
Vertex- centric API Easy to write graph programs Higher programmer productivity
01/06/2015 University of Wisconsin–Madison 6
Vali
do { foreach vertex in the graph { receive_messages(); mutate_vertex_value(); if (send_to_neighbors()) { send_messages_to_neighbors(); } } } until (has_converged() || reached_limit()) e1 e2 e3 e4 e5 New Vali
01/06/2015 University of Wisconsin–Madison 7
∞
∞
∞ \
A D B C 1 2 2 3
01/06/2015 University of Wisconsin–Madison 8
1
2
∞ \
A D B C 1 2 2 3
( D , 3 )
∞ 3
01/06/2015 University of Wisconsin–Madison 9
1
2
3 \
A D B C 1 2 2 3
01/06/2015 University of Wisconsin–Madison 10
Vali e1 e2 e3 signal signal New Vali Gather
01/06/2015 University of Wisconsin–Madison 11
Vertex- centric API Easy to write graph programs Higher programmer productivity
01/06/2015 University of Wisconsin–Madison 12
01/06/2015 University of Wisconsin–Madison 13
\
A D B C 1 2 2 3 id val A ∞ B ∞ C ∞ D ∞ Vertex src dest val A B 1 A C 2 B D 2 C D 3 Edge id val A 0 B ∞ C ∞ D ∞ next id val B 1 C 2 message Iteration 1 id val A 0 B 1 C 2 D ∞ next id val D 3 D 5 message Iteration 2 id val A 0 B 1 C 2 D 3 next id val message Iteration 3
01/06/2015 University of Wisconsin–Madison 14
DECLARE @flag int; SET @flag = 1; SELECT vertex.id, 2147483647 AS val INTO next FROM vertex; CREATE TABLE message( id int, val int ); INSERT INTO message values(1,0); WHILE (@flag != 0) BEGIN SELECT message.id AS id, MIN(message.val) AS val INTO cur FROM message GROUP BY message.id; DROP TABLE message; SELECT cur.id AS id, cur.val AS val INTO update FROM cur, next WHERE cur.id = next.id AND cur.val < next.val; UPDATE next SET next.val = update.val FROM update, next WHERE next.id = update.id; SELECT edge.dest AS id, update.val + 1 AS val INTO message FROM update, edge WHERE edge.src = update.id; DROP TABLE cur; DROP TABLE update; SELECT @flag = COUNT(*) FROM message; END
VertexValType: INT MessageValType: INT InitiateVal : INT_MAX InitialMessage : (1, 0) CombineMessage: MIN(message) UpdateAndSend: update=cur.val<getVal() if (update) { setVal(cur.val) send(out, cur.val+1) } End: NO_MESSAGE
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
1 2 3 4 5 6 7 8 9 10 11
Initialize Initialize the message table Aggregate the messages Update and generate messages for the next iteration Stop when no new msgs.
01/06/2015 University of Wisconsin–Madison 15
DECLARE @flag int; SET @flag = 1; SELECT vertex.id, 2147483647 AS val INTO next FROM vertex; CREATE TABLE message( id int, val int ); INSERT INTO message values(1,0); WHILE (@flag != 0) BEGIN SELECT message.id AS id, MIN(message.val) AS val INTO cur FROM message GROUP BY message.id; DROP TABLE message; SELECT cur.id AS id, cur.val AS val INTO update FROM cur, next WHERE cur.id = next.id AND cur.val < next.val; UPDATE next SET next.val = update.val FROM update, next WHERE next.id = update.id; SELECT edge.dest AS id, update.val + 1 AS val INTO message FROM update, edge WHERE edge.src = update.id; DROP TABLE cur; DROP TABLE update; SELECT @flag = COUNT(*) FROM message; END
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Initialize Initialize the message table Aggregate the messages Create an update table and
Update the next table Generate the message table for the next iteration Stop when there are no new messages
01/06/2015 University of Wisconsin–Madison 16
Join attributes control the direction Scalar computation (can be a UDF) Scalar computation (can be a UDF)
For single source shortest path
min identity sum Outgoing edges Aggregate function (can be a UDAF)
01/06/2015 University of Wisconsin–Madison 17
API SQL Queries
Results Query Translator Optimizer
Test Machine (single node)
Compare with
Dataset #nodes #edges size web-google (GO) 9K 5M 71MB com-Orkut (OR) 3M 117M 1.6GB Twitter-10 (TW) 41.6M 1.5B 24GB uk-2007-05 (UK) 100M 3.3B 56GB
Queries
31 62 718 88 85 601 DNF 12.4 67.5 DNF DNF 12.7 315 DNF DNF
100 200 300 400 500 600 700 800 GO (71MB) OR (1.6GB) TW (24GB) UK (56GB) Execution time in seconds Grail Giraph GraphLab (sync) GraphLab (async)
01/06/2015 University of Wisconsin–Madison 18 1827
Grail is slower than GraphLab for the smallest datasets, … but catches up as the dataset size grows, … and can handle the largest datasets, while the other systems fail
01/06/2015 University of Wisconsin–Madison 19
01/06/2015 University of Wisconsin–Madison 20
21
David DeWitt Jae Young Do Alan Halverson Ian Rae