The case against specialized graph engines Jing Fan, - - PowerPoint PPT Presentation

the case against specialized graph engines
SMART_READER_LITE
LIVE PREVIEW

The case against specialized graph engines Jing Fan, - - PowerPoint PPT Presentation

The case against specialized graph engines Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel University of Wisconsin Madison 01/06/2015 University of WisconsinMadison 1 Motivation Graph


slide-1
SLIDE 1

The ¡case ¡against ¡specialized ¡ graph ¡engines ¡

Jing Fan, Adalbert Gerald Soosai Raj, and Jignesh M. Patel University of Wisconsin – Madison

01/06/2015 University of Wisconsin–Madison 1

slide-2
SLIDE 2

Motivation ¡

  • Graph analytics is now common
  • Response = new specialized graph engines

01/06/2015 University of Wisconsin–Madison 2

The house always wins

slide-3
SLIDE 3

Motivation ¡

  • Graph analytics is now common
  • Response = new specialized graph engines

01/06/2015 University of Wisconsin–Madison 3

Stanford GPS

  • thers,

82% graphs, 18%

slide-4
SLIDE 4

Motivation ¡

  • Graph analytics is now common
  • Response = new specialized graph engines

01/06/2015 University of Wisconsin–Madison 4

Question: Is graph processing that different from other types of data processing?

Stanford GPS

Our Answer: No. Can be subsumed by “traditional” relational processing

slide-5
SLIDE 5

What ¡is ¡appealing ¡about ¡these ¡new ¡engines? ¡

01/06/2015 University of Wisconsin–Madison 5

Vertex- centric API Easy to write graph programs Higher programmer productivity

slide-6
SLIDE 6

Graph ¡API: ¡Giraph ¡

01/06/2015 University of Wisconsin–Madison 6

Vi

Vali

Vertex Centric:

do { foreach vertex in the graph { receive_messages(); mutate_vertex_value(); if (send_to_neighbors()) { send_messages_to_neighbors(); } } } until (has_converged() || reached_limit()) e1 e2 e3 e4 e5 New Vali

slide-7
SLIDE 7

Example: ¡Shortest ¡path ¡

01/06/2015 University of Wisconsin–Madison 7

VA VB

VC

VD

∞ \

Input Graph

A D B C 1 2 2 3

Iteration 1

Computation & Communication Pattern

slide-8
SLIDE 8

Example: ¡Shortest ¡path ¡

01/06/2015 University of Wisconsin–Madison 8

VA VB

1

VC

2

VD

∞ \

Input Graph

A D B C 1 2 2 3

Iteration 2

( D , 3 )

Computation & Communication Pattern

VD

∞ 3

slide-9
SLIDE 9

Example: ¡Shortest ¡path ¡

01/06/2015 University of Wisconsin–Madison 9

VA VB

1

VC

2

VD

3 \

Input Graph

A D B C 1 2 2 3

Iteration 3

Computation & Communication Pattern

slide-10
SLIDE 10

GraphLab ¡

  • 1. Gather values (from neighbors)
  • 2. Apply updates to local state
  • 3. Scatter signals to your neighbors

01/06/2015 University of Wisconsin–Madison 10

Vi

Vali e1 e2 e3 signal signal New Vali Gather

slide-11
SLIDE 11

What ¡is ¡appealing ¡about ¡these ¡new ¡engines? ¡

01/06/2015 University of Wisconsin–Madison 11

Vertex- centric API Easy to write graph programs Higher programmer productivity

slide-12
SLIDE 12

But ¡… ¡ ¡

  • Can we build a similar vertex-centric simple API?
  • … and then map it to SQL, with good performance

Advantages:

  • Already have SQL in the enterprise stack
  • Huge advantage to “one size fits many”
  • O(N2) headaches when maintaining N specialized systems
  • Economies of scale

01/06/2015 University of Wisconsin–Madison 12

The GRAIL API

slide-13
SLIDE 13

¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡Example: ¡Shortest ¡path ¡

01/06/2015 University of Wisconsin–Madison 13

\

Input Graph

A D B C 1 2 2 3 id val A ∞ B ∞ C ∞ D ∞ Vertex src dest val A B 1 A C 2 B D 2 C D 3 Edge id val A 0 B ∞ C ∞ D ∞ next id val B 1 C 2 message Iteration 1 id val A 0 B 1 C 2 D ∞ next id val D 3 D 5 message Iteration 2 id val A 0 B 1 C 2 D 3 next id val message Iteration 3

slide-14
SLIDE 14

01/06/2015 University of Wisconsin–Madison 14

DECLARE @flag int; SET @flag = 1; SELECT vertex.id, 2147483647 AS val INTO next FROM vertex; CREATE TABLE message( id int, val int ); INSERT INTO message values(1,0); WHILE (@flag != 0) BEGIN SELECT message.id AS id, MIN(message.val) AS val INTO cur FROM message GROUP BY message.id; DROP TABLE message; SELECT cur.id AS id, cur.val AS val INTO update FROM cur, next WHERE cur.id = next.id AND cur.val < next.val; UPDATE next SET next.val = update.val FROM update, next WHERE next.id = update.id; SELECT edge.dest AS id, update.val + 1 AS val INTO message FROM update, edge WHERE edge.src = update.id; DROP TABLE cur; DROP TABLE update; SELECT @flag = COUNT(*) FROM message; END

VertexValType: INT MessageValType: INT InitiateVal : INT_MAX InitialMessage : (1, 0) CombineMessage: MIN(message) UpdateAndSend: update=cur.val<getVal() if (update) { setVal(cur.val) send(out, cur.val+1) } End: NO_MESSAGE

T-SQL Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

1 2 3 4 5 6 7 8 9 10 11

The Grail API

Initialize Initialize the message table Aggregate the messages Update and generate messages for the next iteration Stop when no new msgs.

slide-15
SLIDE 15

01/06/2015 University of Wisconsin–Madison 15

DECLARE @flag int; SET @flag = 1; SELECT vertex.id, 2147483647 AS val INTO next FROM vertex; CREATE TABLE message( id int, val int ); INSERT INTO message values(1,0); WHILE (@flag != 0) BEGIN SELECT message.id AS id, MIN(message.val) AS val INTO cur FROM message GROUP BY message.id; DROP TABLE message; SELECT cur.id AS id, cur.val AS val INTO update FROM cur, next WHERE cur.id = next.id AND cur.val < next.val; UPDATE next SET next.val = update.val FROM update, next WHERE next.id = update.id; SELECT edge.dest AS id, update.val + 1 AS val INTO message FROM update, edge WHERE edge.src = update.id; DROP TABLE cur; DROP TABLE update; SELECT @flag = COUNT(*) FROM message; END

T-SQL Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

Initialize Initialize the message table Aggregate the messages Create an update table and

  • nly consider updated vertices

Update the next table Generate the message table for the next iteration Stop when there are no new messages

slide-16
SLIDE 16

01/06/2015 University of Wisconsin–Madison 16

Join attributes control the direction Scalar computation (can be a UDF) Scalar computation (can be a UDF)

For single source shortest path

min identity sum Outgoing edges Aggregate function (can be a UDAF)

slide-17
SLIDE 17

Grail: ¡Implementation ¡and ¡Evaluation ¡

01/06/2015 University of Wisconsin–Madison 17

Grail

API SQL Queries

SQL Server

Results Query Translator Optimizer

Test Machine (single node)

  • Dual 1.8GHz Xeon E2450L
  • 96GB of main memory

Compare with

  • Giraph (v.1.1.0)
  • GraphLab (v 2.2): sync and async

Dataset #nodes #edges size web-google (GO) 9K 5M 71MB com-Orkut (OR) 3M 117M 1.6GB Twitter-10 (TW) 41.6M 1.5B 24GB uk-2007-05 (UK) 100M 3.3B 56GB

Queries

  • Single source shortest-path
  • Page Rank
  • Weakly connect components
slide-18
SLIDE 18

Results: ¡Single ¡Source ¡Shortest ¡Path ¡

31 62 718 88 85 601 DNF 12.4 67.5 DNF DNF 12.7 315 DNF DNF

100 200 300 400 500 600 700 800 GO (71MB) OR (1.6GB) TW (24GB) UK (56GB) Execution time in seconds Grail Giraph GraphLab (sync) GraphLab (async)

01/06/2015 University of Wisconsin–Madison 18 1827

Grail is slower than GraphLab for the smallest datasets, … but catches up as the dataset size grows, … and can handle the largest datasets, while the other systems fail

slide-19
SLIDE 19

Summary: ¡Graph ¡Analytics ¡on ¡RDBMS ¡

Simple API (Grail) addresses the programmer productivity issue Produces far more robust and deployable solutions than specialized graph engines Interesting physical schema design and optimization issues

01/06/2015 University of Wisconsin–Madison 19

slide-20
SLIDE 20

Why ¡does ¡Mike ¡not ¡have ¡GraphDB ¡Inc.? ¡

01/06/2015 University of Wisconsin–Madison 20

The ¡general ¡case ¡against ¡GraphDB ¡Inc. ¡

slide-21
SLIDE 21

Thanks!

21

David DeWitt Jae Young Do Alan Halverson Ian Rae