Giraph: Production-grade graph processing infrastructure for trillion edge graphs
6/22/2014 GRADES Avery Ching
Giraph: Production-grade graph processing infrastructure for - - PowerPoint PPT Presentation
Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014 GRADES Avery Ching Motivation Apache Giraph Inspired by Googles Pregel but runs on Hadoop Think like a vertex Maximum value
6/22/2014 GRADES Avery Ching
Processor 1 Processor 2 Time 5 5 5 5 2 5 5 5 2 1 5 5 2 1
Send page rank value to neighbors for 30 iterations Calculate updated page rank value from neighbors
DoubleWritable, FloatWritable, DoubleWritable> { public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) { if (getSuperstep() >= 1) { double sum = 0; for (DoubleWritable message : messages) { sum += message.get(); } vertex.getValue().set(DoubleWritable((0.15d / getTotalNumVertices()) + 0.85d * sum); } if (getSuperstep() < 30) { sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / getNumOutEdges())); } else { voteToHalt();
} } }
Loading the graph
Input format
Split 0 Split 1 Split 2 Split 3 Master Load/ Send Graph Worker 0 Load/ Send Graph Worker 1
Storing the graph
Worker 0 Worker 1
Output format
Part 0 Part 1 Part 2 Part 3 Part 0 Part 1 Part 2 Part 3
Compute / Iterate
Compute/ Send Messages Compute/ Send Messages
In-memory graph
Part 0 Part 1 Part 2 Part 3 Master Worker 0 Worker 1
Send stats/iterate!
Master “computes”
Time Worker 0 Worker 1 Master
phase 1a phase 1a phase 1b phase 1b phase 2 phase 2 phase 3 phase 3
Frey and Dueck “Clustering by passing messages between data points” Science 2007 Organically discover exemplars based on similarity
Initialization Intermediate Convergence
Responsibility r(i,k)
Availability a(i,k)
Update exemplars
my exemplar?
Every vertex i with an edge to k maintains responsibility of k for i Sends responsibility to k in ResponsibilityMessage (senderid, responsibility(i,k))
C A D B
r(c,a) r(d,a) r(b,d) r(b,a)
Vertex sums positive messages Sends availability to i in AvailabilityMessage (senderid, availability(i,k))
C A D B
a(c,a) a(d,a) a(b,d) a(b,a)
Dampens availabilities and scans edges to find exemplar k Updates self-exemplar
C A D B
update update update update exemplar=a exemplar=d exemplar=a exemplar=a
calculate responsibility calculate availability update exemplars initial state halt if (exemplars agree they are exemplars && changed exemplars < ∆) then halt,
Twitter 255M MAU (https://about.twitter.com/company), 208 average followers (Beevolve 2012) → Estimated >53B edges Facebook 1.28B MAU (Q1/2014 report), 200+ average friends (2011 S1) → Estimated >256B edges
Graphs used in research publications Billions 1.75 3.5 5.25 7 Clueweb 09 Twitter dataset Friendster Yahoo! web Rough social network scale* Billions 75 150 225 300 Twitter Est* Facebook Est*
Application Graph Size CPU Time Speedup Elapsed Time Speedup Page rank
(single iteration)
400B+ edges
26x 120x
Friends of friends score
71B+ edges
12.5x 48x
Scalability of workers (200B edges)
Seconds 125 250 375 500 # of Workers 50 100 150 200 250 300 Giraph Ideal
Scalability of edges (50 workers)
Seconds 125 250 375 500 # of Edges 1E+09 7E+10 1E+11 2E+11 Giraph Ideal
Minutes per iteration 1 2 3 4 6/30/2013 6/2/2014
Improvements
Random partitioning == good balance BUT ignores entity affinity
1 2 3 4 5 6 7 8 9 10 11
Results from one service: Cache hit rate grew from 70% to 85%, bandwidth cut in 1/2
3 5 6 9 11 1 4 7 8 10
* Loosely based on Ugander and Backstrom. Balanced label propagation for partitioning massive graphs, WSDM '13
Explicit remapping Native remapping
Id Edges
San Jose (Chicago, 4) (New York, 6) Chicago (San Jose, 4) (New York, 3) New York (San Jose, 6) (Chicago, 3)
Original graph Partitioning Mapping
Id Alt Id
San Jose Chicago 1 New York 2
Id Edges
(1, 4) (2, 6) 1 (0, 4) (2, 3) 2 (0, 6) (1, 3)
Remapped graph Reverse partition mapping
Alt Id Id
San Jose 1 Chicago 2 New York
Compute - shortest paths from 0
Join Compute
Id Distance
1 4 2 6
Join
Id Distance
San Jose Chicago 4 New York 6
Final compute
Id Edges
San Jose (Chicago, 4) (New York, 6) Chicago (San Jose, 4) (New York, 3) New York (San Jose, 6) (Chicago, 3)
Original graph Partitioning Mapping
Id Group
San Jose Chicago 1 New York 2
Id Distance
San Jose Chicago 4 New York 6
Final compute
Id Group Edges
San Jose (Chicago, 4) (New York, 6) Chicago 1 (San Jose, 4) (New York, 3) New York 2 (San Jose, 6) (Chicago, 3)
Original graph with group information
Compute - shortest paths from “San Jose”
Id Edges
(1, 4) (2, 6) 1 (0, 4) (2, 3) 2 (0, 6) (1, 3)
Original graph Partitioning Mapping
Id Mach
1 1 2
Id Distance
1 4 2 6
Final compute
Top bits machine, Id Edges
0, 0 (Chicago, 4) (New York, 6) 1, 1 (San Jose, 4) (New York, 3) 0, 2 (San Jose, 6) (Chicago, 3)
Original graph with mapping embedded in Id
Compute - shortest paths from “San Jose”
Not all graphs can leverage this technique, Facebook can since ids are longs with unused bits.
Explicit Native Transparent Native Embedded Pros
compression
just additional input parameters
Cons
remapping
usage on input
more memory
type
345B edge page rank
Seconds per iteration 40 80 120 160 Random 47% Local 60% Local
Example: Mutual friends calculation between neighbors
200+ average friends (2011 S1) 8-byte ids (longs) = 394 TB / 100 GB machines 3,940 machines (not including the graph)
A B C D E
A:{D} D:{A,E} E:{D} B:{} C:{D} D:{C} A:{C} C:{A,E} E:{C}
D:{C}
Subsets of sources/destinations edges per superstep * Currently manual - future work automatic!
A
Sources: A (on), B (off) Destinations: A (on), B (off)
B B B A A A
Sources: A (on), B (off) Destinations: A (off), B (on)
B B B A A A
Sources: A (off), B (on) Destinations: A (on), B (off)
B B B A A A
Sources: A (off), B (on) Destinations: A (off), B (on)
B B B A A
Over 1.5 years in production Hundreds of production Giraph jobs processed a week
30+ applications in our internal application repository Sample production job - 700B+ edges Job times range from minutes to hours
Graft: The distributed Giraph debugger
2/12 - 0.1 6/14 - 1.1 5/13 - 1.0
Time Time
Snapshot automatically after a time period and restart at end of queue
Higher level primitives (i.e. HelP - Salihoglu)
Values (ANV)
(LUV)
One Other Vertex (UVUOV)
using a value from one
necessarily a neighbor)
(AGV)
Graph traversal language (i.e. Gremlin)
// calculate basic // collaborative filtering for // vertex 1 m = [:] g.v(1).out('likes').in('likes').o ut('likes').groupCount(m) m.sort{-it.value}
// eigenvector (eigenvector // centrality) of a graph m = [:]; c = 0; g.V.as('x').out.groupCount( m).loop('x'){c++ < 1000} m.sort{-it.value}
Implement lots of algorithms?
./run-page-rank
Investigate alternative computing models
Performance Applications
Kabiljo Sergey Edunov Pavan Athivarapu Avery Ching Sambavi Muthukrishnan