giraph production grade graph processing infrastructure
play

Giraph: Production-grade graph processing infrastructure for - PowerPoint PPT Presentation

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014 GRADES Avery Ching Motivation Apache Giraph Inspired by Googles Pregel but runs on Hadoop Think like a vertex Maximum value


  1. Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014 GRADES Avery Ching

  2. Motivation

  3. Apache Giraph • Inspired by Google’s Pregel but runs on Hadoop • “Think like a vertex” • Maximum value vertex example Processor 1 5 5 5 5 Processor 2 1 5 5 1 5 2 5 2 2 5 Time

  4. Giraph on Hadoop / Yarn Giraph MapReduce YARN Hadoop Hadoop Hadoop Hadoop 0.20.x 0.20.203 1.x 2.0.x

  5. Page rank in Giraph � � public class PageRankComputation extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) { if (getSuperstep() >= 1) { Calculate double sum = 0; updated for (DoubleWritable message : messages) { page rank sum += message.get(); value from } neighbors vertex.getValue().set(DoubleWritable((0.15d / getTotalNumVertices()) + 0.85d * sum); } if (getSuperstep() < 30) { sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / getNumOutEdges())); Send page rank } else { value to voteToHalt(); neighbors for } 30 iterations } }

  6. Apache Giraph data flow Loading the graph Compute / Iterate Storing the graph Input 
 In-memory Output 
 format graph format Worker 0 Worker 0 Worker 0 Part 0 Load/ Compute/ Send Split 0 Part 0 Part 0 Send Graph Part 1 Messages Master Master Split 1 Part 1 Part 1 Split 2 Part 2 Part 2 Worker 1 Worker 1 Worker 1 Part 2 Load/ Compute/ Send Split 3 Part 3 Send Part 3 Graph Part 3 Messages Send stats/iterate!

  7. Pipelined computation Master “computes” • Sets computation, in/out message, combiner for next super step • Can set/modify aggregator values Master Worker 0 phase 1a phase 1b phase 2 phase 3 Worker 1 phase 1b phase 2 phase 3 phase 1a Time

  8. Use case

  9. Affinity propagation Frey and Dueck “Clustering by passing messages between data points” Science 2007 Organically discover exemplars based on similarity Initialization Intermediate Convergence

  10. 3 stages Responsibility r(i,k) • How well suited is k to be an exemplar for i ? Availability a(i,k) • How appropriate for point i to choose point k as an exemplar given all of k ’s responsibilities? Update exemplars • Based on known responsibilities/availabilities, which vertex should be my exemplar? � * Dampen responsibility, availability

  11. Responsibility Every vertex i with an edge to k maintains responsibility of k for i Sends responsibility to k in ResponsibilityMessage (senderid, responsibility(i,k)) r(b,a) B r(b,d) A r(c,a) r(d,a) C D

  12. Availability Vertex sums positive messages Sends availability to i in AvailabilityMessage (senderid, availability(i,k)) a(b,a) B A a(c,a) a(d,a) a(b,d) C D

  13. Update exemplars Dampens availabilities and scans edges to find exemplar k Updates self-exemplar B update A update exemplar=a exemplar=d update C update D exemplar=a exemplar=a

  14. Master logic calculate calculate update initial halt responsibility availability exemplars state if (exemplars agree they are exemplars && changed exemplars < ∆ ) then halt, otherwise continue

  15. Performance & Scalability

  16. Example graph sizes Graphs used in research publications Rough social network scale* 7 300 5.25 225 Billions Billions 3.5 150 1.75 75 0 0 Clueweb 09 Twitter dataset Friendster Yahoo! web Twitter Est* Facebook Est* Twitter 255M MAU (https://about.twitter.com/company), 208 average followers (Beevolve 2012) → Estimated >53B edges Facebook 1.28B MAU (Q1/2014 report), 200+ average friends (2011 S1) → Estimated >256B edges

  17. Faster than Hive? Application Graph Size CPU Time Speedup Elapsed Time Speedup Page rank 
 400B+ edges 26x 120x (single iteration) Friends of friends score 
 71B+ edges 12.5x 48x

  18. Apache Giraph scalability Scalability of workers Scalability of edges (50 (200B edges) workers) 500 500 375 375 Seconds Seconds 250 250 125 125 0 0 50 100 150 200 250 300 1E+09 7E+10 1E+11 2E+11 # of Workers # of Edges Giraph Ideal Giraph Ideal

  19. Trillion social edges page rank 4 Improvements • GIRAPH-840 - Netty 4 upgrade Minutes per iteration 3 • G1 Collector / tuning 2 1 0 6/30/2013 6/2/2014

  20. Graph partitioning

  21. Why balanced partitioning Random partitioning == good balance BUT ignores entity affinity 0 1 6 3 10 4 5 7 8 9 2 11

  22. Balanced partitioning application Results from one service: Cache hit rate grew from 70% to 85% , bandwidth cut in 1/2 � � 0 3 6 9 1 4 7 10 2 5 8 11

  23. Balanced label propagation results * Loosely based on Ugander and Backstrom. Balanced label propagation for partitioning massive graphs, WSDM '13

  24. Leveraging partitioning Explicit remapping Native remapping • Transparent • Embedded

  25. Explicit remapping Original Compute graph output Compute - shortest paths from 0 Id Distance Id Edges (Chicago, 4) San 0 0 Jose (New York, 6) (San Jose, 4) Chicago 1 4 Remapped Final compute (New York, 3) graph output (San Jose, 6) New 2 6 York (Chicago, 3) Id Edges Id Distance Join Join (1, 4) Reverse partition 
 Partitioning San 0 0 Jose (2, 6) mapping Mapping (0, 4) Chicago 4 1 Id Alt Id Alt Id Id (2, 3) (0, 6) New San Jose 0 0 San Jose 2 6 York (1, 3) Chicago 1 1 Chicago New York 2 2 New York

  26. Native transparent remapping Original graph Compute - shortest paths from Id Edges Original graph with (Chicago, 4) San group information Jose (New York, 6) (San Jose, 4) Chicago Final compute Id Group Edges (New York, 3) output (San Jose, 6) (Chicago, 4) New “San Jose” San Jose 0 York (Chicago, 3) (New York, 6) Id Distance (San Jose, 4) San Partitioning Chicago 1 0 (New York, 3) Jose Mapping (San Jose, 6) New York 2 Chicago 4 (Chicago, 3) Id Group New San Jose 0 6 York Chicago 1 New York 2

  27. Native embedded remapping Original graph Compute - shortest paths from Id Edges Original graph with (1, 4) 0 mapping embedded in Id (2, 6) (0, 4) 1 Final compute Top bits machine, Id Edges (2, 3) output (0, 6) (Chicago, 4) 2 “San Jose” 0, 0 (1, 3) (New York, 6) Id Distance (San Jose, 4) Partitioning 1, 1 0 0 (New York, 3) Mapping (San Jose, 6) 0, 2 1 4 (Chicago, 3) Id Mach 0 0 2 6 Not all graphs can leverage this 1 1 technique, Facebook 2 0 can since ids are longs with unused bits.

  28. Remapping comparison Native Native Explicit Transparent Embedded • Can also add id • No application change, • Utilize unused bits compression just additional input parameters Pros � •Application aware of • Additional memory • Application changes Id remapping usage on input type •Workflow complexity • Group information uses Cons more memory •Pre and post joins overhead

  29. Partitioning experiments 345B edge page rank 160 Seconds per iteration 120 80 40 0 Random 47% Local 60% Local

  30. Message explosion

  31. Avoiding out-of-core Example: Mutual friends calculation between � � � C:{D} neighbors D:{C} E:{} 1. Send your friends a list of your friends A B 2. Intersect with your friend list A:{D} B:{} � D:{A,E} C:{D} C E 1.23B (as of 1/2014) E:{D} D:{C} 200+ average friends (2011 S1) 8-byte ids (longs) D A:{C} = 394 TB / 100 GB machines C:{A,E} E:{C} 3,940 machines (not including the graph)

  32. Superstep splitting Subsets of sources/destinations edges per superstep * Currently manual - future work automatic! Sources: A (on), B (off) Sources: A (on), B (off) Sources: A (off), B (on) Sources: A (off), B (on) Destinations: A (on), B (off) Destinations: A (off), B (on) Destinations: A (on), B (off) Destinations: A (off), B (on) B B B B A B A B A B A B B A B A B A B A A A A A

  33. Giraph in production Over 1.5 years in production Hundreds of production Giraph jobs processed a week • Lots of untracked experiments 30+ applications in our internal application repository Sample production job - 700B+ edges Job times range from minutes to hours

  34. GiraphicJam demo

  35. Giraph related projects Graft: The distributed Giraph debugger

  36. Giraph roadmap 2/12 - 0.1 5/13 - 1.0 6/14 - 1.1

  37. The future

  38. Scheduling jobs Snapshot automatically after a time period and restart at end of queue Time Time

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend