pregel
play

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, - PowerPoint PPT Presentation

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al. Google, Inc. 2010 ACM SIGMOD Conference Presented By: Ezequiel Aguilar Gonzalez Computer Science and Engineering The University of Texas at


  1. Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al. Google, Inc. 2010 ACM SIGMOD Conference Presented By: Ezequiel Aguilar Gonzalez Computer Science and Engineering The University of Texas at Arlington 2018

  2. Graph Processing • Nodes represent entities (people, businesses, accounts…) • Properties are pertinent information that relate to nodes • Edges interconnect nodes to nodes or nodes to properties and they represent the relationship between the two Ezequiel Aguilar Gonzalez

  3. Big Graphs Google: > 1 trillion Facebook: > 800 indexed pages million active users Web Social Network Graph 31 billion RDF 100M Ratings, 31 billion RDF De Bruijn: triples in 2011 480K Users, triples in 2011 4 k nodes 17K Movies (k = 20, … , 40) Information Network Biological Network Graphs in Machine Learning [1] Arijit Khan. Systems Group. ETH Zurich Ezequiel Aguilar Gonzalez

  4. Big Graphs 100M(10 8 ) Social Scale 100B (10 11 ) Web Scale 1T (10 12 ) Brain Scale, 100T (10 14 ) Internet US Knowledge Road Graph Human Connectome, Web graph BTC The Human Connectome Project, NIH (Google) Semantic Web 4 [2] Y. Wu. Washington State University Ezequiel Aguilar Gonzalez

  5. Graph Computing • Diffusion: propagate information from a vertex to neighbors Ezequiel Aguilar Gonzalez

  6. Graph Computing • Diffusion: propagate information from a vertex to neighbors • Fusion: aggregate information from neighbors to a set of entities Ezequiel Aguilar Gonzalez

  7. The Problem • Graph computations involve local data, and the connectivity between vertices is sparse. The data may not all fit into one node. Large Graph Data Graph Algorithms Web Page Rank Transportation Routes Shortest Path Citation Relationships Connected Components Social Networks Clustering Techniques http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  8. The Problem • Many problems can be modeled by graphs and solved with appropriate graph algorithms Large Graph Data Graph Algorithms Web Page Rank Transportation Routes Shortest Path Citation Relationships Connected Components Social Networks Clustering Techniques http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  9. The Problem • Many problems can be modeled by graphs and solved with appropriate graph algorithms • Efficient processing of large graphs is challenging Very little work per vertex • Changing degree of parallelism • Poor locality of memory access • Running over many machines makes the problem • worse http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  10. The Options • Infrastructure for graph processing – expensive to design • Single computer and library approach – not scalable • Use existing shared memory parallel graph algorithm – no fault-tolerance Ezequiel Aguilar Gonzalez

  11. The Options – MapReduce for Graph Analytics • MapReduce does not directly support iterative algorithms • Invariant graph-topology-data re-loaded and re-processed at each iteration à wasting I/O, network bandwidth, and CPU Each Page Rank Iteration: Input: ( id 1 , [PR t (1), out 11 , out 12 , … ]), ( id 2 , [PR t (2), out 21 , out 22 , … ]), … Output: ( id 1 , [PR t+1 (1), out 11 , out 12 , … ]), ( id 2 , [PR t+1 (2), out 21 , out 22 , … ]), … • Materializations of intermediate results at every MapReduce iteration harm performance • Extra MapReduce job on each iteration for detecting if a fixpoint has been reached Ezequiel Aguilar Gonzalez

  12. Pregel • Developed at Google • Provides scalability • Fault-tolerance • Flexibility to express arbitrary algorithms • The high level organization of Pregel program is inspired by Valiant’s Bulk Synchronous Parallel Model (‘90). Ezequiel Aguilar Gonzalez

  13. Bulk Synchronous Parallelism (BSP) Processors • P3 P1 P2 P4 P5 Have local memory • Can perform some computation • P1 P3 Processors can communicate • pairwise Communication can overlap with • another node’s computation P2 P4 P5 Barrier Synchronization [3] Gupta, Amarnath. UC San Diego Ezequiel Aguilar Gonzalez

  14. Vertex-Oriented • Based on BSP model • Provides directed graph to Pregel • Runs your computation at each vertex (processor) • Repeats until every computation at each vertex votes to halt • Pregel returns directed graph as a result Ezequiel Aguilar Gonzalez

  15. Pregel Organized via C++ API • Supersteps S • Application code subclasses Vertex, writes a Compute method • Can get/set Vertex value • Can get/set outgoing edges values • Can send/receive messages • Reads messages sent to V in superstep S-1. Sends messages to other vertices that will be received at superstep S+1; modifies state of V and its outgoing edges Ezequiel Aguilar Gonzalez

  16. C++ API • Message passing • No guaranteed message delivery order • Messages delivered exactly once • Can send a message to any node • If destination doesn’t exist, user’s function is called Ezequiel Aguilar Gonzalez

  17. Question 01 What is superstep in the Pregel graph processing model? In the single source shortest path problem what computation is involved in a superstep ? Ezequiel Aguilar Gonzalez

  18. Pregel Supersteps Input • Series of iterations Computation • Each vertex V invokes a functional in Communication parallel Superstep Synchronization • Can read messages sent in previous superstep (S-1) • Can send messages, to be read at the next superstep (S+1) • Can modify state of outgoing edges Output PREGEL Computation Model Ezequiel Aguilar Gonzalez

  19. Pregel Supersteps Superstep 1 Get required data Compute – Yes or No? Exchange messages – Yes or No? Synchronize Superstep 2 Get required data Compute – Yes or No? Exchange messages – Yes or No? Synchronize … Ezequiel Aguilar Gonzalez

  20. SSSP – Parallel BFS in Pregel 1 ¥ ¥ Active vertex 10 Inactive vertex 9 2 3 4 6 0 5 7 ¥ ¥ 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  21. SSSP – Parallel BFS in Pregel ¥ 1 ¥ ¥ Active vertex 10 ¥ ¥ ¥ 10 Inactive vertex ¥ 9 2 3 4 6 0 ¥ 5 ¥ 7 5 ¥ ¥ ¥ 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  22. SSSP – Parallel BFS in Pregel 1 ¥ 10 Active vertex 10 Inactive vertex 9 2 3 4 6 0 5 7 ¥ 5 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  23. SSSP – Parallel BFS in Pregel 11 1 ¥ 10 Active vertex 14 8 10 Inactive vertex 9 2 3 4 6 0 5 12 7 ¥ 5 7 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  24. SSSP – Parallel BFS in Pregel 1 8 11 Active vertex 10 Inactive vertex 9 2 3 4 6 0 5 7 5 7 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  25. SSSP – Parallel BFS in Pregel 9 1 8 11 Active vertex 10 13 Inactive vertex 14 9 2 3 4 6 0 5 7 15 5 7 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  26. SSSP – Parallel BFS in Pregel 1 8 9 Active vertex 10 Inactive vertex 9 2 3 4 6 0 5 7 5 7 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  27. SSSP – Parallel BFS in Pregel 1 8 9 Active vertex 10 Inactive vertex 9 2 3 4 6 0 5 7 13 5 7 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  28. SSSP – Parallel BFS in Pregel 1 8 9 Active vertex 10 Inactive vertex 9 2 3 4 6 0 5 7 5 7 2 http://ranger.uta.edu/~sjiang/CSE6350-spring-18/index.htm Ezequiel Aguilar Gonzalez

  29. Question 02 What does synchronicity in the Pregel’s execution refer to? What benefits can it bring? Ezequiel Aguilar Gonzalez

  30. Pregel Synchornization • Barrier synchronization • All messages are exchanged reliably • All peers wait for others to enter the barrier • After coming out of the barrier, each peer can work independently on messages sent to it from other peers, and the cycle repeats • Disadvantage: Fast processors can be delayed by slow ones • Benefits : • Ensures that Pregel programs are inherently Output free of deadlocks and data races Ezequiel Aguilar Gonzalez

  31. Question 03 How is a Pregel program terminated (completing its execution)? Ezequiel Aguilar Gonzalez

  32. Pregel Program Termination • Algorithm termination is based on every Votes to Halt vertex voting to halt • In superstep 0 , every vertex is in the active state Active Inactive • A vertex deactivates itself by voting to halt • It can be reactivated by receiving an Message Received (external) message Ezequiel Aguilar Gonzalez

  33. Question 04 Use Figure 2 to illustrate a Pregel program’s execution Ezequiel Aguilar Gonzalez

  34. Finding the largest value in a graph 3 6 2 1 Superstep 0 Ezequiel Aguilar Gonzalez

  35. Finding the largest value in a graph Superstep 0 3 6 2 1 Superstep 1 6 6 2 6 Ezequiel Aguilar Gonzalez

  36. Finding the largest value in a graph Superstep 0 3 6 2 1 Superstep 1 6 6 2 6 6 6 6 6 Superstep 2 Ezequiel Aguilar Gonzalez

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend