processing massive graphs
play

Processing Massive Graphs Amir H. Payberah - PowerPoint PPT Presentation

Processing Massive Graphs Amir H. Payberah amir.payberah@cs.ox.ac.uk University of Oxford Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 1 / 78 Whats the Problem? Amir H. Payberah (Oxford) Processing Massive Graphs


  1. Processing Massive Graphs Amir H. Payberah amir.payberah@cs.ox.ac.uk University of Oxford Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 1 / 78

  2. What’s the Problem? Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 2 / 78

  3. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 3 / 78

  4. Large Graph Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 4 / 78 ◮ A large graph either cannot fit into memory of single computer or

  5. Big Data Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 5 / 78

  6. Scale Up vs. Scale Out ◮ Scale up or scale vertically. ◮ Scale out or scale horizontally. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 6 / 78

  7. A Scale Out Example (1/3) ◮ Count the number of times each distinct word appears in the file ◮ If the file fits in memory: words(doc.txt) | sort | uniq -c Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 7 / 78

  8. A Scale Out Example (1/3) ◮ Count the number of times each distinct word appears in the file ◮ If the file fits in memory: words(doc.txt) | sort | uniq -c ◮ If not? Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 7 / 78

  9. A Scale Out Example (2/3) ◮ Parallelize the data and process. ◮ Data-Parallel processing. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 8 / 78

  10. A Scale Out Example (3/3) ◮ MapReduce Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 9 / 78

  11. Can we use platforms like MapReduce or Spark, which are based on data-parallel model, for large-scale graph proceeding? Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 10 / 78

  12. Large Graph Processing Challenges ◮ Difficult to extract parallelism based on partitioning of the data. ◮ Difficult to express parallelism based on partitioning of computation. ◮ No locality between computations and data access patterns. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 11 / 78

  13. Graph-Parallel Processing Graph-Parallel Processing ◮ Computation typically depends on the neighbors. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 12 / 78

  14. Graph-Parallel Processing ◮ Restricts the types of computation. ◮ New techniques to partition and distribute graphs. ◮ Exploit graph structure. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 13 / 78

  15. Data-Parallel vs. Graph-Parallel Computation Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 14 / 78

  16. Graph-Parallel Processing Models ◮ Vertex-centric processing model • Pregel, Giraph, GraphLab, PowerGraph, ... ◮ Edge-centric processing model • X-Stream, Chaos, ... Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 15 / 78

  17. Vertex-Centric Programming Model ◮ Vertex-centric Programming model • Write a vertex program • State stored in vertices. ◮ Vertex operations: • Gather updates from incoming edges • Scatter updates along outgoing edges Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 16 / 78

  18. A Vertex-Centric Program ◮ Iterates over vertices // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 17 / 78

  19. A Vertex-Centric Program ◮ Iterates over vertices // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 17 / 78

  20. Vertex-Centric Scatter-Gather (1/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 18 / 78

  21. Vertex-Centric Scatter-Gather (2/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 19 / 78

  22. Vertex-Centric Scatter-Gather (3/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 20 / 78

  23. Vertex-Centric Scatter-Gather (4/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 21 / 78

  24. Vertex-Centric Scatter-Gather (5/5) Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 22 / 78

  25. Vertex-Centric vs. Edge-Centric (1/2) Vertex-centric Edge-centric Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 23 / 78

  26. Vertex-Centric vs. Edge-Centric (2/2) Until convergence { // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) } Until convergence { // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 24 / 78

  27. Edge-Centric Scatter-Gather (1/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 25 / 78

  28. Edge-Centric Scatter-Gather (2/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 26 / 78

  29. Edge-Centric Scatter-Gather (3/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 27 / 78

  30. Edge-Centric Scatter-Gather (4/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 28 / 78

  31. Edge-Centric Scatter-Gather (5/5) Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) } Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 29 / 78

  32. Vertex-Centric Processing Platforms Pregel and GraphLab Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 30 / 78

  33. Pregel Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 31 / 78

  34. Pregel ◮ Large-scale graph-parallel processing platform developed at Google. ◮ Inspired by bulk synchronous parallel (BSP) model. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 32 / 78

  35. Programming Model ◮ Vertex-centric programming: Think as a vertex. ◮ Each vertex computes individually its value: in parallel ◮ Each vertex can see its local context and updates its value. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 33 / 78

  36. Execution Model (1/2) ◮ Applications run in sequence of iterations: supersteps ◮ A vertex in superstep S can: • reads messages sent to it in superstep S-1. • sends messages to other vertices: receiving at superstep S+1. • modifies its state. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 34 / 78

  37. Execution Model (2/2) ◮ Superstep 0: all vertices are in the active state. ◮ A vertex deactivates itself by voting to halt: no further work to do. ◮ A halted vertex can be active if it receives a message. ◮ The whole algorithm terminates when: • All vertices are simultaneously inactive. • There are no messages in transit. Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 35 / 78

  38. Example: Max Value (1/4) i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val) Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 36 / 78

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend