Processing Massive Graphs Amir H. Payberah - - PowerPoint PPT Presentation

processing massive graphs
SMART_READER_LITE
LIVE PREVIEW

Processing Massive Graphs Amir H. Payberah - - PowerPoint PPT Presentation

Processing Massive Graphs Amir H. Payberah amir.payberah@cs.ox.ac.uk University of Oxford Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 1 / 78 Whats the Problem? Amir H. Payberah (Oxford) Processing Massive Graphs


slide-1
SLIDE 1

Processing Massive Graphs

Amir H. Payberah

amir.payberah@cs.ox.ac.uk

University of Oxford

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 1 / 78

slide-2
SLIDE 2

What’s the Problem?

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 2 / 78

slide-3
SLIDE 3

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 3 / 78

slide-4
SLIDE 4

Large Graph

◮ A large graph either cannot fit into memory of single computer or

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 4 / 78

slide-5
SLIDE 5

Big Data

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 5 / 78

slide-6
SLIDE 6

Scale Up vs. Scale Out

◮ Scale up or scale vertically. ◮ Scale out or scale horizontally.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 6 / 78

slide-7
SLIDE 7

A Scale Out Example (1/3)

◮ Count the number of times each distinct word appears in the file ◮ If the file fits in memory: words(doc.txt) | sort | uniq -c

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 7 / 78

slide-8
SLIDE 8

A Scale Out Example (1/3)

◮ Count the number of times each distinct word appears in the file ◮ If the file fits in memory: words(doc.txt) | sort | uniq -c ◮ If not?

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 7 / 78

slide-9
SLIDE 9

A Scale Out Example (2/3)

◮ Parallelize the data and process. ◮ Data-Parallel processing.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 8 / 78

slide-10
SLIDE 10

A Scale Out Example (3/3)

◮ MapReduce

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 9 / 78

slide-11
SLIDE 11

Can we use platforms like MapReduce or Spark, which are based on data-parallel model, for large-scale graph proceeding?

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 10 / 78

slide-12
SLIDE 12

Large Graph Processing Challenges

◮ Difficult to extract parallelism based on partitioning of the data. ◮ Difficult to express parallelism based on partitioning of computation. ◮ No locality between computations and data access patterns.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 11 / 78

slide-13
SLIDE 13

Graph-Parallel Processing

Graph-Parallel Processing ◮ Computation typically depends on the neighbors.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 12 / 78

slide-14
SLIDE 14

Graph-Parallel Processing

◮ Restricts the types of computation. ◮ New techniques to partition and distribute graphs. ◮ Exploit graph structure.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 13 / 78

slide-15
SLIDE 15

Data-Parallel vs. Graph-Parallel Computation

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 14 / 78

slide-16
SLIDE 16

Graph-Parallel Processing Models

◮ Vertex-centric processing model

  • Pregel, Giraph, GraphLab, PowerGraph, ...

◮ Edge-centric processing model

  • X-Stream, Chaos, ...

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 15 / 78

slide-17
SLIDE 17

Vertex-Centric Programming Model

◮ Vertex-centric Programming model

  • Write a vertex program
  • State stored in vertices.

◮ Vertex operations:

  • Gather updates from incoming edges
  • Scatter updates along outgoing edges

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 16 / 78

slide-18
SLIDE 18

A Vertex-Centric Program

◮ Iterates over vertices // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 17 / 78

slide-19
SLIDE 19

A Vertex-Centric Program

◮ Iterates over vertices // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 17 / 78

slide-20
SLIDE 20

Vertex-Centric Scatter-Gather (1/5)

Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 18 / 78

slide-21
SLIDE 21

Vertex-Centric Scatter-Gather (2/5)

Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 19 / 78

slide-22
SLIDE 22

Vertex-Centric Scatter-Gather (3/5)

Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 20 / 78

slide-23
SLIDE 23

Vertex-Centric Scatter-Gather (4/5)

Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 21 / 78

slide-24
SLIDE 24

Vertex-Centric Scatter-Gather (5/5)

Until convergence { // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 22 / 78

slide-25
SLIDE 25

Vertex-Centric vs. Edge-Centric (1/2)

Vertex-centric Edge-centric

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 23 / 78

slide-26
SLIDE 26

Vertex-Centric vs. Edge-Centric (2/2)

Until convergence { // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) } Until convergence { // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 24 / 78

slide-27
SLIDE 27

Edge-Centric Scatter-Gather (1/5)

Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 25 / 78

slide-28
SLIDE 28

Edge-Centric Scatter-Gather (2/5)

Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 26 / 78

slide-29
SLIDE 29

Edge-Centric Scatter-Gather (3/5)

Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 27 / 78

slide-30
SLIDE 30

Edge-Centric Scatter-Gather (4/5)

Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 28 / 78

slide-31
SLIDE 31

Edge-Centric Scatter-Gather (5/5)

Until convergence { // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 29 / 78

slide-32
SLIDE 32

Vertex-Centric Processing Platforms

Pregel and GraphLab

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 30 / 78

slide-33
SLIDE 33

Pregel

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 31 / 78

slide-34
SLIDE 34

Pregel

◮ Large-scale graph-parallel processing platform developed at Google. ◮ Inspired by bulk synchronous parallel (BSP) model.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 32 / 78

slide-35
SLIDE 35

Programming Model

◮ Vertex-centric programming: Think as a vertex. ◮ Each vertex computes individually its value: in parallel ◮ Each vertex can see its local context and updates its value.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 33 / 78

slide-36
SLIDE 36

Execution Model (1/2)

◮ Applications run in sequence of iterations: supersteps ◮ A vertex in superstep S can:

  • reads messages sent to it in superstep S-1.
  • sends messages to other vertices: receiving at superstep S+1.
  • modifies its state.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 34 / 78

slide-37
SLIDE 37

Execution Model (2/2)

◮ Superstep 0: all vertices are in the active state. ◮ A vertex deactivates itself by voting to halt: no further work to do. ◮ A halted vertex can be active if it receives a message. ◮ The whole algorithm terminates when:

  • All vertices are simultaneously inactive.
  • There are no messages in transit.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 35 / 78

slide-38
SLIDE 38

Example: Max Value (1/4)

i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 36 / 78

slide-39
SLIDE 39

Example: Max Value (2/4)

i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 37 / 78

slide-40
SLIDE 40

Example: Max Value (3/4)

i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 38 / 78

slide-41
SLIDE 41

Example: Max Value (4/4)

i_val := val for each message m if m > val then val := m if i_val == val then vote_to_halt else for each neighbor v send_message(v, val)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 39 / 78

slide-42
SLIDE 42

Example: PageRank

◮ Update ranks in parallel. ◮ Iterate until convergence.

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 40 / 78

slide-43
SLIDE 43

Example: PageRank

Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 41 / 78

slide-44
SLIDE 44

Graph Partitioning

◮ Vertices are assigned to partitions based on their vertex-ID. ◮ E.g., hash(ID)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 42 / 78

slide-45
SLIDE 45

Pregel Limitations

◮ Inefficient if different regions of the graph converge at different

speed.

◮ Can suffer if one task is more expensive than the others. ◮ Runtime of each phase is determined by the slowest machine.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 43 / 78

slide-46
SLIDE 46

GraphLab

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 44 / 78

slide-47
SLIDE 47

GraphLab

◮ GraphLab allows asynchronous iterative computation. ◮ Vertex scope of vertex v: the data stored in v, in all adjacent vertices

and edges.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 45 / 78

slide-48
SLIDE 48

Programming Model

◮ Vertex-centric programming ◮ A vertex can read and modify any of the data in its scope.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 46 / 78

slide-49
SLIDE 49

Example: PageRank (Pregel)

Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 47 / 78

slide-50
SLIDE 50

Example: PageRank (GraphLab)

GraphLab_PageRank(i) // compute sum over neighbors total = 0 foreach(j in in_neighbors(i)): total = total + R[j] * wji // update the PageRank R[i] = 0.15 + total // trigger neighbors to run again foreach(j in out_neighbors(i)): signal vertex-program on j

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 48 / 78

slide-51
SLIDE 51

Consistency (1/4)

◮ Overlapped scopes: race-condition in simultaneous execution of two

update functions.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 49 / 78

slide-52
SLIDE 52

Consistency (1/4)

◮ Overlapped scopes: race-condition in simultaneous execution of two

update functions.

◮ Full consistency: during the execution f(v), no other function reads

  • r modifies data within the v scope.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 49 / 78

slide-53
SLIDE 53

Consistency (2/4)

◮ Edge consistency: during the execution f(v), no other function

reads or modifies any of the data on v or any of the edges adja- cent to v.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 50 / 78

slide-54
SLIDE 54

Consistency (3/4)

◮ Vertex consistency: during the execution f(v), no other function

will be applied to v.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 51 / 78

slide-55
SLIDE 55

Consistency (4/4)

Consistency vs. Parallelism

[Low, Y., GraphLab: A Distributed Abstraction for Large Scale Machine Learning (Doctoral dissertation, University of California), 2013.] Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 52 / 78

slide-56
SLIDE 56

Graph Partitioning

◮ Convert the input graph to a meta-graph. ◮ Meta-graph is very small. ◮ A fast balanced partition.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 53 / 78

slide-57
SLIDE 57

PowerGraph (GraphLab2)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 54 / 78

slide-58
SLIDE 58

PowerGraph

◮ Factorizes the update function into the Gather, Apply and Scatter

phases.

◮ Vertex-cut partitioning.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 55 / 78

slide-59
SLIDE 59

Programming Model

◮ Gather-Apply-Scatter (GAS) ◮ Gather: accumulate information about neighborhood through a gen-

eralized sum.

◮ Apply: apply the accumulated value to center vertex. ◮ Scatter: update adjacent edges and vertices.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 56 / 78

slide-60
SLIDE 60

Execution Model (1/2)

◮ Initially all vertices are active. ◮ It executes the vertex-program on the active vertices until none re-

main.

  • Once a vertex-program completes the scatter phase it becomes

inactive until it is reactivated.

  • Vertices can activate themselves and neighboring vertices.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 57 / 78

slide-61
SLIDE 61

Execution Model (1/2)

◮ Initially all vertices are active. ◮ It executes the vertex-program on the active vertices until none re-

main.

  • Once a vertex-program completes the scatter phase it becomes

inactive until it is reactivated.

  • Vertices can activate themselves and neighboring vertices.

◮ PowerGraph can execute both synchronously and asynchronously.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 57 / 78

slide-62
SLIDE 62

Execution Model (2/2)

◮ Synchronous scheduling like Pregel.

  • Executing the gather, apply, and scatter in order.
  • Changes made to the vertex/edge data are committed at the end of

each step.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 58 / 78

slide-63
SLIDE 63

Execution Model (2/2)

◮ Synchronous scheduling like Pregel.

  • Executing the gather, apply, and scatter in order.
  • Changes made to the vertex/edge data are committed at the end of

each step.

◮ Asynchronous scheduling like GraphLab.

  • Changes made to the vertex/edge data during the apply and scatter

functions are immediately committed to the graph.

  • Visible to subsequent computation on neighboring vertices.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 58 / 78

slide-64
SLIDE 64

Example: PageRank (Pregel)

Pregel_PageRank(i, messages): // receive all the messages total = 0 foreach(msg in messages): total = total + msg // update the rank of this vertex R[i] = 0.15 + total // send new messages to neighbors foreach(j in out_neighbors[i]): sendmsg(R[i] * wij) to vertex j

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 59 / 78

slide-65
SLIDE 65

Example: PageRank (GraphLab)

GraphLab_PageRank(i) // compute sum over neighbors total = 0 foreach(j in in_neighbors(i)): total = total + R[j] * wji // update the PageRank R[i] = 0.15 + total // trigger neighbors to run again foreach(j in out_neighbors(i)): signal vertex-program on j

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 60 / 78

slide-66
SLIDE 66

Example: PageRank (PowerGraph)

PowerGraph_PageRank(i): Gather(j -> i): return wji * R[j] sum(a, b): return a + b // total: Gather and sum Apply(i, total): R[i] = 0.15 + total Scatter(i -> j): if R[i] changed then activate(j)

R[i] = 0.15 +

  • j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 61 / 78

slide-67
SLIDE 67

Graph Partitioning (1/3)

◮ Natural graphs: skewed Power-Law degree distribution. ◮ Edge-cut algorithms perform poorly on Power-Law Graphs.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 62 / 78

slide-68
SLIDE 68

Graph Partitioning (2/3)

Vertex-Cut partitioning

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 63 / 78

slide-69
SLIDE 69

Graph Partitioning (3/3)

◮ Random vertex-cuts

  • Randomly assign edges to machines.

◮ Greedy vertex-cuts

  • Put each incoming edge such that the number of replicated vertices

become minimum.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 64 / 78

slide-70
SLIDE 70

Edge-Centric Processing Platforms

X-Stream

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 65 / 78

slide-71
SLIDE 71

X-Stream

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 66 / 78

slide-72
SLIDE 72

Could we compute Big Graphs on a single machine?

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 67 / 78

slide-73
SLIDE 73

Challenges

◮ Disk-based processing

  • Problem: graph traversal = random access
  • Random access is inefficient for storage

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 68 / 78

slide-74
SLIDE 74

Challenges

◮ Disk-based processing

  • Problem: graph traversal = random access
  • Random access is inefficient for storage

Eiko Y., and Roy A., “Scale-up Graph Processing: A Storage-centric View”, 2013.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 68 / 78

slide-75
SLIDE 75

Vertex-Centric vs. Edge-Centric (Reminder)

Vertex-centric Edge-centric

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 69 / 78

slide-76
SLIDE 76

Vertex-Centric vs. Edge-Centric Tradeoff

◮ Vertex-centric scatter-gather: EdgeData RandomAccessBandwidth ◮ Edge-centric scatter-gather: Scatters×EdgeData SequentialAccessBandwidth ◮ Sequential Access Bandwidth ≫ Random Access Bandwidth. ◮ Few scatter gather iterations for real world graphs.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 70 / 78

slide-77
SLIDE 77

◮ Problem: still have random access to vertex set.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 71 / 78

slide-78
SLIDE 78

◮ Problem: still have random access to vertex set.

Solution Partition the graph into streaming partitions.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 71 / 78

slide-79
SLIDE 79

Streaming Partitions (1/2)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 72 / 78

slide-80
SLIDE 80

Streaming Partitions (2/2)

Random access for free.

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 73 / 78

slide-81
SLIDE 81

Vertex-Centric vs. Edge-Centric Scatter-Gather

Until convergence { // the scatter phase for all vertices v for all outgoing edges from v: update = f(v.value) // the gather phase for all vertices v for all incoming edges to v: v.value = g(v.value, update) } Until convergence { // the scatter phase for all edges e u = new update u.dst = e.dst u.value = f(e.src.value) // the gather phase for all edges e e.dst.value = g(e.dst.value, u.value) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 74 / 78

slide-82
SLIDE 82

Streaming Partition Scatter-Gather

// the scatter phase for each streaming_partition p { load Vertices(p) for each unprocessed e in Edges(P) u = new update u.dst = e.dst u.value = f(e.src.value) add u to Update(partition(u.dst)) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 75 / 78

slide-83
SLIDE 83

Streaming Partition Scatter-Gather

// the scatter phase for each streaming_partition p { load Vertices(p) for each unprocessed e in Edges(P) u = new update u.dst = e.dst u.value = f(e.src.value) add u to Update(partition(u.dst)) } // the gather phase for each streaming-partition p { load Vertices(p) for each unprocessed u in Update(p) u.dst.value = g(u.dst.value, u.value) delete Update(p) }

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 75 / 78

slide-84
SLIDE 84

Summary

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 76 / 78

slide-85
SLIDE 85

Summary

◮ Data-parallel vs. Graph-parallel processing ◮ Graph-parallel: vertex-centric vs. edge-centric ◮ Vertex-centric: pregel and graphlab ◮ Edge-centric: x-stream

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 77 / 78

slide-86
SLIDE 86

Questions?

Acknowledgement Some slides were derived from the slides of Amitabha Roy (EPFL)

Amir H. Payberah (Oxford) Processing Massive Graphs April 17, 2017 78 / 78