Computation on Natural Graphs Presenter: Mengxiao Wang Problem: - - PowerPoint PPT Presentation

computation on natural graphs
SMART_READER_LITE
LIVE PREVIEW

Computation on Natural Graphs Presenter: Mengxiao Wang Problem: - - PowerPoint PPT Presentation

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Presenter: Mengxiao Wang Problem: Existing distributed graph computation


slide-1
SLIDE 1

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs

Presenter: Mengxiao Wang

slide-2
SLIDE 2

Problem:

Existing distributed graph computation systems perform poorly on natural graphs.

slide-3
SLIDE 3

Properties of Natural Graphs: Power-Law Degree Distribution

slide-4
SLIDE 4

Properties of Natural Graphs: Low-Quality Partition

  • Power-Law graphs do not have low cost

balanced cut

  • Traditional graph-partitioning algorithms

perform poorly on Power-Law graphs

slide-5
SLIDE 5

Q1: Use Figure 1 to illustrate highly skewed power-law degree distribution in a graph and explain how this presents challenges to graph-parallel execution engines like Pregel (in terms of work balance, partitioning, communication, storage, and computation).

  • Work balance: Due to the property of power-low distribution, the

runtime of vertices varies widely with graph-parallel execution.

  • Partitioning: Pregel uses random partitioning of vertices, which

results in poor locality (only a small part of machines will have most edge cuts).

  • Communication: High-degree vertices will have too much

communication other vertices resulting in bottleneck, like traffic problem and too many same messages.

  • Storage: High-degree vertices will have too many edge metadata on

a single machine.

  • Computation: Since the vertex-programs are executed in parallel

but abstractions within them do not parallelize, the high-degree vertices will have more computation than other vertices.

slide-6
SLIDE 6
  • Gather all the information of in-neighbors and in-edges of the
  • vertex. Find out the minimum value of the sum of in-

neighbors and in-edges.

  • Apply the value to the master vertex and update it to other

mirrors on other machines.

  • If the value is changed, scatter the new value to all out-

neighbors and activate them to start GAS vertex programs.

Q2: Use Algorithm 1 and Figure 3 about SSSP to explain how an SSSP problem is solved in the execution of GAS vertex programs.

slide-7
SLIDE 7
slide-8
SLIDE 8
  • Evenly assign edges to

machines

  • Minimize machines spanned

by each vertex

  • Assign each edge as it is

loaded

  • Touch each edge only once
  • Propose three distributed

approaches

  • Random Edge Placement
  • Coordinated Greedy Edge

Placement

  • Oblivious Greedy Edge

Placement

Q3: Explain how the load balancing issue with a graph

  • f highly skewed power-law degree distribution in

Pregel can be addressed in PowerGraph.

For greedy vertex-cuts:

  • De-randomization: greedily

minimizes the expected number of machines spanned

  • Coordinated

– Requires coordination to place each edge – Slower: higher quality cuts

  • Oblivious

– Approx. greedy objective without coordination – Faster: lower quality cuts

slide-9
SLIDE 9
  • Rather than edge-cut
  • Prefer vertex-cut

Must synchronize edges Must synchronize vertices

slide-10
SLIDE 10

Distributed Execution of a PowerGraph Vertex-Program

slide-11
SLIDE 11

Thank You!

Questions?