computation on natural graphs
play

Computation on Natural Graphs Presenter: Mengxiao Wang Problem: - PowerPoint PPT Presentation

CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Presenter: Mengxiao Wang Problem: Existing distributed graph computation


  1. CSE 6350 File and Storage System Infrastructure in Data centers Supporting Internet-wide Services PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs Presenter: Mengxiao Wang

  2. Problem: Existing distributed graph computation systems perform poorly on natural graphs.

  3. Properties of Natural Graphs: Power-Law Degree Distribution

  4. Properties of Natural Graphs: Low-Quality Partition • Power-Law graphs do not have low cost balanced cut • Traditional graph-partitioning algorithms perform poorly on Power-Law graphs

  5. Q1: Use Figure 1 to illustrate highly skewed power-law degree distribution in a graph and explain how this presents challenges to graph-parallel execution engines like Pregel (in terms of work balance, partitioning, communication, storage, and computation). • Work balance: Due to the property of power-low distribution, the runtime of vertices varies widely with graph-parallel execution. • Partitioning: Pregel uses random partitioning of vertices, which results in poor locality (only a small part of machines will have most edge cuts). • Communication: High-degree vertices will have too much communication other vertices resulting in bottleneck, like traffic problem and too many same messages. • Storage: High-degree vertices will have too many edge metadata on a single machine. • Computation: Since the vertex-programs are executed in parallel but abstractions within them do not parallelize, the high-degree vertices will have more computation than other vertices.

  6. Q2: Use Algorithm 1 and Figure 3 about SSSP to explain how an SSSP problem is solved in the execution of GAS vertex programs. • Gather all the information of in-neighbors and in-edges of the vertex. Find out the minimum value of the sum of in- neighbors and in-edges. • Apply the value to the master vertex and update it to other mirrors on other machines. • If the value is changed, scatter the new value to all out- neighbors and activate them to start GAS vertex programs.

  7. Q3: Explain how the load balancing issue with a graph of highly skewed power-law degree distribution in Pregel can be addressed in PowerGraph. • Evenly assign edges to For greedy vertex-cuts: machines • De-randomization: greedily • Minimize machines spanned minimizes the expected by each vertex number of machines spanned • Assign each edge as it is • Coordinated loaded – Requires coordination to place • Touch each edge only once each edge • Propose three distributed – Slower: higher quality cuts approaches • Oblivious • Random Edge Placement – Approx. greedy objective • Coordinated Greedy Edge without coordination Placement – Faster: lower quality cuts • Oblivious Greedy Edge Placement

  8. • Rather than edge-cut Must synchronize edges • Prefer vertex-cut Must synchronize vertices

  9. Distributed Execution of a PowerGraph Vertex-Program

  10. Thank You! Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend