PowerGraph : Distributed Graph-Parallel Computation on Natural - - PowerPoint PPT Presentation

powergraph distributed graph parallel computation on
SMART_READER_LITE
LIVE PREVIEW

PowerGraph : Distributed Graph-Parallel Computation on Natural - - PowerPoint PPT Presentation

PowerGraph : Distributed Graph-Parallel Computation on Natural Graphs Gonzales et al. James Trever What are Graphs? Graphs are everywhere and used to encode relationships So what are they used for? Data Mining - Targeted ads - Natural


slide-1
SLIDE 1

PowerGraph: Distributed Graph-Parallel Computation

  • n Natural Graphs

Gonzales et al. James Trever

slide-2
SLIDE 2

What are Graphs?

Graphs are everywhere and used to encode relationships

slide-3
SLIDE 3

So what are they used for?

  • Targeted ads
  • Natural Language

Processing

  • Identifying influential

people and information Machine Learning Data Mining

slide-4
SLIDE 4

Natural Graphs

Graphs derived from real world phenomena

slide-5
SLIDE 5

Challenges with Natural Graphs

Power-Law Degree Distribution

slide-6
SLIDE 6
slide-7
SLIDE 7

Graph-Parallel Abstraction

  • A Vertex-Program, designed by the user, runs on every vertex
  • Vertex-Programs interact with one another along their edges
  • Multiple Vertex-Programs are run simultaneously
slide-8
SLIDE 8

Challenges with Natural Graphs

  • Power-Law Graphs are very difficult to partition/cut
  • Often incurs a large communication or storage overhead
slide-9
SLIDE 9

Existing Systems

Pregel & GraphLab

slide-10
SLIDE 10

Pregel

  • Bulk Synchronous Message Passing Abstraction
  • Uses messages to communicate with other vertices
  • Waits until all vertex programs have finished before starting the next “super

step”

  • Uses message combiners
slide-11
SLIDE 11

Pregel

Fan-In Fan-Out

slide-12
SLIDE 12

GraphLab

  • Asynchronous Distributed Shared-Memory Abstraction
  • Vertex-Programs have shared access to distributed graph with data stored on

each vertex and edge and can access the current vertex, adjacent edges and adjacent vertices irrespective of edge direction

  • Vertex-Programs have the ability to schedule other vertices’ execution in the

future

slide-13
SLIDE 13

GraphLab

GraphLab Ghosting

slide-14
SLIDE 14

Challenges with Natural Graphs

slide-15
SLIDE 15

PowerGraph

slide-16
SLIDE 16

PowerGraph

  • GAS Decomposition
  • Distribute Vertex-Programs
  • Parallelise high degree vertices
  • Vertex Partitioning
  • Distribute power-law graphs more efficiently
slide-17
SLIDE 17

GAS Decomposition

slide-18
SLIDE 18

Vertex Partitioning

Edge Cuts Vertext Cuts

slide-19
SLIDE 19

Vertex Partitioning

slide-20
SLIDE 20

How the vertices are partitioned

  • Evenly assign edges to machines
  • 3 different approaches
  • Random edge placement
  • Greedy placement
  • Coordinated edge placement
  • Oblivious edge placement
slide-21
SLIDE 21

Random Edge Placements

slide-22
SLIDE 22

Greedy Edge Placements

  • Place edges on machines that already have the vertices in that edge
  • If there are multiple options, choose the less loaded machine
slide-23
SLIDE 23

Greedy Edge Placements

  • Minimises the expected number of machines spanned
  • Coordinated:
  • Requires coordination to place each edge
  • Slower but has higher quality cuts
  • Oblivious:
  • Approximate greedy objective without coordination
  • Faster but lower quality cuts
slide-24
SLIDE 24

Experiments - Graph Partitioning

slide-25
SLIDE 25

Experiments - Synthetic Work Imbalance and Communication

slide-26
SLIDE 26

Experiments - Synthetic Runtime

slide-27
SLIDE 27

Experiments - Machine Learning

slide-28
SLIDE 28

Other Features

  • 3 different execution modes:
  • Bulk Synchronous
  • Asynchronous
  • Asynchronous Serialisable
  • Delta Caching
slide-29
SLIDE 29

Critical Evaluation

  • Lots of talk of performance, not many tests comparing systems
  • Delta caching only briefly touched on
  • Future work lacks detail
  • Lots of unbacked up claims
  • Greedy edge placement not very clear
  • No mention of fault tolerance
slide-30
SLIDE 30

Bibliography

  • J. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin: Powergraph: distributed

graph-parallel computation on naturalgraphs. OSDI, 2012. And his original presentation found here: http://www.cs.berkeley.edu/~jegonzal/talks/powergraph_osdi12.pptx