Pregel Large-Scale Graph Processing William Jones Analysing large - PowerPoint PPT Presentation

Pregel Large-Scale Graph Processing William Jones

Analysing large graphs is hard. ● We are keenly interested in analysing certain very large graphs. (e.g. the Web graph) ● These graphs are now too large to store and process on one machine. ● Parallelising graph algorithms is hard: Poor memory locality (lots of remote reading of data). This ruins performance. o ● There is no scalable system for implementing arbitrary graph algorithms over arbitrary graphs in a large-scale distributed environment.

How would you solve this? ● Use a custom distributed environment. Not general purpose. o ● Use existing distributed platform like MapReduce Sub-optimal and inefficient. Often graph algorithms are iterative. o ● Use existing parallel graph system like Parallel BGL or CGMgraph. None currently address fault tolerance and other important issues like internal cluster o communication.

Pregel aims to solve this problem ● Allows efficient processing of large, distributively-stored, graphs. ● Abstracts away distributed computed related issues like fault tolerance and machine communication. ● Outlines a ‘vertex-centric system’ All programmer needs to do is outline a single function. o

Pregel’s computation model ● Consists of a sequence of iterations (supersteps), where the same user-defined function is executed for each vertex. ● This function specifies behaviour at a single vertex V and superset S. It can read messages sent to the vertex in superstep S-1, send messages to other vertices that will be read in superset S+1, and modify that state of V and its outgoing edges. ● Initially each vertex is in an active state. Each vertex can ‘vote to halt’ , where it runs no further computation in any further superstep unless it receives a message from another vertex. It is then is reactivated again and needs to explicitly vote to halt to deactivate itself again. Algorithm terminates when all vertices have halted.

Pregel simple example - find maximum value

Pregel details - the master, combiners, aggregators and fault tolerance. ● Master node Coordinates and maintains a list of all workers. o Maintains aggregator. o ● Aggregators Nodes send master a value at each iteration for aggregation. o Provides a global statistic to each node at each superstep, important for some algorithms o like Dijkstra's algorithm. ● Combiners Combines messages to reduce message traffic. o ● Fault tolerance. Achieved through checkpointing o Master instructs workers to save their state to persistent storage at the beginning of each o superstep. If Master detects these workers as down, it reassigns their partitions to available workers o and recomputes the superstep.

Pregel more complicated example - SSSP Set value to 0 if the vertex is the source and INF otherwise. Compute all the potential min distances from incoming arcs. If this is less than the current min distance, alert neighbours through outgoing arcs Halt until another message is sent to me.

Experiments - SSSP with varying graph size and worker numbers Run time varies linearly with increasing number of vertices. (As The 16x increase in worker tasks it should) from 50 to 800 leads to a 10-fold speedup.

Pregel critical analysis ● It uses network transfers only for messages. Omitting the need to read from remote memory. This is conventionally why algorithms over distributively stored graphs are slow. o ● Authors claim all graph algorithms can be transformed into this vertex-centric approach. However no proof is presented. o ● If fault tolerance occurs, it’s not clear whether only the work for the reassigned graph partition, or the entire work for that superstep is recomputed. ● Doesn’t address when infinite loops might occur and how to account for them.

Questions/Discussion

Pregel Large-Scale Graph Processing William Jones Analysing large - PowerPoint PPT Presentation

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are keenly interested in analysing certain very large graphs. (e.g. the Web graph) These graphs are now too large to store and process on one

Optimising Graph Algorithms on Pregel-Like Systems S. Salihoglu, J. Widom Stanford University

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart

Graph Processing Connor Gramazio Spiros Boosalis Pregel why not MapReduce? semantics: awkward

PREGEL: A SYSTEM FOR LARGE-SCALE GRAPH PROCESSING Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J.

Pregel A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, et. al.

Pregel: A System for Large- Scale Graph Processing Written by G. Malewicz et al. at SIGMOD 2010

Graph Mining - PageRank Mert Terzihan-Zhixiong Chen Content 1. Web as a Graph 2. Why is

Think Like a {Vertex, Column, Parallel Collection} David Konerding, Google Inc. Pregel: a system

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic,

Ligra: A Lightweight Graph Processing Framework for Shared Memory Shared memory Other not

X-Stream: Edge-centric Graph Processing using Streaming Partitions AMITABHA ROY, IVO MIHAILOVIC,

Pregelix: Big(ger) Graph Analytics on A Dataflow Engine Yingyi Bu (UC Irvine) Joint work with:

Graphs On Databases Alekh Jindal Sam Madden Mike Stonebraker CSAIL, MIT + = Jena FlockDB

Recap: Pregel Superstep 0 3 6 2 1 Superstep 1 6 6 2 6 Superstep 2 6 6 6 6

Verification System PUBLIC INPUT 2019 Nancy Nikolas Maier, Aging Services Division Director

Analyzing Streamers By Jose Arroyo Platform of Choice Catalogue Size Over 58 million Over

The Next Generation Lossless Network in the Data Center BrightTalk, Data Center Transformation

PennyMac Mortgage Investment Trust May 2020 Investor Presentation Forward-Looking Statements

New Distribution Capability (NDC) NDC Certification February 2016 NDC Certification. Why? To

2019 Full-year financial results 2019 FULL YEAR FINANCIAL RESULTS FY19 results Highlights

Energy Management Systems Annabelle Pratt Power Research Engineer Energy Systems Research, Intel

Presentation of the LHCONE Architecture document Marco Marletta, GARR LHCONE Meeting Paris,