One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC - PowerPoint PPT Presentation

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey Edunov Maja Kabiljo Facebook Facebook Facebook Dionysios Logothetis Sambavi Muthukrishnan Facebook Facebook

Social Graph

Social Graph Example Question: Are Jay and Sambavi friends?

Ranking Features 7.6 9.3 6.4 8.2

Ranking Features Recommendations 7.6 9.3 6.4 8.2

Ranking Features Recommendations Data Partitioning 7.6 9.3 6.4 8.2

Benchmark Graphs Benchmark to Social Graphs Vertices Clueweb 09 Edges Twitter research Friendster Yahoo! web 0 1750 3500 5250 7000

Benchmark Graphs Benchmark to Social Graphs Vertices Clueweb 09 Edges Twitter research Friendster 70x larger than benchmarks! Yahoo! web 2015 Twitter Approx. 2015 Facebook Approx. 0 125000 1750 250000 3500 375000 5250 50000 7000

Requirements

Requirements Efficient iterative computing model • • Easy to program and debug graph-based API • Scale to real world Facebook graph sizes (1B+ nodes and hundreds of billions of edges) • Easily interoperable with existing data (Hive) • Run multiple jobs in a multi-tenant environment

Apache Giraph Maximum Vertex Example • Highly scalable graph processing engine loosely based on Pregel Combiners are used to aggregate message values • • Aggregators are global data generated on every superset

Apache Giraph Maximum Vertex Example Processor 1 5 Processor 2 1 2 Time • Highly scalable graph processing engine loosely based on Pregel Combiners are used to aggregate message values • • Aggregators are global data generated on every superset

Apache Giraph Maximum Vertex Example Processor 1 5 5 Processor 2 1 1 5 2 2 Time • Highly scalable graph processing engine loosely based on Pregel Combiners are used to aggregate message values • • Aggregators are global data generated on every superset

Apache Giraph Maximum Vertex Example Processor 1 5 5 5 Processor 2 1 5 1 5 2 2 2 5 Time • Highly scalable graph processing engine loosely based on Pregel Combiners are used to aggregate message values • • Aggregators are global data generated on every superset

Apache Giraph Maximum Vertex Example Processor 1 5 5 5 5 Processor 2 1 5 5 1 5 2 5 2 2 5 Time • Highly scalable graph processing engine loosely based on Pregel Combiners are used to aggregate message values • • Aggregators are global data generated on every superset

Pipelines Data Pipelines Framework Core Applications Analytics Execution Framework MapReduce (Scheduler) Storage HDFS

Architecture

Architecture Loading the graph Input Format Split 0 Worker 0 Load/Send Graph Split 1 Master Split 2 Worker 1 Load/Send Graph Split 3

Architecture Loading the graph Compute / Iterate Input Format In-memory graph Split 0 Part 0 Worker 0 Worker 0 Load/Send Compute/ Send Graph Messages Split 1 Part 1 Master Master Split 2 Part 2 Worker 1 Worker 1 Load/Send Compute/ Send Graph Messages Split 3 Part 3 Send stats/iterate!

Architecture Loading the graph Compute / Iterate Storing the graph Input Format In-memory graph Output Format Split 0 Part 0 Part 0 Part 0 Worker 0 Worker 0 Worker 0 Load/Send Compute/ Send Graph Messages Part 1 Split 1 Part 1 Part 1 Master Master Split 2 Part 2 Part 2 Part 2 Worker 1 Worker 1 Worker 0 Load/Send Compute/ Send Graph Messages Split 3 Part 3 Part 3 Part 3 Send stats/iterate!

Parallelization Model Compute / Iterate In-memory graph Part 0 Worker 0 Compute/ Send Messages Part 1 Master Part 2 Worker 1 Compute/ Send Messages Part 3 Send stats/iterate!

Parallelization Model Compute / Iterate Worker parallelization: Rely on scheduling In-memory graph framework for parallelization (e.g. Part 0 Worker 0 Compute/ Send more mappers) Messages Part 1 Master Part 2 Worker 1 Compute/ Send Messages Pros: Simple Part 3 Send stats/iterate!

Parallelization Model Pros: Fewer Compute / Iterate connections, better Worker memory usage (e.g. parallelization: shared message Rely on scheduling In-memory graph buffering) framework for parallelization (e.g. Part 0 Worker 0 Compute/ Send more mappers) Messages Part 1 Master Multithreading Part 2 Worker 1 parallelization:   Compute/ Send Messages Pros: Simple Multicore machines Part 3 leverage up to (partitions / worker) Send stats/iterate! threads

            Efficient Java Object Support Edges >> vertices (> 2 orders) • /**   * Interface for data structures that store out-edges for a vertex.   *   • Allow custom out edge * @param < I > Vertex id   * @param < E > Edge value   implementations */   public interface OutEdges<I extends WritableComparable, E extends Writable> extends Iterable<Edge<I, E>>, Writable {   Example: Java primitive arrays, • void initialize(Iterable<Edge<I, E>> edges);   void initialize(int capacity);   FastUtil libraries void initialize();   void add(Edge<I, E> edge);   • Serialize messages into large void remove(I targetVertexId);   byte arrays int size();   }

Page Rank Map Reduce (Hadoop)

Page Rank Map Reduce (Hadoop) Giraph public class PageRankComputation extends BasicComputation<LongWritable, DoubleWritable, FloatWritable, DoubleWritable> { public void compute( Vertex<LongWritable, DoubleWritable, FloatWritable> vertex, Iterable<DoubleWritable> messages) { // Calculate new page rank value if (getSuperstep() >= 1) { double sum = 0; for (DoubleWritable message : messages) { sum += message.get(); } vertex.getValue().set(0.15d / getTotalNumVertices() +0.85d * sum); } // Send page rank value to neighbors if (getSuperstep() < 30) { sendMsgToAllEdges(new DoubleWritable(getVertexValue().get() / getNumOutEdges())); } else { voteToHalt(); } } }

Pregel Extensions

Pregel Model Limitations

Pregel Model Limitations Difficult to construct “multi-stage” graph applications •

Pregel Model Limitations Difficult to construct “multi-stage” graph applications • • Hard to reuse code

Extensions

Extensions • Make Computation first class object

Extensions • Make Computation first class object • Define computation on a master, worker, and a vertex

Extensions • Make Computation first class object • Define computation on a master, worker, and a vertex Master computation is executed centrally to set the • computation, combiner for the workers

Extensions • Make Computation first class object • Define computation on a master, worker, and a vertex Master computation is executed centrally to set the • computation, combiner for the workers • All computations are now composable and reusable

First Class Computation

First Class Computation class Vertex { public: virtual void Compute(MessageIterator* msgs) = 0; … };

First Class Computation class Vertex { public: … }; public interface Computation { void compute(Vertex<I, V, E> vertex, Iterable<M1> messages); … }

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC - PowerPoint PPT Presentation

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey Edunov Maja Kabiljo Facebook Facebook Facebook Dionysios Logothetis Sambavi Muthukrishnan Facebook Facebook Social Graph Social Graph

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu tong.niu.cn@outlook.com 11.

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows

Graph Consists of nodes/vertices and edges. Edges may be directed or undirected. Edges

9/30/2013 Us- One species- Homo Sapiens The Microbiome: 10 Trillion cells Your Trillion

Facebook Strategies Facebook www.facebook.com Facebook TIPS Idea #1: Share the School Calendar.

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

GETTING STARTED WITH FACEBOOK ADVERTISING 1.Facebook Ads Growth 2.Why theyre popular

Introducing Live for Facebook Available Now (beta) Coming Soon Available On Facebook Mentions

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford

Graph Algorithms What is a graph? V - vertices E V x V - edges directed / undirected

Graph Theory Graph G = ( V , E ) . V ={vertices}, E ={edges}. a b c h d k g e f

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

What are Graphs? Nodes and Edges A graph consists of dots called nodes or vertices.

Polygon decomposition into monotone polygons Vertex types START vertex (2 edges on the right and

Recap: Map-Reduce Map Phase Reduce Phase (per record

Data Format and Packaging, An Update Kurt Biery 18 March 2020 DUNE DAQ Dataflow Working Group

Introduction to SystemVerilog Instructor: Nima Honarmand (Slides adapted from Prof. Milders

Computational Learning Theory: Positive and negative learnability results Machine Learning 1

A Greybeard's Worst Nightmare How Kubernetes and Containers are re-defining the Linux OS Daniel

Functional Descriptions as the Bridge between Hypermedia APIs and the Semantic Web Ruben Verborgh

ADROIT: Detecting Spatio-Temporal Correlated Attack-Stages in IoT Networks NUS-Singtel Cyber

Tool-Assisted Specification and Verification of the JavaCard Platform Gilles