One Trillion edges : Graph Processing at Facebook- Scale Tong Niu - PowerPoint PPT Presentation

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu tong.niu.cn@outlook.com 11. Juli 2019 1

Outline • Introduction • Improvements • Experiment Results • Conclusion& Future Work • Discussion Tong Niu 2

Introduction • Graph Structures(entities, connections) • social networks • Facebook manages a social graph that is composed of people, their friendships, subscriptions, likes, posts, and many other connections. 1.39B active users in 2014 with more than 400B edges Tong Niu 3

Introduction • What is Apache Giraph? • “Think like a vertex” • Each vertex has an id, a value, a list of adjacent neighbors and corresponding edge values • Bulk synchronous processing(BSP) • Break up to several supersteps(iteration) • Messages are sent during a superstep from one vertex to another and then delivered in the following supersteps Tong Niu 4

Introduction • What is Apache Giraph? Tong Niu 5

Introduction • What is Apache Giraph? • Master – Application coordinator • Assigns partitions to workers • Synchronizes supersteps • Worker – Computation, messaging • Load the graph from input splits • Does the computation/messaging of its assigned partitions Tong Niu 6

1. Flexible vertex/edge based input • Original input: • All data(vertex/edge) need to be read from the same record and assumed to the same data source • Modified input: • Allow loading vertex data and edges from separate sources • Add an arbitrary number of data sources Tong Niu 7

2. Parallelization support • Original: • Scheduled as a single MapReduce job • Modified: • Add more workers per machine • Use local multithreading to maximize resource utilization Tong Niu 8

3. Memory optimization • Original: • Large memory overhead because of flexibility • Modified: • Serialize the edges of every vertex into a bit array rather than using native direct serialization methods • Create an OutEdges interface that allow developers to achieve edge stores Tong Niu 9

4. Sharded aggregators • global computation(min/max value) • provide efficient shared state across workers • make the values available in the next superstep Tong Niu 10

4. Sharded aggregators • Original: • Use znodes in zookeeper to store partial aggregated data from workers, master aggregate all of them and write result back to znode for workers to access it • every worker has plenty of data that need to be aggregated • Modified: Randomly assigned to one of the workers Distribute final values to master/workers Tong Niu 11

K-Means clustering In a graph application, input vectors are vertices, and centroids are aggregators. Tong Niu 12

1. Worker phases • Add preApplication() to initialize positions of centroids • Add preSuperstep() to calculate the new position for each of the centroids before next superstep 2. Master computation • Centralized computation prior to every superstep that can communicate with the workers via aggregators Tong Niu 13

3. Composable computation • Allows us to use different message types ,combiners and computation to build a powerful k-means application 4. Superstep splitting • For a message heavy superstep • send a fragment of messages to the destinations and do a partial computation during each iteration Tong Niu 14

Experiment results Tong Niu 15

Experiment results • Giraph(200 machines) vs Hive(at least 200 machines) • compare CPU time and elapsed time • label propagation algorithm • Weighted PageRank Tong Niu 16

Conclusion & Future work How a processing framework supports Facebook-scale production workloads. We have described the improvements to Giraph. 1.Determine a good quality graph partitioning prior to our computation. 2.Make our computation more asynchronous to improve convergence speed. 3.Leverage Giraph as a parallel machine-learning platform Tong Niu 17

Discussion Tong Niu 18

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu - PowerPoint PPT Presentation

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu tong.niu.cn@outlook.com 11. Juli 2019 1 Outline Introduction Improvements Experiment Results Conclusion& Future Work Discussion Tong Niu 2

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows

Graph Consists of nodes/vertices and edges. Edges may be directed or undirected. Edges

9/30/2013 Us- One species- Homo Sapiens The Microbiome: 10 Trillion cells Your Trillion

Facebook Strategies Facebook www.facebook.com Facebook TIPS Idea #1: Share the School Calendar.

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

GETTING STARTED WITH FACEBOOK ADVERTISING 1.Facebook Ads Growth 2.Why theyre popular

Introducing Live for Facebook Available Now (beta) Coming Soon Available On Facebook Mentions

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford

Graph Algorithms What is a graph? V - vertices E V x V - edges directed / undirected

Graph Theory Graph G = ( V , E ) . V ={vertices}, E ={edges}. a b c h d k g e f

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

What are Graphs? Nodes and Edges A graph consists of dots called nodes or vertices.

Polygon decomposition into monotone polygons Vertex types START vertex (2 edges on the right and

of Charm Physics Alexey Dzyuba \ HEPD PNPI NRC KI on behalf of LHCb Collaboration 21 st of May

PANDA Software Trigger Status Report PANDA Collaboration Meeting Computing Session March 2014,

Searching for Physics Beyond the Standard Model @ LHCb Mike Williams Department of Physics &

Revisiting Enumerative Instantiation Andrew Reynolds 1 , Haniel Barbosa 1 , 2 and Pascal Fontaine 2

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014

A Trillion Rows Per Second as a Foundation for Interactive Analytics Eric Hanson, Principal

11/10/20 Triple Threat or Epiphany? The Need for a Biopsychosocial Approach to Pain Management

The tunnel leveling addendum Darryl McCullough University of Oklahoma Geometric Topology in 3

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu - PowerPoint PPT Presentation

One Trillion edges : Graph Processing at Facebook- Scale Tong Niu tong.niu.cn@outlook.com 11. Juli 2019 1 Outline Introduction Improvements Experiment Results Conclusion& Future Work Discussion Tong Niu 2

One Trillion Edges: Graph Processing at Facebook-Scale GraphHPC 2015, Moscow Avery Ching Sergey

Facebook Exchange Facebook Exchange (FBX) (FBX) Facebook Exchange The Facebook Exchange allows

Graph Consists of nodes/vertices and edges. Edges may be directed or undirected. Edges

9/30/2013 Us- One species- Homo Sapiens The Microbiome: 10 Trillion cells Your Trillion

Facebook Strategies Facebook www.facebook.com Facebook TIPS Idea #1: Share the School Calendar.

Graphs Introduction Graph Graph A graph G = ( V , E ) is a set V of vertices connected by an

GETTING STARTED WITH FACEBOOK ADVERTISING 1.Facebook Ads Growth 2.Why theyre popular

Introducing Live for Facebook Available Now (beta) Coming Soon Available On Facebook Mentions

MySQL Replication and HA at Facebook Part-II Jeff Jiang Production Engineer Facebook, Inc

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford

Graph Algorithms What is a graph? V - vertices E V x V - edges directed / undirected

Graph Theory Graph G = ( V , E ) . V ={vertices}, E ={edges}. a b c h d k g e f

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

What are Graphs? Nodes and Edges A graph consists of dots called nodes or vertices.

Polygon decomposition into monotone polygons Vertex types START vertex (2 edges on the right and

of Charm Physics Alexey Dzyuba \ HEPD PNPI NRC KI on behalf of LHCb Collaboration 21 st of May

PANDA Software Trigger Status Report PANDA Collaboration Meeting Computing Session March 2014,

Searching for Physics Beyond the Standard Model @ LHCb Mike Williams Department of Physics &amp;

Revisiting Enumerative Instantiation Andrew Reynolds 1 , Haniel Barbosa 1 , 2 and Pascal Fontaine 2

Giraph: Production-grade graph processing infrastructure for trillion edge graphs 6/22/2014

A Trillion Rows Per Second as a Foundation for Interactive Analytics Eric Hanson, Principal

11/10/20 Triple Threat or Epiphany? The Need for a Biopsychosocial Approach to Pain Management

The tunnel leveling addendum Darryl McCullough University of Oklahoma Geometric Topology in 3

Searching for Physics Beyond the Standard Model @ LHCb Mike Williams Department of Physics &