using Sector/Sphere Yunhong Gu , Li Lu : University of Illinois at - PowerPoint PPT Presentation

Processing Massive Sized Graphs using Sector/Sphere Yunhong Gu , Li Lu : University of Illinois at Chicago Robert Grossman : University of Chicago and Open Data Group Andy Yoo : Lawrence Livermore National Laboratory

Background  Very large graph (billions of vertices) processing is important in many real world applications (e.g., social networks)  Traditional systems are often complicated to use and/or expensive to build  Processing graphs distributedly requires shared data access or complicated data moving  This paper investigates how to support large graph processing with “cloud” style compute system  Data centric model, simplified API  E.g., MapReduce

Overview  Sector/Sphere  In-Storage Data Processing Framework  Graph Breadth-First Search  Experimental Results  Conclusion

Sector/Sphere  Sector: distributed file system  Running on clusters of commodity computers  Software fault tolerance with replication  T opology aware  Application aware  Sphere: parallel data processing framework  In-storage processing  User-defined functions on data segments in parallel  Load balancing and fault tolerance

Parallel Data Processing Framework  Data Storage Disk Disk Disk  Locality aware distributed Input Input Input Seg. x Seg. y Seg. z file system UDF UDF UDF  Data Processing  MapReduce  User-defined functions Bucket Bucket Bucket Bucket Writer Writer Writer Writer Output Output Output Output  Data Exchanging Seg. 1 Seg. 2 Seg. 3 Seg. n  Hash  Reduce Disk Disk Disk Disk

Key Performance Factors  Input locality  Data is processed on the node where it resides, or on nearest nodes  Output locality  Output data can be put such locations such that data movement can be reduced in further processing  In-memory objects  Frequently accessed data may be stored in memory

Output Locality: An Example  Join two datasets DataSet 1 DataSet 2  Scan each one UDF 1 UDF 1 UDF 2 UDF 2 independently, put their results together  Merge the result buckets UDF- UDF- UDF- Join Join Join

Graph BFS b a b a

Data Segmentation  Adjacency list  Each segments contains approximately same number of edges  Edges belonging to the adjacency list of one vertex will not be slit into two segments

Sphere UDF for Graph BFS  Basic idea: scan each data segment, find the neighbors of the current level, generates next level, which is the union of the neighbors of all vertices in the current level. Repeat this, until destination is found.  Sphere UDF for unidirectional BFS  Input: Graph data segment x, current level segment l_x. If a vertex appears in level segment l_x, then it must exist in the graph data segment x  For each vertex in level segment l_x, find its neighbor vertices in the data segment x, label each neighbor vertex a bucket ID so that it satisfies the above relationship

Experiments Setup  Data  PubMed: 28m vertices, 542m edges, 6GB data  PubMedEx: 5b vertices, 118b edges, 1.3TB data  Testbed  Open Cloud T estbed: JHU, UIC, StarLight, Calit2  4 racks, 120 nodes, 10GE inter-connection

Average Time Cost (seconds) on PubMed using 20 Servers Avg Time Avg Time Length Count Percent Uni-BFS Bi-BFS 2 28 10.8 21 25 3 85 32.7 26 29 4 88 33.8 38 33 5 34 13.1 70 42 6 13 5 69 42 7 7 2.7 88 51 8 5 1.9 84 54 Total 260 Avg Time 40 33

Performance Impact of Various Components in Sphere Components Change Time Cost Change Without in-memory object 117% Without bucket location 146% optimization With bucket combiner 106% With bucket fault tolerance 110% Data segmentation by the same 118% number of vertices

The Average Time Cost (seconds) on PubMedEx using 60 Servers Length Count Percent Avg Time 2 11 4.2 56 3 1 0.4 82 4 60 23.2 79 5 141 54.2 197 6 45 17.3 144 7 2 0.7 201 Total 260 Avg Time 156

The Average Time Cost (seconds) on PubMedEx on 19, 41, 61, 83 Servers Group 3 4 5 6 7 Count 1 24 58 16 1 Servers # AVG 19 112 257 274 327 152 275 41 153 174 174 165 280 174 59 184 150 157 140 124 153 83 214 145 146 138 192 147

Conclusion  We can process very large graphs with the “cloud” compute model such as Sphere  Performance is comparable to traditional systems, but requires simple development effort (less than 1000 lines of code for BFS)  A BFS-type query can be done in a few minutes  Future work: concurrent queries

using Sector/Sphere Yunhong Gu , Li Lu : University of Illinois at - PowerPoint PPT Presentation

Processing Massive Sized Graphs using Sector/Sphere Yunhong Gu , Li Lu : University of Illinois at Chicago Robert Grossman : University of Chicago and Open Data Group Andy Yoo : Lawrence Livermore National Laboratory Background Very large

CS 225 Data Structures Sept. 15 - Templates RedBall r; Sphere obj; RedBall obj; Sphere

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, Indianapolis, IN Outline Outline

Accelerating Sphere Tracing Csaba Blint , Gbor Valasek Etvs Lornd University, Hungary

Recent breakthroughs in sphere packing Abhinav Kumar Stony Brook, ICTS November 8, 2019 Abhinav

CS 225 Data Structures joinSpheres-returnByValue.cpp 11 /* 12 * Creates a new sphere that

CS 225 Data Structures Wad ade Fag agen-Ulm lmschneid ider #include "sphere.h"

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

Low Silver BGA Sphere Metallurgy Project COMPARISON OF FOUR LOW-SILVER SPHERE ALLOYS AND

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

DynamO Workshop Tutorial: Hard sphere simulation Dr Marcus N. Bannerman & Dr Leo Lue

The Euclidean Algorithm in Circle/Sphere Packings Arseniy (Senia) Sheydvasser October 25, 2019

PROCEDURAL OBJECT MODELING 1 OUTLINE Building a Sphere Building a Torus 2 A SPHERE!!

Slide 1 / 29 Slide 2 / 29 1. Sphere 1 carries a positive charge 1. Sphere 1 carries a

Slide 7 / 39 Slide 8 / 39 6 What is the electric potential at point c if each side of the

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

Installation and Usage Yunhong Gu July 2010 Agenda System Overview Installation File

Update on the FI Testbed Activities in Korea in Korea Sunhee Yang@ETRI Sunhee Yang@ETRI,

Differentiation is daunting... MATH GRADE LEVEL 3 In a first grade classroom 2 with 24

Uninformed Search Lecture 4 What are common search strategies that operate given only a search

Advanced Biofuels: Part of a Practical Plan to Cut Projected Oil Use in Half over Twenty Years

DataFrame column operations CLEAN IN G DATA W ITH P YS PARK Mike Metzger Data Engineering

Extract Transform Select IN TRODUCTION TO S PARK S QL IN P YTH ON Mark Plutowski Data

Elmer Software Development Practices APIs for Solver and UDF Peter Rback ElmerTeam CSC IT

MySQL User-Defined Functions ...in JavaScript! https://github.com/rpbouman/mysqlv8udfs Welcome!

using Sector/Sphere Yunhong Gu , Li Lu : University of Illinois at - PowerPoint PPT Presentation

Processing Massive Sized Graphs using Sector/Sphere Yunhong Gu , Li Lu : University of Illinois at Chicago Robert Grossman : University of Chicago and Open Data Group Andy Yoo : Lawrence Livermore National Laboratory Background Very large

CS 225 Data Structures Sept. 15 - Templates RedBall r; Sphere obj; RedBall obj; Sphere

Sector/Sphere Tutorial Yunhong Gu CloudCom 2010, Nov. 30, Indianapolis, IN Outline Outline

Accelerating Sphere Tracing Csaba Blint , Gbor Valasek Etvs Lornd University, Hungary

Recent breakthroughs in sphere packing Abhinav Kumar Stony Brook, ICTS November 8, 2019 Abhinav

CS 225 Data Structures joinSpheres-returnByValue.cpp 11 /* 12 * Creates a new sphere that

CS 225 Data Structures Wad ade Fag agen-Ulm lmschneid ider #include &quot;sphere.h&quot;

Volume Presentation Volume of a Volume with sphere, cone, Slant height cylinder, and pyramid.

Low Silver BGA Sphere Metallurgy Project COMPARISON OF FOUR LOW-SILVER SPHERE ALLOYS AND

Sphere packing, lattice packing, and related problems Abhinav Kumar Stony Brook April 25, 2018

DynamO Workshop Tutorial: Hard sphere simulation Dr Marcus N. Bannerman &amp; Dr Leo Lue

The Euclidean Algorithm in Circle/Sphere Packings Arseniy (Senia) Sheydvasser October 25, 2019

PROCEDURAL OBJECT MODELING 1 OUTLINE Building a Sphere Building a Torus 2 A SPHERE!!

Slide 1 / 29 Slide 2 / 29 1. Sphere 1 carries a positive charge 1. Sphere 1 carries a

Slide 7 / 39 Slide 8 / 39 6 What is the electric potential at point c if each side of the

Multi-plane multi-view approach to project the sphere viewing sphere Introduction Global Map

Installation and Usage Yunhong Gu July 2010 Agenda System Overview Installation File

Update on the FI Testbed Activities in Korea in Korea Sunhee Yang@ETRI Sunhee Yang@ETRI,

Differentiation is daunting... MATH GRADE LEVEL 3 In a first grade classroom 2 with 24

Uninformed Search Lecture 4 What are common search strategies that operate given only a search

Advanced Biofuels: Part of a Practical Plan to Cut Projected Oil Use in Half over Twenty Years

DataFrame column operations CLEAN IN G DATA W ITH P YS PARK Mike Metzger Data Engineering

Extract Transform Select IN TRODUCTION TO S PARK S QL IN P YTH ON Mark Plutowski Data

Elmer Software Development Practices APIs for Solver and UDF Peter Rback ElmerTeam CSC IT

MySQL User-Defined Functions ...in JavaScript! https://github.com/rpbouman/mysqlv8udfs Welcome!

CS 225 Data Structures Wad ade Fag agen-Ulm lmschneid ider #include "sphere.h"

DynamO Workshop Tutorial: Hard sphere simulation Dr Marcus N. Bannerman & Dr Leo Lue