HelP: High-level Primitives for Large- Scale Graph Processing Semih - PowerPoint PPT Presentation

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu — Stanford University Jennifer Widom — Stanford University 1

Large-scale Graph Processing  10s or 100s billion vertices and edges  Distributed Shared-Nothing Systems Machine k Machine 1 Machine 2 ……… Distributed Storage Pregel PowerGraph 2

APIs of Existing Systems  Specialized map() and reduce() type APIs  Pregel’s compute()  PowerGraph’s gather(), apply(), scatter()  Vertex-centric/Graph-parallel  Message-passing Machine k Machine 1 Machine 2 ……… Distributed Storage 3

Advantages  Transparent parallelism  Flexible. Can express many graph algorithms: PageRank HITS Shortest Paths Collaborative Filtering Affinity Propagation Loopy Belief Propagation Weakly Connected Components Triangle Counting Strongly Connected Components Betweenness-Centrality Minimum Spanning Tree Diameter Estimation … … 4

Disadvantages  Custom code for common operations, such as:  Initializing vertex values  Aggregating neighbor values  Difficult to read and understand some programs:  Complex UDFs hide higher-level graph operations … graph = Pregel.compute Pregel.compute(U (UDF1 F1) graph graph = = Pregel.compute Pregel.compute(U (UDF2 F2) graph = Pregel.compute Pregel.compute(U (UDF3 F3) …  Too low-level for some operations  E.g: forming super vertices in a minimum spanning tree  Multiple rounds of complex messaging inside compute() 5

HelP Primitives Large-Scale Data Large-Scale Graph Processing Processing X X X X X map() reduce() X compute() gather() apply() scatter() Pig and Hive: HelP: join, group by, ? select, … 6

Steps in Our Work 1. Implemented a wide suite of distributed graph algorithms 2. Identified the commonly appearing operations 3. Abstracted the operations into HelP primitives 4. Implemented HelP on GraphX 5. Reimplemented the suite of algorithms on GraphX 7

Graph Algorithms We Implemented Algorithm PageRank HITS Conductance Approx. Betweenness Centrality Clustering Coefficient Semi-clustering Multi-level clustering Approx. Maximum Weight Matching Random Bipartite Matching Weakly Connected Components Strongly Connected Components Single Source Shortest Paths Graph Coloring Maximal Independent Set K-core Triangle Counting Diameter Estimation K-truss Minimum Spanning Forest 8

HelP Primitives Primitive Type of Operation Aggregate Neighbor Values (ANV) Vertex-centric Update Local Update of Vertices (LUV) Vertex-centric Update Vertex-centric Update Update Vertices Using One Other Vertex (UVUOV) Filter Topology Modification Form Supervertices (FS) Topology Modification Aggregate Global Value (AGV) Global Aggregation 9

Algorithms & HelP Primitives Algorithm Filter ANV LUV UVUOV FS AGV PageRank x x HITS x x x Conductance x x Approx. Betweenness Centrality x x x Clustering Coefficient x x Semi-clustering x x x Multi-level clustering x x x Approx. Maximum Weight Matching x x Random Bipartite Matching x x x Weakly Connected Components x x Strongly Connected Components x x x x Single Source Shortest Paths x x Graph Coloring x x x Maximal Independent Set x x x K-core x x Triangle Counting x Diameter Estimation x x x K-truss x Minimum Spanning Forest x x x x 10

Example: Aggregate Neighbor Values  Vertices a ggregate some or all of their neighbors’ values  Update own value with the aggregated value  Version 1: Non-iterative => aggregateNeighborValues PageRank … for (i=0; i < 10; ++i) { g.aggregat aggregateN eNeig eighb hbor orVa Valu lues es( v -> true /* aggregate all vertices */, nbr -> true /* which neighbors to aggregate */ , nbr -> nbr.val.pr/nbr.degree, AggrFnc.SUM, (v, sumPr)->{v.val.pr = 0.85*sumPr + 0.15/g.numV;}) } 11

Version 2: Iterative => propagateAndAggregate  Continue aggregations until vertex values converge  Ex: Weakly Connected Components 1 1 1 1 7 7 7 7 5 1 4 9 7 9 5 2 4 8 8 8 8 2 2 2 2 9 8 9 5 5 3 5 5 5 5 4 9 5 9 9 3 3 5 3 5 … 5 4 3 5 4 4 9 4 9 9 9 g.propaga gate teAn AndAg Aggr greg egat ate( EdgeDirection.BOTH, v -> true, /* start propagation from all */ v -> v.val.wccID, AggrFnc.MAX, (v, aggrWCCID) -> {v.val.wccID = aggrWCCID;}) 12

Related Work (see paper)  Vertex-centric APIs  MapReduce-based APIs  Higher-Level Data Analysis Languages  Domain-Specific Graph Languages  MPI-based Libraries 13

GraphX Implementation, Limitations, Future Work See Our Paper & Poster! 14

Questions? 15

GraphX Implementation (Non-iterative Version) Graph EdgesRDD MessagesRDD v 1 .ID v 2 .ID e 1 v 1 .ID v 3 .ID e 2 mapreduceTriplets v 1 .I aggrMsg 1 (join + map + D v 2 .ID v 3 .ID e 3 reduceBy) v 2 .I aggrMsg 2 v 3 .ID v 1 .ID e 4 D v 4 .ID v 2 .ID e 5 v 3 .I aggrMsg 3 D v 4 .ID v 1 .ID e 6 VerticesMsgsRDD NewVerticesRDD v 1 .ID v 1 .val aggrMsg 1 v 1 .ID v 1 .newval VerticesRDD join v 2 .ID v 2 .val aggrMsg 2 v 2 .ID v 2 .newval map v 1 .I v 1 .val v 3 .ID v 3 .newval v 3 .ID v 3 .val aggrMsg 3 D v 4 .ID v 4 .mewval v 4 .ID v 4 .val aggrMsg 4 v 2 .I v 2 .val D v 3 .I v 3 .val D v 4 .I v 4 .val D 16 Replace VerticesRDD with NewVerticesRDD.

HelP: High-level Primitives for Large- Scale Graph Processing Semih - PowerPoint PPT Presentation

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford University Jennifer Widom Stanford University 1 Large-scale Graph Processing 10s or 100s billion vertices and edges Distributed Shared-Nothing

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Implementing new Topology Mapping Primitives Guillermo Baltra Prior Work Primitives for

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Verilog HDL:Digital Design and Modeling Chapter 6 User-Defined Primitives Chapter 6

Beyond Block I/O: Rethinking / Traditional Storage Primitives Traditional Storage Primitives

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Large scale graph processing systems: survey and an experimental evaluation Cluster Computing

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,

Towards Effective Partition Management for Large Graphs Shengqi Yang, Xifeng Yan, Bo Zong and

Volume Rendering Volume Rendering Isosurface Generation Isosurface Generation Cludio T. Silva

Shortest Paths Dijkstras algorithm implementation negative weights References:

The Asphericity of Injective Labeled Oriented Trees Stephan Rosebrock Pdagogische Hochschule

Explicit realization of affine vertex algebras and their applications Dra zen Adamovi c

Model Checking Lower Bounds for Simple Graphs Michael Lampis KTH Royal Institute of Technology

A new method for verifying the hyperbolicity of finitely presented groups Derek Holt University

Cg: A system for programming Cg: A system for programming graphics hardware in a graphics

HelP: High-level Primitives for Large- Scale Graph Processing Semih - PowerPoint PPT Presentation

HelP: High-level Primitives for Large- Scale Graph Processing Semih Salihoglu Stanford University Jennifer Widom Stanford University 1 Large-scale Graph Processing 10s or 100s billion vertices and edges Distributed Shared-Nothing

A large-scale International IPv6 Network A large-scale International IPv6 Network www.6net.org

RenderMan Primitives RenderMan Primitives CSCD 472? Slide 1 4/5/10 Primitive Attributes

FINANCING LARGE SCALE SOLAR Large Scale Solar Conference - Sydney Gloria Chan Director, Large

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Efficient Large-Scale Graph Processing on Hybrid CPU and GPU Systems A. Gharaibeh, E.

Pregel Large-Scale Graph Processing William Jones Analysing large graphs is hard. We are

Implementing new Topology Mapping Primitives Guillermo Baltra Prior Work Primitives for

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Verilog HDL:Digital Design and Modeling Chapter 6 User-Defined Primitives Chapter 6

Beyond Block I/O: Rethinking / Traditional Storage Primitives Traditional Storage Primitives

Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms Wing

Large-Scale Machine Learning at Twitter 2 Large-Scale Machine Learning at Twitter Jimmy Lin and

INFRASTRUCTURE 2110414 Large Scale Computing Systems Natawut Nupairoj, Ph.D. Outline 2

Large scale graph processing systems: survey and an experimental evaluation Cluster Computing

Graph Indexing: Tree + Delta Delta &gt;= Graph &gt;= Graph Graph Indexing: Tree + Peixian Zhao,

Towards Effective Partition Management for Large Graphs Shengqi Yang, Xifeng Yan, Bo Zong and

Volume Rendering Volume Rendering Isosurface Generation Isosurface Generation Cludio T. Silva

Shortest Paths Dijkstras algorithm implementation negative weights References:

The Asphericity of Injective Labeled Oriented Trees Stephan Rosebrock Pdagogische Hochschule

Explicit realization of affine vertex algebras and their applications Dra zen Adamovi c

Model Checking Lower Bounds for Simple Graphs Michael Lampis KTH Royal Institute of Technology

A new method for verifying the hyperbolicity of finitely presented groups Derek Holt University

Cg: A system for programming Cg: A system for programming graphics hardware in a graphics

Graph Indexing: Tree + Delta Delta >= Graph >= Graph Graph Indexing: Tree + Peixian Zhao,