X-Stream: Edge-centric Graph Processing using Streaming Partitions - PowerPoint PPT Presentation

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel

Context Approach Model Implementation Results & Conclusion

Pregel & Powergraph: scatter & gather → A scatter-gather methodology: 1. scatter(vertex v): send updates over outgoing edges of v 2. gather(vertex v): apply updates from inbound edges of v → how to scale-up?

Trade-off: Sequential vs Random access

GraphChi: a sequential approach → avoids random access using shards Problems: 1. need graph to be pre-sorted by source vertex 2. vertex-centric 3. requires re-sort of edges by destination vertex for gather step

X-Stream’s Approach 1. retain scatter-gather programming model 2. use an edge-centric implementation 3. stream unordered edge lists Gains: 1. use sequential ( not random) access 2. do not need pre-processing step

scatter-gather : an edge-centric implementation scatter(edge e): send update over e gather(update u): apply update u to u.destination

Quick Terminology Fast Storage: → caches (in-memory) → main-memory (out-of-core) Slow Storage: → main-memory (in-memory) → SSD/Disk (out-of-core)

The basic model: input : an unordered set of directed edges Apply Scatter Apply Gather API : implementations of scatter/gather for given edges

Problem: vertices may not fit in fast storage

Problem: vertices may not fit in fast storage → Streaming partitions: - vertex set, V: a subset of the vertices of the graph - edge list: source is ∈ V - update list: dest ∈ V → How do we use them? 1. scatter/gather iterate over streaming partitions 2. updates need to be shuffled

Stream buffer Index Array (K entries) Chunck Chunck Array

Out-of-core In-memory → Folds shuffle into scatter → Parallel multi-stage shuffler & scatter/gather run scatter, appending updates to an in- stream independently for each ● ● memory buffer streaming partition when buffer full: run an in-memory work stealing ● ● shuffle group partitions together into a tree for ● the shuffler → 2 Stream Buffers → 3 stream buffers → Number of partitions → Number of partitions N/K + 5SK <= M = CPU_cache_size / footprint → Disk I/O

Chaos: the extension of X-Stream → Scale out to multiple machines in 1 cluster 2 gains: 1. access secondary storage in parallel improves performance 2. increases size of graph that can be handled

Chaos: the extension of X-Stream → Steps: 1. simple initial partitioning 2. spread graph data uniformly over all 2nd storage devices 3. work stealing Assumptions : 1. network machine-to-machine bandwidth > bandwidth of storage device 2. network switch bandwidth > aggregate bandwidth of all storage devices of cluster

Experiments: → Tested on real-world graphs.

Scalability

Comparison

Comparison: Ligra

Comparison: Graphchi

Conclusion & Takeaway Strengths : → Sequential access → Scale up & scale out Weaknesses → Limited number of problems it can handle → Limited types of graphs it can handle → How would you use in a real-world scenario

X-Stream: Edge-centric Graph Processing using Streaming Partitions - PowerPoint PPT Presentation

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel Context Approach Model Implementation Results & Conclusion Pregel & Powergraph: scatter & gather A

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic,

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

X-Stream: Edge-centric Graph Processing using Streaming Partitions AMITABHA ROY, IVO MIHAILOVIC,

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

Embedded Software Streaming Embedded Software Streaming via Block Stream via Block Stream A

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

Building a Graph Processing System Amitabha Roy (LABOS) 1 X-Stream Graph processing system

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Parallel Triangle Counting and K-Truss Identification Using Graph-Centric Methods Chad Voegele,

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Auto-sizing for Stream Processing Applications at LinkedIn Rayman Preet Singh, Bharath

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget Under the

Rich Identity Provisioning Agenda Introduction Research questions Related work RIP

A Comprehensive Framework for Testing Database-Centric Software Applications Gregory M.

An Architecture-Centric Approach for Software Engineering with for Software Engineering with

Sarama Resources Investment Highlights Two project areas geologically and strategically well

FY FY16 results presentation 24 November 2016 Andrew Rashbass, CEO Co Context Strategy on

Info- -Centric Scenario Development Centric Scenario Development Info Presentation to 19 th

Programming-Model Centric Debugging for OpenMP Kevin Pouget Jean-Fran cois M ehaut, Miguel

X-Stream: Edge-centric Graph Processing using Streaming Partitions - PowerPoint PPT Presentation

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic, Willy Zwaenepoel Context Approach Model Implementation Results & Conclusion Pregel & Powergraph: scatter & gather A

Cloud Cloud Cloud Cloud network Edge Edge Edge Edge as a Edge Edge Edge Edge Edge

Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge Get the edge

X-Stream: Edge-centric Graph Processing using Streaming Partitions Amitabha Roy, Ivo Mihailovic,

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

X-Stream: Edge-centric Graph Processing using Streaming Partitions AMITABHA ROY, IVO MIHAILOVIC,

Stream Processing Marco Serafini COMPSCI 532 Lecture 5 Stream vs. Batch Processing Batch

Batch &amp; Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Training Presentation Web Streaming Introduction What is Web Streaming? Who is Streaming?

Embedded Software Streaming Embedded Software Streaming via Block Stream via Block Stream A

Edge-based Segmentation Transform Hough Edge Tracking Linking Edge Detection Canny Edge

Building a Graph Processing System Amitabha Roy (LABOS) 1 X-Stream Graph processing system

? sync ref chosen as sync source by Listener Stream B: Presentation Stream C: timestamps

Parallel Triangle Counting and K-Truss Identification Using Graph-Centric Methods Chad Voegele,

2 Workloa d? 3 OLTP 4 OLAP OLTP 4 OLAP OLTP Streaming 4 Scan- OLAP OLTP Streaming

Auto-sizing for Stream Processing Applications at LinkedIn Rayman Preet Singh, Bharath

Graph Distances in the Streaming Model Joan Feigenbaum Sampath Kannan Andrew McGregor Siddharth

Programming-Model Centric Debugging for Multicore Embedded Systems Kevin Pouget Under the

Rich Identity Provisioning Agenda Introduction Research questions Related work RIP

A Comprehensive Framework for Testing Database-Centric Software Applications Gregory M.

An Architecture-Centric Approach for Software Engineering with for Software Engineering with

Sarama Resources Investment Highlights Two project areas geologically and strategically well

FY FY16 results presentation 24 November 2016 Andrew Rashbass, CEO Co Context Strategy on

Info- -Centric Scenario Development Centric Scenario Development Info Presentation to 19 th

Programming-Model Centric Debugging for OpenMP Kevin Pouget Jean-Fran cois M ehaut, Miguel

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri