Memory-Optimized Distributed Graph Processing through Novel - PowerPoint PPT Presentation

Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia Papakonstantinopoulou Joint work with Panagiotis Liakos and Alex Delis University of Athens Athens Colloquium in Algorithms and Complexity UoA, August 26 th , 2016

Motivation graph data whose size continuously grows ⇓ distributed graph processing systems (e.g., Pregel & Apache Giraph) scale of real-world graphs hardens graph processing even in distributed environments ⇓ we need efficient distributed representations of such graphs ⇓ We address this problem by exploiting empirically-observed properties demonstrated by behavior graphs UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Intro – Motivation 2/19

Distributed graph-processing approaches and systems Most of these approaches adopt the “ think like a vertex ” programming paradigm introduced with Pregel [5] ( ⇒ intuitive parallelizable algorithms). A graph partitioned on a vertex basis in a distributed environment: Worker 1 Worker 2 6 1, 8 Worker 3 4 5 3 1, 7 2 1 8 2, 3, 4 7 2 7 3,4,5 8 The proposed frameworks [1, 6, 4, 7] fail to handle the huge scale of real-world graphs, as a result of ineffective memory usage [2]. The partitioning hardens the task of compression. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Intro – Motivation 3/19

Indicative memory optimization approaches The distributed graph-processing systems face significant memory-related issues. Some memory optimization approaches are: Apache Giraph [1] (graph processing system that follows Pregel ) with contributions by Facebook focused entirely on a more careful implementation for the representation of the out-edges of a vertex, without exploiting the redunduncy in real-world graphs Gbase [3] (a number of alternative compression techniques to reduce storage and hence network traffic and query execution time) does not follow the vertex-centric model requires decompression UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Intro – Motivation 4/19

Contribution We follow the Pregel paradigm and partition the graph vertices among the nodes of a distributed computing environment. In this context, we present a number of novel techniques that: 1 offer space efficient-representations of the out-edges of vertices, 2 allow fast mining (in-situ) of the graph elements without the need of decompression, 3 enable the execution of graph algorithms in memory-constrained settings, and 4 ease the task of memory management, thus allowing faster execution. Our work lies in the intersection of distributed graph processing systems and compressed graph representations. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Intro – Motivation 5/19

Distributed vs non-distributed settings In non-distributed settings, we can exploit the fact that vertices tend to exhibit similarities (copy property). In order to achieve memory optimization, we need representations that allow mining of the graph’s elements without decompression. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Intro – Motivation 6/19

Giraph structures for out-neighbors Giraph ’s adjacency-list representations: ByteArrayEdges vertex id 1 weight 1 vertex id 2 weight 2 . . . size 1 size 2 size 1 size 2 The bytes required per out-neighbor are determined by the data type used for its id and weight; for integer numbers 4+4=8 bytes are required. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Intro – Motivation 7/19

Representations based on consecutive out-edges Consider the following sequence of neighbors to be represented: (2 , 9 , 10 , 11 , 12 , 14 , 17 , 18 , 20 , 127) . BVedges (1) 2 (9) 2 γ (0) 4 bytes 4 bytes � �� number interval of intervals ζ (13) ζ (11) ζ (2) ζ (0) ζ (1) ζ (106) � �� residuals In the context of graph compression, Elias’ γ coding is preferred for the representation of rather small values of x , whereas ζ coding is more proper for potentially large values. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Our Approach 8/19

Representations based on consecutive out-edges Consider again the sequence of neighbors: (2 , 9 , 10 , 11 , 12 , 14 , 17 , 18 , 20 , 127) . IntervalResidualEdges (2) 2 (9) 2 (4) 2 (17) 2 (2) 2 4 bytes 4+1 bytes 4+1 bytes � �� 1st interval 2nd interval number of intervals (2) 2 (14) 2 (20) 2 (127) 2 4 bytes 4 bytes 4 bytes 4 bytes � �� residuals UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Our Approach 9/19

Representation based on concentration of out-edges Again using the sequence: (2 , 9 , 10 , 11 , 12 , 14 , 17 , 18 , 20 , 127) . IndexedBitArrayEdges . . . (0) 2 (1) 2 (2) 2 (15) 2 4 bytes 1 byte UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Our Approach 10/19

Experimental Evaluation How much more space-efficient is each of our three compressed out-edge representations compared to ByteArrayEdges ? Are our techniques competitive speed-wise when memory is not a concern? How much more efficient are our compressed representations when the available memory is constrained? Can we execute algorithms for large graphs in settings where it was not possible before? UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Experimental Evaluation 11/19

Space Efficiency Comparison Memory requirements of Giraph ’s ByteArrayEdges and our three representations for the small and large-scale graphs of our dataset: ByteArray- BVEdges IntervalRe- IndexedBit- graph Edges sidualEdges ArrayEdges 22.61MB 6.41MB 7.92MB 8.91MB uk-2007-05@100000 279.16MB 67.36MB 82.7MB 97.79MB uk-2007-05@1000000 1,511.67MB 442.34MB 646.03MB 554.23MB indochina-2004 1,381.91MB 287.53MB 613.52MB 676.88MB hollywood-2011 2,733.6MB 1,092.82MB 1,224.07MB 1,255.67MB uk-2002 4,820.09MB 1,428.97MB 1,674.75MB 1,849.83MB arabic-2005 7,401.88MB 2,383.54MB 2,728.74MB 2,928.81MB uk-2005 14,829.64MB 4,889.85MB 5,657.79MB 6,354.17MB sk-2005 UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Experimental Evaluation 12/19

Execution Time Comparison (small-scale graphs) Execution time of PageRank algorithm for graph indochina-2004 using a setup of 2 , 4 , and 8 workers: ByteArrayEdges 45 BVEdges IntervalResidualEdges IndexedBitArrayEdges 40 Execution time (in minutes) 35 30 25 20 15 10 5 0 2 workers 4 workers 8 workers UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Experimental Evaluation 13/19

Execution Time Comparison (large-scale graphs) Execution time for each superstep of PageRank algorithm for graph uk-2005 using 5 workers: 10 Execution time (in minutes) 8 6 4 2 ByteArrayEdges BVEdges IntervalResidualEdges IndexedBitArrayEdges 0 0 5 10 15 20 25 30 Supersteps of PageRank execution ByteArrayEdges performance fluctuates due to extensive garbage collection. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Experimental Evaluation 14/19

Execution Time Comparison (large-scale graphs) Execution time of PageRank algorithm for graph uk-2005 : ByteArrayEdges 200 BVEdges IntervalResidualEdges IndexedBitArrayEdges Execution time (in minutes) 150 FAILED 100 50 0 uk-2005 (5 workers) uk-2005 (4 workers) IntervalResidualEdges and IndexedBitArrayEdges outperform ByteArrayEdges . UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Experimental Evaluation 15/19

Conclusion Our experimental results indicate significant improvements on space-efficiency for all proposed techniques. We reduced memory requirements up to 5 times in comparison with currently applied techniques. In settings where earlier approaches were able to execute graph algorithms, we achieve significant performance improvements. We reduced execution time up to 31 % due to memory optimization. These findings establish our structures as the preferable option for web graphs, or any other type of behavior graphs. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Experimental Evaluation 16/19

Future Directions Design representation methods that favor mutations of the graph. Examine the compression of edge weights. UoA Katia Papakonstantinopoulou Distributed Graph Compression- • Future Directions 17/19

Memory-Optimized Distributed Graph Processing through Novel - PowerPoint PPT Presentation

Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia Papakonstantinopoulou Joint work with Panagiotis Liakos and Alex Delis University of Athens Athens Colloquium in Algorithms and Complexity UoA, August 26

Mode-dependent Rate-distortion Optimized Transforms Using Graph Signal Processing Methods

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Multiple- -Writer Distributed Memory Writer Distributed Memory Multiple The Sequential

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Management Memory Manager Requirements Minimize primary memory access time

F F Fast Transforms using the Cell/B.E. Processor Fast Transforms using the Cell/B.E. Processor

Uncovering SAP vulnerabilities: Reversing and breaking the Diag protocol Martin Gallo Core

Opening Exercise Suppose that you are given three integers in int variables. Describe a way to

GPU-Acceleration of In-Memory Data Analytics Evangelia Sitaridi AWS Redshift GPUs for Telcos

Re-think Data Management Software Design Upon the Arrival of Storage Hardware with Built-in

Application compartmentalization Conventional gunzip Compartmentalized gunzip UNIX process UNIX

Exact JPEG recompression and forensics using interval arithmetic Andrew B. Lewis and Markus G.

p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing In forensic voice comparison,

Memory-Optimized Distributed Graph Processing through Novel - PowerPoint PPT Presentation

Memory-Optimized Distributed Graph Processing through Novel Compression Techniques Katia Papakonstantinopoulou Joint work with Panagiotis Liakos and Alex Delis University of Athens Athens Colloquium in Algorithms and Complexity UoA, August 26

Mode-dependent Rate-distortion Optimized Transforms Using Graph Signal Processing Methods

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Distributed Shared Memory 1 Distributed Shared Memory Making the main memory of a cluster of

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Batch &amp; Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

Multiple- -Writer Distributed Memory Writer Distributed Memory Multiple The Sequential

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Personal SE Computer Memory Addresses C Pointers Computer Memory Organization Memory is a

Memory Management Memory Manager Requirements Minimize primary memory access time

F F Fast Transforms using the Cell/B.E. Processor Fast Transforms using the Cell/B.E. Processor

Uncovering SAP vulnerabilities: Reversing and breaking the Diag protocol Martin Gallo Core

Opening Exercise Suppose that you are given three integers in int variables. Describe a way to

GPU-Acceleration of In-Memory Data Analytics Evangelia Sitaridi AWS Redshift GPUs for Telcos

Re-think Data Management Software Design Upon the Arrival of Storage Hardware with Built-in

Application compartmentalization Conventional gunzip Compartmentalized gunzip UNIX process UNIX

Exact JPEG recompression and forensics using interval arithmetic Andrew B. Lewis and Markus G.

p(E|H p(E|H p ) p ) p(E|H p(E|H d ) d ) Need for testing In forensic voice comparison,

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri