M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block - PowerPoint PPT Presentation

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model Hugo Gualdron University of Sao Paulo Robson Cordeiro University of Sao Paulo Jose Rodrigues-Jr University of Sao Paulo Duen Horng (Polo) Chau Georgia Tech Minsuk Kahng Georgia Tech U Kang Seoul National University Dezhi “Andy” Fang Georgia Tech Presenter

Internet 4+ Billion Web Pages www.worldwidewebsize.com www.opte.org 2

Citation Network 250+ Million Articles www.scirus.com/press/html/feb_2006.html#2 Modified from well-formed.eigenfactor.org 3

Many More § Twitter Who-follows-whom (310 million monthly active users) Who-buys-what (300+ million users) cellphone network § Who-calls-whom (130+ million users) Protein-protein interactions 200 million possible interactions in human genome Sources: www.selectscience.net www.phonedog.com www.mediabistro.com www.practicalecommerce.com 4

Large Graphs Are Common Graph Nodes Edges YahooWeb 1.4 Billion 6 Billion Symantec Machine-File Graph 1 Billion 37 Billion Twitter 104 Million 3.7 Billion Phone call network 30 Million 260 Million Takes Most Space 5

Scalable Graph Computation on Single Machines PageRank Runtime (s) on Twitter Graph (1.5 billion edges; 10 iterations, lower is better) Spark 428.5 Today’s single machines 128 Giraph 298 are very powerful. Cores GraphX 209.5 GraphChi 1248 Single TurboGraph 198 Machine MMap 131 (4 cores) 0 500 1000 1500 Can we do even better? McSherry, Frank, Michael Isard, and Derek G. Murray. "Scalability! But at what COST?." 15th Workshop on Hot Topics in Operating Systems (HotOS XV). 2015. Lin, Zhiyuan, et al. "Mmap: Fast billion-scale graph computation on a pc via memory mapping." Big Data (Big 6 Data), 2014 IEEE International Conference on. IEEE, 2014.

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model 7

Our Observation #1: I/O is Bottleneck Graph edges need to be stored on disk. Symantec graph: 37 billion edges, 200+ GB Disk access is much slower than RAM. Goal: Reduce I/O, especially random accesses 8

Our Observation #2: Real-world graphs are sparse. Adjacency matrix contains dense and sparse blocks https://web.stanford.edu/class/bios221/labs/networks/lab_7_networks. html Sparse Blocks Dense Blocks 9

M-Flash’s Solutions 1. Determine edge block types (dense and sparse) 2. Design efficient processing approaches for each block type 10

Determine Block Types In Pre-processing Sparse Sparse Sparse Dense Sparse, if I/O cost if treated as Sparse I/O cost if treated as Dense < 1 BlockType = Dense, otherwise 11

Dense Block Processing (Assuming all blocks are dense) = x New vertex values Old vertex values 12

I/O Cost for Dense Block Processing Each vertex is read 𝛾 times # Edge # Vertex and then written once O( 𝛾 + 1 𝑊 + 𝐹 Type equation here. + 𝛾 9 ) 𝐶 #Interval Size of per I/O Operation (= #Row = #Column) 13

Sparse Block Processing (Assuming all blocks are sparse) Destination Source Partition 1 Source Partition: Sequential Read Source Partition 2 14

Sparse Block Processing (Assuming all blocks are sparse) Destination Destination Partition 1 Partition 2 Destination Partition: Sequential Write Source 15

I/O Cost for Sparse Block Processing Edge with extended information # Vertex # Edge Type equation here. O(2 𝑊 + 𝐹 + 2|𝐹 =>?=@A=A | + 𝛾 9 ) 𝐶 #Interval Size of per I/O Operation (= #Row = #Column) 16

Bimodal Block Processing Sparse Sparse Sparse Dense Sparse, if I/O cost if treated as Sparse I/O cost if treated as Dense < 1 BlockType = Dense, otherwise 17

Large Graphs Used in Evaluation Graph Nodes Edges 5 Million 69 Million LiveJournal 41 Million 1.5 Billion Twitter 1.4 Billion 6.6 Billion YahooWeb 4 Billion 12 Billion R-Mat (Synthetic) 18

Runtime of M-Flash PageRank Runtime (s) on 6 billion edge YahooWeb Graph (1 iteration, shorter is better) 4GB M-Flash Memory MMap Size 8GB TurboGraph X-Stream GraphChi 16GB 0 1000 2000 3000 19

Dezhi “Andy” Fang Georgia Tech CS Undergrad M-Flash: http://andyfang.me Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model • Fastest single-node graph computing framework • Innovative bimodal design that addresses varying edge density in real-world graphs • M-Flash Code: https://github.com/M-Flash/m-flash-cpp • MMap Project: http://poloclub.gatech.edu/mmap/ CNPq (grant 444985/2014-0), Fapesp (grants 2016/02557-0, 2014/21483-2), Capes, NSF (grants IIS- 1563816, TWC-1526254, IIS-1217559) GRFP (grant DGE-1148903), Korean (MSIP) agency IITP (grant R0190-15-2012) 20

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block - PowerPoint PPT Presentation

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model Hugo Gualdron University of Sao Paulo Robson Cordeiro University of Sao Paulo Jose Rodrigues-Jr University of Sao Paulo Duen Horng (Polo) Chau Georgia Tech

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

Arc Flash Protection Arc Flash Protection Electrical Reliability Services Arc Flash Hazard Arc

ReFlex: Remote Flash Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis NVMW18

The Basics Of Flash Building A Web Application With Flash What is Flash? Introduction

Arc Flash Arc Flash Mitigation Mitigation Remote Racking and Switching for Arc Flash danger

Flash Presentation The flash web designs which we make are attractive to captivate your website

Design of Flash- -Based DBMS: Based DBMS: Design of Flash Design of Flash-Based DBMS: An In-

A Case for Flash Memory SSD in A Case for Flash Memory SSD in A Case for Flash Memory SSD in

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *

Flash Memory and Micro SD Card Presented by: Krishna Goyal (200601195) Anirudh Tripathi

FLASH and Its Research Communities D. Q. Lamb Flash Center for Computational Science

Explosive Astrophysics with Flash Alan Calder (alan.calder@stonybrook.edu) Sean Couch

DFS: A Filesystem for Virtualized Flash Disks 25 February 2010 William Josephson

Flash What is Flash? Multimedia platform used to add animation, video, and interactivity to

What Youll Learn Today Review, Q&A: Flash Tweening Using Flash as a multimedia

Flash Storage Disaggregation Ana Klimovic 1 , Christos Kozyrakis 1,4 , Eno Thereska 3,5 , Binu John

DiGSNP: A web tool for disease-gene-SNP prioritization Carmen Navarro 1 , Carlos Cano 1 Armando

CAYCE UPDAT E Ma y 18, 2015 @Na shville MDHA #E nvisionCa yc e AGE NDA 1. Re vie w of E

NOAA SURFRAD Current Activities NOAA GRAD: Kathleen Lantz, John Augustine, Gary Hodges, Jim

India: The steps to the future Dr Priya Shah (Hasan) Asst Professor in Physics Maulana Azad

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

A Comparison of Radial and Linear Charts for Visualizing Daily Patterns Manuela Waldner ,

Radial Projection Techniques InfoVis SS2020 G4 12 05 2020 Radial Projection Basics Also

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block - PowerPoint PPT Presentation

M-Flash: Fast Billion-Scale Graph Computation Using a Bimodal Block Processing Model Hugo Gualdron University of Sao Paulo Robson Cordeiro University of Sao Paulo Jose Rodrigues-Jr University of Sao Paulo Duen Horng (Polo) Chau Georgia Tech

2004: Poisson Matting 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: Flash/No-Flash 2004: The

Arc Flash Protection Arc Flash Protection Electrical Reliability Services Arc Flash Hazard Arc

ReFlex: Remote Flash Local Flash Ana Klimovic Heiner Litz Christos Kozyrakis NVMW18

The Basics Of Flash Building A Web Application With Flash What is Flash? Introduction

Arc Flash Arc Flash Mitigation Mitigation Remote Racking and Switching for Arc Flash danger

Flash Presentation The flash web designs which we make are attractive to captivate your website

Design of Flash- -Based DBMS: Based DBMS: Design of Flash Design of Flash-Based DBMS: An In-

A Case for Flash Memory SSD in A Case for Flash Memory SSD in A Case for Flash Memory SSD in

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it &amp; why do we use it? *

Flash Memory and Micro SD Card Presented by: Krishna Goyal (200601195) Anirudh Tripathi

FLASH and Its Research Communities D. Q. Lamb Flash Center for Computational Science

Explosive Astrophysics with Flash Alan Calder (alan.calder@stonybrook.edu) Sean Couch

DFS: A Filesystem for Virtualized Flash Disks 25 February 2010 William Josephson

Flash What is Flash? Multimedia platform used to add animation, video, and interactivity to

What Youll Learn Today Review, Q&amp;A: Flash Tweening Using Flash as a multimedia

Flash Storage Disaggregation Ana Klimovic 1 , Christos Kozyrakis 1,4 , Eno Thereska 3,5 , Binu John

DiGSNP: A web tool for disease-gene-SNP prioritization Carmen Navarro 1 , Carlos Cano 1 Armando

CAYCE UPDAT E Ma y 18, 2015 @Na shville MDHA #E nvisionCa yc e AGE NDA 1. Re vie w of E

NOAA SURFRAD Current Activities NOAA GRAD: Kathleen Lantz, John Augustine, Gary Hodges, Jim

India: The steps to the future Dr Priya Shah (Hasan) Asst Professor in Physics Maulana Azad

Instance Based Learning k -Nearest Neighbor Locally weighted regression Radial basis

A Comparison of Radial and Linear Charts for Visualizing Daily Patterns Manuela Waldner ,

Radial Projection Techniques InfoVis SS2020 G4 12 05 2020 Radial Projection Basics Also

Classes of Herz-Schur multipliers Ivan Todorov April 2014 Toronto Content Positive multipliers

Basics of Off-Camera Flash Off-Camera Flash www.jedi.com * What is it & why do we use it? *

What Youll Learn Today Review, Q&A: Flash Tweening Using Flash as a multimedia