Analyzing the Graph-Processing Pipeline: A comparative study of - PowerPoint PPT Presentation

Jan 25, 2024 •332 likes •482 views

Analyzing the Graph-Processing Pipeline: A comparative study of GraphLab and GraphX An open source project study Presented by Niko Stahl for R212 Context GraphLab (execution engine: Powergraph) is exclusively built for graph processing.

Analyzing the Graph-Processing Pipeline: A comparative study of GraphLab and GraphX An open source project study Presented by Niko Stahl for R212
Context ● GraphLab (execution engine: Powergraph) is exclusively built for graph processing. ● GraphX is built on top of Spark.
Quick Intro: GraphX and Spark What makes it competitive? ● Spark facilitates in-memory computation on clusters. ● The main abstraction: RDDs (Resilient Distributed Datasets) ● RDDs maintain fault tolerance ● The caching of RDDs can greatly speed-up algorithms that exhibit data reuse (e.g. PageRank)
Context ● GraphX combines the advantages of data-parallel and graph-parallel systems.
Why is it useful to combine data-parallel and graph- parallel features? A typical graph-processing pipeline requires moving between different views of the same data. http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html
Context Switching: GraphX preferred http://spark.apache.org/docs/0.9.0/graphx-programming-guide.html
Performance: GraphLab preferred Xin et al., 2013: GraphX: A Resilient Distributed Graph System on Spark 16 node Amazon EC2 cluster Each node 8 virtual cores 68GB memory Graph: 4.8M vertices, 69M edges
Project Motivation “We believe that the loss in performance may, in many cases, be ameliorated by the gains in productivity achieved by the GraphX system .” - Xin et al., 2013
Project Significance ● GraphLab released GraphLab Create earlier this year ● Goal of the project is to introduce a tabular data structure (SFrame) to GraphLab ● SFrame are similar to R/pandas data frames but stored on disk. ● To the best of my knowledge, there are no direct comparisons between GraphLab Create and GraphX.
Project Aim - In Detail ● Compare the efficiency and usability of GraphLab Create vs. GraphX in a realistic scenario . ● The pipeline I will evaluate: 1. transform (Filter pages of a certain language) 2. process (PageRank) 3. summarize (top k most influential pages)
Project Evaluation ● Experiments will take place on an Amazon EC2 cluster ● Each stage will be evaluated according to: 1. Execution Time 2. Programming effort (lines of code, flexibility of API)
Expected Outcome stage performance programming effort 1. transform GraphX (?) ? 2. process GraphLab ? 3. summarize GraphX (?) ?
Project Challenges ● How objective is a comparison on Amazon EC2? -> Every time you launch a cluster you get different machines. ● How do you objectively evaluate programming effort? -> Lines of code is contrived. This will be a subjective evaluation.
Project Status ● I have launched GraphX on AmazonEC2 and have run stand-alone Scala applications with GraphX. ● Next Steps: 1. Setup preliminary GraphX experiments 2. Setup preliminary GraphLab Create experiments 3. Evaluate how comparable each stage is 4. Tune experiments and run repeatedly on Amazon EC2 to get statistics

Recommend

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and Graph

1.28k views • 60 slides

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis Report 03 February 2012 Deliverable Title D3.2 Comparative Analysis Report Filename WP3 EX-POST Case studies: Comparative Analysis Report

805 views • 58 slides

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and Networks and Networks Bonnie Berger MIT Comparative Genomics Comparative Genomics Look at the same kind of data across species with the hope that

1.06k views • 59 slides

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying General Graph Processing Offline analytics Online querying 2 / 75 Graph Data are Very Common Internet 3 / 75 Graph Data are Very Common Social

986 views • 75 slides

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri Outline Distributed Graph Processing Gelly: Batch Graph Processing with Flink Gelly-Stream: Continuous Graph Processing with Flink WHEN

1.12k views • 90 slides

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and

1.16k views • 48 slides

Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at>

Natural Language Processing: Traditional Processing Pipeline SCIENCE PASSION TECHNOLOGY Natural Language Processing: Traditional Processing Pipeline Roman Kern <rkern@tugraz.at> 2020-03-19 Roman Kern <rkern@tugraz.at>, Institute

1.36k views • 114 slides

Office of Pipeline Safety Office of Pipeline Safety Presentation on Presentation on Damage

Office of Pipeline Safety Office of Pipeline Safety Presentation on Presentation on Damage Prevention Damage Prevention Office of Pipeline Safety Office of Pipeline Safety http://primis.phmsa.dot.gov/rd/ Damage Prevention Research Damage

294 views • 13 slides

Ma Magic Mountain Pipeline Phase 6 gic Mountain Pipeline Phase 6 Project ject Board Meeting

Ma Magic Mountain Pipeline Phase 6 gic Mountain Pipeline Phase 6 Project ject Board Meeting November 6, 2018 Magic Mountain Pipeline (MMP), Phases 4 6 Existing MMP, Ph3 2,400 ft 2,900 ft 2 Magic Mountain Pipeline Phase 6 Project Magic

183 views • 6 slides

Internal Pipeline Corrosion Kenneth Lee Pipeline Safety Director, Engineering & Research

Internal Pipeline Corrosion Kenneth Lee Pipeline Safety Director, Engineering & Research Trust Conference PHMSA Office of Pipeline Safety November 2, 2017 1 Steel Pipeline Corrosion Iron ore Steel Iron ore Outside coated,

428 views • 18 slides

Pipeline Construction Pipeline Construction Challenges Challenges NAPCA Workshop August 19,

U.S. Department of Transportation Pipeline and Hazardous Materials Safety Administration Pipeline Construction Pipeline Construction Challenges Challenges NAPCA Workshop August 19, 2010 Houston, Texas Kenneth Y. Lee Office of Pipeline

507 views • 50 slides

Pipeline A Presentation by Team Pipeline Ben Lai Brandon Bakhshai Jeffrey Serio Somya

Pipeline A Presentation by Team Pipeline Ben Lai Brandon Bakhshai Jeffrey Serio Somya Vasudevan What is pipeline Pipeline is an asynchronous programming language that uses an event-driven architecture. Pipelines event-loop is powered by

330 views • 20 slides

1,000 foot pipeline Connect Replacement (Saugus 3 and 4) Wells to Magic Mountain Pipeline

Magic Mountain Water Pipeline Installation Agreement Amendment Commerce Center Drive Pipeline Background 1,000 foot pipeline Connect Replacement (Saugus 3 and 4) Wells to Magic Mountain Pipeline Five Point will oversee construction

60 views • 3 slides

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework Framework on FPGA on FPGA Nina Engelhardt August 31, 2016 Graphs and Graph Traversal Algorithms 1 Vertex-centric Programming Model: From POV of

262 views • 9 slides

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

S6287 PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA application with the Windows Performance Monitor Richard Wilton Department of Physics and Astronomy Johns Hopkins University S6287: Analyzing

279 views • 25 slides

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

DataCamp Analyzing Survey Data in R ANALYZING SURVEY DATA IN R What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing Survey Data in R Survey data Have you ever found yourself analyzing a dataset that

414 views • 27 slides

About us The Data Centre & Analytics Lab (DCAL) is a centre of excellence set up by the Indian

About us The Data Centre & Analytics Lab (DCAL) is a centre of excellence set up by the Indian Institute of Management, Bangalore in the year 2012. Housed at IIMB campus, DCAL is a dedicated research and training lab to support

434 views • 18 slides

Building robust machine learning systems Or, how to sleep well when running machine learning

@sjwhitworth Building robust machine learning systems Or, how to sleep well when running machine learning systems in production ravelin.com Building robust machine learning systems Me & Ravelin Co-founder and engineer at Ravelin -

605 views • 33 slides

2019 Research Experience for Undergraduates Detection of Data Poisoning Attacks on Image

2019 Research Experience for Undergraduates Detection of Data Poisoning Attacks on Image Classification Models Harsh Kachhadia Co-Advisors: Dr. Amin Alipour and Dr. Ioannis Kakadiaris Motivation Deep Learning models, due to their

204 views • 19 slides

From Zero to AI Hero Presented by: Kevin

W4 Test Analytics, AI/ ML Wednesday, October 2nd, 2019 11:30 AM From Zero to AI Hero Presented by: Kevin Pyles

691 views • 28 slides

Journey Through China Summer 2017 Nathan Greenlee Immediately when we arrived in Chengdu, we

Journey Through China Summer 2017 Nathan Greenlee Immediately when we arrived in Chengdu, we began exploring the city. We stayed there overnight in a hotel and went to an amusement park called Happy Valley the next day. The next day, Rays

232 views • 21 slides

Principles of Chinese Foreign Policy LIAO Liqiang Ambassador of the Peoples Republic of China

Principles of Chinese Foreign Policy LIAO Liqiang Ambassador of the Peoples Republic of China in the Kingdom of Belgium Principles of Chinese Foreign Policy I.What kind of country is China Principles of Chinese Foreign Policy I. What kind

1k views • 77 slides

C? Andrew Aday, Amol Kapoor, Jonathan Zhang Overview - Background - Implementation - Syntax

C? Andrew Aday, Amol Kapoor, Jonathan Zhang Overview - Background - Implementation - Syntax - Program Structure - Features - Libraries - Math - DEEP - Demo Design Goals - Languages are made or broken by their libraries -

381 views • 14 slides

The New Inspection Arrangements Regional Divisional Managers Sheila Brown South Mike

The New Inspection Arrangements Regional Divisional Managers Sheila Brown South Mike Raleigh - North Roger Shippam Midlands The New Inspection Arrangements Why change? Inspection is constantly under review. The Green

729 views • 71 slides