A Distributed Multi-GPU System for Fast Graph Processing Z. Jia, Y. - PowerPoint PPT Presentation

Dec 08, 2022 •224 likes •354 views

A Distributed Multi-GPU System for Fast Graph Processing Z. Jia, Y. Kwon, G. Shipman, P. McCormick, M. Erez, A. Aiken Presented by Oliver Hope 1 / 11 What is Lux? / Contributions of paper Computational Model: 2 execution models A dynamic

A Distributed Multi-GPU System for Fast Graph Processing Z. Jia, Y. Kwon, G. Shipman, P. McCormick, M. Erez, A. Aiken Presented by Oliver Hope 1 / 11
What is Lux? / Contributions of paper Computational Model: 2 execution models A dynamic repartitioning strategy A performance model for parameter choice Implementation: Working code Benchmarked on different algorithms Comparisons to different platforms 2 / 11
Motivation / Prior Work Lux: A graph processing framework to run on multi-GPU clusters Prior work for: Prior work cannot be adapted easily to GPU clusters 3 / 11 ◮ Single-node CPU ◮ Distributed CPU ◮ Single-node GPU ◮ Data placement (heterogeneous memories) ◮ Optimisation interference ◮ Load-balancing does not map across from CPUs
Abstraction Iteratively modifjes subset of graph until convergence Edges and vertices have properties 3 stateless functions to implement: 4 / 11 ◮ void init(Vertex v, Vertex v old ) ◮ void compute(Vertex v, Vertex u old , Edge e) ◮ bool update(Vertex v, Vertex v old )
Abstraction: Pull vs Push Does not require additional synchronisation Takes advantage of GPU caching and aggregation Better for rapidly changing frontiers 5 / 11
Task Execution Pull-based: Push-based: Computation can overfmow to CPU+DRAM if not enough space 6 / 11 ◮ Single GPU kernel for all steps ◮ Scan-based gather to resolve load imbalance ◮ Separate kernel for all 3 steps ◮ All updates have to use device memory to avoid races
Graph Partitioning Lux uses Edge partitioning Idea: Assign equal number of edges to each partition Each partition holds contiguously numbered vertices and the edges pointing to them So GPU can coalesce reads and writes to consecutive memory Very fast to compute (e.g. vs vertex-cut) 7 / 11
Dynamic Repartitioning transfer) 3. Globally repartition depending on 2 4. Local repartition 8 / 11 Figure: Estimates of f ( x ) = � x i =0 w i used to pick pivot vertices. 1. Collect t i per P i , update f , calculate partitioning 2. Compare ∆ gain ( G ) (improvement) vs ∆ cost ( G ) (inter-node
Performance Model To preselect an execution model and runtime confjguration Models performance for a single iteration Sums together estimates for: 1. Load time 2. Compute time 3. Update time 4. Inter-node transfer time 9 / 11
Evaluation Different hardware used for shared memory and GPU testing. Tried to get best attainable performance from every system. 10 / 11
Criticisms Abstract claims up to 20x speedup over shared-memory systems (more like 5-10) “Most popular graph algorithms can be expressed in Lux” Does not assess what cannot be. “For many applications … identical implementation for both push and pull” Did not test the overfmow processing to CPU feature For evaluation all parameters were highly tuned. Can’t guarantee others were as tuned as Lux. 11 / 11

Recommend

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of multiple GPUs NCCL Multiple GPUs per system 1 GPU Multiple systems connected NCCL : N VIDIA C ollective C ommunication L ibrary 2 MULTI-GPU DL

1.39k views • 19 slides

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ OVERVIEW Motivation Tools of the trade Multi-GPU driver functions Multi-GPU programming functions Multi threaded multi GPU renderer General

553 views • 43 slides

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Status of GPU offloading on Wayland Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland How to do GPU offloading 1 GPU offloading with X DRI2 2 GPU offloading with Wayland 3 and XWayland? 4

427 views • 29 slides

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs. CPU Why to Learn About GPU? NVIDIA GPU relative performances Why to Learn About GPU? Hardware Why to Learn About GPU? Interactive rendering

852 views • 46 slides

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8, 2017 Why super GPU is needed Extending CUDA view into clusters Why super GPU is needed Extending CUDA view into clusters Example: Sparse Matrix

484 views • 13 slides

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing Unit (GPGPU) management Today GPU architecture GPU programming model Challenges Real-Time GPU management 2 History GPU

834 views • 66 slides

Gunrock: A Fast and Programmable Multi- GPU Graph Processing Library Yangzihao Wang and Yuechao

Gunrock: A Fast and Programmable Multi- GPU Graph Processing Library Yangzihao Wang and Yuechao Pan with Andrew Davidson, Yuduo Wu, Carl Yang, Leyuan Wang, Andy Riffel and John D. Owens University of California, Davis {yzhwang,

650 views • 36 slides

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS ARCHITECTURES GPU 0 GPU 1 GPU 2 CPU GPU 0 GPU 1 GPU 2 MEM MEM MEM SYS MEM 2 UNIFIED MEMORY FUNDAMENTALS Single Pointer CPU code GPU code void

870 views • 70 slides

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team Lead Alexander Soklev, RT GPU R&D Agenda Recent improvements in RT GPU Rounded edges MDL material support Next-gen GPU

534 views • 24 slides

MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA

April 4-7, 2016 | Silicon Valley MULTI GPU PROGRAMMING WITH MPI Jiri Kraus, Senior Devtech Compute, April 4th 2016 MPI+CUDA System System System GDDR5 Memory GDDR5 Memory GDDR5 Memory Memory Memory Memory GPU GPU GPU CPU CPU

1.1k views • 77 slides

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

Graph Mining and Graph Kernels GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan* ^University of Cambridge *IBM T. J. Watson Research Center August 24, 2008 | ACM SIG KDD, Las Vegas Graph Mining and Graph

1.28k views • 60 slides

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri

Batch & Stream Graph Processing with Apache Flink Vasia Kalavri vasia@apache.org @vkalavri Outline Distributed Graph Processing Gelly: Batch Graph Processing with Flink Gelly-Stream: Continuous Graph Processing with Flink WHEN

1.12k views • 90 slides

Performance-driven system Performance-driven system generation for distributed generation for

Performance-driven system Performance-driven system generation for distributed generation for distributed vertex-centric vertex-centric graph processing graph processing on multi-FPGA systems on multi-FPGA systems Nina Engelhardt C.-H.

435 views • 14 slides

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying

Graph Data Processing M. Tamer Ozsu 1 / 75 Outline Introduction RDF Graph Querying General Graph Processing Offline analytics Online querying 2 / 75 Graph Data are Very Common Internet 3 / 75 Graph Data are Very Common Social

986 views • 75 slides

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward Fireside Chat Story so far What mindset do you need? Whats different about this market? Fast Reward 2 Fireside Chat Clear

508 views • 12 slides

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms This week Example program Memory model

750 views • 36 slides

Affordable Access to Clean and Efficient Energy Initiative Working Group EEAC M EETING July 13,

Creating A Cleaner Energy Future For the Commonwealth Affordable Access to Clean and Efficient Energy Initiative Working Group EEAC M EETING July 13, 2016 Affordable Access to Clean and Efficient Energy Initiative Baker Polito

513 views • 8 slides

The Rapid Rise of Dual Credit: Understanding the Positive Impact on Grades and College

Kentucky Council on Postsecondary Education Data Webinar The Rapid Rise of Dual Credit: Understanding the Positive Impact on Grades and College Persistence To explore our data materials, visit cpe.ky.gov. Todays focus: Student outcomes at

315 views • 8 slides

Transport at COP21 Cooperation between the FIA Foundation and UNEP to promote sustainable and

Transport at COP21 Cooperation between the FIA Foundation and UNEP to promote sustainable and efficient transportation Mark Radka, Chief, Energy, Climate, and Technology Branch UN Environment Programme Climate Change and Transportation The

166 views • 13 slides

part of the quality of care for small and sick newborns Dr Ornella Lincetto, WHO Quality of

Purpose of the standards as part of the quality of care for small and sick newborns Dr Ornella Lincetto, WHO Quality of health services: problem magnitude 295,000 mothers and 2.5 million newborns die annually around the time of childbirth

557 views • 11 slides

Mission, Market or Madness? Mission, Market or Madness? Twenty years from now you will be more

Mission, Market or Madness? Mission, Market or Madness? Twenty years from now you will be more disappointed by the things that you didnt do than by the ones you did. So throw off the bowlines. Sail away from the safe harbor. Catch the

334 views • 21 slides

MAKING RECOVERY REAL RECOVERY & WELLBEING: Different words, same agenda Mary OHagan

MAKING RECOVERY REAL RECOVERY & WELLBEING: Different words, same agenda Mary OHagan Welcome to the friendly place ORIGINS OF RECOVERY First post-institutional philosophy Generic recovery movement = personal development Psychiatric

522 views • 24 slides

Psychosis 2.0 Conference Presentation Sumac Place: September 10 th , 2014 Vancouver: September 29

Psychosis 2.0 Conference Presentation Sumac Place: September 10 th , 2014 Vancouver: September 29 th , 2014 Psychosis 2.0: New understandings and effective ways of working with and healing from psychosis Welcome and Introductions Sumac

339 views • 20 slides

+ Money Madness CHS Counseling Department + Centennial School Counseling Maru-Sam A-C Ms.

+ Money Madness CHS Counseling Department + Centennial School Counseling Maru-Sam A-C Ms. Freeman Ms. Beaty beattyz@fultonschools.org freemanma@fultonschools.org San-Z Ms. Peart D-Gol & AVID peart@fultonschools.org Ms. Marino Grad

212 views • 18 slides