Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid - PowerPoint PPT Presentation

Jan 09, 2024 •229 likes •423 views

Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures University of Edinburgh Siddharth Mohanty Murray Cole Agenda (1:00) Wavefront Pattern (1:00) Wavefront Applications (0:30) Implementation Strategy +

Autotuning Wavefront Applications for Multicore Multi-GPU Hybrid Architectures University of Edinburgh Siddharth Mohanty Murray Cole
Agenda (1:00) Wavefront Pattern (1:00) ● Wavefront Applications (0:30) ● Implementation Strategy + trade-offs (4:30) ● Experimental Programme (1:30) ● Platform And Parameters (1:00) ● Exhaustive Search Results (2:00) ● ESR : Best Points Performance (1:00) ● ESR : Best Points Sensitivity (1:00) ● Autotuning Model (1:00) ● Autotuning Results (1:30) ● Q&A (4:00) ●
Wavefront Pattern (0:30) (c) (c)-Dios, A.J et al."Evaluation of the Task Programming Model in the Parallelization of Wavefront Problems," (HPCC), 2010, IEEE
Wavefront Applications (0:30) Nash Equilibrium : A game-theoretic problem in economics, characterized by small instances ● but a very computationally demanding kernel. The internal granularity parameter controls the iteration count of a nested loop. Biological Sequence Comparison : A string alignment problem from Bioinformatics, ● characterized by very large instances and very fine-grained kernels, varying with detailed comparisons made. (a) (a)- http://en.wikipedia.org/wiki/SmithWaterman_algorithm
Implementation Strategy (4:30) Dual GPU MultiCore Wavefront Framework
Experimental Programme (1:30)
Platforms and Parameters (0:30)
Exhaustive Search Results (ESR) (2:00)
ESR : Best Point Performance (1:00)
ESR : Best Points Sensitivity (1:00)
Autotuning : Model (1:00)
Autotuning Results (1:30)
Thank You
Appendix :Tuning Challenges Problem size ( dim ) large enough to justify parallel computation in GPU (smaller sized ● problems can be computed quicker in the faster CPU cores) Granularity of task ( tsize ) high enough for computation to dominate over the cost of starting a ● GPU and the communication overhead of transferring data between GPU and CPU. Communication cost increases with increase in data ( dsize ) being transferred ● Dual GPUs have the additional overhead of exchanging neighbouring data between ● themselves every few iterations ( halo swapping). Halo swaps will decrease with increase in halo size but this has to be traded against ● redundant computation, which starts affecting performance with increase in granularity of task GPU tiling ( gpu-tile ) leads to reduction in the number of kernel calls but this has to be traded ● against the additional cost of synchronizing work items within each work group. When computation dominates over communication anyway, time spent in kernel calls no ● longer matters and gpu tiling may prove to be counter productive The type of system affects the performance : ● - fast GPU coupled to a slow CPU means data will mostly be offloaded to the GPU, meaning more diagonals in the GPU ( band sizes) with CPU tiling having negligible effect. - fast GPU + fast CPU would similarly mean lower band sizes
Appendix : Framework Interface
● Appendix : TBB/Omp/baseline vs skeleton ● ●
Appendix :Previous Autotuning Performance ● Synthetic Application – note varying colour key 1
Appendix : Previous Summarised Results ● Overall Average Performance 1

Recommend

Thin Film Metrology Using Wavefront Thin Film Metrology Using Wavefront Sensing. Sensing. D M

Thin Film Metrology Using Wavefront Thin Film Metrology Using Wavefront Sensing. Sensing. D M Faichnie, A H Greenaway, I Bain* Physics, Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, Scotland, EH14 4AS *Scalar

488 views • 14 slides

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D Graphic Processor (GPU). Develop a high level language to program the GPU. Provide all of the necessary tools, test-bench and regressions.

411 views • 18 slides

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline

State of Multicore OCaml KC Sivaramakrishnan University of OCaml Labs Cambridge Outline Overview of the multicore OCaml project Multicore OCaml runtime design Future directions Multicore OCaml Multicore OCaml Add native

932 views • 62 slides

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore?

The Why, Where and How of Multicore Anant Agarwal MIT and Tilera Corp. What is Multicore? Whatevers Inside? What is Multicore? Whatevers Inside? Seriously, multicore satisfies three properties Single chip Multiple

494 views • 35 slides

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double

Multicore Multicore curiculum 1 Motivation Moores Law: the number of transistors double every 18 months Fonte: Intel Multicore curiculum 2 Memory capacity also increases Multicore curiculum 3 The Memory Wall 100,000 10,000 1,000

391 views • 17 slides

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of multiple GPUs NCCL Multiple GPUs per system 1 GPU Multiple systems connected NCCL : N VIDIA C ollective C ommunication L ibrary 2 MULTI-GPU DL

1.39k views • 19 slides

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ

SLICING THE WORKLOAD MULTI-GPU OPENGL RENDERING APPROACHES INGO ESSER NVIDIA DEVTECH PROVIZ OVERVIEW Motivation Tools of the trade Multi-GPU driver functions Multi-GPU programming functions Multi threaded multi GPU renderer General

553 views • 43 slides

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Status of GPU offloading on Wayland Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland How to do GPU offloading 1 GPU offloading with X DRI2 2 GPU offloading with Wayland 3 and XWayland? 4

427 views • 29 slides

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs. CPU Why to Learn About GPU? NVIDIA GPU relative performances Why to Learn About GPU? Hardware Why to Learn About GPU? Interactive rendering

852 views • 46 slides

WAVEFRONT ALL-WEATHER FUND Private & Confidential Presentation data as of July 31, 2020 For

INVESTOR PRESENTATION WAVEFRONT ALL-WEATHER FUND Private & Confidential Presentation data as of July 31, 2020 For Accredited Investors Only WaveFront Global Asset Management Corp. is a Canadian global asset management company based in

1.47k views • 24 slides

Algorithm Gavin J. Pringle The Benchmark code Particle transport code using wavefront

Parallelism Inherent in the Wavefront Algorithm Gavin J. Pringle The Benchmark code Particle transport code using wavefront algorithm Primarily used for benchmarking Coded in Fortran 90 and MPI Scales to thousands of cores for

616 views • 32 slides

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge

Multicore OCaml GC KC Sivaramakrishnan, Stephen Dolan University of OCaml Labs Cambridge Multicore OCaml Multicore OCaml Adds native support for concurrency and parallelism in OCaml Multicore OCaml Adds native support for concurrency

1.28k views • 115 slides

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on

Multicore Synchronization a pragmatic introduction Multicore Synchronization This is a talk on mechanical sympathy of parallel systems on modern multicore systems. Understanding both your workload and your environment allows for effective

1.21k views • 107 slides

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide

RETHINKING OPERATING SYSTEM DESIGNS FOR A Ken Birman Based heavily MULTICORE WORLD on a slide set by Colin Ponce THE RISE OF MULTICORE CPUS Multicore computer: A computer with more than one CPU. 1960-1990: Multicore existed in

571 views • 31 slides

Autotuning Dense Batched QR Factorizations on GPU Wissam M. Sid-Lakhdar Tim A. Davis Xiaoye S.

Introduction Meta-programming Optimization Experimental results Conclusion Autotuning Dense Batched QR Factorizations on GPU Wissam M. Sid-Lakhdar Tim A. Davis Xiaoye S. Li Texas A&M University & Lawrence Berkeley National

592 views • 32 slides

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8, 2017 Why super GPU is needed Extending CUDA view into clusters Why super GPU is needed Extending CUDA view into clusters Example: Sparse Matrix

484 views • 13 slides

CASE STUDY Solution CASE STUDY Solution 1/8 Main characteristics: A multi site ITN of 8

CASE STUDY Solution CASE STUDY Solution 1/8 Main characteristics: A multi site ITN of 8 partners (coordinated by the organisation KIWI) proposes to provide: initial training of 36 months to 11 ESRs (total 396 person months) and

345 views • 10 slides

Study of the Low-Energy ER/NR Discrimination and its Electric-Field Dependence with Liquid

1/20 Study of the Low-Energy ER/NR Discrimination and its Electric-Field Dependence with Liquid Argon 22/9/2017 LIDINE 2017 @ SLAC Tatsuki Washimi (Waseda University, Japan) 2/20 WIMP Search with Liquid Noble Gas arXiv:1707.08042v2 Energy

389 views • 20 slides

Holders for Ultrafast Electron Spin Resonance Mary Lou P. Bailey EUREKA Intern Mentor: Devin T.

Developing Sample Holders for Ultrafast Electron Spin Resonance Mary Lou P. Bailey EUREKA Intern Mentor: Devin T. Edwards Dr. Mark S. Sherwin UCSB Physics Department August 22, 2012 What is ESR Spectroscopy? Investigates unpaired

543 views • 14 slides

Earth Science E.S. operational issues David Weissenbach (IPGP) Slides by Andr Gemnd (SCAI)

Earth Science E.S. operational issues David Weissenbach (IPGP) Slides by Andr Gemnd (SCAI) 1 4/10/13 www.egi.eu EGI-InSPIRE RI-261323 Earth Science Earth Science VRC trans-discipline, established on prior activities(strategic

546 views • 7 slides

Welcome HIP-Cuyahoga Overview Speaker Introduction Update on life expectancy map release Heidi

Welcome HIP-Cuyahoga Overview Speaker Introduction Update on life expectancy map release Heidi Gullett, MD, MPH HIP-Cuyahoga Co-Chair 6/27/2016 Vision and Mission Our Vision Cuyahoga County is a place where all residents live,

579 views • 54 slides

Lecture 07: Impedance Matching 1 Matthew Spencer Harvey Mudd College E157 Radio Frequency

Department of Engineering Lecture 07: Impedance Matching 1 Matthew Spencer Harvey Mudd College E157 Radio Frequency Circuit Design 1 1 Department of Engineering Matching Networks Matthew Spencer Harvey Mudd College E157 Radio

743 views • 27 slides

The GSI Anomaly M. Lindner Max-Planck-Institut fr Kernphysik, Heidelberg Sildes partially

The GSI Anomaly M. Lindner Max-Planck-Institut fr Kernphysik, Heidelberg Sildes partially adopted from F. Bosch What is the GSI Anomaly? Periodically modualted exponential -decay law of highly charged, stored ions at GSI by

246 views • 20 slides

Cathedrals in the Cloud Musings on APIs for the Web Mike Amundsen API Academy / CA @mamund

Cathedrals in the Cloud Musings on APIs for the Web Mike Amundsen API Academy / CA @mamund Eric S. Raymond (ESR) "In the beginning, there were Real Programmers" - ESR "In the beginning, there were Real Programmers" - ESR

1.32k views • 89 slides