Composable GPU programming GPUs -- what are they? Basic model: - PowerPoint PPT Presentation

Jan 05, 2023 •374 likes •474 views

Composable GPU programming GPUs -- what are they? Basic model: SIMD, SPMD, MIMD; blocks of PUs with single PC, local memory (synchronous); warps many blocks (asynchronous), VRAM discontinuities/constraints from hardware

Composable GPU programming
GPUs -- what are they? • Basic model: SIMD, SPMD, MIMD; • blocks of PUs with single PC, local memory (synchronous); warps • many blocks (asynchronous), VRAM • discontinuities/constraints from hardware implementation of memory access; • next-generation hardware likely to mediate this to make programmability more orthogonal
GPUs -- what are they? Revenge of the PRAM? • Basic model: SIMD, SPMD, MIMD; • blocks of PUs with single PC, local memory (synchronous); warps • many blocks (asynchronous), VRAM • discontinuities/constraints from hardware implementation of memory access; • next-generation hardware likely to mediate this to make programmability more orthogonal
Programming GPUs • CUDA: C-like language for general-purpose programming with code generated for GPUs • previously: OpenGL for graphics programming • coming up: OpenCL (compute language) • foo<<m, n, k>> (args) • Execute foo with implicit argument i, j (block, PU) selecting from arguments • Care required when accessing memory: out of sequence accesses sequentialized!
GPU language projects • Data parallel Haskell: • Programming flat PRAM level • Nested/compositional programming • map (map f) (xss) • Obsidian: Combinator language for generating CUDA code • explicit synchronization • choosing threads, mapping to blocks
How to exploit? • Performance: If you have a data parallel problem, formulate it using scan, map, fold, permute on bulk data (arrays), have it shipped out to a GPU! • If you can’t figure out how to do that, do not expect magic from your compiler.
Qualities • Obsidian good candidate for capturing two-level model (synchronous blocks and asynchronous sets of blocks) and implementing APRAM model • Excellent scan implementations • Data parallel Haskell good model for programming APRAM model and for compositional abstraction on top of that • NESL with h.o. functions, polymorphism
Requirements • Need a robust performance model: NESL at PRAM level, sth else lower; • Need to stay in the same programming model when engineering/tuning code • Need a robust programming model (sw/ hw) -- small changes shouldn't lead to unpredicatable radical changes in performance.
(End)

Recommend

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

GPU programming in Haskell GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation: Sensor calibration 1 Motivation: Sensor calibration 2 Haskell GPU programming 3 Fact-check 4 Accelerate programming 5

460 views • 33 slides

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

1 GPU Programming GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of two programming assignments Concurrency GPU programming Requires a

819 views • 65 slides

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

Gunrock High-Performance Graph Analytics for the GPU Muhammad Osama University of California, Davis Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found everywhere Found everywhere Road &

560 views • 23 slides

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of multiple GPUs NCCL Multiple GPUs per system 1 GPU Multiple systems connected NCCL : N VIDIA C ollective C ommunication L ibrary 2 MULTI-GPU DL

1.39k views • 19 slides

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE &

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE & EXTENSIBLE REST API REST API Thierry Delprat td@nuxeo.com https://github.com/tiry/ AGENDA AGENDA Quick introduction provide some context API design

1.37k views • 75 slides

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Status of GPU offloading on Wayland Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland How to do GPU offloading 1 GPU offloading with X DRI2 2 GPU offloading with Wayland 3 and XWayland? 4

432 views • 29 slides

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs. CPU Why to Learn About GPU? NVIDIA GPU relative performances Why to Learn About GPU? Hardware Why to Learn About GPU? Interactive rendering

854 views • 46 slides

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU hardware and system architecture GPU programming languages GPU programming paradigms This week Example program Memory model

752 views • 36 slides

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS ARCHITECTURES GPU 0 GPU 1 GPU 2 CPU GPU 0 GPU 1 GPU 2 MEM MEM MEM SYS MEM 2 UNIFIED MEMORY FUNDAMENTALS Single Pointer CPU code GPU code void

874 views • 70 slides

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team Lead Alexander Soklev, RT GPU R&D Agenda Recent improvements in RT GPU Rounded edges MDL material support Next-gen GPU

534 views • 24 slides

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8, 2017 Why super GPU is needed Extending CUDA view into clusters Why super GPU is needed Extending CUDA view into clusters Example: Sparse Matrix

485 views • 13 slides

DATA CENTER GPU MANAGER Brent Stolle and David Beer March 2018 TOOLS FOR MANAGING GPUs

DATA CENTER GPU MANAGER Brent Stolle and David Beer March 2018 TOOLS FOR MANAGING GPUs Out-of-Band In-Band GPU Metrics and Monitoring via Tools use the NVIDIA driver to BMC (SMBPBI) provide GPU and NVSwitch metrics Provide metrics

589 views • 27 slides

Hosted I/O Architecture Micah Dowty Jeremy Sugerman USENIX WIOV 2008 1 Contents GPUs are

GPU Virtualization on VMwares Hosted I/O Architecture Micah Dowty Jeremy Sugerman USENIX WIOV 2008 1 Contents GPUs are hard But GPU virtualization is worth the trouble How to virtualize a GPU? VMwares virtual GPU

229 views • 22 slides

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1 Intro to GPU Computing Outline Discuss HW1 Intro to GPU Computing Discuss HW1 Intro to GPU Computing Outline Discuss HW1 Intro to GPU

749 views • 72 slides

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

Today s Topic s Topic Today GPU architecture GPU architecture What and why What and why GPU Architecture and chitecture and GPU Ar The good The good The bad The bad Programming with OpenCL

571 views • 23 slides

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing Unit (GPGPU) management Today GPU architecture GPU programming model Challenges Real-Time GPU management 2 History GPU

838 views • 66 slides

Duality of upper and lower powerlocales on locally compact locales Tatsuji Kawai University of

Duality of upper and lower powerlocales on locally compact locales Tatsuji Kawai University of Padova Duality of lower and upper powerlocales on locally compact locales Tatsuji Kawai (University of Padova) Why powerlocales? Compact and

849 views • 51 slides

Sub-Nyquist Sampling of Wideband Signals Deborah Cohen Technion Israel Institute of

Sub-Nyquist Sampling of Wideband Signals Deborah Cohen Technion Israel Institute of Technology Sub-Nyquist Sampling (Xampling) Smart Sampling Seminar March 21 st , 2012 1 /20 Outline Motivation Algorithms Sampling: MWC and Multicoset

242 views • 22 slides

$Powder Neutron Diffraction - an introduction MENA3100 March 7 2018 Magnus H. Srby$

Powder Neutron Diffraction - an introduction MENA3100 March 7 2018 Magnus H. Srby

Powder Neutron Diffraction - an introduction MENA3100 March 7 2018 Magnus H. Srby 13.03.2018 Scope The advantages of neutrons vs. X-rays Examples of neutron diffraction studies Neutron diffraction at IFE The glory of

683 views • 38 slides

Authenticated encryption in civilian space m issions: context and requirem ents I. Aguilar

Authenticated encryption in civilian space m issions: context and requirem ents I. Aguilar Snchez, D. Fischer Stockholm 05/ 07/ 2012 DI AC 2012 Presentation | I . Aguilar Snchez, D. Fischer | Stockholm | 05/ 07/ 2012 | Technical and

500 views • 33 slides

Constant-Overhead Secure Computation using Preprocessing Ivan Damgrd, Sarah Zakarias Aarhus

Constant-Overhead Secure Computation using Preprocessing Ivan Damgrd, Sarah Zakarias Aarhus University, Denmark Multiparty Computation Goal: Compute circuit UC-securely Unlike previous talk: I Interested in d i complexity of protocol

466 views • 21 slides

CPU+GPU Load Balance Guided by Execution Time Prediction Jean-Franois Dollinger, Vincent

CPU+GPU Load Balance Guided by Execution Time Prediction Jean-Franois Dollinger, Vincent Loechner Inria CAMUS, ICube Lab., University of Strasbourg jean-francois.dollinger@inria.fr, vincent.loechner@inria.fr 19 January 2015 1 / 29 Outline 1

347 views • 30 slides

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of Maryland furongh@cs.umd.edu ACM SIGMETRICS Tutorial 2018 1 / 75 Tutorial Topic Learning algorithms for latent variable models based on

1.96k views • 180 slides

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Introduction The ADMM Algorithm The Main Result Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint work with Mingyi Hong, Tsung-Hui Chang, Xiangfeng Wang, Meisam Razaviyanyn, Shiqian Ma University

775 views • 57 slides

Composable GPU programming GPUs -- what are they? Basic model: - PowerPoint PPT Presentation

Composable GPU programming GPUs -- what are they? Basic model: SIMD, SPMD, MIMD; blocks of PUs with single PC, local memory (synchronous); warps many blocks (asynchronous), VRAM discontinuities/constraints from hardware

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE &

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

DATA CENTER GPU MANAGER Brent Stolle and David Beer March 2018 TOOLS FOR MANAGING GPUs

Hosted I/O Architecture Micah Dowty Jeremy Sugerman USENIX WIOV 2008 1 Contents GPUs are

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Duality of upper and lower powerlocales on locally compact locales Tatsuji Kawai University of

Sub-Nyquist Sampling of Wideband Signals Deborah Cohen Technion Israel Institute of

Powder Neutron Diffraction - an introduction MENA3100 March 7 2018 Magnus H. Srby

Authenticated encryption in civilian space m issions: context and requirem ents I. Aguilar

Constant-Overhead Secure Computation using Preprocessing Ivan Damgrd, Sarah Zakarias Aarhus

CPU+GPU Load Balance Guided by Execution Time Prediction Jean-Franois Dollinger, Vincent

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Sambuz

Useful Links

Newsletter

Mail Us

Composable GPU programming GPUs -- what are they? Basic model: - PowerPoint PPT Presentation

Composable GPU programming GPUs -- what are they? Basic model: SIMD, SPMD, MIMD; blocks of PUs with single PC, local memory (synchronous); warps many blocks (asynchronous), VRAM discontinuities/constraints from hardware

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Why use GPUs for graph processing? FOSDEM 2020 2 GPUs and Graphs Graphs GPUs Found

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE &amp; EXTENSIBLE A FLEXIBLE, COMPOSABLE &amp;

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

GPU programming Dr. Bernhard Kainz 1 Overview About myself Last week Motivation GPU

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

DATA CENTER GPU MANAGER Brent Stolle and David Beer March 2018 TOOLS FOR MANAGING GPUs

Hosted I/O Architecture Micah Dowty Jeremy Sugerman USENIX WIOV 2008 1 Contents GPUs are

Lecture 5: HW1 Discussion, Intro to GPUs G63.2011.002/G22.2945.001 October 5, 2010 Discuss HW1

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

Real-Time GPU Management Heechul Yun 1 This Week Topic: General Purpose Graphic Processing

Duality of upper and lower powerlocales on locally compact locales Tatsuji Kawai University of

Sub-Nyquist Sampling of Wideband Signals Deborah Cohen Technion Israel Institute of

Powder Neutron Diffraction - an introduction MENA3100 March 7 2018 Magnus H. Srby

Authenticated encryption in civilian space m issions: context and requirem ents I. Aguilar

Constant-Overhead Secure Computation using Preprocessing Ivan Damgrd, Sarah Zakarias Aarhus

CPU+GPU Load Balance Guided by Execution Time Prediction Jean-Franois Dollinger, Vincent

Guaranteed Learning of Latent Variable Models through Tensor Methods Furong Huang University of

Flexible ADMM for Block-Structured Convex and Nonconvex Optimization Zhi-Quan (Tom) Luo Joint

Sambuz

Useful Links

Newsletter

Mail Us

EXPOSING EXPOSING A FLEXIBLE, COMPOSABLE & EXTENSIBLE A FLEXIBLE, COMPOSABLE &

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,