CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - PowerPoint PPT Presentation

CuPP – A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP – A framework for easy CUDA integration 1

The current state of CUDA development GPU? GPUs are REALLY fast Performance (gflops) Development Time (hours) 3D Filterbank Convolution 0.3 Matlab 0.5 9.0 C/SSE 10.0 110.0 PS3 30.0 330.0 GT200 10.0 Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

Introduction CUDA Overview CUDA is NVIDIAs general purpose programming system In CUDA the GPU (=device) executes a function (=kernel) in the SPMD model The kernels run in their own memory domain and data must be explicitly transfered to/from it Breitbart CuPP – A framework for easy CUDA integration 3

Introduction CUDA ... so what is the problem? dataflow is known dataflow is not known application � � it written in C � � it written in C++ Breitbart CuPP – A framework for easy CUDA integration 4

CuPP Overview CuPP – one solution to the problems Breitbart CuPP – A framework for easy CUDA integration 5

CuPP Overview Device management The developer uses a device handle to identify a GPU / device A handle must by passed to all functions using the device Designed to support multiple devices per thread... ... but this not yet implemented Some magic so far? No, just some basics. Breitbart CuPP – A framework for easy CUDA integration 6

CuPP Overview Memory management Two levels: 1 CUDA like, but C++-ified only use this for short experiments 2 CuPP memory objects A memory object represents data stored at device memory Use them to implement your data structures , not for everyday use We still haven’t solved a problem... ... sorry, but we are close Breitbart CuPP – A framework for easy CUDA integration 7

CuPP Overview Support for classes Recall: CUDA does officially not support C++ “Direct” ... it works with some restrictions Type transformations allow you to use two independent type device type: used @ device host type: used @ host Breitbart CuPP – A framework for easy CUDA integration 8

CuPP Overview Kernel call (Almost) identical to a C++ function call Supports both call by value and call by reference Behaviour is customizable by using call back functions This is were the magic starts ... ... and multiple seconds of compile time are spent Breitbart CuPP – A framework for easy CUDA integration 9

CuPP Overview Data structures ... well currently there is just a std::vector wrapper with some lazy memory copying But you can easily design (or adopt) your own data structures Three steps to add adopt your existing data structure 1 Add memory objects to store your data at the device 2 Create a C-conform device type to be used at the GPU 3 Implement the kernel callback functions to transform between host and device type Breitbart CuPP – A framework for easy CUDA integration 10

CuPP Examples The result of the puzzle 1 cupp :: device dev; 2 cupp :: kernel k(get_fct_ptr (), gridDim , blockDim); 3 4 cupp ::vector <int > one , two; 5 cupp ::vector <int > *input = &one; 6 cupp ::vector <int > *output = &two; 7 8 for (int i=0; i <10; ++i) { 9 k(dev , *input , *output); 10 #if debug 11 for (int i=0; i<output ->size (); ++i) 12 std:: cout << output ->at(i) << ", "; 13 #endif 14 swap (input , output); 15 } 16 17 write_to_file (* input); Breitbart CuPP – A framework for easy CUDA integration 11

Conclusion What we have done CuPP can ... CuPP can’t ... help you integrate manage multiple devices effectivly CUDA into your double buffering ( GPU/CPU application. concurrency) manage your data on the device. Breitbart CuPP – A framework for easy CUDA integration 12

Conclusion What we have (not) done CuPP can ... CuPP can’t ... help you integrate manage multiple devices effectivly CUDA into your double buffering ( GPU/CPU application. concurrency) manage your data on the device. Breitbart CuPP – A framework for easy CUDA integration 12

Conclusion What we have (not) done CuPP can ... CuPP can’t ... help you integrate manage multiple devices effectivly CUDA into your double buffering ( GPU/CPU application. concurrency) manage your data on the device. Thank you � Breitbart CuPP – A framework for easy CUDA integration 12

Conclusion Data structure to speed up k-nn The overall performance is similar to a data structure that can be both created and transfered effectivly. Breitbart CuPP – A framework for easy CUDA integration 13

Conclusion Einstein@Home The existing data structure cannot be transfered to the device Solution? Design a new data structure and rewrite all existing CPU functions or use two types Breitbart CuPP – A framework for easy CUDA integration 14

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - PowerPoint PPT Presentation

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP A framework for easy CUDA integration 1 The current state

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

COLLEGE/UNDERSERVED COMMUNITY PARTNERSHIP PROGRAM (CUPP) Michael W. Burns Senior Advisor to the

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grner

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

CDT Domingo Hilario CDT Colton Cupp CDT Scott Rapuano Mr. Scott Hunter The Team

MultiGPU Made Easy by OmpSs + CUDA/OpenACC Antonio J. Pea Sr. Researcher & Activity Lead

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

Database Systems II Secondary Storage CMPT 454, Simon Fraser University, Fall 2009, Martin Ester

A Component-Based Framework for the Cell Broadband Engine Timothy D. R. Hartley, Umit V.

DetNet Bounded Latency-04 drafu-fjnn-detnet-bounded-latency-04 Norman Finn, Jean-Yves Le Boudec,

1 L Jan-25-04 SMD159, Input and Interaction Overview Basic input devices - Physical

Fall 2003 6.893 UI Design

Robustness issues in timed models Nicolas Markey LSV, CNRS & ENS Cachan, France (based on

By a set we mean any collection of objects that are precisely spec- ified. These objects are called

Mod odule 3: Representing Values in a Com omputer Du Due Da Dates Assignment #1 is due

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - PowerPoint PPT Presentation

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP A framework for easy CUDA integration 1 The current state

Outline Overview Parallel Computing with GPU Introduction to CUDA CUDA Thread Model

Introduction to CUDA C What is CUDA? CUDA Architecture Expose general-purpose GPU

Lecture 2.1 - Introduction to CUDA C CUDA C vs. Thrust vs. CUDA Libraries Objective To learn

COLLEGE/UNDERSERVED COMMUNITY PARTNERSHIP PROGRAM (CUPP) Michael W. Burns Senior Advisor to the

CUDA/Ada An Ada binding to CUDA Reto B urki, Adrian-Ken R uegsegger University of Applied

Easy-to-Use Easy-to-Install Easy on the Budget orecx.com Easy-to-Use

A High-Level Intro to CUDA CS5220 Fall 2015 What is CUDA? C ompute U nified D evice A

GPU Programming Alan Gray EPCC The University of Edinburgh Overview Motivation and need

Lecture 2.4 Introduction to CUDA C Introduction to the CUDA Toolkit Objective To become

S9391 GstCUDA: Easy GStreamer and CUDA Integration Eng. Daniel Garbanzo MSc. Michael Grner

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

CDT Domingo Hilario CDT Colton Cupp CDT Scott Rapuano Mr. Scott Hunter The Team

MultiGPU Made Easy by OmpSs + CUDA/OpenACC Antonio J. Pea Sr. Researcher &amp; Activity Lead

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

S9751: ACCELERATE YOUR CUDA DEVELOPMENT WITH LATEST DEBUGGING AND CODE ANALYSIS DEVELOPER TOOLS

CUDA 7 AND BEYOND MARK HARRIS, NVIDIA CUDA 7 Runtime C++11 cuSOLVER Compilation

Database Systems II Secondary Storage CMPT 454, Simon Fraser University, Fall 2009, Martin Ester

A Component-Based Framework for the Cell Broadband Engine Timothy D. R. Hartley, Umit V.

DetNet Bounded Latency-04 drafu-fjnn-detnet-bounded-latency-04 Norman Finn, Jean-Yves Le Boudec,

1 L Jan-25-04 SMD159, Input and Interaction Overview Basic input devices - Physical

Fall 2003 6.893 UI Design

Robustness issues in timed models Nicolas Markey LSV, CNRS &amp; ENS Cachan, France (based on

By a set we mean any collection of objects that are precisely spec- ified. These objects are called

Mod odule 3: Representing Values in a Com omputer Du Due Da Dates Assignment #1 is due

MultiGPU Made Easy by OmpSs + CUDA/OpenACC Antonio J. Pea Sr. Researcher & Activity Lead

Robustness issues in timed models Nicolas Markey LSV, CNRS & ENS Cachan, France (based on