CuPP – A framework for easy CUDA integration
Jens Breitbart1
1University of Kassel
Research Group Programming Languages / Methodologies
Rom, Italy May 25, 2009
Breitbart CuPP – A framework for easy CUDA integration 1
CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - - PowerPoint PPT Presentation
CuPP A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP A framework for easy CUDA integration 1 The current state
Jens Breitbart1
1University of Kassel
Research Group Programming Languages / Methodologies
Rom, Italy May 25, 2009
Breitbart CuPP – A framework for easy CUDA integration 1
Matlab C/SSE PS3 GT200
0.3 9.0 110.0 330.0 0.5 10.0 30.0 10.0
Performance (gflops) Development Time (hours)
3D Filterbank Convolution
Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)
Introduction CUDA
CUDA is NVIDIAs general purpose programming system In CUDA the GPU (=device) executes a function (=kernel) in the SPMD model The kernels run in their own memory domain and data must be explicitly transfered to/from it
Breitbart CuPP – A framework for easy CUDA integration 3
Introduction CUDA
application dataflow is known dataflow is not known it written in C
CuPP – A framework for easy CUDA integration 4
CuPP Overview
Breitbart CuPP – A framework for easy CUDA integration 5
CuPP Overview
The developer uses a device handle to identify a GPU / device
A handle must by passed to all functions using the device
Designed to support multiple devices per thread...
... but this not yet implemented
Some magic so far?
No, just some basics.
Breitbart CuPP – A framework for easy CUDA integration 6
CuPP Overview
Two levels:
1 CUDA like, but C++-ified
2 CuPP memory objects
A memory object represents data stored at device memory Use them to implement your data structures, not for everyday use
We still haven’t solved a problem...
... sorry, but we are close
Breitbart CuPP – A framework for easy CUDA integration 7
CuPP Overview
Recall: CUDA does officially not support C++ “Direct” ... it works with some restrictions Type transformations allow you to use two independent type
device type: used @ device host type: used @ host
Breitbart CuPP – A framework for easy CUDA integration 8
CuPP Overview
(Almost) identical to a C++ function call Supports both call by value and call by reference Behaviour is customizable by using call back functions
This is were the magic starts ...
... and multiple seconds of compile time are spent
Breitbart CuPP – A framework for easy CUDA integration 9
CuPP Overview
... well currently there is just a std::vector wrapper
with some lazy memory copying
But you can easily design (or adopt) your own data structures
Three steps to add adopt your existing data structure
1 Add memory objects to store your data at the
device
2 Create a C-conform device type to be used at the
GPU
3 Implement the kernel callback functions to
transform between host and device type
Breitbart CuPP – A framework for easy CUDA integration 10
CuPP Examples
1 cupp :: device dev; 2 cupp :: kernel k(get_fct_ptr (), gridDim , blockDim); 3 4 cupp ::vector <int > one , two; 5 cupp ::vector <int > *input = &one; 6 cupp ::vector <int > *output = &two; 7 8 for (int i=0; i <10; ++i) { 9 k(dev , *input , *output); 10 #if debug 11 for (int i=0; i<output ->size (); ++i) 12 std:: cout << output ->at(i) << ", "; 13 #endif 14 swap (input , output); 15 } 16 17 write_to_file (* input);
Breitbart CuPP – A framework for easy CUDA integration 11
Conclusion
CuPP can ...
help you integrate CUDA into your application. manage your data on the device.
CuPP can’t ...
manage multiple devices effectivly double buffering ( GPU/CPU concurrency)
Breitbart CuPP – A framework for easy CUDA integration 12
Conclusion
CuPP can ...
help you integrate CUDA into your application. manage your data on the device.
CuPP can’t ...
manage multiple devices effectivly double buffering ( GPU/CPU concurrency)
Breitbart CuPP – A framework for easy CUDA integration 12
Conclusion
CuPP can ...
help you integrate CUDA into your application. manage your data on the device.
manage multiple devices effectivly double buffering ( GPU/CPU concurrency)
Breitbart CuPP – A framework for easy CUDA integration 12
Conclusion
The overall performance is similar to a data structure that can be both created and transfered effectivly.
Breitbart CuPP – A framework for easy CUDA integration 13
Conclusion
The existing data structure cannot be transfered to the device Solution? Design a new data structure and rewrite all existing CPU functions or use two types
Breitbart CuPP – A framework for easy CUDA integration 14