CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - - PowerPoint PPT Presentation

cupp a framework for easy cuda integration
SMART_READER_LITE
LIVE PREVIEW

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 - - PowerPoint PPT Presentation

CuPP A framework for easy CUDA integration Jens Breitbart 1 1 University of Kassel Research Group Programming Languages / Methodologies Rom, Italy May 25, 2009 Breitbart CuPP A framework for easy CUDA integration 1 The current state


slide-1
SLIDE 1

CuPP – A framework for easy CUDA integration

Jens Breitbart1

1University of Kassel

Research Group Programming Languages / Methodologies

Rom, Italy May 25, 2009

Breitbart CuPP – A framework for easy CUDA integration 1

slide-2
SLIDE 2

The current state of CUDA development GPUs are REALLY fast

Matlab C/SSE PS3 GT200

0.3 9.0 110.0 330.0 0.5 10.0 30.0 10.0

Performance (gflops) Development Time (hours)

3D Filterbank Convolution

GPU?

Nicolas Pinto, James DiCarlo, David Cox (MIT, Harvard)

slide-3
SLIDE 3

Introduction CUDA

Overview

CUDA is NVIDIAs general purpose programming system In CUDA the GPU (=device) executes a function (=kernel) in the SPMD model The kernels run in their own memory domain and data must be explicitly transfered to/from it

Breitbart CuPP – A framework for easy CUDA integration 3

slide-4
SLIDE 4

Introduction CUDA

... so what is the problem?

application dataflow is known dataflow is not known it written in C

  • it written in C++
  • Breitbart

CuPP – A framework for easy CUDA integration 4

slide-5
SLIDE 5

CuPP Overview

CuPP – one solution to the problems

Breitbart CuPP – A framework for easy CUDA integration 5

slide-6
SLIDE 6

CuPP Overview

Device management

The developer uses a device handle to identify a GPU / device

A handle must by passed to all functions using the device

Designed to support multiple devices per thread...

... but this not yet implemented

Some magic so far?

No, just some basics.

Breitbart CuPP – A framework for easy CUDA integration 6

slide-7
SLIDE 7

CuPP Overview

Memory management

Two levels:

1 CUDA like, but C++-ified

  • nly use this for short experiments

2 CuPP memory objects

A memory object represents data stored at device memory Use them to implement your data structures, not for everyday use

We still haven’t solved a problem...

... sorry, but we are close

Breitbart CuPP – A framework for easy CUDA integration 7

slide-8
SLIDE 8

CuPP Overview

Support for classes

Recall: CUDA does officially not support C++ “Direct” ... it works with some restrictions Type transformations allow you to use two independent type

device type: used @ device host type: used @ host

Breitbart CuPP – A framework for easy CUDA integration 8

slide-9
SLIDE 9

CuPP Overview

Kernel call

(Almost) identical to a C++ function call Supports both call by value and call by reference Behaviour is customizable by using call back functions

This is were the magic starts ...

... and multiple seconds of compile time are spent

Breitbart CuPP – A framework for easy CUDA integration 9

slide-10
SLIDE 10

CuPP Overview

Data structures

... well currently there is just a std::vector wrapper

with some lazy memory copying

But you can easily design (or adopt) your own data structures

Three steps to add adopt your existing data structure

1 Add memory objects to store your data at the

device

2 Create a C-conform device type to be used at the

GPU

3 Implement the kernel callback functions to

transform between host and device type

Breitbart CuPP – A framework for easy CUDA integration 10

slide-11
SLIDE 11

CuPP Examples

The result of the puzzle

1 cupp :: device dev; 2 cupp :: kernel k(get_fct_ptr (), gridDim , blockDim); 3 4 cupp ::vector <int > one , two; 5 cupp ::vector <int > *input = &one; 6 cupp ::vector <int > *output = &two; 7 8 for (int i=0; i <10; ++i) { 9 k(dev , *input , *output); 10 #if debug 11 for (int i=0; i<output ->size (); ++i) 12 std:: cout << output ->at(i) << ", "; 13 #endif 14 swap (input , output); 15 } 16 17 write_to_file (* input);

Breitbart CuPP – A framework for easy CUDA integration 11

slide-12
SLIDE 12

Conclusion

What we have done

CuPP can ...

help you integrate CUDA into your application. manage your data on the device.

CuPP can’t ...

manage multiple devices effectivly double buffering ( GPU/CPU concurrency)

Breitbart CuPP – A framework for easy CUDA integration 12

slide-13
SLIDE 13

Conclusion

What we have (not) done

CuPP can ...

help you integrate CUDA into your application. manage your data on the device.

CuPP can’t ...

manage multiple devices effectivly double buffering ( GPU/CPU concurrency)

Breitbart CuPP – A framework for easy CUDA integration 12

slide-14
SLIDE 14

Conclusion

What we have (not) done

CuPP can ...

help you integrate CUDA into your application. manage your data on the device.

Thank you

  • CuPP can’t ...

manage multiple devices effectivly double buffering ( GPU/CPU concurrency)

Breitbart CuPP – A framework for easy CUDA integration 12

slide-15
SLIDE 15

Conclusion

Data structure to speed up k-nn

The overall performance is similar to a data structure that can be both created and transfered effectivly.

Breitbart CuPP – A framework for easy CUDA integration 13

slide-16
SLIDE 16

Conclusion

Einstein@Home

The existing data structure cannot be transfered to the device Solution? Design a new data structure and rewrite all existing CPU functions or use two types

Breitbart CuPP – A framework for easy CUDA integration 14