Investigation of the OpenCL support in the GeantV's Vectorized - - PowerPoint PPT Presentation

investigation of the opencl support in the geantv s
SMART_READER_LITE
LIVE PREVIEW

Investigation of the OpenCL support in the GeantV's Vectorized - - PowerPoint PPT Presentation

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014. Outline What is OpenCL? VecGeom in a few words What are the goals? Results , conclusions Gabor Biro (ELTE, Hungary) OpenCL support


slide-1
SLIDE 1

Investigation of the OpenCL support in the GeantV's Vectorized Geometry

Gabor Biro

22.09.2014.

slide-2
SLIDE 2

Outline

▶ What is OpenCL™? ▶ VecGeom in a few words ▶ What are the goals? ▶ Results, conclusions

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 1 / 16

slide-3
SLIDE 3

What is OpenCL?

Khronos Group

▶ Khronos Group: founded in January 2000 ▶ Non-profit member-funded consortium focused on the creation of

royalty-free open standards for parallel computing, graphics and dynamic media (∼100 copmanies)

▶ Currently 15 active standards, including OpenCL™ ▶ OpenCL (Open Computing Language) is a framework that increases

application performance by enabling efficient parallel programming of a variety of CPUs, GPUs

▶ Initially developed by Apple ▶ OpenCL 1.0 was released on August 28, 2009 ▶ C++ Wrapper API: built on the top of the OpenCL C API 1.2

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 2 / 16

slide-4
SLIDE 4

What is OpenCL?

OpenCL and C++

▶ AMD extension for C++ kernel language > major supported C++ features: kernel overloading, templates, namespaces,

references…

> major unsupported C++ features: virtual functions, abstract classes, dynamic

memory allocation, the :: operator, STL and other standard C++ libraries…

▶ SYCL: ,,royalty-free, cross-platform C++ abstraction layer that builds on the

underlying concepts, portability and efficiency of OpenCL, while adding the ease-of-use and flexibility of C++''

▶ Provisional SYCL 1.2 specification was released on March 19, 2014

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 3 / 16

slide-5
SLIDE 5

What is OpenCL?

Heterogeneous systems

▶ Hierarchy of models: > platform model > execution model

>> host programs >> kernels

> memory model

Illustrations taken from: Introduction to OpenCL™ Programming, AMD

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 4 / 16

slide-6
SLIDE 6

What is OpenCL?

host/device code

host code

cl::Context context (CL_DEVICE_TYPE_DEFAULT); std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>(); cl::Program program(context, util::loadProgram(clFile), (...)); cl::CommandQueue queue(context); auto KernelFunctor = cl::make_kernel<cl::Buffer&, (...)> (program, "kernelfunction"); cl::Buffer inputbuf = cl::Buffer(context, CL_MEM_USE_HOST_PTR, sizeof(Precision)*datapoints, input); KernelFunctor (cl::EnqueueArgs(queue, cl::NDRange(datapoints)), inputbuf, (...)); queue.finish();

kernel code

__kernel void kernelfuntion ( __global float* in, __global float* out ){ int id = get_global_id(0);

  • ut[id] = in[id] * in[id];

}

scalar code

void function ( int datapoints, float* in, float* out ){ for ( int i = 0; i < datapoints; i++ ){

  • ut[i] = in[i] * in[i];

} }

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 5 / 16

slide-7
SLIDE 7

What is OpenCL?

Benefits

▶ OpenCL can manage all

available computational resources

▶ Software portability: all the

hardware implementation specifics (such as drivers and runtime) are invisible to the upper-level software programmers

▶ Very highly customizable:

the developer can choose the best hardware without having to reshuffle the upper software infrastructure

devices[devid].getInfo<cl_device_info>();

where cl_device_info can be

▶ CL_DEVICE_TYPE ▶ CL_DEVICE_VENDOR_ID ▶ CL_DEVICE_MAX_COMPUTE_UNITS ▶ CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS ▶ CL_DEVICE_MAX_WORK_ITEM_SIZES ▶ CL_DEVICE_MAX_WORK_GROUP_SIZES ▶ CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE ▶ CL_DEVICE_MAX_CLOCK_FREQUENCY ▶ CL_DEVICE_GLOBAL_MEM_SIZES ▶ …

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 6 / 16

slide-8
SLIDE 8

VecGeom

Vectorized Geometry

▶ Parallelism in > particle-level > primitive-level ▶ Support multiple

architectures without having multiple implementations

1!

! ! !

4!

! ! !

1024 … … Input size

Generic kernel

Scalar (autovectorized?) instantiation

Vectorized instantiation GPU instantiation

Illustration borrowed from: First experience with portable high-performance geometry code on GPU, J.d.F.L.

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 7 / 16

slide-9
SLIDE 9

VecGeom

Generic templated code

▶ Generic templated code ▶ Backend is specified during

compile time

▶ The specific method for a

specific volume is called in a generic way

template <TranslationCode transCodeT, RotationCode rotCodeT> template <class Backend> VECGEOM_CUDA_HEADER_BOTH void ParallelepipedImplementation<transCodeT, rotCodeT>::DistanceToOut( UnplacedParallelepiped const &unplaced, Vector3D<typename Backend::precision_v> const &point, Vector3D<typename Backend::precision_v> const &direction, typename Backend::precision_v const &stepMax, typename Backend::precision_v &distance) { ...vectorized, generic computation... } ▶ Calling a method for a given volume: v->Specialized()

  • >DistanceToOut(points, directions,

fStepMax, distances);

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 8 / 16

slide-10
SLIDE 10

VecGeom + OpenCL

Goals and motivations

▶ Goals: > investigate the OpenCL support with the least possible modification of the existing

code

> implement an OpenCL API (without creating too much extra code) > investigate the performance with AMD GPU's ▶ Motivations: > architecture independency > for GPU's the CUDA backend already exists, but there is no support for AMD cards ▶ Target of investigation: > simplest volume: Box ▶ Used: > AMD Accelerated Parallel Processing (APP) SDK > AMD OpenCL Static C++ Kernel Language Extension

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 9 / 16

slide-11
SLIDE 11

VecGeom + OpenCL

Experiences

▶ The main problems are the unsupported C++ features like STL libraries > macros are not a good solution ▶ Structure of the kernel functions is different compared to the generic volume

methods

int3 id = (int3) ( get_global_id(0), get_global_id(1), get_global_id(2)); int gSize = get_global_size(0); get_local_id(0); get_num_groups(0); ...

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 10 / 16

slide-12
SLIDE 12

VecGeom + OpenCL

Experiences

▶ OpenCL kernel launching process needs a different approach ▶ Structure of the kernel functions is different compared to the generic volume

methods

CommandQueue::enqueueNDRangeKernel(Kernel &kernel, NDRange &offset, NDRange &global, NDRange &local, VECTOR_CLASS<Event> *events, Event *event);

▶ Currently the OpenCL support of the volume methods is feasible only with

new, external codes

> minimal modification of existing codes (code maintenance) > new kernel codes with different working mechanism > with OpenCL one should receive at least as fast results as with the vectorized code

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 11 / 16

slide-13
SLIDE 13

VecGeom + OpenCL

Benchmarks

Figure : Benchmark with an Intel Core i3 M350 CPU

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 12 / 16

slide-14
SLIDE 14

VecGeom + OpenCL

Benchmarks

Figure : Benchmark with an ATi Radeon HD5970 GPU

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 13 / 16

slide-15
SLIDE 15

VecGeom + OpenCL

Benchmarks

Figure : Benchmark with an AMD Opteron 6376 CPU

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 14 / 16

slide-16
SLIDE 16

Conclusions

▶ Memory management with OpenCL is a difficult task ▶ Optimization and more benchmarks needed ▶ Current version of OpenCL needs some effort ▶ Need further investigations with OpenCL/SYCL

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16

slide-17
SLIDE 17

backup slides

slide-18
SLIDE 18

Kernel of the DistanceToOut

__kernel void FastDistanceToOut(__global double* dimensions, __global double* posX, __global double* posY, __global double* posZ, __global double* dirX, __global double* dirY, __global double* dirZ, __global double* distances, unsigned PointCount){ unsigned rem = PointCount%2; for (unsigned i = 0; i < PointCount-rem; i=i+2){ double2 distX = (double2)((( copysign(dimensions[0], dirX[i] ) - posX[i] ) * ( 1. / ( dirX[i]+1e-30 ) )), (( copysign(dimensions[0], dirX[i+1]) - posX[i+1] ) * ( 1. / ( dirX[i+1]+1e-30 ) ))); double2 distY = (double2)((( copysign(dimensions[1], dirY[i]) - posY[i] ) * ( 1. / ( dirY[i]+1e-30 ) )), (( copysign(dimensions[1], dirY[i+1]) - posY[i+1] ) * ( 1. / ( dirY[i+1]+1e-30 ) ))); double2 distZ = (double2)((( copysign(dimensions[2], dirZ[i]) - posZ[i] ) * ( 1. / ( dirZ[i]+1e-30 ) )), (( copysign(dimensions[2], dirZ[i+1]) - posZ[i+1] ) * ( 1. / ( dirZ[i+1]+1e-30 ) ))); double2 temp = (double2) ( fmin( distX.x, distY.x), distZ.x ); double2 temp2 = (double2) ( fmin( distX.y, distY.y), distZ.y ); distances[i] = fmin( temp.x, temp.y); distances[i+1] = fmin( temp2.x, temp2.y ); } }

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16

slide-19
SLIDE 19

Illustration borrowed from:Kronos Overview, November 2012

Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 16 / 16