Investigation of the OpenCL support in the GeantV's Vectorized - - PowerPoint PPT Presentation
Investigation of the OpenCL support in the GeantV's Vectorized - - PowerPoint PPT Presentation
Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014. Outline What is OpenCL? VecGeom in a few words What are the goals? Results , conclusions Gabor Biro (ELTE, Hungary) OpenCL support
Outline
▶ What is OpenCL™? ▶ VecGeom in a few words ▶ What are the goals? ▶ Results, conclusions
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 1 / 16
What is OpenCL?
Khronos Group
▶ Khronos Group: founded in January 2000 ▶ Non-profit member-funded consortium focused on the creation of
royalty-free open standards for parallel computing, graphics and dynamic media (∼100 copmanies)
▶ Currently 15 active standards, including OpenCL™ ▶ OpenCL (Open Computing Language) is a framework that increases
application performance by enabling efficient parallel programming of a variety of CPUs, GPUs
▶ Initially developed by Apple ▶ OpenCL 1.0 was released on August 28, 2009 ▶ C++ Wrapper API: built on the top of the OpenCL C API 1.2
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 2 / 16
What is OpenCL?
OpenCL and C++
▶ AMD extension for C++ kernel language > major supported C++ features: kernel overloading, templates, namespaces,
references…
> major unsupported C++ features: virtual functions, abstract classes, dynamic
memory allocation, the :: operator, STL and other standard C++ libraries…
▶ SYCL: ,,royalty-free, cross-platform C++ abstraction layer that builds on the
underlying concepts, portability and efficiency of OpenCL, while adding the ease-of-use and flexibility of C++''
▶ Provisional SYCL 1.2 specification was released on March 19, 2014
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 3 / 16
What is OpenCL?
Heterogeneous systems
▶ Hierarchy of models: > platform model > execution model
>> host programs >> kernels
> memory model
Illustrations taken from: Introduction to OpenCL™ Programming, AMD
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 4 / 16
What is OpenCL?
host/device code
host code
cl::Context context (CL_DEVICE_TYPE_DEFAULT); std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>(); cl::Program program(context, util::loadProgram(clFile), (...)); cl::CommandQueue queue(context); auto KernelFunctor = cl::make_kernel<cl::Buffer&, (...)> (program, "kernelfunction"); cl::Buffer inputbuf = cl::Buffer(context, CL_MEM_USE_HOST_PTR, sizeof(Precision)*datapoints, input); KernelFunctor (cl::EnqueueArgs(queue, cl::NDRange(datapoints)), inputbuf, (...)); queue.finish();
kernel code
__kernel void kernelfuntion ( __global float* in, __global float* out ){ int id = get_global_id(0);
- ut[id] = in[id] * in[id];
}
scalar code
void function ( int datapoints, float* in, float* out ){ for ( int i = 0; i < datapoints; i++ ){
- ut[i] = in[i] * in[i];
} }
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 5 / 16
What is OpenCL?
Benefits
▶ OpenCL can manage all
available computational resources
▶ Software portability: all the
hardware implementation specifics (such as drivers and runtime) are invisible to the upper-level software programmers
▶ Very highly customizable:
the developer can choose the best hardware without having to reshuffle the upper software infrastructure
devices[devid].getInfo<cl_device_info>();
where cl_device_info can be
▶ CL_DEVICE_TYPE ▶ CL_DEVICE_VENDOR_ID ▶ CL_DEVICE_MAX_COMPUTE_UNITS ▶ CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS ▶ CL_DEVICE_MAX_WORK_ITEM_SIZES ▶ CL_DEVICE_MAX_WORK_GROUP_SIZES ▶ CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE ▶ CL_DEVICE_MAX_CLOCK_FREQUENCY ▶ CL_DEVICE_GLOBAL_MEM_SIZES ▶ …
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 6 / 16
VecGeom
Vectorized Geometry
▶ Parallelism in > particle-level > primitive-level ▶ Support multiple
architectures without having multiple implementations
1!
! ! !
4!
! ! !
1024 … … Input size
Generic kernel
Scalar (autovectorized?) instantiation
Vectorized instantiation GPU instantiation
Illustration borrowed from: First experience with portable high-performance geometry code on GPU, J.d.F.L.
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 7 / 16
VecGeom
Generic templated code
▶ Generic templated code ▶ Backend is specified during
compile time
▶ The specific method for a
specific volume is called in a generic way
template <TranslationCode transCodeT, RotationCode rotCodeT> template <class Backend> VECGEOM_CUDA_HEADER_BOTH void ParallelepipedImplementation<transCodeT, rotCodeT>::DistanceToOut( UnplacedParallelepiped const &unplaced, Vector3D<typename Backend::precision_v> const &point, Vector3D<typename Backend::precision_v> const &direction, typename Backend::precision_v const &stepMax, typename Backend::precision_v &distance) { ...vectorized, generic computation... } ▶ Calling a method for a given volume: v->Specialized()
- >DistanceToOut(points, directions,
fStepMax, distances);
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 8 / 16
VecGeom + OpenCL
Goals and motivations
▶ Goals: > investigate the OpenCL support with the least possible modification of the existing
code
> implement an OpenCL API (without creating too much extra code) > investigate the performance with AMD GPU's ▶ Motivations: > architecture independency > for GPU's the CUDA backend already exists, but there is no support for AMD cards ▶ Target of investigation: > simplest volume: Box ▶ Used: > AMD Accelerated Parallel Processing (APP) SDK > AMD OpenCL Static C++ Kernel Language Extension
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 9 / 16
VecGeom + OpenCL
Experiences
▶ The main problems are the unsupported C++ features like STL libraries > macros are not a good solution ▶ Structure of the kernel functions is different compared to the generic volume
methods
int3 id = (int3) ( get_global_id(0), get_global_id(1), get_global_id(2)); int gSize = get_global_size(0); get_local_id(0); get_num_groups(0); ...
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 10 / 16
VecGeom + OpenCL
Experiences
▶ OpenCL kernel launching process needs a different approach ▶ Structure of the kernel functions is different compared to the generic volume
methods
CommandQueue::enqueueNDRangeKernel(Kernel &kernel, NDRange &offset, NDRange &global, NDRange &local, VECTOR_CLASS<Event> *events, Event *event);
▶ Currently the OpenCL support of the volume methods is feasible only with
new, external codes
> minimal modification of existing codes (code maintenance) > new kernel codes with different working mechanism > with OpenCL one should receive at least as fast results as with the vectorized code
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 11 / 16
VecGeom + OpenCL
Benchmarks
Figure : Benchmark with an Intel Core i3 M350 CPU
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 12 / 16
VecGeom + OpenCL
Benchmarks
Figure : Benchmark with an ATi Radeon HD5970 GPU
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 13 / 16
VecGeom + OpenCL
Benchmarks
Figure : Benchmark with an AMD Opteron 6376 CPU
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 14 / 16
Conclusions
▶ Memory management with OpenCL is a difficult task ▶ Optimization and more benchmarks needed ▶ Current version of OpenCL needs some effort ▶ Need further investigations with OpenCL/SYCL
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16
backup slides
Kernel of the DistanceToOut
__kernel void FastDistanceToOut(__global double* dimensions, __global double* posX, __global double* posY, __global double* posZ, __global double* dirX, __global double* dirY, __global double* dirZ, __global double* distances, unsigned PointCount){ unsigned rem = PointCount%2; for (unsigned i = 0; i < PointCount-rem; i=i+2){ double2 distX = (double2)((( copysign(dimensions[0], dirX[i] ) - posX[i] ) * ( 1. / ( dirX[i]+1e-30 ) )), (( copysign(dimensions[0], dirX[i+1]) - posX[i+1] ) * ( 1. / ( dirX[i+1]+1e-30 ) ))); double2 distY = (double2)((( copysign(dimensions[1], dirY[i]) - posY[i] ) * ( 1. / ( dirY[i]+1e-30 ) )), (( copysign(dimensions[1], dirY[i+1]) - posY[i+1] ) * ( 1. / ( dirY[i+1]+1e-30 ) ))); double2 distZ = (double2)((( copysign(dimensions[2], dirZ[i]) - posZ[i] ) * ( 1. / ( dirZ[i]+1e-30 ) )), (( copysign(dimensions[2], dirZ[i+1]) - posZ[i+1] ) * ( 1. / ( dirZ[i+1]+1e-30 ) ))); double2 temp = (double2) ( fmin( distX.x, distY.x), distZ.x ); double2 temp2 = (double2) ( fmin( distX.y, distY.y), distZ.y ); distances[i] = fmin( temp.x, temp.y); distances[i+1] = fmin( temp2.x, temp2.y ); } }
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16
Illustration borrowed from:Kronos Overview, November 2012
Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 16 / 16