investigation of the opencl support in the geantv s
play

Investigation of the OpenCL support in the GeantV's Vectorized - PowerPoint PPT Presentation

Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014. Outline What is OpenCL? VecGeom in a few words What are the goals? Results , conclusions Gabor Biro (ELTE, Hungary) OpenCL support


  1. Investigation of the OpenCL support in the GeantV's Vectorized Geometry Gabor Biro 22.09.2014.

  2. Outline ▶ What is OpenCL™? ▶ VecGeom in a few words ▶ What are the goals? ▶ Results , conclusions Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 1 / 16

  3. What is OpenCL? Khronos Group ▶ Khronos Group: founded in January 2000 ▶ Non-profit member-funded consortium focused on the creation of royalty-free open standards for parallel computing, graphics and dynamic media ( ∼ 100 copmanies) ▶ Currently 15 active standards, including OpenCL™ ▶ OpenCL (Open Computing Language) is a framework that increases application performance by enabling efficient parallel programming of a variety of CPUs, GPUs ▶ Initially developed by Apple ▶ OpenCL 1.0 was released on August 28, 2009 ▶ C++ Wrapper API: built on the top of the OpenCL C API 1.2 Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 2 / 16

  4. What is OpenCL? OpenCL and C++ ▶ AMD extension for C++ kernel language > major supported C++ features: kernel overloading, templates, namespaces, references… > major unsupported C++ features: virtual functions, abstract classes, dynamic memory allocation, the :: operator, STL and other standard C++ libraries… ▶ SYCL : ,,royalty-free, cross-platform C++ abstraction layer that builds on the underlying concepts, portability and efficiency of OpenCL, while adding the ease-of-use and flexibility of C++'' ▶ Provisional SYCL 1.2 specification was released on March 19, 2014 Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 3 / 16

  5. What is OpenCL? Heterogeneous systems ▶ Hierarchy of models: > platform model > execution model >> host programs >> kernels > memory model  Illustrations taken from: Introduction to OpenCL™ Programming, AMD Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 4 / 16

  6. What is OpenCL? host/device code host code kernel code cl::Context __kernel void kernelfuntion ( context (CL_DEVICE_TYPE_DEFAULT); __global float* in, __global float* out ){ std::vector<cl::Device> devices = context.getInfo<CL_CONTEXT_DEVICES>(); int id = get_global_id(0); out[id] = in[id] * in[id]; cl::Program } program(context, util::loadProgram(clFile), (...)); cl::CommandQueue queue(context); scalar code auto KernelFunctor = void function ( int datapoints, cl::make_kernel<cl::Buffer&, (...)> float* in, (program, "kernelfunction"); float* out ){ cl::Buffer inputbuf = for ( int i = 0; i < datapoints; i++ ){ cl::Buffer(context, CL_MEM_USE_HOST_PTR, out[i] = in[i] * in[i]; sizeof(Precision)*datapoints, input); } } KernelFunctor (cl::EnqueueArgs(queue, cl::NDRange(datapoints)), inputbuf, (...)); queue.finish(); Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 5 / 16

  7. What is OpenCL? Benefits ▶ OpenCL can manage all devices[devid].getInfo<cl_device_info>(); available computational resources where cl_device_info can be ▶ Software portability: all the ▶ CL_DEVICE_TYPE hardware implementation ▶ CL_DEVICE_VENDOR_ID specifics (such as drivers and runtime) are invisible to the ▶ CL_DEVICE_MAX_COMPUTE_UNITS upper-level software ▶ CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS programmers ▶ CL_DEVICE_MAX_WORK_ITEM_SIZES ▶ CL_DEVICE_MAX_WORK_GROUP_SIZES ▶ Very highly customizable: ▶ CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE the developer can choose ▶ CL_DEVICE_MAX_CLOCK_FREQUENCY the best hardware without ▶ CL_DEVICE_GLOBAL_MEM_SIZES having to reshuffle the ▶ … upper software infrastructure Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 6 / 16

  8. VecGeom Vectorized Geometry Generic kernel ▶ Parallelism in Input size > particle-level > primitive-level 1 ! Scalar (autovectorized?) ! ▶ Support multiple instantiation … architectures without having ! multiple implementations ! Vectorized 4 ! instantiation ! … ! GPU ! instantiation 1024  Illustration borrowed from: First experience with portable high-performance geometry code on GPU, J.d.F.L. Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 7 / 16

  9. VecGeom Generic templated code template <TranslationCode transCodeT, RotationCode rotCodeT> ▶ Generic templated code template <class Backend> VECGEOM_CUDA_HEADER_BOTH void ParallelepipedImplementation<transCodeT, rotCodeT>::DistanceToOut( ▶ Backend is specified during UnplacedParallelepiped const &unplaced, Vector3D<typename Backend::precision_v> const &point, compile time Vector3D<typename Backend::precision_v> const &direction, typename Backend::precision_v const &stepMax, typename Backend::precision_v &distance) { ▶ The specific method for a ...vectorized, generic computation... specific volume is called in a generic way } ▶ Calling a method for a given volume: v->Specialized() ->DistanceToOut(points, directions, fStepMax, distances); Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 8 / 16

  10. VecGeom + OpenCL Goals and motivations ▶ Goals : > investigate the OpenCL support with the least possible modification of the existing code > implement an OpenCL API (without creating too much extra code) > investigate the performance with AMD GPU's ▶ Motivations : > architecture independency > for GPU's the CUDA backend already exists, but there is no support for AMD cards ▶ Target of investigation: > simplest volume: Box ▶ Used: > AMD Accelerated Parallel Processing (APP) SDK > AMD OpenCL Static C++ Kernel Language Extension Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 9 / 16

  11. VecGeom + OpenCL Experiences ▶ The main problems are the unsupported C++ features like STL libraries > macros are not a good solution ▶ Structure of the kernel functions is different compared to the generic volume methods int3 id = (int3) ( get_global_id(0), get_global_id(1), get_global_id(2)); int gSize = get_global_size(0); get_local_id(0); get_num_groups(0); ... Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 10 / 16

  12. VecGeom + OpenCL Experiences ▶ OpenCL kernel launching process needs a different approach ▶ Structure of the kernel functions is different compared to the generic volume methods CommandQueue::enqueueNDRangeKernel(Kernel &kernel, NDRange &offset, NDRange &global, NDRange &local, VECTOR_CLASS<Event> *events, Event *event); ▶ Currently the OpenCL support of the volume methods is feasible only with new, external codes > minimal modification of existing codes (code maintenance) > new kernel codes with different working mechanism > with OpenCL one should receive at least as fast results as with the vectorized code Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 11 / 16

  13. VecGeom + OpenCL Benchmarks Figure : Benchmark with an Intel Core i3 M350 CPU Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 12 / 16

  14. VecGeom + OpenCL Benchmarks Figure : Benchmark with an ATi Radeon HD5970 GPU Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 13 / 16

  15. VecGeom + OpenCL Benchmarks Figure : Benchmark with an AMD Opteron 6376 CPU Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 14 / 16

  16. Conclusions ▶ Memory management with OpenCL is a difficult task ▶ Optimization and more benchmarks needed ▶ Current version of OpenCL needs some effort ▶ Need further investigations with OpenCL/SYCL Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16

  17. backup slides

  18. Kernel of the DistanceToOut __kernel void FastDistanceToOut(__global double* dimensions, __global double* posX, __global double* posY, __global double* posZ, __global double* dirX, __global double* dirY, __global double* dirZ, __global double* distances, unsigned PointCount){ unsigned rem = PointCount%2; for (unsigned i = 0; i < PointCount-rem; i=i+2){ double2 distX = (double2)((( copysign(dimensions[0], dirX[i] ) - posX[i] ) * ( 1. / ( dirX[i]+1e-30 ) )), (( copysign(dimensions[0], dirX[i+1]) - posX[i+1] ) * ( 1. / ( dirX[i+1]+1e-30 ) ))); double2 distY = (double2)((( copysign(dimensions[1], dirY[i]) - posY[i] ) * ( 1. / ( dirY[i]+1e-30 ) )), (( copysign(dimensions[1], dirY[i+1]) - posY[i+1] ) * ( 1. / ( dirY[i+1]+1e-30 ) ))); double2 distZ = (double2)((( copysign(dimensions[2], dirZ[i]) - posZ[i] ) * ( 1. / ( dirZ[i]+1e-30 ) )), (( copysign(dimensions[2], dirZ[i+1]) - posZ[i+1] ) * ( 1. / ( dirZ[i+1]+1e-30 ) ))); double2 temp = (double2) ( fmin( distX.x, distY.x), distZ.x ); double2 temp2 = (double2) ( fmin( distX.y, distY.y), distZ.y ); distances[i] = fmin( temp.x, temp.y); distances[i+1] = fmin( temp2.x, temp2.y ); } } Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 15 / 16

  19.  Illustration borrowed from: Kronos Overview, November 2012 Gabor Biro (ELTE, Hungary) OpenCL support in VecGeom 2014.09.22. 16 / 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend