LLVM AMDGPU for High Performance Computing: are we competitive yet? - - PowerPoint PPT Presentation
LLVM AMDGPU for High Performance Computing: are we competitive yet? - - PowerPoint PPT Presentation
LLVM AMDGPU for High Performance Computing: are we competitive yet? Vedran Mileti, HITS gGmbH Szilrd Pll, KTH Frauke Grter, HITS gGmbH Layers of GPU computing GPU accelerated apps and libraries CUDA OpenCL Our work NVIDIA AMD
Layers of GPU computing
GPU accelerated apps and libraries CUDA OpenCL AMD proprietary compiler Libclc, LLVM, Mesa, and amdgpu/nouveau AMD proprietary driver NVIDIA proprietary driver Clang
Our work
NVIDIA proprietary compiler Clang
State of the art: CUDA and OpenCL
- CUDA
– 338 applications listed at
NVIDIA’s website
– Over 50% market share in
Top 500 (Nov 2016)
- OpenCL
– ~70 applications listed
- n Wikipedia
- ~30 in Scientifjc
computing category
- Couple of benchmarks
and toys
OpenCL applications
- Image taken from:
Ribeiro, João V., et al. "QwikMD—Integrative Molecular Dynamics Toolkit for Novices and Experts." Scientifjc reports 6 (2016).
- Focus on
GROMACS, LAMMPS, OpenMM, ASL
Running open source OpenCL stack on Radeon/FirePro/FireStream
- AMD’s proprietary OpenCL driver and compiler
– GPUs released 2012 or later – Will be open sourced soon™
- Mesa/LLVM
– AMD GPUs released 2009 or later – Open source from the beginning™
Our work
- No changes or minor changes in apps/libs
- Improvements to LLVM, Clang, libclc, Mesa
– Missing math functions, OpenCL 1.2 API calls – Bug fjxes
1,5 3 6 12 24 48 96 192 384 768 1536 3072 20 40 60 80 100 120 140
GROMACS OpenCL kernel execution time
AMDGPU-PRO AMDGPU
Systerm size Time
1,5 3 6 12 24 48 96 192 384 768 1536 3072 20 40 60 80 100 120 140
GROMACS OpenCL kernel execution time
AMDGPU-PRO AMDGPU
Systerm size Time
AndersenThermostat BrownianIntegrator LangevinIntegrator VerletIntegrator 10 20 30 40 50 60
OpenMM test execution time
AMDGPU-PRO AMDGPU
Test
Time
melt_imd-gpu 10 20 30 40 50 60 70 80
LAMMPS example execution time
AMDGPU-PRO AMDGPU
Example Time
testKernel testKernelMerger testPrivateVar 5 10 15 20 25 30 35 40 45 50
ASL test execution time
AMDGPU-PRO AMDGPU
Test Time
Other OpenCL software
- Blender
– Different users report performance issues and crashes
- BEAGLE, phylogenetics library
– Made some progress
- clBLAS and clFFT
– Implmented clEnqueueFillBuffer, requires more work – Required for Octopus (quantum chem), probably others
Other OpenCL software
- BOINC, CP2K, Theano
– Had users tell me “I would try it if worked”
- clpeak, opencl-stream, SNU NPB
– Benchmarks
- App or lib you care about?
Acknowledgments
- Matt Arsenault, AMD
- Jan Vesely, Aaron Watry and Serge Martin, Mesa
contributors
- Francisco Jerez, Intel
- Peter Eastman, OpenMM
- Tom Stellard, Red Hat
- Freenode channel #radeon