LLVM AMDGPU for High Performance Computing: are we competitive yet? - - PowerPoint PPT Presentation

llvm amdgpu for high performance computing are we
SMART_READER_LITE
LIVE PREVIEW

LLVM AMDGPU for High Performance Computing: are we competitive yet? - - PowerPoint PPT Presentation

LLVM AMDGPU for High Performance Computing: are we competitive yet? Vedran Mileti, HITS gGmbH Szilrd Pll, KTH Frauke Grter, HITS gGmbH Layers of GPU computing GPU accelerated apps and libraries CUDA OpenCL Our work NVIDIA AMD


slide-1
SLIDE 1

LLVM AMDGPU for High Performance Computing: are we competitive yet?

Vedran Miletić, HITS gGmbH Szilárd Páll, KTH Frauke Gräter, HITS gGmbH

slide-2
SLIDE 2

Layers of GPU computing

GPU accelerated apps and libraries CUDA OpenCL AMD proprietary compiler Libclc, LLVM, Mesa, and amdgpu/nouveau AMD proprietary driver NVIDIA proprietary driver Clang

Our work

NVIDIA proprietary compiler Clang

slide-3
SLIDE 3

State of the art: CUDA and OpenCL

  • CUDA

– 338 applications listed at

NVIDIA’s website

– Over 50% market share in

Top 500 (Nov 2016)

  • OpenCL

– ~70 applications listed

  • n Wikipedia
  • ~30 in Scientifjc

computing category

  • Couple of benchmarks

and toys

slide-4
SLIDE 4

OpenCL applications

  • Image taken from:

Ribeiro, João V., et al. "QwikMD—Integrative Molecular Dynamics Toolkit for Novices and Experts." Scientifjc reports 6 (2016).

  • Focus on

GROMACS, LAMMPS, OpenMM, ASL

slide-5
SLIDE 5

Running open source OpenCL stack on Radeon/FirePro/FireStream

  • AMD’s proprietary OpenCL driver and compiler

– GPUs released 2012 or later – Will be open sourced soon™

  • Mesa/LLVM

– AMD GPUs released 2009 or later – Open source from the beginning™

slide-6
SLIDE 6

Our work

  • No changes or minor changes in apps/libs
  • Improvements to LLVM, Clang, libclc, Mesa

– Missing math functions, OpenCL 1.2 API calls – Bug fjxes

slide-7
SLIDE 7

1,5 3 6 12 24 48 96 192 384 768 1536 3072 20 40 60 80 100 120 140

GROMACS OpenCL kernel execution time

AMDGPU-PRO AMDGPU

Systerm size Time

1,5 3 6 12 24 48 96 192 384 768 1536 3072 20 40 60 80 100 120 140

GROMACS OpenCL kernel execution time

AMDGPU-PRO AMDGPU

Systerm size Time

slide-8
SLIDE 8

AndersenThermostat BrownianIntegrator LangevinIntegrator VerletIntegrator 10 20 30 40 50 60

OpenMM test execution time

AMDGPU-PRO AMDGPU

Test

Time

melt_imd-gpu 10 20 30 40 50 60 70 80

LAMMPS example execution time

AMDGPU-PRO AMDGPU

Example Time

slide-9
SLIDE 9

testKernel testKernelMerger testPrivateVar 5 10 15 20 25 30 35 40 45 50

ASL test execution time

AMDGPU-PRO AMDGPU

Test Time

slide-10
SLIDE 10

Other OpenCL software

  • Blender

– Different users report performance issues and crashes

  • BEAGLE, phylogenetics library

– Made some progress

  • clBLAS and clFFT

– Implmented clEnqueueFillBuffer, requires more work – Required for Octopus (quantum chem), probably others

slide-11
SLIDE 11

Other OpenCL software

  • BOINC, CP2K, Theano

– Had users tell me “I would try it if worked”

  • clpeak, opencl-stream, SNU NPB

– Benchmarks

  • App or lib you care about?
slide-12
SLIDE 12

Acknowledgments

  • Matt Arsenault, AMD
  • Jan Vesely, Aaron Watry and Serge Martin, Mesa

contributors

  • Francisco Jerez, Intel
  • Peter Eastman, OpenMM
  • Tom Stellard, Red Hat
  • Freenode channel #radeon