alpaka an abstraction library for parallel kernel
play

Alpaka An Abstraction Library for Parallel Kernel Acceleration Erik - PowerPoint PPT Presentation

Alpaka An Abstraction Library for Parallel Kernel Acceleration Erik Zenker 1,2 , Benjamin Worpitz 1,2 , Ren Widera 1 ,Axel Huebl 1,2 , Guido Juckeland 1 , Andreas Knpfer 2 , Wolfgang E. Nagel 2 , Michael Bussmann 1 1 Helmholtz-Zentrum


  1. Alpaka – An Abstraction Library for Parallel Kernel Acceleration Erik Zenker 1,2 , Benjamin Worpitz 1,2 , René Widera 1 ,Axel Huebl 1,2 , Guido Juckeland 1 , Andreas Knüpfer 2 , Wolfgang E. Nagel 2 , Michael Bussmann 1 1 Helmholtz-Zentrum Dresden – Rossendorf 2 Technische Universität Dresden Prof. Peter Mustermann I Institut xxxxx I www.hzdr.de

  2. We Have Lasers : Risk of Forest Fires ! PICon GPU Electron Acceleration Ion Acceleration Plasma Instabilities with Lasers with Lasers    Compact X-Ray sources Tumor Therapy Astrophysics Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 2 { r.widera, e.zenker, g.juckeland }@hzdr.de

  3. PIConGPU ─ Scales up to 18,432 GPUs strong scaling weak scaling efficiency 105 10000 100 1000 efficiency [%] 95 Efficiency > 95% speedup 100 6.9 PFlop/s (SP) 90 ideal 1 to 32 10 8 to 256 85 64 to 2048 512 to 16384 ideal 4096 to 16384 PIConGPU 1 80 1 10 100 1000 10000 1 10 100 1000 10000 number of GPUs number of GPUs Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 3 { r.widera, e.zenker, g.juckeland }@hzdr.de

  4. Alpaka Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 4 { r.widera, e.zenker, g.juckeland }@hzdr.de

  5. Good News: There are Alpaka s on the Compute Meadow C++ TBB Threads Fibers  Single zero overhead interface to existing parallelism models  Single source C++11 kernels  Data-agnostic memory model Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 5 { r.widera, e.zenker, g.juckeland }@hzdr.de

  6. A Uniform Interface to All These Programming Models ? ▪ Interface is defined by a set of free functions with Application code template arguments ▪ Template arguments need to Zero overhead abstraction interface fulfill type requirements (concepts) Lib 1 Lib 2 Lib n ... ▪ Interface is extendable through more concepts implementations (models) C++ compilers are able to almost completely remove abstraction layers Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 6 { r.widera, e.zenker, g.juckeland }@hzdr.de

  7. Heterogeneous Codes Need to be Maintainable Heterogeneity Testability Sustainability Optimizability Openness Write once, Validate once, Porting implies Tune for good Open source execute get correct results minimal code performance and everywhere everywhere changes at minimum open standards coding effort Single Source Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 7 { r.widera, e.zenker, g.juckeland }@hzdr.de

  8. Abstract Hierarchical Redundant Parallelism Model Synchronize Grid Parallel Sequential Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 8 { r.widera, e.zenker, g.juckeland }@hzdr.de

  9. Abstract Hierarchical Redundant Parallelism Model Synchronize Grid Block Parallel Sequential Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 9 { r.widera, e.zenker, g.juckeland }@hzdr.de

  10. Abstract Hierarchical Redundant Parallelism Model Synchronize Grid Block Thread Parallel Sequential Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 10 { r.widera, e.zenker, g.juckeland }@hzdr.de

  11. Abstract Hierarchical Redundant Parallelism Model Synchronize Grid Block Thread Element Parallel ● Element level is an explicit sequential layer Sequential Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 11 { r.widera, e.zenker, g.juckeland }@hzdr.de

  12. Data Structure Agnostic Memory Model Explicit deep copy Grid Device Host Global Memory Memory Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 12 { r.widera, e.zenker, g.juckeland }@hzdr.de

  13. Data Structure Agnostic Memory Model Explicit deep copy Grid Block Device Host Shared Global Memory Memory Memory Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 13 { r.widera, e.zenker, g.juckeland }@hzdr.de

  14. Data Structure Agnostic Memory Model Explicit deep copy Grid Block Register Memory Thread Device Host Shared Global Memory Memory Memory Register Memory Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 14 { r.widera, e.zenker, g.juckeland }@hzdr.de

  15. Map the Abstraction Model to your Desired Acceleration Back-End CPU RAM Package Package L3 L3 Core Core Core Core L1/2 L1/2 L1/2 L1/2 R R R R AVX AVX AVX AVX ● Explicit mapping of parallelization levels to hardware Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 15 { r.widera, e.zenker, g.juckeland }@hzdr.de

  16. Map the Abstraction Model to your Desired Acceleration Back-End CPU Grid RAM Package Package L3 L3 Core Core Core Core Global Memory L1/2 L1/2 L1/2 L1/2 R R R R AVX AVX AVX AVX ● Explicit mapping of parallelization levels to hardware Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 16 { r.widera, e.zenker, g.juckeland }@hzdr.de

  17. Map the Abstraction Model to your Desired Acceleration Back-End CPU Grid RAM Block Package Package L3 L3 Core Core Core Core Core Core Core Core Global Memory L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 Shared Memory R R R R R R R R AVX AVX AVX AVX ● Explicit mapping of parallelization levels to hardware Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 17 { r.widera, e.zenker, g.juckeland }@hzdr.de

  18. Map the Abstraction Model to your Desired Acceleration Back-End CPU Grid RAM Block Package Package Thread L3 L3 Core Core Core Core Core Core Core Core Global Memory L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 Shared Memory R R R R R R R R AVX AVX AVX AVX Register Memory ● Explicit mapping of parallelization levels to hardware Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 18 { r.widera, e.zenker, g.juckeland }@hzdr.de

  19. Map the Abstraction Model to your Desired Acceleration Back-End CPU Grid RAM Block Package Package Thread L3 L3 Element Core Core Core Core Core Core Core Core Global Memory L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 L1/2 Shared Memory R R R R R R R R AVX AVX AVX AVX Register Memory ● Explicit mapping of parallelization levels to hardware Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 19 { r.widera, e.zenker, g.juckeland }@hzdr.de

  20. Map the Abstraction Model to your Desired Acceleration Back-End  Specific unsupported levels of the model can be ignored  Abstract interface allows to extend the set of mappings Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 20 { r.widera, e.zenker, g.juckeland }@hzdr.de

  21. Alpaka : Vector Addition Kernel struct VectorAdd { template<typename TAcc, typename TElem, typename TSize> ALPAKA_FN_ACC auto operator()( TAcc const & acc , TSize const & numElements, TElem const * const X, TElem * const Y) const -> void { using alp = alpaka; auto globalIdx = alp::idx:: getIdx <alp::Grid, alp::Threads>( acc )[0u]; auto elemsPerThread = alp::workdiv:: getWorkDiv <alp::Thread, alp::Elems>( acc )[0u]; auto begin = globalIdx * elemsPerThread; auto end = min(begin + elemsPerThread, numElements); for(TSize i = begin; i < end; ++i){ Y[i] = X[i] + Y[i]; } } }; Mitglied der Helmholtz-Gemeinschaft René Widera, Erik Zenker, Guido Juckeland · Computational Radiation Physics · www.hzdr.de/crp 21 { r.widera, e.zenker, g.juckeland }@hzdr.de

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend