GPU acceleration of plane-wave codes using SIRIUS library Materials - PowerPoint PPT Presentation

GPU acceleration of plane-wave codes using SIRIUS library Materials Design Ecosystem at the Exascale: High-Performance and High-Throughput Computing Anton Kozhevnikov, CSCS January 29, 2018

Introduction

Piz Daint: #3 supercomputer in the world Cray XC50, 5320 nodes Intel Xeon E5-2690v3 12C, 2.6GHz, 64GB + NVIDIA Tesla P100 16GB 4.761 Teraflops / node

Piz Daint node layout 32 GB/s 732 GB/s bidirectional   over ~60 GB/s 16 Gb of PCIe x16 64 GB of CPU GPU high DDR4 host ~500 Gigaflops ~4.2 Teraflops bandwidth memory memory

Porting codes to GPUs No magic “silver bullet” exists!

Porting codes to GPUs No magic “silver bullet” exists! Usual steps in porting codes to GPUs

Porting codes to GPUs No magic “silver bullet” exists! Usual steps in porting codes to GPUs ▪ cleanup and refactor the code ▪ (possibly) change the data layout ▪ fully utilize CPU threads and prepare code for node-level parallelization ▪ move compute-intensive kernels to GPUs

Porting codes to GPUs ▪ CUDA (C / C++ / Fortran) ▪ OpenCL ▪ OpenACC ▪ OpenMP 4.0

Why do we need a separation of concerns?

Why do we need a separation of concerns? Users Computational Code scientists developers

Why do we need a separation of concerns? Users Computational Code scientists developers Supercomputer Code

Electronic-structure codes

Electronic structure codes Basis functions for KS states Periodic Bloch functions Localized orbitals (plane-waves or similar) Atomic potential treatment FLEUR Wien2K FHI-aims Full-potential Exciting FPLO Elk VASP CPMD CP2K Pseudo-potential Quantum ESPRESSO SIESTA Abinit OpenMX Qbox

Delta DFT codes effort

Pseudopotential plane-wave method ▪ Unit cell is mapped to a regular grid ▪ All functions are expanded in plane-waves Atomic potential is replaced by a pseudopotential ˆ X X | β α ξ i D α ξξ 0 h β α ▪ V PS = V loc ( r ) + ξ 0 | α ξξ 0

Pseudopotential plane-wave method ▪ Unit cell is mapped to a regular grid ▪ All functions are expanded in plane-waves ˆ X X | β α ξ i D α ξξ 0 h β α ▪ Atomic potential is replaced by a pseudopotential V PS = V loc ( r ) + ξ 0 | α ξξ 0 Basis functions: 1 e i ( G + k ) r ϕ G + k ( r ) = √ Ω

Pseudopotential plane-wave method ▪ Unit cell is mapped to a regular grid ▪ All functions are expanded in plane-waves ˆ X X | β α ξ i D α ξξ 0 h β α ▪ Atomic potential is replaced by a pseudopotential V PS = V loc ( r ) + ξ 0 | α ξξ 0 Basis functions: 1 e i ( G + k ) r ϕ G + k ( r ) = √ Ω Potential and density: X X V ( G ) e i Gr ρ ( G ) e i Gr V ( r ) = ρ ( r ) = G G

Pseudopotential plane-wave method ▪ Approximation to atomic potential ▪ Core states are excluded ▪ Number of basis functions: ~1000 / atom ▪ Number of valence states: ~0.001 - 0.01% of the total basis size ▪ Efficient iterative subspace diagonalization schemes exist ▪ Atomic forces can be easily computed ▪ Stress tensor can be easily computed

Full-potential linearized augmented plane-wave method ▪ Unit cell is partitioned into “muffin-tin” spheres and interstitial region ▪ Inside MT spheres spherical harmonic expansion is used ▪ In the interstitial region functions are expanded in plane-waves atom #2 atom #1 Interstitial

Full-potential linearized augmented plane-wave method ▪ Unit cell is partitioned into “muffin-tin” spheres and interstitial region ▪ Inside MT spheres spherical harmonic expansion is used ▪ In the interstitial region functions are expanded in plane-waves Basis functions: O ↵ 8 ` > X X A ↵ ` m ⌫ ( G + k ) u ↵ `⌫ ( r ) Y ` m (ˆ r ) r ∈ MT α > atom #2 > < ϕ G + k ( r ) = atom #1 ⌫ =1 ` m 1 > e i ( G + k ) r r ∈ I > √ > : Ω Interstitial

Full-potential linearized augmented plane-wave method ▪ Unit cell is partitioned into “muffin-tin” spheres and interstitial region ▪ Inside MT spheres spherical harmonic expansion is used ▪ In the interstitial region functions are expanded in plane-waves Basis functions: O ↵ 8 ` > X X A ↵ ` m ⌫ ( G + k ) u ↵ `⌫ ( r ) Y ` m (ˆ r ) r ∈ MT α > atom #2 > < ϕ G + k ( r ) = atom #1 ⌫ =1 ` m 1 > e i ( G + k ) r r ∈ I > √ > : Ω Interstitial Potential and density: 8 8 X X V ↵ ρ ↵ ` m ( r ) Y ` m (ˆ r ) r ∈ MT α ` m ( r ) Y ` m (ˆ r ) r ∈ MT α > > > > < < ` m ` m V ( r ) = ρ ( r ) = X X V ( G ) e i Gr ρ ( G ) e i Gr r ∈ I r ∈ I > > > > : : G G

Full-potential linearized augmented plane-wave method ▪ No approximation to atomic potential ▪ Core states are included ▪ Number of basis functions: ~100 / atom ▪ Number of valence states: ~15-20% of the total basis size ▪ Large condition number of the overlap matrix ▪ Full diagonalization of dense matrix is required (iterative subspace diagonalization schemes are not efficient) ▪ Atomic forces can be easily computed ▪ Stress tensor can’t be easily computed (N-point numerical scheme is often required)

Common features of the FP-LAPW and PP-PW methods ▪ Definition of the unit cell (atoms, atom types, lattice vectors, symmetry operations, etc.) ▪ Definition of the reciprocal lattice, plane-wave cutoffs, G vectors, G+k vectors ▪ Definition of the wave-functions ▪ FFT driver ▪ Generation of the charge density on the regular grid ▪ Generation of the XC-potential ▪ Symmetrization of the density, potential and occupancy matrices ▪ Low-level numerics (spherical harmonics, Bessel functions, Gaunt coefficients, spline interpolation, Wigner D-matrix, linear algebra wrappers, etc.)

SIRIUS library

Motivation for a common domain specific library Extend the legacy Fortran codes with the API calls to a domain-specific library which runs on GPUs and other novel architectures. Quantum ESPRESSO Exciting / Elk inherent PW / PAW inherent LAPW implementation implementation BLAS, PBLAS, LAPACK, ScaLAPACK, FFT CPU

Motivation for a common domain specific library Extend the legacy Fortran codes with the API calls to a domain-specific library which runs on GPUs and other novel architectures. Quantum ESPRESSO Exciting / Elk Quantum ESPRESSO Exciting / Elk inherent PW / PAW inherent LAPW inherent PW / PAW inherent LAPW implementation implementation implementation implementation SIRIUS domain specific library LAPW / PW / PAW implementation BLAS, PBLAS, LAPACK, ScaLAPACK, FFT, BLAS, PBLAS, LAPACK, ScaLAPACK, FFT cuBLAS, MAGMA, PLASMA, cuFFT CPU CPU GPU

Where to draw the line? SIRIUS domain specific library Eigen-value problem LAPW / PW / PAW implementation − 1 ⇣ ⌘ 2 ∆ + v eff ( r ) ψ j ( r ) = ε j ψ j ( r ) Effective potential construction Density generation ρ ( r 0 ) Z X ρ new ( r ) = | ψ j ( r ) | 2 | r 0 − r | d r 0 + v XC [ ρ ]( r ) + v ext ( r ) v eff ( r ) = j Density mixing ρ ( r ) = αρ new ( r ) + (1 − α ) ρ old ( r ) Output: wave-functions and eigen energies ψ j ( r ) ε j charge density and magnetization ρ ( r ) m ( r ) total energy , atomic forces and stress tensor F α E tot σ αβ

SIRIUS library ▪ full-potential (L)APW+lo ▪ non-magnetic, collinear and non-collinear magnetic ground states ▪ non-relativistic, ZORA and IORA valence solvers ▪ Dirac solver for core states ▪ norm-conserving, ultrasoft and PAW pseudopotentials ▪ non-magnetic, collinear and non-collinear magnetic ground states ▪ spin-orbit correction ▪ atomic forces ▪ stress tensor ▪ Gamma-point case

SIRIUS library https://github.com/electronic-structure/SIRIUS SIRIUS is a collection of classes that abstract away the different building blocks of PW and LAPW codes. The class composition hierarchy starts from the most primitive classes ( Communicator , mdarray , etc.) and progresses towards several high-level classes ( DFT_ground_state , Band , Potential , etc.). The code is written in C++11 with MPI, OpenMP and CUDA programming models. DFT_ground_state Band Local_operator Potential Density K_point_set K_point Non_local_operator Beta_projectors Periodic_function Matching_coefficients Simulation_context Unit_cell Radial_integrals Augmentation_operator Step_function Atom_type Radial_grid Atom Spline Eigensolver Wave_functions linalg dmatrix BLACS_grid FFT3D MPI_grid Gvec Communicator mdarray splindex matrix3d vector3d

GPU acceleration of plane-wave codes using SIRIUS library Materials - PowerPoint PPT Presentation

GPU acceleration of plane-wave codes using SIRIUS library Materials Design Ecosystem at the Exascale: High-Performance and High-Throughput Computing Anton Kozhevnikov, CSCS January 29, 2018 Introduction Piz Daint: #3 supercomputer in the world

Sirius 4.0: Let me Sirius that for you! EclipseCon France, June 2016 Sirius EclipseCon France,

25 April 2017 Sirius Facilities GmbH - Providing space for business Andrew Coombs CEO Sirius

Sirius Real Estate Business Presentation 2014 1 Sirius Real Estate Group History Listed on

Your Cloud Based Modeling Workbench in 15 minutes with Eclipse Sirius @melaniebats CTO @Obeo

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

A skydiver jumps out of a plane. What is the direction of her acceleration immediately after

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

Building Codes Building Codes Building Codes Building Codes 1 1 Builder Responsibilities

ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

INSPIRATION Faxton Campus St . Lukes Campus Faxton-St . Lukes Healthcare EDUCATION

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim

SIRIUS Source: Bourliere M, et al. Lancet Infect Dis. 2015;15:397-404. Ledipasvir-Sofosbuvir in

Sirius Web 100% open source cloud modeling platform Our Crew Mlanie Bats Stphane

Dynamics of a quantum particle in the Dynamics of a quantum particle in the presence of a

Estimating bed shear stress distribution from numerically modeled tides and wind waves on

Dispersive Quantization of Linear and Nonlinear Waves Peter J. Olver University of Minnesota

Low-code, GraphQL, Serverless Platform 2019 IMCS June 2019 Courtney Robinson Founder & CEO

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

I ntroduction to Nanoelectronics Nanoelectronics I ntroduction to Prof. Supriyo Datta ECE 453

Wave Phenomena Physics 15c Lecture 11 Fourier Analysis (H&L Sections 13.14) (Georgi

Causality in Lovelock theories of gravity Harvey Reall DAMTP, Cambridge University HSR, N.

GPU acceleration of plane-wave codes using SIRIUS library Materials - PowerPoint PPT Presentation

GPU acceleration of plane-wave codes using SIRIUS library Materials Design Ecosystem at the Exascale: High-Performance and High-Throughput Computing Anton Kozhevnikov, CSCS January 29, 2018 Introduction Piz Daint: #3 supercomputer in the world

Sirius 4.0: Let me Sirius that for you! EclipseCon France, June 2016 Sirius EclipseCon France,

25 April 2017 Sirius Facilities GmbH - Providing space for business Andrew Coombs CEO Sirius

Sirius Real Estate Business Presentation 2014 1 Sirius Real Estate Group History Listed on

Your Cloud Based Modeling Workbench in 15 minutes with Eclipse Sirius @melaniebats CTO @Obeo

A GPU-Inspired Soft Processor for High- Throughput Acceleration Throughput Acceleration Jeffrey

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

A skydiver jumps out of a plane. What is the direction of her acceleration immediately after

[9] Orthogonalization Finding the closest point in a plane Goal: Given a point b and a plane, find

Building Codes Building Codes Building Codes Building Codes 1 1 Builder Responsibilities

ECEN 5682 Theory and Practice of Error Control Codes Cyclic Codes Peter Mathys University of

Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source

INSPIRATION Faxton Campus St . Lukes Campus Faxton-St . Lukes Healthcare EDUCATION

GPU ACCELERATION OF CHOLMOD: BATCHING, HYBRID AND MULTI-GPU Steve Rennich, Darko Stosic, Tim

SIRIUS Source: Bourliere M, et al. Lancet Infect Dis. 2015;15:397-404. Ledipasvir-Sofosbuvir in

Sirius Web 100% open source cloud modeling platform Our Crew Mlanie Bats Stphane

Dynamics of a quantum particle in the Dynamics of a quantum particle in the presence of a

Estimating bed shear stress distribution from numerically modeled tides and wind waves on

Dispersive Quantization of Linear and Nonlinear Waves Peter J. Olver University of Minnesota

Low-code, GraphQL, Serverless Platform 2019 IMCS June 2019 Courtney Robinson Founder &amp; CEO

Computer Graphics Spectral Analysis Philipp Slusallek Spatial Frequency Frequency

I ntroduction to Nanoelectronics Nanoelectronics I ntroduction to Prof. Supriyo Datta ECE 453

Wave Phenomena Physics 15c Lecture 11 Fourier Analysis (H&amp;L Sections 13.14) (Georgi

Causality in Lovelock theories of gravity Harvey Reall DAMTP, Cambridge University HSR, N.

Low-code, GraphQL, Serverless Platform 2019 IMCS June 2019 Courtney Robinson Founder & CEO

Wave Phenomena Physics 15c Lecture 11 Fourier Analysis (H&L Sections 13.14) (Georgi