Parallel scaling of Teters Minimization for Ab Initio Calculations - PowerPoint PPT Presentation

Introduction Parallelization Hunting the Overlap Parallel scaling of Teter’s Minimization for Ab Initio Calculations Torsten Hoefler Department of Computer Science Technical University of Chemnitz HPCNano Workshop 2006 Supercomputing’06 Tampa, FL, USA November 13th 2006 university-logo Torsten Hoefler Teter Parallelism

Introduction Parallelization Hunting the Overlap Outline Introduction 1 Introduction to ABINIT Teter’s Conjugate Gradient Minimization Parallelization 2 Already implemented Parallelization A new Proposal Verifying this Proposal Hunting the Overlap 3 Non blocking Collectives university-logo Torsten Hoefler Teter Parallelism

Introduction Introduction to ABINIT Parallelization Teter’s Conjugate Gradient Minimization Hunting the Overlap Outline Introduction 1 Introduction to ABINIT Teter’s Conjugate Gradient Minimization Parallelization 2 Already implemented Parallelization A new Proposal Verifying this Proposal Hunting the Overlap 3 Non blocking Collectives university-logo Torsten Hoefler Teter Parallelism

Introduction Introduction to ABINIT Parallelization Teter’s Conjugate Gradient Minimization Hunting the Overlap ABINIT Introduction ABINIT solves time-independent Schrödinger equation effective one-particle case, uses DFT � H tot Φ = E tot Φ ⇒ Eigenvalue problem Eigenvalues and -vectors determined with CG minimization (Teter et al.) wavefunction Φ written in plain-wave basis set university-logo Torsten Hoefler Teter Parallelism

Introduction Introduction to ABINIT Parallelization Teter’s Conjugate Gradient Minimization Hunting the Overlap ABINIT Program Flow (3) (4) calculate trial potential minimize electronic Energy (5) (2) (8) SCF−cycle calculate total Energy calculate Electron density calculate Potential not converged (1) (7) (6) choose Coefficients mix new Density check convergence Initialization converged Stop Start university-logo Torsten Hoefler Teter Parallelism

Introduction Introduction to ABINIT Parallelization Teter’s Conjugate Gradient Minimization Hunting the Overlap ABINIT Tracing vtowfk (97.3%/4.3%) cgwf (83.6%/1.3%) orthon (5.7%/5.6%) fourwf (27.4%/0.0%) projbd (36.0%/36.0%) nonlop (21.5%/0.0%) sg_fftrisc (27.4%/5.7%) nonlop_pl (21.5%/0.1%) sg_ffty (14.8%/14.8%) sg_fftpx (6.6%/6.6%) opernl4a (11.6%/10.3%) opernl4b (9.8%/8.7%) ⇒ 83% for Teter minimization university-logo Torsten Hoefler Teter Parallelism

Introduction Introduction to ABINIT Parallelization Teter’s Conjugate Gradient Minimization Hunting the Overlap Outline Introduction 1 Introduction to ABINIT Teter’s Conjugate Gradient Minimization Parallelization 2 Already implemented Parallelization A new Proposal Verifying this Proposal Hunting the Overlap 3 Non blocking Collectives university-logo Torsten Hoefler Teter Parallelism

Introduction Introduction to ABINIT Parallelization Teter’s Conjugate Gradient Minimization Hunting the Overlap Conjugate Gradient Operations dot- and matrix-vector product dot-product: � Φ i | Φ j � matrix-vector product: � H Φ � H = E e kin + V e loc + V e nl E e kin and V e loc in reciprocal (k-) space V e nl in real space ⇒ 3D-FFT to transform between real and reciprocal space university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Outline Introduction 1 Introduction to ABINIT Teter’s Conjugate Gradient Minimization Parallelization 2 Already implemented Parallelization A new Proposal Verifying this Proposal Hunting the Overlap 3 Non blocking Collectives university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal K-Point Parallelization Bands have to be minimized for each k-point Minimization for each k-point is independent All k-point data is only needed for the calculation of ETOT ⇒ straightforward parallelization ABINIT implementation: Good speedup :-) Uses only collective communication :-) Limited to nkpt :-( Uses MPI_COMM_WORLD :-( Uses MPI_BARRIER :-( university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Band Parallelization The Teter Method allows parallel CG Orthogonalization constraint forces non-ideal solution ⇒ tricky parallelization ABINIT implementation: Speedup depends on interconnect :-/ Uses Send/Recv :-( Limited by nband / c ( c not easily predictable) university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal G Parallelization Vector Distribution FFT ⇒ Two parallelization schemes: 1 1 2 2 Distribute plane wave coefficients PE0 3 3 4 4 Distribute real space FFT Grid 5 5 6 6 Strict load balancing PE1 7 7 8 8 9 9 Minimize communication 10 10 PE2 11 11 Possible to combine with Band and 12 12 13 13 k-Point parallelization 14 14 PE3 15 15 university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Real Space Distribution 3D−FFT Distribution 2D−FFT on z−planes 1D−FFT on xy−lines 1 2 FFT−Box FFT−Box 0 0 0 0 PE0 3 4 1 1 2 0 0 0 0 0 0 1 5 0 0 1 1 1 2 2 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 0 0 0 0 0 0 1 1 1 6 1 2 1 1 1 1 1 1 1 1 1 0 0 0 1 1 2 2 PE1 7 0 0 0 1 1 1 2 2 2 1 1 1 1 1 1 2 2 2 0 0 1 1 1 2 2 2 2 2 2 2 2 2 2 2 8 0 2 1 0 0 0 1 1 2 2 2 2 2 2 2 2 2 3 3 3 9 0 0 1 1 1 2 2 3 3 3 3 3 3 3 x 10 0 1 1 2 3 3 3 3 3 1 PE2 11 0 0 0 0 12 13 14 PE3 z y 15 MPI_ALLTOALL MPI_ALLTOALL university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Implementation Issues Necessary communication (complexity): Dot-products ( O ( 1 ) ) Computation of kinetic energy ( O ( 1 ) ) FFT transpose ( O ( natom ) ) Only collective communication: MPI_ALLREDUCE for reductions MPI_ALLTOALL for FFT transpose Principles: only coll. communication separate communicator simplification of the main code heavy usage of math librarys university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Benchmarking the Implementation of cgwf 60 SiO 2 , natom=43, nband=126, npw=48728 SiO 2 , natom=86, nband=251, npw=97624 linear 50 40 Speedup (s) 30 20 10 0 university-logo 0 10 20 30 40 50 60 # processors (P) Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Possible Reasons for limited Scalability serial parts (Amdahl’s law) allocations scalar calculation index reordering (packin,packout - FFT) communication overhead latency of blocking collective operations limits scalability significantly overhead will be modelled in the following university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal The LogP Model level Sender Receiver CPU Network or o s L g g time university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Modelling the MPI_ALLREDUCE → MPI_REDUCE to node 0 and MPI_BCAST o s f f s s P0 o r o s f s P1 L o r o s P2 o r o s P3 o r P4 o r P5 f s = max( o s , g) o r P6 o r , g) f r = max( o r P7 t red ( P , size ) = 2 · size · ( 2 o + L +( ⌈ log 2 P ⌉− 1 ) · max { g , 2 o + L } ) university-logo Torsten Hoefler Teter Parallelism

Introduction Already implemented Parallelization Parallelization A new Proposal Hunting the Overlap Verifying this Proposal Modelling the MPI_ALLTOALL → each node hast to send to all others single host: o s o s o s o s g g g P0 o r L P1 L o r L P2 L o r P3 o r P4 all hosts send, assuming FBB t a 2 a ( P , size ) = size · (( 2 o + L ) + ( P − 1 ) · ( g + o )) university-logo Torsten Hoefler Teter Parallelism

Parallel scaling of Teters Minimization for Ab Initio Calculations - PowerPoint PPT Presentation

Introduction Parallelization Hunting the Overlap Parallel scaling of Teters Minimization for Ab Initio Calculations Torsten Hoefler Department of Computer Science Technical University of Chemnitz HPCNano Workshop 2006 Supercomputing06

Ab initio modelling methods Al Kikhney EMBL Hamburg Ab initio shape reconstruction Log I(s)

Teter erbor boro Aircraft t Noise e Abateme ment t Advisory Committ ttee ee st Quarter 2018

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Ab Initio Models of Solar Activity Ab Initio Models of Solar Activity Robert Stein, Michigan State

Ab initio gene prediction Genome 559, Winter 2014 Ab initio gene prediction method Define

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Performance Scaling How is my parallel code performing and scaling? Performance metrics

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Structure of Cement Phases Structure of Cement Phases from ab initio Modeling Modeling from ab

Genome 559, Winter 2012 Ab initio gene prediction method Define parameters of real genes

Ab Initio Valence-Space Hamiltonians for Exotic Nuclei Jason D son D. H . Holt olt R.

Lecture IX: Ab Initio Nuclear Structure for Double-Beta Decay J. Engel November 1, 2017 Ab

Tuturial: ( ab initio ) Density Functional Calculations with FPLO Manuel Richter (IFW Dresden)

Ab initio nuclear structure calculations Thomas Papenbrock and Coworkers: G. Hagen, D. J. Dean,

Symmetry-adapted bases for ab initio structure and reaction theory Alexis Mercenne, Kristina

A Short Introduction to Servo Web Engines Hackfest 2014 Martin Robinsonn @abandonedwig The

Design of atomistic models of the little-known palladium oxide PdO 2 Diana Fabuov 1 , K.

A simplified ab initio cosmic-ray modulation model: Construction and predictive capabilities

Electromagnetic strengths in ab-initio approaches Sonia Bacca | TRIUMF Nuclear Halo

Add Talk title here Continuous Platform Evolution for Cost Optimization PRESENTER | DATE GAVIN

Parallel scaling of Teters Minimization for Ab Initio Calculations - PowerPoint PPT Presentation

Introduction Parallelization Hunting the Overlap Parallel scaling of Teters Minimization for Ab Initio Calculations Torsten Hoefler Department of Computer Science Technical University of Chemnitz HPCNano Workshop 2006 Supercomputing06

Ab initio modelling methods Al Kikhney EMBL Hamburg Ab initio shape reconstruction Log I(s)

Teter erbor boro Aircraft t Noise e Abateme ment t Advisory Committ ttee ee st Quarter 2018

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Ab Initio Models of Solar Activity Ab Initio Models of Solar Activity Robert Stein, Michigan State

Ab initio gene prediction Genome 559, Winter 2014 Ab initio gene prediction method Define

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Performance Scaling How is my parallel code performing and scaling? Performance metrics

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Outline Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large Principles of

Structure of Cement Phases Structure of Cement Phases from ab initio Modeling Modeling from ab

Genome 559, Winter 2012 Ab initio gene prediction method Define parameters of real genes

Ab Initio Valence-Space Hamiltonians for Exotic Nuclei Jason D son D. H . Holt olt R.

Lecture IX: Ab Initio Nuclear Structure for Double-Beta Decay J. Engel November 1, 2017 Ab

Tuturial: ( ab initio ) Density Functional Calculations with FPLO Manuel Richter (IFW Dresden)

Ab initio nuclear structure calculations Thomas Papenbrock and Coworkers: G. Hagen, D. J. Dean,

Symmetry-adapted bases for ab initio structure and reaction theory Alexis Mercenne, Kristina

A Short Introduction to Servo Web Engines Hackfest 2014 Martin Robinsonn @abandonedwig The

Design of atomistic models of the little-known palladium oxide PdO 2 Diana Fabuov 1 , K.

A simplified ab initio cosmic-ray modulation model: Construction and predictive capabilities

Electromagnetic strengths in ab-initio approaches Sonia Bacca | TRIUMF Nuclear Halo

Add Talk title here Continuous Platform Evolution for Cost Optimization PRESENTER | DATE GAVIN

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms