New developments in the quantum ESPRESSO software distribution for - - PowerPoint PPT Presentation

new developments in the quantum espresso software
SMART_READER_LITE
LIVE PREVIEW

New developments in the quantum ESPRESSO software distribution for - - PowerPoint PPT Presentation

New developments in the quantum ESPRESSO software distribution for quantum simulations at the nanoscale Paolo Giannozzi Universit` a di Udine, Italy Workshop From experiments to theory & models... Roma Tor Vergata, 2017/12/5 Typeset by


slide-1
SLIDE 1

New developments in the quantum ESPRESSO software distribution for quantum simulations at the nanoscale

Paolo Giannozzi Universit` a di Udine, Italy Workshop From experiments to theory & models... Roma Tor Vergata, 2017/12/5

– Typeset by FoilT EX –

slide-2
SLIDE 2

At the nanoscale

Nanoscale: phenomena happening on a scale of lengths up to a few tens of nm. Can be studied using quantum, or first-principle, simulations, that is: calculations based on the electronic structure

slide-3
SLIDE 3

Size vs accuracy

slide-4
SLIDE 4

At the nanoscale: nanocatalysis

Cobalt-base catalyser for water splitting: J. Am. Chem. Soc. 135, 15353 (2013)

slide-5
SLIDE 5

At the nanoscale: systems of biological interest

Metal-β-amyloid interactions; Metallomics 4, 156 (2012).

slide-6
SLIDE 6

Quantum simulations: basics

Time-dependent Schr¨

  • dinger equation for nuclei R ≡ {

RI} and electrons r ≡ { ri}: i¯ h∂ ˆ Φ(r, R; t) ∂t =

  • I

¯ h2 2MI ∇2

  • RI −
  • i

¯ h2 2m∇2

  • ri + V (r, R)
  • ˆ

Φ(r, R; t) Born-Oppenheimer (or adiabatic) approximation, valid for MI >> m: ˆ Φ(r, R; t) ≃ Φ(R)Ψ(r|R)e−i ˆ

Et/¯ h

Problem splits into an electronic problem depending upon nuclear positions:

  • i

¯ h2 2m∇2

  • ri + V (r, R)
  • Ψ(r, R) = E(R)Ψ(r, R)

and a nuclear problem under an effective interatomic potential E(R), typically treated as classical, with forces on nuclei: FI = −∇

RIE(R).

slide-7
SLIDE 7

Density-Functional Theory

Transforms the many-electron problem into an equivalent problem of (fictitious) non-interacting electrons, the Kohn-Sham equations: Hφv ≡

  • − ¯

h2 2m∇2

  • r + VR(

r)

  • φv(

r) = ǫvφv( r) The effective potential is a functional of the charge density: VR( r) = −

  • I

ZIe2 | r − RI| + v[n( r)], n( r) =

  • v

|φv( r)|2 (Hohenberg-Kohn 1964, Kohn-Sham 1965). Exact form is unknown, but simple approximate forms yielding very accurate (ground-state) results are known.

slide-8
SLIDE 8

Density-Functional Theory II

The total energy is also a functional of the charge density: E ⇒ E[{φ}, R] = − ¯ h2 2m

  • v
  • φ∗

v(

r)∇2φv( r)d r +

  • VR(

r)n( r)d r + + e2 2 n( r)n( r′) | r − r′| d rd r′ + Exc[n( r)] +

  • I=J

e2 2 ZIZJ | RI − RJ| Kohn-Sham equations arise from the minimization of the energy functional: E(R) = min

φ E[{φ}, R],

  • φ∗

i (

r)φj( r)d r = δij Hellmann-Feynman theorem holds. Forces on nuclei:

  • FI = −∇

RIE(R) = −

  • n(

r)∇

RIVR(

r)d r

slide-9
SLIDE 9

Density-Functional Theory in practice

  • Expanding the Kohn-Sham orbitals into a suitable basis set turns Density-

Functional Theory into a multi-variate minimization problem, and the Kohn- Sham equations into a non-linear matrix eigenvalue problem

  • The use of pseudopotentials allows one to ignore chemically inert core states

and to use plane waves

  • Plane waves are orthogonal and the matrix elements of the Hamiltonian are

usually easy to calculate; the completeness of the basis is easy to check

  • Plane waves allow to efficiently calculate matrix-vector products and to solve

the Poisson equation using Fast Fourier Transforms (FFTs) (NB: Other approaches based on different basis sets and all-electron atoms exist)

slide-10
SLIDE 10

Requirements on effective software for quantum simulations at the nanoscale

  • Diffusion of first-principle techniques among non-specialists requires software

that is easy to use and (reasonably) error-proof

  • Challenging calculations stress the limits of available computer power: software

should be fast and efficient

  • Introducing innovation requires new ideas to materialize into new algorithms

through codes: software should be easy to extend and to improve

  • Complex problems require a mix of solutions coming from different approaches

and methods: software should be interoperable with other software

  • Finally, scientific ethics requires that results should be reproducible and

algorithms susceptible of validation

slide-11
SLIDE 11

The quantum ESPRESSO distribution

quantum ESPRESSO (QE) stands for Quantum opEn-Source Package for Research in Electronic Structure, Simulation, and Optimization QE is a distribution (an integrated suite) of software for first-principle simulations, i.e., atomistic calculations based on electronic structure, using density-functional theory, a plane-wave basis set, pseudopotentials. QE is freely available under the terms of the GNU General Public License Main goals of QE are

  • innovation in theoretical methods and numerical algorithms
  • efficiency on modern computer architectures

A great effort is also devoted to user friendliness and to the formation of a users’ and developers’ community QE exists since 2002, resulting from the merge of pre-existing packages; some core components have been under development for ∼ 30 years

slide-12
SLIDE 12

quantum ESPRESSO contributors

QE is one of the community codes of H2020 project MaX – Materials at the Exascale, receives contributions from many individuals and partner institutions in Europe and worldwide. Who “owns” QE ... ? ... the quantum ESPRESSO Foundation: a non–profit (“limited by guarantee”) company, based in London, that

  • coordinates and supports research, education, and outreach within the QE

community

  • owns the trademarks and protects the open-source character of QE
  • raises funds to foster the QE project and its development

Current members of the Foundation: SISSA, EPFL, ICTP, IOM-CNR, Cineca, North Texas University Oxford University

slide-13
SLIDE 13

Users’ community: factoids

  • 2000+ registered users for the pw forum mailing list
  • An average of ∼ 10 messages a days on pw forum
  • Latest version downloaded 9500 times in less than two months [*]
  • 30+ Schools or tutorials since 2002, attended by ∼ 1200 users
  • 4 developers’ schools since 2013, latest in 2017
  • Annual developers’ meeting since 2010

[*] Number may be inflated by bots, failed or repeated downloads, etc.

slide-14
SLIDE 14

This is the main documenting paper. After 8 years and ∼ 6000 citations ...

slide-15
SLIDE 15

... a new version is out. What happened meanwhile?

slide-16
SLIDE 16

Requirements on effective software for quantum simulations at the nanoscale

  • Diffusion of first-principle techniques among non-specialists requires software

that is easy to use and (reasonably) error-proof

  • Challenging calculations stress the limits of available computer power: software

should be fast and efficient

  • Introducing innovation requires new ideas to materialize into new algorithms

through codes: software should be easy to extend and to improve

  • Complex problems require a mix of solutions coming from different approaches

and methods: software should be interoperable with other software

  • Finally, scientific ethics requires that results should be reproducible and

algorithms susceptible of validation

slide-17
SLIDE 17

Verification and Validation of electronic-structure codes

Systematic comparisons of different pseudopotential and all-electron DFT codes: Reproducibility in density-functional theory calculations of solids, K. Lejaeghere et multis aliis, Science 351 (6280), aad3000 (2016), DOI 10.1126/science.aad3000 Tests precision of the computational methods, not physical accuracy of results. Main outcome: everybody is converging towards the same set of results.

slide-18
SLIDE 18

Comparing QE with Gaussian-based code CRYSTAL

Effect of the basis set Comparison of charge density in Si Comparison of charge density in Al

slide-19
SLIDE 19

Requirements on effective software for quantum simulations at the nanoscale

  • Diffusion of first-principle techniques among non-specialists requires software

that is easy to use and (reasonably) error-proof

  • Challenging calculations stress the limits of available computer power: software

should be fast and efficient

  • Introducing innovation requires new ideas to materialize into new algorithms

through codes: software should be easy to extend and to improve

  • Complex problems require a mix of solutions coming from different approaches

and methods: software should be interoperable with other software

  • Finally, scientific ethics requires that results should be reproducible and

algorithms susceptible of validation

slide-20
SLIDE 20

Solutions for interoperability

  • I/O with schema-based,

standard-compliant XML file, plus binary files (optionally in portable HDF5 format) for large records (e.g. wavefunctions, charge density). Allows easy parsing and transferral of data both inside QE and between QE and external software.

  • More modular code and parallelization logic. Allows to call QE code as a library

and to execute it inside a MPI communicator provided by the external software Applications:

  • QM-MM, with LAMMPS for the MM part (Comput. Phys. Commun. 195, 191

(2015), new version using MPI still under development)

  • Advanced minimization algorithms (basin hopping, genetic algorithms)
  • Path-Integral Molecular Dynamics with i-Pi (CPC 185, 1019 (2014))
  • High-throughput computing with AiiDA (Comput. Mater. Sci. 111 218 (2016))
slide-21
SLIDE 21

Requirements on effective software for quantum simulations at the nanoscale

  • Diffusion of first-principle techniques among non-specialists requires software

that is easy to use and (reasonably) error-proof

  • Challenging calculations stress the limits of available computer power: software

should be fast and efficient

  • Introducing innovation requires new ideas to materialize into new algorithms

through codes: software should be easy to extend and to improve

  • Complex problems require a mix of solutions coming from different approaches

and methods: software should be interoperable with other software

  • Finally, scientific ethics requires that results should be reproducible and

algorithms susceptible of validation

slide-22
SLIDE 22

Major improvements and extensions

Mostly in the field of “advanced functionals”:

  • New methods for van-der-Waals-bonded systems:

– non-local functionals (“vdw-DF”) – semi-empirical corrections: Grimme’s DFT+D2, DFT+D3 – non-so-empirical corrections; Tkatchenko-Scheffler, exchange-hole dipole moment model (XDM)

  • New methods to deal with hybrid functionals:

– Adaptively Compressed Exchange, also in conjunction with Selected Columns

  • f Density Matrix localization (under development)

– Car-Parrinello dynamics with localized Wannier functions

  • Usable meta-GGA functionals
slide-23
SLIDE 23

Non-local functionals in molecular crystals

New non-local (vdW-DF) functionals allow to deal with molecular crystals without semi-empirical schemes, with a computational effort comparable to plain DFT: First-principle molecular dynamics explains fourfold symmetry axis at ambient conditions, apparently inconsistent with three-fold symmetry of NH3BH3 molecules

slide-24
SLIDE 24

Hybrid functionals for plane waves

Double-loop algorithm

Pure DFT Outer loop: Do n = 1, ... (VX ϕ(n)

k )(

r) = −

N

  • i

ϕ(n)

i

( r)

  • d

r ′ ϕ(n)

i

( r ′)ϕ(n)

k (

r ′) | r − r ′| SLOW!! {ϕ(n)

i

, ψ(n,0)

i

} Inner loop: Do m = 1, ... (VX ψ(n,m)

k

)( r) = −

N

  • i

ϕ(n)

i

( r)

  • d

r ′ ϕ(n)

i

( r ′)ψ(n,m)

k

( r ′) | r − r ′| SLOW!! Iterative diagonalization Check Convergence on {ψ(n,m)

i

} at fixed {ϕ(n)

i

} End Inner loop Check Convergence on {ϕ(n)

i

} End Outer loop

The exchange integrals have to be evaluated at every iteration, even though the {ϕ(n)

i

} functions are not varying in the inner

  • nes.

Giannozzi et al., JPCM, 21, 395502 (2009)

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 4 / 18

slide-25
SLIDE 25

Exchange with localized orbitals: inner projection

We project the exchange operator over the set of occupied orbitals ˆ WX =

  • ij

ˆ VX |ϕi ϕi| ˆ VX|ϕj

−1 ϕj| ˆ

VX =

  • i

|ξi ξi| |ξk =

  • i

ˆ VX |ϕi L−T

ik

Adaptively Compressed Exchange (ACE) Inner projection methods. The application of ˆ WX during the SCF is equivalent to ˆ VX within the subspace of the projection ˆ WX |ψk =

  • i

|ξi ξi|ψk

Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015); L owdin, IJQC, 4, 231 (1971).

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 6 / 18

slide-26
SLIDE 26

Hybrid functionals for plane waves

Double-loop algorithm

Pure DFT Outer loop: Do n = 1, ... (VX ϕ(n)

k )(

r) = −

N

  • i

ϕ(n)

i

( r)

  • d

r ′ ϕ(n)

i

( r ′)ϕ(n)

k (

r ′) | r − r ′| SLOW!! WX =

  • i

|ξ(n) ξ(n)| = VX |ϕ(n) ϕ(n)|VX |ϕ(n)

−1 ϕ(n)| VX

WX |ϕ(n)

k = −

  • i

|ξ(n)

i

ξ(n)

i

|ϕ(n)

k

{ξ(n)

i

, ψ(n,0)

i

} Inner loop: Do m = 1, ... WX |ψ(n,m)

k

= −

  • i

|ξ(n)

i

ξ(n)

i

|ψ(n,m)

k

  • Iterative diagonalization

Check Convergence on {ψ(n,m)

i

} at fixed {ξ(n)

i

} End Inner loop Check Convergence on {ϕ(n)

i

} End Outer loop

With the localization step the exchange integrals in the

  • uter loop now

involve two localized functions.

Lin, JCTC, 12, 2242 (2016); Damle, Lin, Ying, JCTC, 11, 1463 (2015).

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 7 / 18

slide-27
SLIDE 27

Hybrid functionals for plane waves

2 4 6 8 10 12 14 16

#Iteration

200 400 600 800 1000

Wall time (sec) ACE Full Wall time along SCF iterations Exchange step

With the ACE method the cost of the inner iterations is analogous to the cost of a pure DFT method. Example: Ethylene, 1 CPU, 100 Ry, converged to 10−7 Ry Pure DFT: 6 Outer loop: 5 Inner loop: 2, 3, 3, 2

  • Tot. calls: 32/5

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 8 / 18

slide-28
SLIDE 28

Hybrid functionals for plane waves

( ˆ VXwk)( r) = −

N

  • i

wi( r)

  • d

r ′ wi( r ′)wk( r ′) | r − r ′| The products between two canonical orbitals are much more delocalized than the product of two localized

  • rbitals and this can be exploited in order to reduce the

cost of the exchange integrals evaluations.

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 10 / 18

slide-29
SLIDE 29

Hybrid functionals for plane waves

The SCDM method has been used for the localization and a threshold has been introduced in order to skip the smallest exchange integrals Sij =

  • d

r|wi( r)| · |wj( r)| ≤ threshold Spatial extent of Canonical and Localized (SCDM) orbitals.

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 11 / 18

slide-30
SLIDE 30

Hybrid functionals for plane waves

Ratio between the computational time of the ACE+SCDM method with respect to ACE as a function of the number of molecules.

Ivan Carnimeo (UniUd/SISSA) Exact exchange with plane waves 2017-10-02 14 / 18

slide-31
SLIDE 31

Requirements on effective software for quantum simulations at the nanoscale

  • Diffusion of first-principle techniques among non-specialists requires software

that is easy to use and (reasonably) error-proof

  • Challenging calculations stress the limits of available computer power: software

should be fast and efficient

  • Introducing innovation requires new ideas to materialize into new algorithms

through codes: software should be easy to extend and to improve

  • Complex problems require a mix of solutions coming from different approaches

and methods: software should be interoperable with other software

  • Finally, scientific ethics requires that results should be reproducible and

algorithms susceptible of validation

slide-32
SLIDE 32

Parallelization towards the exascale

Scalability of realistic calculations on up to tens of thousands cores, using mixed MPI-OpenMP parallelization, has been demonstrated. Careful

  • ptimization
  • f

nonscalable RAM and computations required! Scalability strongly depends upon the kind and size of system! More and more parallelization levels are being implemented CP Scalability on BG/Q, 1532-atom porphyrin-functionalized carbon nanotube (data from N. Varini et al., Comput. Phys. Commun. 184, 1827 (2013))

slide-33
SLIDE 33

New architectures: GPU

“Accelerated” architectures such as NVidia GPU’s are the current “big thing” in high-performance computing. Problem: large code rewriting needed to obtain interesting performances. Existing porting of QE to GPU’s, using NVidia’s CUDA language, has maintenability issues: as the code evolves, the GPU version lags behind. Solution (tentative!): rewrite selected computational kernels using CUDA Fortran. Integrates much better into the Fortran-based code of QE.

slide-34
SLIDE 34

Perspectives and Outlook

  • More packages for advanced methodologies
  • Better-structured distribution, with interfaces to external codes and to python

scripting

  • Porting to new hybrid and accelerated architectures
  • More parallelization everywhere, communication-reducing and latency-hiding

algorithms

slide-35
SLIDE 35

Credits

  • Thanks to all people whose slides and pictures I borrowed
  • Thanks to all people who contributed to QE
  • ...and thanks to you all