openatom first principles gw method for electronic
play

OpenAtom: First Principles GW method for electronic excitation - PowerPoint PPT Presentation

OpenAtom: First Principles GW method for electronic excitation Minjung Kim, Subhasish Mandal, and Sohrab Ismail-Beigi Yale University Eric Mikida, Kavitha Chandrasekar, Eric Bohm, Nikhil Jain, and Laxmikant Kale University of Illinois at


  1. OpenAtom: First Principles GW method for electronic excitation Minjung Kim, Subhasish Mandal, and Sohrab Ismail-Beigi Yale University Eric Mikida, Kavitha Chandrasekar, Eric Bohm, Nikhil Jain, and Laxmikant Kale University of Illinois at Urbana-Champaign Qi Li and Glenn Martyna IBM T.J. Watson Research Center

  2. Density Functional Theory (DFT) Energy functional E [ n ] of electron density n ( r ) Minimizing over n ( r ) gives exact ‣ Ground-state energy E 0 ‣ Ground-state density n ( r ) equivalent to Kohn-Sham equations Minimum condition LDA/GGA for E xc : good geometries and total energies § Bad band gaps and excitations § Hohenberg & Kohn, Phys. Rev. (1964); Kohn and Sham, Phys. Rev. (1965).

  3. DFT: problems with excitations Energy gaps (eV) Material LDA Expt. [1] Diamond 3.9 5.48 Si 0.5 1.17 LiCl 6.0 9.4 [1] Landolt-Bornstien, vol. III; Baldini & Bosacchi, SrTiO 3 2.0 3.25 Phys. Stat. Solidi (1970). Solar spectrum

  4. DFT: problems with energy alignment Interfacial systems: § Electrons can transfer across e - § Depends on energy level alignment across interface § DFT has errors in band energies § Is any of it real?

  5. One particle Green’s function ( r ’ ,0) ( r,t ) Dyson Equation: DFT:

  6. Green’s function successes Quasiparticle gaps (eV) Material LDA GW Expt. Diamond 3.9 5.6* 5.48 Si 0.5 1.3* 1.17 LiCl 6.0 9.1* 9.4 SrTiO 3 2.0 3.4-3.8 3.25 * Hybertsen & Louie, Phys. Rev. B (1986) Band structure of Cu Strokov et al ., PRL/PRB (1998/2001)

  7. What is a big system for GW? P3HT polymer § Band alignment for this potential photovoltaic system? § 100s of atoms/unit cell § Not possible routinely (with current software) Zinc oxide nanowire

  8. GW is expensive Scaling with number of atoms N DFT: N 3 But in practice the GW is the killer GW: N 4 (gives better bands) BSE: N 6 (gives optical excitations) a nanoscale system with 50-75 atoms (GaN) DFT: 1 cpu x hours ∴ Focus on GW GW: 91 cpu x hours BSE: 2 cpu x hours

  9. Steps for typical G 0 W 0 calculation Stage 1 : Run DFT calc. on structure à output : ε i and 𝜔 i ( r ) P ( r, r 0 ) = @ n ( r ) Stage 2.1 : compute Polarizability matrix @ V ( r 0 ) Stage 2.2 : double FFT rows and columns à P(G,G’) Stage 3 : compute and invert dielectric screening function p p → ✏ − 1 ✏ = I − V coul ∗ P ∗ V coul Stage 4 : “plasmon-pole” method à dynamic screening → ✏ − 1 ( ! ) Stage 5 : put together ε i , 𝜔 i ( r ) and à self-energy 𝛵 ( 𝜕 ) ✏ − 1 ( ! )

  10. What is so expensive in GW? One key element : response of electrons to perturbation P ( r,r’ ) = Response of electron density n ( r ) at position r to change of potential V ( r’ ) at position r’

  11. What is so expensive in GW? One key element : response of electrons to perturbation Standard perturbation theory expression Problems: 1. Must generate “all” empty states (sum over c ) 2. Lots of FFTs to get functions 𝜔 i ( r ) functions 3. Enormous outer produce to form P 4. Dense r grid : P huge in memory

  12. Computing P in Charm++ * for all l, m Basic Computation: f lm = ψ l × ψ m † for all f P += f lm f lm Parallel decomposition: Ψ Vectors 1D Chare Array L occupied M unoccupied … R P Matrix 2D Tiles 2D Chare Array R R

  13. Computing P in Charm++ 1.Duplicate occupied states on each node ψ ψ ψ

  14. Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors ψ ψ ψ ψ

  15. Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors 3.Locally update each matrix tile P P P P P P P P P

  16. Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors 3.Locally update each matrix tile 4.Repeat step 2 for next unoccupied state

  17. Parallel performance: P calculation § 108 atom bulk Si § 216 occupied § 1832 unoccupied § 1 k point § 32 processors per node § FFT grids: same accuracy OA 42x42x22 BGW 111x55x55 Supercomputer : Mira (ANL) : BQ BlueGene/Q

  18. Parallel performance: P calculation § 108 atom bulk Si Scaling/on/BlueWaters/ 1000 § 216 occupied 32/cores/per/node § 1832 unoccupied 100 Time(Sec) § 1 k point § 32 processors per node 10 OpenAtom BerkeleyGW1.2 § FFT grids: same accuracy 1 OA 42x42x22 1 10 100 1000 10000 Number/of/Nodes BGW 111x55x55 Supercomputer : Blue Waters (NCSA) : Cray XE6

  19. Reducing the scaling: quartic to cubic & ×𝑂 ( ×𝑂 ) § O(N 4 ) = 𝑂 % § Sum-over-state (i.e., sum over unoccupied c band) not to blame: removal of unocc. states still O(N 4 ) but lower prefactor* § Working in r-space can reduce to O(N 3 ) [see also †] * Bruneval and Gonze, PRB 78 (2008); Berger, Reining, Sottile, PRB 82 (2010) * Umari, Stenuit, Baroni, PRB 81 , (2010) * Giustino, Cohen, Louie, PRB 81 , (2010) * Wilson, Gygi, Galli, PRB 78 , (2008); Govoni, Galli, J. Chem. Th. Comp ., 11 (2015) * Gao, Xia, Gao, Zhang, Sci. Rep. 6 (2016) † Foerster, Koval, Sanchez-Portal, JCP 135 (2011) † Liu, Kaltak, Klimes and Kresse, PRB 94 , (2016)

  20. � � � � What’s special about r-space? Quasi-philosophical: all basis good in quantum mechanics, why is r-space special? Observable is diagonal in the best basis Practical: P is separable in r-space 4 1 = 1 𝑒𝑦 𝑓 < = > <= @ ? 𝜗 ) − 𝜗 ( 5 4 𝑄 𝑠, 𝑠 - = −2 1 𝑒𝑦 ∗ (𝑠)𝜔 ) (𝑠′)𝑓 <= > ? 6 𝜔 ( (𝑠)𝜔 ( ∗ (𝑠′)𝑓 = @ ? 6 𝜔 ) 5 ) ( separable G H 4 1 𝑔(𝑨)𝑓 <D Gauss-Laguerre quadrature: 𝑒𝑦 ≈ 6 𝜕 F 𝑔 𝑨 F 5 F G H 𝑄 𝑠, 𝑠 - = −2 6 𝜕 F 𝑓 ? L ∗ (𝑠)𝜔 ) (𝑠′)𝑓 <= > ? L 6 𝜔 ( (𝑠)𝜔 ( & 𝑂 M (𝑂 ) +𝑂 ( ) ∝ 𝑂 P ∗ (𝑠′)𝑓 = @ ? L 𝑂 M is intensive 𝑂 % 6 𝜔 ) F ) (

  21. Windowed cubic Laplace method 50 § N GL depends on U VS E bw = E cmax - E vmin 40 U WXY 30 N GL § Largest error: 𝐹 ) − 𝐹 ( = 𝐹 [ or 𝐹 \] 20 10 0 0 100 200 300 400 500 E bw /E g 𝑄 = 𝑄 + 𝑄 &T + 𝑄 + 𝑄 && Example: 2 by 2 windows • TT T& 𝑄 &T {E v } 1 {E c } 2 {E v } 2 {E c } 1 E E c,max E v,max E c,min E v,min G S@ G S> N wv : # windows for E v 𝑄 𝑠, 𝑠 - = 6 6 𝑄 QR (𝑠, 𝑠 - ) N wc : # of windows for E c Q R Save computation: small N GL for each window pair § Especially for materials with small band gaps §

  22. � Estimate the computational costs Computation cost can be estimated with E bw and E g : G @S G >S R^? − 𝐹 (Q R^? − 𝐹 )R QR R`b R`b 𝐹 \] 𝐹 (Q R`b 𝑂 ( − 𝐹 )R 𝐷 ∝ 6 6 R`b 𝑂 ) R^? − 𝐹 ( R^? − 𝐹 ) QR 𝐹 [ 𝐹 ( 𝐹 ) Q R Example: 2x2 window Real computational costs Estimated computational costs × 10 4 2.5 200 ∗ − 𝐹 (,R`b 𝐹 (,%^_`a = 𝐹 ( 2 150 ∗ 𝐹 (,R^? − 𝐹 ( C simple C elab 100 1.5 ∗ − 𝐹 ),R`b 𝐹 ),%^_`a = 𝐹 ) 50 1 ∗ 𝐹 ),R^? − 𝐹 ) 0 0.5 9 9 9 9 1 1 1 1 Ec ratio Ev ratio Ec ratio Ev ratio

  23. Windowed Laplace: example § Si crystal (16 atoms) § MgO crystal (16 atoms) § Number of bands: 399 § Number of bands: 433 § 𝑂 ]( =1, 𝑂 ]) =4 § 𝑂 ]( =1, 𝑂 ]) =4 d\a(e %^_`a Compared to O(N 4 ) method, for bigger system ratio is G ^_ Tf ⁄

  24. Do I care in practice? Correct practical comparison: • Our N 3 method vs. available N 4 method with acceleration • Crossover is at very few atoms: N 3 method already competitive for small systems • 2 atoms Si , 8 k-points • Yambo N 4 GW software • BG* acceleration * Bruneval & Gonze, PRB 78 (2008)

  25. � � Windowed Laplace method for self-energy Dynamic GW self-energy: m 𝜔 %b 𝜔 % i b ∗ 𝐶 %,% i jkb = 6 m : residues Σ(𝜕) %,% i 𝐶 %,% i 𝜕 − 𝜗 b + 𝑡𝑕𝑜(𝜈 − 𝜗 b )𝜕 m 𝜕 m : energies of the poles of 𝑋(𝑠) %,%- m,b 𝐺 𝑦 = 1 m 𝜔 %b 𝜔 % i b ∗ = 6 𝐶 %,% i 𝐺(𝜕 − 𝜗 b ± 𝜕 m ) 𝑦 m,b 1 1 Gauss-Laguerre quadrature not < 0 > 0 OR 𝜕 − 𝜗 b ± 𝜕 m 𝜕 − 𝜗 b ± 𝜕 m appropriate G YS G xS R`b ≤ 𝜕 − 𝜗 b < 𝑓 R R^? 𝑓 R Σ(𝜕) = 6 6 Σ(𝜕) QR R`b ≤ ±𝜕 m < Ω Q R^? Ω Q Q R

  26. New quadrature for overlapping windows New quadrature Size of quadrature grid n q n q % error ( 𝒇 <𝒘<𝒘 𝟑 /𝟑 ) ( 𝒇 <𝒘 ) 5 6 1 1 24 1 0.1 124 5 0.01 547 15 4 𝐺 𝑦 = 𝐽𝑛 1 𝑥 𝑤 𝑓 `(? 𝑒𝑤 0.001 2216 36 5 𝑥 𝑤 = 𝑓 <( 𝑥 𝑤 = 𝑓 <(<( } /&

  27. Results - G 0 W 0 gap § Si crystal (16 atoms) § Number of bands: 399 § 𝑂 m] =15, 𝑂 b] =30 Si 1.65 Laplace+windowing N 4 1.6 G 0 W 0 E g (eV) 1.55 1.5 1.45 1.4 1.35 0 0.1 0.2 0.3 0.4 0.5 ratio of computation to N 4 method

  28. Where we are with OpenAtom GW Phase Serial Parallel 1 Compute P in RSpace Complete Complete 2 FFT P to GSpace Complete Complete 3 Invert epsilon Complete Complete 4 Plasmon pole Complete In Progress 5 COHSEX self-energy Complete Complete 6 Dynamic self-energy Complete In Progress 7 Coulomb Truncation Future Future Aim to release parallel COHSEX version late spring 2018

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend