 
              OpenAtom: First Principles GW method for electronic excitation Minjung Kim, Subhasish Mandal, and Sohrab Ismail-Beigi Yale University Eric Mikida, Kavitha Chandrasekar, Eric Bohm, Nikhil Jain, and Laxmikant Kale University of Illinois at Urbana-Champaign Qi Li and Glenn Martyna IBM T.J. Watson Research Center
Density Functional Theory (DFT) Energy functional E [ n ] of electron density n ( r ) Minimizing over n ( r ) gives exact ‣ Ground-state energy E 0 ‣ Ground-state density n ( r ) equivalent to Kohn-Sham equations Minimum condition LDA/GGA for E xc : good geometries and total energies § Bad band gaps and excitations § Hohenberg & Kohn, Phys. Rev. (1964); Kohn and Sham, Phys. Rev. (1965).
DFT: problems with excitations Energy gaps (eV) Material LDA Expt. [1] Diamond 3.9 5.48 Si 0.5 1.17 LiCl 6.0 9.4 [1] Landolt-Bornstien, vol. III; Baldini & Bosacchi, SrTiO 3 2.0 3.25 Phys. Stat. Solidi (1970). Solar spectrum
DFT: problems with energy alignment Interfacial systems: § Electrons can transfer across e - § Depends on energy level alignment across interface § DFT has errors in band energies § Is any of it real?
One particle Green’s function ( r ’ ,0) ( r,t ) Dyson Equation: DFT:
Green’s function successes Quasiparticle gaps (eV) Material LDA GW Expt. Diamond 3.9 5.6* 5.48 Si 0.5 1.3* 1.17 LiCl 6.0 9.1* 9.4 SrTiO 3 2.0 3.4-3.8 3.25 * Hybertsen & Louie, Phys. Rev. B (1986) Band structure of Cu Strokov et al ., PRL/PRB (1998/2001)
What is a big system for GW? P3HT polymer § Band alignment for this potential photovoltaic system? § 100s of atoms/unit cell § Not possible routinely (with current software) Zinc oxide nanowire
GW is expensive Scaling with number of atoms N DFT: N 3 But in practice the GW is the killer GW: N 4 (gives better bands) BSE: N 6 (gives optical excitations) a nanoscale system with 50-75 atoms (GaN) DFT: 1 cpu x hours ∴ Focus on GW GW: 91 cpu x hours BSE: 2 cpu x hours
Steps for typical G 0 W 0 calculation Stage 1 : Run DFT calc. on structure à output : ε i and 𝜔 i ( r ) P ( r, r 0 ) = @ n ( r ) Stage 2.1 : compute Polarizability matrix @ V ( r 0 ) Stage 2.2 : double FFT rows and columns à P(G,G’) Stage 3 : compute and invert dielectric screening function p p → ✏ − 1 ✏ = I − V coul ∗ P ∗ V coul Stage 4 : “plasmon-pole” method à dynamic screening → ✏ − 1 ( ! ) Stage 5 : put together ε i , 𝜔 i ( r ) and à self-energy 𝛵 ( 𝜕 ) ✏ − 1 ( ! )
What is so expensive in GW? One key element : response of electrons to perturbation P ( r,r’ ) = Response of electron density n ( r ) at position r to change of potential V ( r’ ) at position r’
What is so expensive in GW? One key element : response of electrons to perturbation Standard perturbation theory expression Problems: 1. Must generate “all” empty states (sum over c ) 2. Lots of FFTs to get functions 𝜔 i ( r ) functions 3. Enormous outer produce to form P 4. Dense r grid : P huge in memory
Computing P in Charm++ * for all l, m Basic Computation: f lm = ψ l × ψ m † for all f P += f lm f lm Parallel decomposition: Ψ Vectors 1D Chare Array L occupied M unoccupied … R P Matrix 2D Tiles 2D Chare Array R R
Computing P in Charm++ 1.Duplicate occupied states on each node ψ ψ ψ
Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors ψ ψ ψ ψ
Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors 3.Locally update each matrix tile P P P P P P P P P
Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors 3.Locally update each matrix tile 4.Repeat step 2 for next unoccupied state
Parallel performance: P calculation § 108 atom bulk Si § 216 occupied § 1832 unoccupied § 1 k point § 32 processors per node § FFT grids: same accuracy OA 42x42x22 BGW 111x55x55 Supercomputer : Mira (ANL) : BQ BlueGene/Q
Parallel performance: P calculation § 108 atom bulk Si Scaling/on/BlueWaters/ 1000 § 216 occupied 32/cores/per/node § 1832 unoccupied 100 Time(Sec) § 1 k point § 32 processors per node 10 OpenAtom BerkeleyGW1.2 § FFT grids: same accuracy 1 OA 42x42x22 1 10 100 1000 10000 Number/of/Nodes BGW 111x55x55 Supercomputer : Blue Waters (NCSA) : Cray XE6
Reducing the scaling: quartic to cubic & ×𝑂 ( ×𝑂 ) § O(N 4 ) = 𝑂 % § Sum-over-state (i.e., sum over unoccupied c band) not to blame: removal of unocc. states still O(N 4 ) but lower prefactor* § Working in r-space can reduce to O(N 3 ) [see also †] * Bruneval and Gonze, PRB 78 (2008); Berger, Reining, Sottile, PRB 82 (2010) * Umari, Stenuit, Baroni, PRB 81 , (2010) * Giustino, Cohen, Louie, PRB 81 , (2010) * Wilson, Gygi, Galli, PRB 78 , (2008); Govoni, Galli, J. Chem. Th. Comp ., 11 (2015) * Gao, Xia, Gao, Zhang, Sci. Rep. 6 (2016) † Foerster, Koval, Sanchez-Portal, JCP 135 (2011) † Liu, Kaltak, Klimes and Kresse, PRB 94 , (2016)
� � � � What’s special about r-space? Quasi-philosophical: all basis good in quantum mechanics, why is r-space special? Observable is diagonal in the best basis Practical: P is separable in r-space 4 1 = 1 𝑒𝑦 𝑓 < = > <= @ ? 𝜗 ) − 𝜗 ( 5 4 𝑄 𝑠, 𝑠 - = −2 1 𝑒𝑦 ∗ (𝑠)𝜔 ) (𝑠′)𝑓 <= > ? 6 𝜔 ( (𝑠)𝜔 ( ∗ (𝑠′)𝑓 = @ ? 6 𝜔 ) 5 ) ( separable G H 4 1 𝑔(𝑨)𝑓 <D Gauss-Laguerre quadrature: 𝑒𝑦 ≈ 6 𝜕 F 𝑔 𝑨 F 5 F G H 𝑄 𝑠, 𝑠 - = −2 6 𝜕 F 𝑓 ? L ∗ (𝑠)𝜔 ) (𝑠′)𝑓 <= > ? L 6 𝜔 ( (𝑠)𝜔 ( & 𝑂 M (𝑂 ) +𝑂 ( ) ∝ 𝑂 P ∗ (𝑠′)𝑓 = @ ? L 𝑂 M is intensive 𝑂 % 6 𝜔 ) F ) (
Windowed cubic Laplace method 50 § N GL depends on U VS E bw = E cmax - E vmin 40 U WXY 30 N GL § Largest error: 𝐹 ) − 𝐹 ( = 𝐹 [ or 𝐹 \] 20 10 0 0 100 200 300 400 500 E bw /E g 𝑄 = 𝑄 + 𝑄 &T + 𝑄 + 𝑄 && Example: 2 by 2 windows • TT T& 𝑄 &T {E v } 1 {E c } 2 {E v } 2 {E c } 1 E E c,max E v,max E c,min E v,min G S@ G S> N wv : # windows for E v 𝑄 𝑠, 𝑠 - = 6 6 𝑄 QR (𝑠, 𝑠 - ) N wc : # of windows for E c Q R Save computation: small N GL for each window pair § Especially for materials with small band gaps §
� Estimate the computational costs Computation cost can be estimated with E bw and E g : G @S G >S R^? − 𝐹 (Q R^? − 𝐹 )R QR R`b R`b 𝐹 \] 𝐹 (Q R`b 𝑂 ( − 𝐹 )R 𝐷 ∝ 6 6 R`b 𝑂 ) R^? − 𝐹 ( R^? − 𝐹 ) QR 𝐹 [ 𝐹 ( 𝐹 ) Q R Example: 2x2 window Real computational costs Estimated computational costs × 10 4 2.5 200 ∗ − 𝐹 (,R`b 𝐹 (,%^_`a = 𝐹 ( 2 150 ∗ 𝐹 (,R^? − 𝐹 ( C simple C elab 100 1.5 ∗ − 𝐹 ),R`b 𝐹 ),%^_`a = 𝐹 ) 50 1 ∗ 𝐹 ),R^? − 𝐹 ) 0 0.5 9 9 9 9 1 1 1 1 Ec ratio Ev ratio Ec ratio Ev ratio
Windowed Laplace: example § Si crystal (16 atoms) § MgO crystal (16 atoms) § Number of bands: 399 § Number of bands: 433 § 𝑂 ]( =1, 𝑂 ]) =4 § 𝑂 ]( =1, 𝑂 ]) =4 d\a(e %^_`a Compared to O(N 4 ) method, for bigger system ratio is G ^_ Tf ⁄
Do I care in practice? Correct practical comparison: • Our N 3 method vs. available N 4 method with acceleration • Crossover is at very few atoms: N 3 method already competitive for small systems • 2 atoms Si , 8 k-points • Yambo N 4 GW software • BG* acceleration * Bruneval & Gonze, PRB 78 (2008)
� � Windowed Laplace method for self-energy Dynamic GW self-energy: m 𝜔 %b 𝜔 % i b ∗ 𝐶 %,% i jkb = 6 m : residues Σ(𝜕) %,% i 𝐶 %,% i 𝜕 − 𝜗 b + 𝑡𝑜(𝜈 − 𝜗 b )𝜕 m 𝜕 m : energies of the poles of 𝑋(𝑠) %,%- m,b 𝐺 𝑦 = 1 m 𝜔 %b 𝜔 % i b ∗ = 6 𝐶 %,% i 𝐺(𝜕 − 𝜗 b ± 𝜕 m ) 𝑦 m,b 1 1 Gauss-Laguerre quadrature not < 0 > 0 OR 𝜕 − 𝜗 b ± 𝜕 m 𝜕 − 𝜗 b ± 𝜕 m appropriate G YS G xS R`b ≤ 𝜕 − 𝜗 b < 𝑓 R R^? 𝑓 R Σ(𝜕) = 6 6 Σ(𝜕) QR R`b ≤ ±𝜕 m < Ω Q R^? Ω Q Q R
New quadrature for overlapping windows New quadrature Size of quadrature grid n q n q % error ( 𝒇 <𝒘<𝒘 𝟑 /𝟑 ) ( 𝒇 <𝒘 ) 5 6 1 1 24 1 0.1 124 5 0.01 547 15 4 𝐺 𝑦 = 𝐽𝑛 1 𝑥 𝑤 𝑓 `(? 𝑒𝑤 0.001 2216 36 5 𝑥 𝑤 = 𝑓 <( 𝑥 𝑤 = 𝑓 <(<( } /&
Results - G 0 W 0 gap § Si crystal (16 atoms) § Number of bands: 399 § 𝑂 m] =15, 𝑂 b] =30 Si 1.65 Laplace+windowing N 4 1.6 G 0 W 0 E g (eV) 1.55 1.5 1.45 1.4 1.35 0 0.1 0.2 0.3 0.4 0.5 ratio of computation to N 4 method
Where we are with OpenAtom GW Phase Serial Parallel 1 Compute P in RSpace Complete Complete 2 FFT P to GSpace Complete Complete 3 Invert epsilon Complete Complete 4 Plasmon pole Complete In Progress 5 COHSEX self-energy Complete Complete 6 Dynamic self-energy Complete In Progress 7 Coulomb Truncation Future Future Aim to release parallel COHSEX version late spring 2018
Recommend
More recommend