scalable gw software for excited electrons using openatom
play

Scalable GW software for excited electrons using OpenAtom Kavitha - PowerPoint PPT Presentation

Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric Bohm and Laxmikant Kale University of Illinois at Urbana-Champaign Kayahan Saritas, Minjung Kim and Sohrab Ismail-Beigi Yale University Glenn


  1. Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric Bohm and Laxmikant Kale University of Illinois at Urbana-Champaign Kayahan Saritas, Minjung Kim and Sohrab Ismail-Beigi Yale University Glenn Martyna Pimpernel Science, Software and Information Technology

  2. Electronic structure calculations Β§ Time independent Schrodinger equation for a many-body system 𝑗ℏ πœ– Ξ¨(𝑒) = + ⟩ ⟩ πœ–π‘’ | 𝐼| Ξ¨(𝑒) Many R i & r j Β§ Density functional theory (DFT) simplifies this to one-body problem Solve for wavefunctions πœ” ! (𝑠) and energies πœ— !

  3. Comparison of the methods Exact SchrΓΆdinger Equation FCI O(N!) CCSD(T) Chemical Chemical O(N 7 ) Accuracy Accuracy QMC Computational Cost O(N 3-4 ) GW Relative Relative HF, DFT O(N 3 ) Energies Energies Transition Transition Tight binding States? States? O(N 3 ) 1 10 100 1,000 10,000 Number of atoms

  4. DFT problem with excitations DFT: ground state . Conduction band . (empty) . πœ— !"# Band gap πœ— ) πœ— ! . Valence band . (filled) . Janak’s theorem πœ–πΉ πœ–πΉ 𝐹 $%& = % βˆ’ % = πœ— !"# βˆ’ πœ— ! πœ–π‘‚ !"' πœ–π‘‚ !('

  5. DFT problem with excitations DFT: ground state . Why band gap/excitations in a material is important? Conduction band . Metallic, semiconducting or insulating? Β§ (empty) . πœ— !"# Light-matter interactions in general Β§ Band gap πœ— ) A lot of engineering implications: PV, lasers, luminescence … Β§ πœ— ! . Valence band Band gaps (eV) . (filled) . Material DFT GW Expt. Diamond 3.9 5.6* 5.48 Si 0.5 1.3* 1.17 πœ–πΉ πœ–πΉ 𝐹 $%& = % βˆ’ % = πœ— !"# βˆ’ πœ— ! πœ–π‘‚ !"' πœ–π‘‚ !(' LiCl 6.0 9.1* 9.4 SrTiO 3 2.0 3.4-3.8 3.25

  6. GW method Challenges Β§ Memory intensive Β§ Much larger number of conduction bands: Huge number of FFTs Β§ Large and dense matrix multiplications Β§ Unfavorable scaling 𝑃(𝑂 4 ) Goal Β§ Efficient and highly scalable GW software Β§ 𝑃(𝑂 3 ) scaling method

  7. What is expensive in GW? ~ 𝑂 * + 𝑂 + 𝑂 , ln 𝑂 , 𝑄 𝑠, 𝑠 ) = - = 𝑃(𝑂 . ) ~ 𝑂 * 𝑂 + 𝑂 , .1234 πœ” * (𝑠)πœ” 0 (𝑠)πœ” * (𝑠 ) )πœ” 0 (𝑠 ) ) +,--./ βˆ’2 4 4 𝐹 * βˆ’ 𝐹 0 * 0 - ln 𝑂 , ~2𝑂 , Lots of FFTs to get πœ” ! (𝑠) functions Β§ However, πœ— "# can converge using a Β§ small r-grid * Kim et al., (2020), Phys. Rev. B., 101, pp. 035139

  8. O(N 3 ) algorithm (CTSP) for P CTSP: Complex time shredded propagator > $%"## πœ” <,* > & > ' > "## βˆ— πœ” <,0 πœ” < ! ,0 βˆ— πœ” < ! ,* 𝐡 <,<) 𝐢 <,<) π‘Œ <,< ! = 4 4 𝑄 <,< ! = βˆ’2 4 4 N r2 N unocc N occ ~ N 4 π‘₯ + 𝑏 @ βˆ’ 𝑐 𝐹 0 βˆ’ 𝐹 * A @ A * 0 ' ' ' 1 𝑔(𝜐)𝑓 ") π‘’πœ (1) Laplace transform: 𝑓 " ( ! "( " ) π‘’πœ = * 𝑓 "( ! ) 𝑓 ( " ) π‘’πœ = * = * 𝐹 $ βˆ’ 𝐹 % & & & + # ' 𝑔(𝜐)𝑓 ") π‘’πœ β‰ˆ 0 N r2 N q (N unocc +N occ )~ N 3 (2) Gauss-Laguerre quadrature: * πœ• * 𝑔 𝜐 * & * 𝑂 " 𝑂 !

  9. O(N 3 ) algorithm (CTSP) for P > ( > "## > $%"## βˆ— πœ” <,0 πœ” < ! ,0 βˆ— 𝑄 <,< ! = βˆ’2 4 4 πœ” <,* πœ” < ! ,* 4 πœ• B 𝑔 𝜐 B * 0 B > ( > "## > $%"## βˆ— βˆ— 𝑓 C ) D * ][ 4 𝑓 EC # D * ] = 4 πœ• B [4 πœ” <,* πœ” < ! ,* πœ” <,0 πœ” < ! ,0 N q (N unocc +N occ ) N r2 ~ N 3 B * 0 ( /0 ( 10 (3) Energy windows: ') 𝑄 $,$& = ( ( 𝑄 $,$& ' ) 𝐹 ! a) , ',- , ',# , ',$ , & , *,- , *,# , *,$ , *,. , *,/ " #$ ($ #$ , &; ( = 0) (&'() b) , #$ - ! !,# (*+) , #$ ,,$

  10. Steps for typical GW calculations Most expensive β€’ Real-space P β€’ O(N 3 ) method Also expensive - O(N 4 )

  11. O(N 3 ) method for self-energy J πœ” <I πœ” < ! I > & > ' βˆ— 𝐢 <,< ! 𝐡 <,<) 𝐢 <,<) & : residues GHI = 4 Ξ£ Β± (πœ•) <,< ! 𝐢 $,$ ! π‘Œ <,< ! = 4 4 π‘₯ + 𝑏 @ βˆ’ 𝑐 πœ• βˆ’ 𝐹 I Β± πœ• J πœ• & : energies of the poles of 𝑋(𝑠) $,$' A @ A J,I Β§ πœ• βˆ’ πœ— I Β± πœ• J =0 is possible: Gauss-Laguerre quadrature not applicable Β§ New quadrature is needed and was developed: Hermite-Gauss-Laguerre quadrature L 1 π‘’πœπ‘“ EDED + /N 𝑓 @(OEC % Β±O , )D = 𝐽𝑛 L πœ• βˆ’ 𝐹 I Β± πœ• J K

  12. Results: Energy gap Β§ MgO crystal (16 atoms) Β§ Si crystal (16 atoms) Β§ Number of bands: 433 Β§ Number of bands: 399 Β§ 𝑂 Q* =1, 𝑂 Q0 =4 Β§ 𝑂 Q* =1, 𝑂 Q0 =4 * Kim et al., (2020), Phys. Rev. B., 101, pp. 035139

  13. Performance against other codes Β§ Si crystal (16 atoms) Β§ Number of bands: 399 Β§ 𝑂 JQ =15, 𝑂 IQ =30 http://charm.cs.illinois.edu/OpenAtom/ * Kim et al., (2019), Comput. Phys. Commun., 244, pp. 427-441

  14. OpenAtom GW Parallel Scaling OpenAtom Team

  15. GW-BSE Parallelization Phase Serial Parallel 1 Compute P in Rspace Complete Complete (N 4 and N 3 methods) 2 FFT P to GSpace Complete Complete 3 Invert epsilon Complete Complete 4 Plasmon pole Complete Future Work 5 COHSEX Self-energy Complete Complete 6 Dynamic Self-energy Complete Future Work 14

  16. GW Phase-I P Matrix Computation (N 4 and N 3 method) Ξ¨ Vectors 1D Chare Array L occupied M unoccupied … R P Matrix 2D Tiles 2D Chare Array R R 15

  17. Parallel Decomposition: Input state vectors Duplicate occupied and unoccupied states on each node ψ ψ ψ ψ ψ 16

  18. Computation of Pmatrix using N 3 method β€’ Outer loops are windows of occupied and unoccupied states β€’ Most expensive computation - 𝜍 and 𝜍 ) matrices for l = 1:Nvw for m = 1:Ncw for j = 1:Nquad lm calculate 𝜍 01')! calculate 𝜍 &01')! P[r,r’] += 𝜍 01')! [r,r’] x 𝜍 &01')! [r,r’]

  19. Computation 𝜍 matrix (Using occupied states) β€’ State vectors are represented with ψ β—‹ Number of occupied states = L, each state has N elements β—‹ All occupied states can be represented as a matrix ψ V [1: L][1:N]) 𝜍 2345) -> Same as ZGEMM of all ψ V and all ψ VT 𝜍 2345) -> Add elements of outer product of ψ V [1:L] ZGEMM ( ψ VT [1: N][1:L] , ψ V [1: L][1:N]) (i.e matrix multiply ) for l=1:L for r=1:N for r=1:N for r’=1:N for r’=1:N 𝜍 2345) [r,r’] += ψ V [l] T [r] x ψ V [l][r’] for l=1:L 𝜍 2345) [r,r’] += ψ VT [r] [l] x ψ V [l][r’]

  20. Computation 𝜍 ’ matrix (Using unoccupied states) β€’ Number of unoccupied states = M, each state has N elements β€’ All unoccupied states can be represented as a matrix ψ C [1:M ][1:N]) 𝜍 2345) -> Same as ZGEMM of all ψ C and all ψ CT 𝜍 2345) -> Add elements of outer product of ψ C [1:M] ZGEMM ( ψ CT [1: N][1:M] , ψ C [1:M ][1:N]) (i.e matrix multiply ) for m=1:M for r=1:N for r=1:N for r’=1:N for r’=1:N πœβ€² 2345) [r,r’] += ψ C [m] T [r] x ψ C [m][r’] for m=1:M πœβ€² 2345) [r,r’] += ψ CT [r] [m] x ψ C [m][r’]

  21. Computation of P-matrix (tiled) (N 3 ) Occupied states ψ V (1:L) L Unoccupied states ψ C (1:M) M N N N N M L (ZGEMM) (ZGEMM) P Matrix 𝜍 matrix 𝜍 ’ matrix N (Element-wise multiply) N N of 𝜍 & 𝜍 ’ matrix N N N

  22. Performance of N 3 method Intel KNL nodes (Stampede2) 10000 N 4 method N 3 method Execution Time β€’ N 3 method is an order faster than 1000 N 4 method for Si108 atoms dataset β—‹ 20k X 20k output matrix size 100 8 16 32 64 β€’ Scales well on Intel KNL and Node count (128 cores per node) SkyLake nodes Intel Skylake nodes (Stampede2) β€’ Future scaling results for larger 10000 N 4 method N 3 method datasets Execution Time 1000 100 10 8 16 32 64 Node count (48 cores per node)

  23. Questions?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend