a practical view on linear algebra tools
play

A practical view on linear algebra tools Evgeny Epifanovsky - PowerPoint PPT Presentation

A practical view on linear algebra tools Evgeny Epifanovsky University of Southern California University of California, Berkeley Q-Chem September 25, 2014 What is Q-Chem? Established in 1993, first release in 1997. Software Q-Chem 3.0


  1. A practical view on linear algebra tools Evgeny Epifanovsky University of Southern California University of California, Berkeley Q-Chem September 25, 2014

  2. What is Q-Chem? Established in 1993, first release in 1997. Software Q-Chem 3.0 (2006) 4.0 (2012) 4.1 (2013) 4.2 (2014) Thousands of users Y. Shao et al., Mol. Phys., in press (2014), DOI:10.1080/00268976.2014.952696 A.I. Krylov and P.M.W. Gill, WIREs Comput. Mol. Sci. 3, 317–326 (2013)

  3. What is Q-Chem? Established in 1993, first release in 1997. Software Platform Q-Chem 3.0 (2006) Supported infrastructure for 4.0 (2012) state-of-the-art quantum chemistry 4.1 (2013) 4.2 (2014) Free of charge and open source for developers Thousands of users 157 contributors (Q-Chem 4) Y. Shao et al., Mol. Phys., in press (2014), DOI:10.1080/00268976.2014.952696 A.I. Krylov and P.M.W. Gill, WIREs Comput. Mol. Sci. 3, 317–326 (2013)

  4. What is Q-Chem? Established in 1993, first release in 1997. Software Platform Q-Chem 3.0 (2006) Supported infrastructure for 4.0 (2012) state-of-the-art quantum chemistry 4.1 (2013) 4.2 (2014) Free of charge and open source for developers Thousands of users 157 contributors (Q-Chem 4) Pleasanton, CA Y. Shao et al., Mol. Phys., in press (2014), DOI:10.1080/00268976.2014.952696 A.I. Krylov and P.M.W. Gill, WIREs Comput. Mol. Sci. 3, 317–326 (2013)

  5. Electronic structure model 1. Separate electrons and nuclei Nuclei become point charges, electrons are a quantum system 2. Choose a discretization scheme Introduce atomic orbitals 3. Choose type of wavefunction (or density functional) Collapses the dimensionality from 3N to a reasonable number First choice is mean-field (Hartree–Fock or Kohn–Sham) 4. Solve for parameters of wavefunction HF or KS molecular orbitals

  6. Origin of dense and sparse objects Atomic orbitals Non-orthogonal Local Sparse Molecular orbitals Orthonormal Delocalized Dense Localized MOs Both Local Sparse

  7. J and K matrices

  8. Making J and K matrices in HF and DFT � � J µν = ( µν | λσ ) P λσ K λν = ( µν | λσ ) P µσ λσ µσ � φ µ ( r 1 ) φ ν ( r 1 ) 1 ( µν | λσ ) ≡ φ λ ( r 2 ) φ σ ( r 2 ) dr 1 dr 2 r 12 Nominal scaling of computational cost for J and K is N 4 .

  9. Making J and K matrices in HF and DFT � � J µν = ( µν | λσ ) P λσ K λν = ( µν | λσ ) P µσ λσ µσ � φ µ ( r 1 ) φ ν ( r 1 ) 1 ( µν | λσ ) ≡ φ λ ( r 2 ) φ σ ( r 2 ) dr 1 dr 2 r 12 Nominal scaling of computational cost for J and K is N 4 . For J-matrix: 1. Define significant pairs ( µν | and | λσ ) – O ( N ) 2. Compute integrals – O ( N 2 ) to O ( N ) 3. Contract with density – O ( N 2 ) to O ( N )

  10. Making J and K matrices in HF and DFT � � J µν = ( µν | λσ ) P λσ K λν = ( µν | λσ ) P µσ λσ µσ � φ µ ( r 1 ) φ ν ( r 1 ) 1 ( µν | λσ ) ≡ φ λ ( r 2 ) φ σ ( r 2 ) dr 1 dr 2 r 12 Nominal scaling of computational cost for J and K is N 4 . For J-matrix: For K-matrix repeat for each ( µν | : 1. Define significant pairs 1. Compute ( � µν | λσ ) – O ( N ) ( µν | and | λσ ) – O ( N ) 2. Contract with density – O ( N 2 ) 2. Compute integrals – O ( N 2 ) to O ( N ) 3. Contract with density – O ( N 2 ) to O ( N )

  11. Contractions in coupled-cluster theory � t λσ t ab = ij C λ a C σ b ij ab � � � [( µν | λσ ) − ( λν | µσ )] t λσ ( µν | λσ ) t λσ ( λν | µσ ) t λσ = − ij ij ij λσ λσ λσ Nominal scaling of the steps is O 2 N 4 . Including sparsity reduces scaling of J-type and K-type contractions to O 2 N 2 and O 2 N 3 , respectively.

  12. Resolution of the identity approximation � C P µν ( P | Q ) C Q ( µν | λσ ) ≈ λσ PQ � ( µν | P )( P | Q ) − 1 ( Q | λσ ) = PQ � ( P | Q ) C Q ( µν | P ) = µν Q (no approximation is made if auxiliary basis is complete) � B Q ( µν | P )( P | Q ) − 1 / 2 µν = P � B Q µν B Q ( µν | λσ ) ≈ λσ Q

  13. Make K matrix with RI � B Q µν B Q K λν = λσ P µσ µσ Q ◮ How to factorize the equation (choose intermediates)? To minimize computations? ◮ To stay within given memory constraint? ◮

  14. AO-MO transformation

  15. Integral transformation step in MP2 and RI-MP2 � ( ia | jb ) = ( µν | λσ ) C µ i C ν a C λ j C σ b µνλσ � ( ia | P ) = ( µν | P ) C µ i C ν a µν ◮ With given memory constrains how to choose batch size and intermediates?

  16. Linear algebra in many dimensions

  17. Coupled-cluster doubles (CCD) equations D ab ij = ǫ i + ǫ j − ǫ a − ǫ b �� � � ij − 1 T ab ij D ab f bc t ac � kl || cd � t bd kl t ac ij = � ij || ab � + P − ( ab ) ij 2 c klcd �� � � ik + 1 f jk t ab � kl || cd � t cd jl t ab − P − ( ij ) ik 2 k klcd � � � + 1 kl + 1 kl + 1 � ij || kl � t ab � kl || cd � t cd ij t ab � ab || cd � t cd ij 2 4 2 kl klcd cd �� � � ik − 1 � kb || jc � t ac � kl || cd � t db lj t ac − P − ( ij ) P − ( ab ) ik 2 kc klcd P − ( ij ) A ij = A ij − A ji

  18. Tensor expressions for CCD void ccd_t2_update(...) { letter i, j, k, l, a, b, c, d; btensor<2> f1_oo(oo), f1_vv(vv); btensor<4> ii_oooo(oooo), ii_ovov(ovov); // Compute intermediates f1_oo(i|j) = f_oo(i|j) + 0.5 * contract(k|a|b, i_oovv(j|k|a|b), t2(i|k|a|b)); f1_vv(b|c) = f_vv(b|c) - 0.5 * contract(k|l|d, i_oovv(k|l|c|d), t2(k|l|b|d)); ii_oooo(i|j|k|l) = i_oooo(i|j|k|l) + 0.5 * contract(a|b, i_oovv(k|l|a|b), t2(i|j|a|b)); ii_ovov(i|a|j|b) = i_ovov(i|a|j|b) - 0.5 * contract(k|c, i_oovv(i|k|b|c), t2(k|j|c|a)); // Compute updated T2 t2new(i|j|a|b) = i_oovv(i|j|a|b) + asymm(a, b, contract(c, t2(i|j|a|c), f1_vv(b|c))) - asymm(i, j, contract(k, t2(i|k|a|b), f1_oo(j|k))) + 0.5 * contract(k|l, ii_oooo(i|j|k|l), t2(k|l|a|b)) + 0.5 * contract(c|d, i_vvvv(a|b|c|d), t2(i|j|c|d)) - asymm(a, b, asymm(i, j, contract(k|c, ii_ovov(k|b|j|c), t2(i|k|a|c)))); }

  19. Evaluation of tensor expressions 1. Convert expression to abstract syntax tree (AST) 2. Optimize and transform AST with given constraints � A ij = T 1 T 2 ik T 3 ij + kj k → A(i|j) = T1(i|j) + contract(k, T2(i|k), T3(k|j));

  20. Evaluation of tensor expressions 1. Convert expression to abstract syntax tree (AST) 2. Optimize and transform AST with given constraints 3. Evaluate expression following optimized AST Back-end: ◮ Shared memory threaded model (single node) 3 ◮ Distributed memory parallel model (via CTF) ◮ Replicated memory parallel model 3 E.Epifanovsky et al., J. Comput. Chem. 34, 2293–2309 (2013)

  21. Block tensors in libtensor Three components: ◮ Block tensor space: dimensions + tiling pattern. ◮ Symmetry relations between blocks. ◮ Non-zero canonical data blocks.

  22. Block tensors in libtensor Three components: ◮ Block tensor space: dimensions + tiling pattern. ◮ Symmetry relations between blocks. ◮ Non-zero canonical data blocks. Symmetry: S : SB i �→ ( B j , U ij ) A B 1 B 2 B 3 α β B ¡ B ¡ A α B 1 B 2 β B 3 Permutational Point group Spin

  23. Perturbation theory correction

  24. Perturbation theory � [( ia | jb ) − ( ib | ja )] 2 � t abc ijk ˜ t abc ijk ∆ iajb ∆ iajbkc ijab ijkabc �� � � t abc t cd t ab = P ( ijk ) P ( abc ) ij � kd || ab � + lk � ij || lc � ijk d l � � ˜ t abc = t abc t c i � kj || ab � + f c i t ab ijk + P ( ijk ) P ( abc ) ijk kj P ( ijk ) a ijk = a ijk − a jik − a ikj − a kji + a jki + a kij ◮ How to partition the numerator to minimize computational cost and satisfy memory constraints?

  25. Summary ◮ Most of the problems are sparse multi-dimensional linear algebra problems ◮ For many of those cases there exist a mapping to a dense two-dimensional problem ◮ Almost all new problems contain sparse many-tensor contractions, for which general optimal algorithms have not been developed Open problems ◮ Given a contraction of multiple sparse tensors, what is the best way to factorize it into pairwise contractions? ◮ How to optimally compute a tensor expression satisfying memory constraints?

  26. Scalability and software requirements

  27. Scaling to large problems How well are existing electronic structure methods equipped to benefit from large-scale HPC systems? Do they really need to be massively parallel?

  28. Technical requirements Linear algebra tools are just one component in a large ecosystem: ◮ Routines should have no side-effects ◮ Routines should be thread-safe and otherwise parallel-friendly ◮ User should be able to designate resources for each operation. How to pass this information?

  29. Technical requirements Linear algebra tools are just one component in a large ecosystem: ◮ Routines should have no side-effects ◮ Routines should be thread-safe and otherwise parallel-friendly ◮ User should be able to designate resources for each operation. How to pass this information? Good example: original BLAS

  30. Technical requirements Linear algebra tools are just one component in a large ecosystem: ◮ Routines should have no side-effects ◮ Routines should be thread-safe and otherwise parallel-friendly ◮ User should be able to designate resources for each operation. How to pass this information? Good example: original BLAS Bad example: modern BLAS-OpenMP

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend