parallelization of dqmc simulations for strongly
play

Parallelization of DQMC Simulations for Strongly Correlated Electron - PowerPoint PPT Presentation

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun Bai (UCDavis) IEEE International


  1. Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun Bai (UCDavis) IEEE International Parallel and Distributed Processing Symposium 2010 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 1 / 22

  2. Outline DQMC simulations 1 DQMC parallelization 2 Algorithmic approaches System approaches Experiment results 3 Conclusion 4 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 2 / 22

  3. Computational Material Science Understanding and exploiting the properties of solid-state materials: magnetism, metal-insulator transition, high temperature superconductivity, ... A. Density B. Density fluctuations × 10 -1 × 10 -1 2.7 1.2 0.9 1.8 0.6 0.9 0.3 0.0 0.0 -15 -10 -5 -15 -10 -5 0 0 5 5 10 15 10 15 C. Spin correlations D. Pairing correlations × 10 -1 3.2 2.4 2.4 1.6 1.6 0.8 0.8 0.0 0.0 -15 -10 -5 -15 -10 -5 0 0 5 5 10 15 10 15 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 3 / 22

  4. Hubbard Model and DQMC Simulations Many body simulation on multi-layer lattices using Hubbard model and quantum monte carlo method. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 4 / 22

  5. Hubbard Model and DQMC Simulations Many body simulation on multi-layer lattices using Hubbard model and quantum monte carlo method. QUEST (QUantum Electron Simulation Toolbox): Fortran 90 package for Determinant Quantum Monte Carlo (DQMC) simulations. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 4 / 22

  6. DQMC Algorithm Random HS field Two stages: warmup Warmup stage DQMC step Sampling stage no thermalized yes A DQMC step DQMC step 1 Propose a local change: h → h ′ . sampling Measurements 2 Throw a random number 0 < r < 1. no enough 3 Accept the change if r < det( e − β H ( h ′ ) ) samples det( e − β H ( h ) ) . yes Aggregation Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 5 / 22

  7. Computational Kernels The equal time Green’s function G k = ( I + B k B k +1 · · · B 1 B L · · · B k − 1 ) − 1 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 6 / 22

  8. Computational Kernels The equal time Green’s function G k = ( I + B k B k +1 · · · B 1 B L · · · B k − 1 ) − 1 The unequal time Green’s function − 1   I B 1 − B 2 I   G τ =   ... ...     − B L I Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 6 / 22

  9. Computational Kernels The equal time Green’s function G k = ( I + B k B k +1 · · · B 1 B L · · · B k − 1 ) − 1 The unequal time Green’s function − 1   I B 1 − B 2 I   G τ =   ... ...     − B L I Physical measurements Operations on G k and G τ , Fourier Transform, etc. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 6 / 22

  10. Computational Challenges For simulating strongly correlated electron systems The size of lattices need be large. A longer warmup stage is required. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 7 / 22

  11. Computational Challenges For simulating strongly correlated electron systems The size of lattices need be large. A longer warmup stage is required. Numerical stability issues. Additional stabilizing steps are required. Most calculations need double precision. Many fast updating methods and parallel algorithms cannot be used. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 7 / 22

  12. DQMC Parallelization Algorithmic approaches Parallel Markov chain Rolling feeder algorithm Parallel matrix computations System approaches Task decomposition Communication and computation overlapping Message compression Load balance Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 8 / 22

  13. Parallel Markov Chain Random HS field The sampling stage can be parallelized warmup embarrassingly. DQMC step no thermalized yes DQMC step DQMC step DQMC step ... sampling Measurements Measurements Measurements ... Aggregation Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 9 / 22

  14. Parallel Markov Chain Random HS field The sampling stage can be parallelized warmup embarrassingly. DQMC step The speedup of parallelization is no thermalized limited by the time of the warmup yes stage. (Amdahl’s law) DQMC step DQMC step DQMC step ... T warmup + T sampling sampling ρ speedup = T warmup + T sampling / N p Measurements Measurements Measurements T warmup + T sampling ... < T warmup Aggregation Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 9 / 22

  15. Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

  16. Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

  17. Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

  18. Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . · · · Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

  19. Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . · · · Parallel reduction (takes O ( N 3 log L ) time.) 4 1 2 4 1 2 3 4 4 1 1 2 3 4 1 3 4 1 ... ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step 2 3 4 2 3 4 2 1 3 2 4 3 1 2 3 1 2 3 Compute G Compute G Compute G Compute G Compute G Compute G Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

  20. Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . · · · Parallel reduction (takes O ( N 3 log L ) time.) 4 1 2 4 1 2 3 4 4 1 1 2 3 4 1 3 4 1 ... ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step 2 3 4 2 3 4 2 1 3 2 4 3 1 2 3 1 2 3 Compute G Compute G Compute G Compute G Compute G Compute G Numerically unstable! Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

  21. Rolling Feeder Algorithm The matrix product can be stably computed sequentially. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 11 / 22

  22. Rolling Feeder Algorithm The matrix product can be stably computed sequentially. 1 2 3 4 1 1 2 2 3 4 2 3 4 1 2 ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step ... 3 4 3 4 1 3 4 1 2 4 4 1 4 1 2 Compute G Compute G Compute G Compute G Compute G Compute G Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 11 / 22

  23. Rolling Feeder Algorithm The matrix product can be stably computed sequentially. 1 2 3 4 1 1 2 2 3 4 2 3 4 1 2 ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step ... 3 4 3 4 1 3 4 1 2 4 4 1 4 1 2 Compute G Compute G Compute G Compute G Compute G Compute G Tasks to get one G k Sequential Parallel reduction Rolling feeder 1. Matrix multiplication L log L 1 2. Stabilization step O ( L ) O (log L ) 1 3. Inverting ( I + B 1 . . . B L ) 1 1 1 N 2 O ( LN 2 ) N 2 4. Data transmission Comparisons on resources and stability Processor O (1) O ( L ) O ( L ) Numerically stable Y N Y Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 11 / 22

  24. Parallel Matrix Computations Two matrix computation kernels are parallelized. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 12 / 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend