Parallelization of DQMC Simulations for Strongly Correlated Electron - PowerPoint PPT Presentation

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun Bai (UCDavis) IEEE International Parallel and Distributed Processing Symposium 2010 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 1 / 22

Outline DQMC simulations 1 DQMC parallelization 2 Algorithmic approaches System approaches Experiment results 3 Conclusion 4 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 2 / 22

Computational Material Science Understanding and exploiting the properties of solid-state materials: magnetism, metal-insulator transition, high temperature superconductivity, ... A. Density B. Density fluctuations × 10 -1 × 10 -1 2.7 1.2 0.9 1.8 0.6 0.9 0.3 0.0 0.0 -15 -10 -5 -15 -10 -5 0 0 5 5 10 15 10 15 C. Spin correlations D. Pairing correlations × 10 -1 3.2 2.4 2.4 1.6 1.6 0.8 0.8 0.0 0.0 -15 -10 -5 -15 -10 -5 0 0 5 5 10 15 10 15 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 3 / 22

Hubbard Model and DQMC Simulations Many body simulation on multi-layer lattices using Hubbard model and quantum monte carlo method. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 4 / 22

Hubbard Model and DQMC Simulations Many body simulation on multi-layer lattices using Hubbard model and quantum monte carlo method. QUEST (QUantum Electron Simulation Toolbox): Fortran 90 package for Determinant Quantum Monte Carlo (DQMC) simulations. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 4 / 22

DQMC Algorithm Random HS field Two stages: warmup Warmup stage DQMC step Sampling stage no thermalized yes A DQMC step DQMC step 1 Propose a local change: h → h ′ . sampling Measurements 2 Throw a random number 0 < r < 1. no enough 3 Accept the change if r < det( e − β H ( h ′ ) ) samples det( e − β H ( h ) ) . yes Aggregation Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 5 / 22

Computational Kernels The equal time Green’s function G k = ( I + B k B k +1 · · · B 1 B L · · · B k − 1 ) − 1 Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 6 / 22

Computational Kernels The equal time Green’s function G k = ( I + B k B k +1 · · · B 1 B L · · · B k − 1 ) − 1 The unequal time Green’s function − 1   I B 1 − B 2 I   G τ =   ... ...     − B L I Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 6 / 22

Computational Kernels The equal time Green’s function G k = ( I + B k B k +1 · · · B 1 B L · · · B k − 1 ) − 1 The unequal time Green’s function − 1   I B 1 − B 2 I   G τ =   ... ...     − B L I Physical measurements Operations on G k and G τ , Fourier Transform, etc. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 6 / 22

Computational Challenges For simulating strongly correlated electron systems The size of lattices need be large. A longer warmup stage is required. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 7 / 22

Computational Challenges For simulating strongly correlated electron systems The size of lattices need be large. A longer warmup stage is required. Numerical stability issues. Additional stabilizing steps are required. Most calculations need double precision. Many fast updating methods and parallel algorithms cannot be used. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 7 / 22

DQMC Parallelization Algorithmic approaches Parallel Markov chain Rolling feeder algorithm Parallel matrix computations System approaches Task decomposition Communication and computation overlapping Message compression Load balance Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 8 / 22

Parallel Markov Chain Random HS field The sampling stage can be parallelized warmup embarrassingly. DQMC step no thermalized yes DQMC step DQMC step DQMC step ... sampling Measurements Measurements Measurements ... Aggregation Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 9 / 22

Parallel Markov Chain Random HS field The sampling stage can be parallelized warmup embarrassingly. DQMC step The speedup of parallelization is no thermalized limited by the time of the warmup yes stage. (Amdahl’s law) DQMC step DQMC step DQMC step ... T warmup + T sampling sampling ρ speedup = T warmup + T sampling / N p Measurements Measurements Measurements T warmup + T sampling ... < T warmup Aggregation Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 9 / 22

Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . · · · Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . · · · Parallel reduction (takes O ( N 3 log L ) time.) 4 1 2 4 1 2 3 4 4 1 1 2 3 4 1 3 4 1 ... ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step 2 3 4 2 3 4 2 1 3 2 4 3 1 2 3 1 2 3 Compute G Compute G Compute G Compute G Compute G Compute G Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

Green’s Function Calculation Matrix G k need be computed cyclically with B k − 1 updated. G 1 = ( I + B 1 B 2 · · · B L − 1 B L ) − 1 . G 2 = ( I + B 2 B 3 · · · B L B 1 ) − 1 . G 3 = ( I + B 3 B 4 · · · B 1 B 2 ) − 1 . · · · Parallel reduction (takes O ( N 3 log L ) time.) 4 1 2 4 1 2 3 4 4 1 1 2 3 4 1 3 4 1 ... ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step 2 3 4 2 3 4 2 1 3 2 4 3 1 2 3 1 2 3 Compute G Compute G Compute G Compute G Compute G Compute G Numerically unstable! Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 10 / 22

Rolling Feeder Algorithm The matrix product can be stably computed sequentially. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 11 / 22

Rolling Feeder Algorithm The matrix product can be stably computed sequentially. 1 2 3 4 1 1 2 2 3 4 2 3 4 1 2 ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step ... 3 4 3 4 1 3 4 1 2 4 4 1 4 1 2 Compute G Compute G Compute G Compute G Compute G Compute G Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 11 / 22

Rolling Feeder Algorithm The matrix product can be stably computed sequentially. 1 2 3 4 1 1 2 2 3 4 2 3 4 1 2 ... DQMC step DQMC step DQMC step DQMC step DQMC step DQMC step ... 3 4 3 4 1 3 4 1 2 4 4 1 4 1 2 Compute G Compute G Compute G Compute G Compute G Compute G Tasks to get one G k Sequential Parallel reduction Rolling feeder 1. Matrix multiplication L log L 1 2. Stabilization step O ( L ) O (log L ) 1 3. Inverting ( I + B 1 . . . B L ) 1 1 1 N 2 O ( LN 2 ) N 2 4. Data transmission Comparisons on resources and stability Processor O (1) O ( L ) O ( L ) Numerically stable Y N Y Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 11 / 22

Parallel Matrix Computations Two matrix computation kernels are parallelized. Che-Rung Lee (cherung@cs.nthu.edu.tw) Parallelization of DQMC Simulations IPDPS 2010 12 / 22

Parallelization of DQMC Simulations for Strongly Correlated Electron - PowerPoint PPT Presentation

Parallelization of DQMC Simulations for Strongly Correlated Electron Systems Che-Rung Lee Dept. of Computer Science National Tsing-Hua University Taiwan joint work with I-Hsin Chung (IBM Research), Zhaojun Bai (UCDavis) IEEE International

Parallelizing the Hamiltonian Computation in DQMC Simulations: Checkerboard Method for Sparse

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Strongly Connected Components Detection Strongly Connected Components A directed graph is

Strongly connected components Finding strongly-connected components A strongly connected component

Speed up evaluation by parallelization /////////// November 2018 Michael Weiss Bayer AG

Parallelization and Parallelization and Proling Proling Programming for Statistical

Parallelization Parallelization Programming for Statistical Programming for Statistical Science

Code Parallelization Fabrice Schlegel Introduction Goal: Efficient parallelization and memory

Simulations in Coalgebra Bart Jacobs and Jesse Hughes { bart,jesseh } @cs.kun.nl. University of

OPTIONAL LABORATORY SESSIONS Second-order NL Optics simulations Simulations using SNLO, a

Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercoputers

Speeding up SPH models: local adaptivity and parallelization for soil simulations. Yaidel Reyes

Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization Alain

for Effective Speculative Parallelization in Hardware VICTOR A. YING MARK C. JEFFREY* DANIEL

Parallelization in Time Mark Maienschein-Cline Department of Chemistry University of Chicago

Parallelization of Geodesic Ray-Tracing for Arbitrary Metrics Guillermo Andree Oliva Mercado

Bounds for the Green Energy on SO (3) Damir Ferizovi c joint work with Carlos Beltr an

Th The e Natio tional l Data a Opt-Out Le Lesle ley Meekin ins Information Governance

Jersey Village TOD Area Master Plan Presentation March15, 2010 Presented by: Kimley-Horn Keys

Holographic thermalization at strong and intermediate coupling Aleksi Vuorinen University of

Introduction to Multiple Scattering Theory L aszl o Szunyogh Department of Theoretical

Instructions: In each group, please come up with a scenario/experiment for each of the three main

Parallel transport in the Kazhdan-Lusztig W -graph and Greens 0 1 conjecture in Lie type B

On a fourth order PDE and applications to problems in conformal geometry Sun-Yung Alice Chang