efficient correlation free many states lattice monte
play

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs - PowerPoint PPT Presentation

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs Jeffrey Kelling, Gza dor, Martin Weigel, Sibylle Gemming 8th May 2017 Member of the Helmholtz Association Jeffrey Kelling, Gza dor, Martin Weigel, Sibylle Gemming | FWIO


  1. Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming 8th May 2017 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  2. 1 Introduction: What is this talk about? surface growth, physical aging (and non-equilibrium systems) lattice Monte-Carlo y p q x 2 Trivial parallism vs. SIMT Page 1/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  3. Applications for Monte Carlo: Stochastic Prosesses http://en.wikipedia.org/wiki/File: game theory Rub_al_Khali_002.JPG http://hubblesite.org/newscenter/ e. g.: Perc, Matjaž Eur. J. Phys. archive/releases/2007/17/image/a 38 (4) 045801 (2017) sociology finance ... https://www.hzdr.de/db/Cms?pOid= 24344&pNid=2707 Müller, T., Heinig, K.-H. et al. Appl. Phys. Lett. 85 2373 (2004) Page 2/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  4. Non-Equilibrium vs Equilibrium Equilibrium Properties: out-of-Equilibrium: only final state relevant kinetics of interest ? ? ? ? disordered state ordered state J 8 -states Potts model, kBT = 5 8 -states Potts model optimal algorithm reproduces optimal algorithm reaches physical evolution equilibrium quickly Page 3/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  5. Non-Equilibrium Systems � � L 2 L 2 � � W 2 ( L, t ) = 1 h 2 i ( t ) − h i ( t ) L 2 i i 10 2 L � = lateral systemsize Interface Roughness W 2 = surface height at site i h i � 10 1 150 . 05 M MCS 10 0 20 . 5 M MCS 10 1 10 2 10 3 10 4 10 5 10 6 10 7 t [ Monte Carlo steps (MCS) ] 0 . 6 M MCS 0 . 1 M MCS Page 4/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  6. Domain Decomposition Stochastic Cellular Automaton Random Sequential (RS) on GPU: domain decomposition (SCA) 1 2 1 2 4 3 4 3 1 2 1 2 4 3 4 3 update odd/even sublattice + uncorrelated updates update probability p < 1 − < 48 B per domain in smem + linear memory access ⇒ fast Page 5/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  7. Parallel random sequential updates are hard. Why should we care for them? Page 6/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  8. Auto-Correlation of a Lattice Gas 10 1 C ( t, s ) = � φ ( t ) φ ( s ) �−� φ ( t ) � � φ ( s ) � t, s : time, waiting-time 10 0 10 − 1 C ( t, s ) · s 0 . 76 10 − 2 10 − 3 10 − 4 10 − 5 Random Sequential 10 − 6 10 0 10 1 10 2 10 3 t/s Page 7/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  9. Auto-Correlation of a Lattice Gas 10 1 10 0 10 − 1 C ( t, s ) · s 0 . 76 10 − 2 10 − 3 10 − 4 SCA − limit (correction) Checkerboard SCA 10 − 5 Random Sequential 10 − 6 10 0 10 1 10 2 10 3 t/s Page 7/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  10. KPZ–Equation for Surface Growth 10 2 Interface Roughness W 2 y p q x 10 1 2 β eff 10 0 2 + 1 D octahedron model 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Ódor, G., Liedke, B., Heinig, K.-H. Phys. Rev. E t [ Monte Carlo steps (MCS) ] 79 021125 (2009) + λ [ ∇ h ( x , t )] 2 + σ 2 ∇ 2 h ( x , t ) d t h ( x , t ) = v + η ( x , t ) ���� � �� � � �� � � �� � mean growth vel. surface tension local growth vel. noise Kardar–Parisi–Zhang stochastic differential equation Kardar, M., Parisi, G., Zhang, Y.-C. Phys. Rev. Lett. 56 889 (1986) Page 8/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  11. β and the Kim–Kosterlitz Hypothesis β = 1 / 4 ? Kim, J. M., Kosterlitz, J. M. Phys. Rev. Lett. 62 2289 (1989) octahedron model restricted solid-on-solid model ∆ h = ± 1 ∆ h ≤ N β ≈ 1 / 4 for N > 1 ? β < 1 / 4 0.246 12 0.244 13 16 17 0.242 β eff 0.24 0.238 0.236 0 0.02 0.04 0.06 0.08 1/2 1/t Kelling, J., Ódor, G. Phys. Rev. E 84 061150 (2011) Kim, J. M. J. Korean Phys. Soc. 67 (9) 1529 (2015) We need more states. Page 9/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  12. Part 2 Trivial parallism vs. SIMT Handling more states. Page 10/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  13. Trivial parallism vs. SIMT efficient simulation of independent copies vector of 32 , . . . , 128 , 256 , . . . layers . depending on application . . ⇒ “random” accesses to vectors in global memory ⇒ no caching of simulation state required ⇒ very efficient use of GPUs ⇒ (vector processors/data parallelism) Ito, N., Kanada, Y. Supercomputer 3 (25) 1988 Page 11/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  14. Trivial parallism vs. SIMT efficient simulation of independent copies Trivially parallel → Multi-Surface . . . �→ large samples ⇒ good statistics �→ large parameter studies �→ large sets of initial conditions + random site-selection Ito, N., Kanada, Y. Supercomputer 3 (25) 1988 Page 11/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  15. Multi-Surface Approach for GPUs 1 2 1 2 4 3 4 3 . . . 4 3 4 3 1 2 1 2 1 2 1 2 4 3 4 3 4 3 4 3 double-tiling at device layer ... with random origin Multi-Surface at block layer global memory ... multi-processor 1 multi-processor N shared memory , up to 48 kB shared memory , up to 48 kB ... ... thread 1 thread M thread 1 thread M sync sync sync sync sync Page 12/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  16. Decorrelating Samples random site-selection is about introducing uncorrelated noise we want to average over independent samples domain growth, phase ordering: structure evolution random initial conditions independent random update acceptance (Boltzmann factors exp ∆ E/k B T ) (quenched disorder) ⇒ no problem surface growth flat initial conditions ⇒ all simulations with identical site-selection would be identical randomly discard every 2nd update Page 13/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  17. Not Decorrelating Samples Cases where identical noise across samples is desirable: sampling initial conditions calculating response functions * parallel annealing Page 14/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  18. RSOS Multi-Surface 8 bits per lattice-site are enough ⇒ process 4 packed samples per thread 4 bits per height-difference word 0 ≡ thread 0 word 1 ≡ thread 1 � �� � � �� � sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) . . . randomly select 2 out of 4 samples for each thread ⇒ no idle threads Page 15/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  19. Collective Generation of Random Coordinates all threads access the same coordinate for each update ⇒ pre-compute list of update coordinates in shared memory each thread computes one component: 1 generate random number 2 apply transformations (origin shift, periodic boundary conditions) collectively refill list when used up Page 16/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  20. Performance 229 200 update attempts/ns bit-coded multi-surface number of states � 4 any number of states 100 large systems large samples 50 11 9 7 4 . 5 0 Octahedron RS Octahedron Octahedron RSOS RS Potts RS Potts RS SCA p = 0 . 95 SCA p = 0 . 5 Kawasaki Page 17/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

  21. Memory Limits: RSOS single-GPU implementations 64 threads per block ⇒ 256 samples ⇒ 256 B / MS lattice site ⇒ 2 12 × 2 12 sites need 4 GB of gmem + random number generator states Page 18/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend