Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs - PowerPoint PPT Presentation

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming 8th May 2017 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

1 Introduction: What is this talk about? surface growth, physical aging (and non-equilibrium systems) lattice Monte-Carlo y p q x 2 Trivial parallism vs. SIMT Page 1/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Applications for Monte Carlo: Stochastic Prosesses http://en.wikipedia.org/wiki/File: game theory Rub_al_Khali_002.JPG http://hubblesite.org/newscenter/ e. g.: Perc, Matjaž Eur. J. Phys. archive/releases/2007/17/image/a 38 (4) 045801 (2017) sociology finance ... https://www.hzdr.de/db/Cms?pOid= 24344&pNid=2707 Müller, T., Heinig, K.-H. et al. Appl. Phys. Lett. 85 2373 (2004) Page 2/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Non-Equilibrium vs Equilibrium Equilibrium Properties: out-of-Equilibrium: only final state relevant kinetics of interest ? ? ? ? disordered state ordered state J 8 -states Potts model, kBT = 5 8 -states Potts model optimal algorithm reproduces optimal algorithm reaches physical evolution equilibrium quickly Page 3/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Non-Equilibrium Systems � � L 2 L 2 � � W 2 ( L, t ) = 1 h 2 i ( t ) − h i ( t ) L 2 i i 10 2 L � = lateral systemsize Interface Roughness W 2 = surface height at site i h i � 10 1 150 . 05 M MCS 10 0 20 . 5 M MCS 10 1 10 2 10 3 10 4 10 5 10 6 10 7 t [ Monte Carlo steps (MCS) ] 0 . 6 M MCS 0 . 1 M MCS Page 4/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Domain Decomposition Stochastic Cellular Automaton Random Sequential (RS) on GPU: domain decomposition (SCA) 1 2 1 2 4 3 4 3 1 2 1 2 4 3 4 3 update odd/even sublattice + uncorrelated updates update probability p < 1 − < 48 B per domain in smem + linear memory access ⇒ fast Page 5/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Parallel random sequential updates are hard. Why should we care for them? Page 6/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Auto-Correlation of a Lattice Gas 10 1 C ( t, s ) = � φ ( t ) φ ( s ) �−� φ ( t ) � � φ ( s ) � t, s : time, waiting-time 10 0 10 − 1 C ( t, s ) · s 0 . 76 10 − 2 10 − 3 10 − 4 10 − 5 Random Sequential 10 − 6 10 0 10 1 10 2 10 3 t/s Page 7/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Auto-Correlation of a Lattice Gas 10 1 10 0 10 − 1 C ( t, s ) · s 0 . 76 10 − 2 10 − 3 10 − 4 SCA − limit (correction) Checkerboard SCA 10 − 5 Random Sequential 10 − 6 10 0 10 1 10 2 10 3 t/s Page 7/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

KPZ–Equation for Surface Growth 10 2 Interface Roughness W 2 y p q x 10 1 2 β eff 10 0 2 + 1 D octahedron model 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Ódor, G., Liedke, B., Heinig, K.-H. Phys. Rev. E t [ Monte Carlo steps (MCS) ] 79 021125 (2009) + λ [ ∇ h ( x , t )] 2 + σ 2 ∇ 2 h ( x , t ) d t h ( x , t ) = v + η ( x , t ) �� mean growth vel. surface tension local growth vel. noise Kardar–Parisi–Zhang stochastic differential equation Kardar, M., Parisi, G., Zhang, Y.-C. Phys. Rev. Lett. 56 889 (1986) Page 8/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

β and the Kim–Kosterlitz Hypothesis β = 1 / 4 ? Kim, J. M., Kosterlitz, J. M. Phys. Rev. Lett. 62 2289 (1989) octahedron model restricted solid-on-solid model ∆ h = ± 1 ∆ h ≤ N β ≈ 1 / 4 for N > 1 ? β < 1 / 4 0.246 12 0.244 13 16 17 0.242 β eff 0.24 0.238 0.236 0 0.02 0.04 0.06 0.08 1/2 1/t Kelling, J., Ódor, G. Phys. Rev. E 84 061150 (2011) Kim, J. M. J. Korean Phys. Soc. 67 (9) 1529 (2015) We need more states. Page 9/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Part 2 Trivial parallism vs. SIMT Handling more states. Page 10/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Trivial parallism vs. SIMT efficient simulation of independent copies vector of 32 , . . . , 128 , 256 , . . . layers . depending on application . . ⇒ “random” accesses to vectors in global memory ⇒ no caching of simulation state required ⇒ very efficient use of GPUs ⇒ (vector processors/data parallelism) Ito, N., Kanada, Y. Supercomputer 3 (25) 1988 Page 11/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Trivial parallism vs. SIMT efficient simulation of independent copies Trivially parallel → Multi-Surface . . . �→ large samples ⇒ good statistics �→ large parameter studies �→ large sets of initial conditions + random site-selection Ito, N., Kanada, Y. Supercomputer 3 (25) 1988 Page 11/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Multi-Surface Approach for GPUs 1 2 1 2 4 3 4 3 . . . 4 3 4 3 1 2 1 2 1 2 1 2 4 3 4 3 4 3 4 3 double-tiling at device layer ... with random origin Multi-Surface at block layer global memory ... multi-processor 1 multi-processor N shared memory , up to 48 kB shared memory , up to 48 kB ... ... thread 1 thread M thread 1 thread M sync sync sync sync sync Page 12/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Decorrelating Samples random site-selection is about introducing uncorrelated noise we want to average over independent samples domain growth, phase ordering: structure evolution random initial conditions independent random update acceptance (Boltzmann factors exp ∆ E/k B T ) (quenched disorder) ⇒ no problem surface growth flat initial conditions ⇒ all simulations with identical site-selection would be identical randomly discard every 2nd update Page 13/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Not Decorrelating Samples Cases where identical noise across samples is desirable: sampling initial conditions calculating response functions * parallel annealing Page 14/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

RSOS Multi-Surface 8 bits per lattice-site are enough ⇒ process 4 packed samples per thread 4 bits per height-difference word 0 ≡ thread 0 word 1 ≡ thread 1 � �� sample 0 sample 1 sample 2 sample 3 sample 4 sample 5 sample 6 sample 7 � �� ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) ( x, y ) . . . randomly select 2 out of 4 samples for each thread ⇒ no idle threads Page 15/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Collective Generation of Random Coordinates all threads access the same coordinate for each update ⇒ pre-compute list of update coordinates in shared memory each thread computes one component: 1 generate random number 2 apply transformations (origin shift, periodic boundary conditions) collectively refill list when used up Page 16/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Performance 229 200 update attempts/ns bit-coded multi-surface number of states � 4 any number of states 100 large systems large samples 50 11 9 7 4 . 5 0 Octahedron RS Octahedron Octahedron RSOS RS Potts RS Potts RS SCA p = 0 . 95 SCA p = 0 . 5 Kawasaki Page 17/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Memory Limits: RSOS single-GPU implementations 64 threads per block ⇒ 256 samples ⇒ 256 B / MS lattice site ⇒ 2 12 × 2 12 sites need 4 GB of gmem + random number generator states Page 18/22 Member of the Helmholtz Association Jeffrey Kelling, Géza Ódor, Martin Weigel, Sibylle Gemming | FWIO | http//www.hzdr.de

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs - PowerPoint PPT Presentation

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs Jeffrey Kelling, Gza dor, Martin Weigel, Sibylle Gemming 8th May 2017 Member of the Helmholtz Association Jeffrey Kelling, Gza dor, Martin Weigel, Sibylle Gemming | FWIO

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

QUASI-EQUILIBRIUM MONTE-CARLO: OFF-LATTICE KINETIC MONTE CARLO SIMULATION OF HETEROEPITAXY

Algebraic Study of Lattice-Valued Logic and Lattice-Valued Modal Logic Yoshihiro Maruyama

Lattice gas simulations Tony Kim Spring 2007 18.354 Project 1) Introducing the lattice gas;

Lattice Points in Polytopes Richard P. Stanley U. Miami & M.I.T. A lattice polygon Georg

When is the lattice of closure operators on a subgroup lattice again a subgroup lattice? Martha

Energy Depositions For Lattices 1 and 2 Lattice 1 Lattice 2 Two scenarios FODO bend FODO

24 States in Total 14 States: Prison Programs 16 States: Jail Programs 2 States: Federal

Local Correlation with Local Vol and Stochastic Vol : Towards Correlation dynamics ? Pascal

JESSIES LAW & 42 CFR PART 2 Dana Richter Senator Shelley Moore Capito JESSICA GRUBB 1986

G RANTEE K ICK -O FF W EBINAR November 13, 2014 W ELCOME AND P URPOSE Gary M. Blau, Ph.D.

Understanding Privacy Laws for Physical and Behavioral Health Information Sharing September 29,

Arnold v. Sarn and the Implementation of SAMHSA Fidelity Tools in Maricopa County Kelli M.

1 Under the SOX (2002), the PCAOB is required to conduct inspections of a sample of conducted

Best Practices in Writing an Evaluation Plan NORC at the University of Chicago Presenters

1 2 When looking for the correct model we do indeed resemble the proverbial blind man in a

MultiBUGS : A parallel implementation of the BUGS modelling framework for faster Bayesian

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs - PowerPoint PPT Presentation

Efficient Correlation-Free Many-States Lattice Monte Carlo on GPUs Jeffrey Kelling, Gza dor, Martin Weigel, Sibylle Gemming 8th May 2017 Member of the Helmholtz Association Jeffrey Kelling, Gza dor, Martin Weigel, Sibylle Gemming | FWIO

Correlation Course Title Correlation Correlation coe ffi cient between -1 and 1 Sign

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Toward Efficient Many-to-Many Broadcast in Dynamic Wireless Networks Fabian Mager , Carsten

Theory of correlation transfer and correlation structure in recurrent networks Ruben Moreno-Bote

Business Statistics CONTENTS The correlation coefficient The rank correlation coefficient

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

QUASI-EQUILIBRIUM MONTE-CARLO: OFF-LATTICE KINETIC MONTE CARLO SIMULATION OF HETEROEPITAXY

Algebraic Study of Lattice-Valued Logic and Lattice-Valued Modal Logic Yoshihiro Maruyama

Lattice gas simulations Tony Kim Spring 2007 18.354 Project 1) Introducing the lattice gas;

Lattice Points in Polytopes Richard P. Stanley U. Miami &amp; M.I.T. A lattice polygon Georg

When is the lattice of closure operators on a subgroup lattice again a subgroup lattice? Martha

Energy Depositions For Lattices 1 and 2 Lattice 1 Lattice 2 Two scenarios FODO bend FODO

24 States in Total 14 States: Prison Programs 16 States: Jail Programs 2 States: Federal

Local Correlation with Local Vol and Stochastic Vol : Towards Correlation dynamics ? Pascal

JESSIES LAW &amp; 42 CFR PART 2 Dana Richter Senator Shelley Moore Capito JESSICA GRUBB 1986

G RANTEE K ICK -O FF W EBINAR November 13, 2014 W ELCOME AND P URPOSE Gary M. Blau, Ph.D.

Understanding Privacy Laws for Physical and Behavioral Health Information Sharing September 29,

Arnold v. Sarn and the Implementation of SAMHSA Fidelity Tools in Maricopa County Kelli M.

1 Under the SOX (2002), the PCAOB is required to conduct inspections of a sample of conducted

Best Practices in Writing an Evaluation Plan NORC at the University of Chicago Presenters

1 2 When looking for the correct model we do indeed resemble the proverbial blind man in a

MultiBUGS : A parallel implementation of the BUGS modelling framework for faster Bayesian

Lattice Points in Polytopes Richard P. Stanley U. Miami & M.I.T. A lattice polygon Georg

JESSIES LAW & 42 CFR PART 2 Dana Richter Senator Shelley Moore Capito JESSICA GRUBB 1986