Analysis and Optimization of a Molecular Dynamics Code using PAPI - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing (ZIH) Analysis and Optimization of a Molecular Dynamics Code using PAPI and the Vampir Toolchain May 2, 2012 Thomas William Zellescher Weg 12 Willers-Bau A 34 +49 351 - 463 32446 Thomas.William@ZIH.TU-Dresden.de

Overview Introduction 1 Serial Analysis 2 PAPI measurements 3 Source Code Analysis 4 Source Code Optimization 5 Tracing and Visualization 6 Conclusion 7 1/34

Overview Introduction 1 1/34

Introduction IU ZIH FutureGrid MD 1/34

Introduction: MD Code Classical molecular dynamics simulation of dense nuclear matter consisting of either fully ionized atoms or free neutrons and protons Main targets are studies of the dense matter in white dwarf and neutron stars. Interaction potentials between particles treated as classical two-particle central potentials No complicated particle geometry or orientation Electrons can be treated as a uniform background charge (not explicitly modeled) 2/34

MD Code Details Particle-particle-interactions have been implemented in a multitude of ways Located in different files PP01, PP02 and PP03 The code blocks are selectable using preprocessor macros PP01 is the original implementation with no division into the Ax, Bx or NBS blocks PP02 implements the versions in use by the physicists today 3 different implementations for the Ax block 3 implementations for the Bx block No manual blocking (NBS) PP03 includes new routines 3 Ax blocks 8 Bx blocks Can be blocked using the NBS value 3/34

MD Code Details Two sections of code each have two or more variations One section is labelled A and the other B Variations are numbered MD can be compiled as a serial, OpenMP , MPI or MPI+OpenMP ⌥ make MDEF=XRay md_mpi ALG=PP02 BLKA=A0 BLKB=B2 NBS="NBSX=0" ⌃ ⇧ ⌦ 4/34

MD Workflow The structure of the algorithm is simple. First reads in a parameter file runmd.in And an initial particle configuration file md.in Program then calculates the initial set of accelerations Enters a time-stepping loop, a triply nested ”do” loop 5/34

Runtime Parameters ⌥ ! Parameters : sim_type = ’ ion − mixture ’ , ! simulation type t s t a r t = 0.00 , ! s t a r t time dt = 25.00 , ! time step ( fm / c ) !Warmup: nwgroup = 2 , ! groups nwsteps = 50 , ! steps per group ! Measurement : ngroup = 2 , ! groups ntot = 2 , ! per group nind = 25 , ! steps between tnormalize = 50 , ! temp normal . ncom = 50 , ! center − of − mass ! motion cancel . ⌃ ⇧ ⌦ Figure : Runtime parameters, snippet from runmd.in 6/34

The Main Loop ⌥ do 100 ig =1 ,ngroup i n i t i a l i z e group ig s t a t i s t i c s do 40 j =1 , ntot do i =1 , nind ! computes forces ! updates x and v call newton enddo call v t o t enddo compute group ig s t a t i s t i c s 40 continue 100 continue ⌃ ⇧ ⌦ Figure : Simplified version of the main loop 7/34

MD Implementation Details - newton module Forces are calculated in the newton module in a pair of nested do-loops Outer loop goes over target particles Inner loop over source particles Targets are assigned to MPI processes in a round-robin fashion Within each MPI process, the work is shared among OpenMP threads 8/34

Overview Serial Analysis 2 9/34

XRay - a Cray XT5m TM Cray XT5m provided by the FutureGrid project XT5m is a 2D mesh of nodes Each node has two sockets each having four cores AMD Opteron 23 ”Shanghai” (45mm) running at 2.4 GHz 84 compute nodes with a total of 672 cores pgi/9.0.4 using the XT PE driver xtpe-shanghai 9/34

Time Constraints 5k particles measurement takes 5 minutes 27k particles measurement takes 1 hour 55k particles measurement takes 10 hours 10/34

Overview PP01, PP02, and PP03 Run$me'for'all'code'combina$ons' 60000" 50000" run$me'in'seconds' 40000" 30000" O3" 20000" O2" FASTSSE" 10000" 0" " " " " " " " " " " " " " " " " " " " " " " " " 0 2 6 8 2 6 8 4 2 2 6 8 4 2 6 8 2 4 2 6 8 2 6 8 B B 1 0 0 5 2 6 3 0 5 2 6 3 1 0 0 6 3 1 0 0 5 2 " " 0 0 0 2 1 0 0 0 2 1 0 0 0 0 0 0 0 0 0 0 2 1 0 1 " " " " " " " " " " " " " " " " " " " " " " A A 0 1 2 2 3 4 5 0 0 1 2 3 4 5 6 0 1 2 3 4 4 5 B B B B B B B B B B B B B B B B B B B B B B " " 1 2 " " " " " " " " " " " " " " " " " " " " " " 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 P P A A A A A A A A A A A A A A A A A A A A A A P P " " " " " " " " " " " " " " " " " " " " " " 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P P Figure : Overview of the runtimes for all (143) code-block combinations with an ion-mix input dataset of 55k particles. The naming scheme for the measurements is ”source-code-file A-block B-block blockingfactor” 11/34

PP02 Code Blocks Run$me,'55k'par$cle'run' 50000" 45000" 40000" Run$me'in'seconds' 35000" 30000" 25000" 20000" O3" 15000" O2" 10000" FASTSSE" 5000" 0" " " " " " " " " " " 0 0 1 2 0 1 2 0 1 2 B B B B B B B B B B " " " " " " " " " " 0 0 0 0 1 1 1 2 2 2 A A A A A A A A A A " " " " " " " " " " 1 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 P P P P P P P P P P P P P P P P P P P P Source'code'file'and'code'block'version' 12/34

Annotated Source for -O3 ⌥ ## do 90 j=i+1,n − 1 ## ! −−−−−− Block A −−−−−− ## # if defined(A0) ## r2=0.0d0 movsd % xmm2 , % xmm1 movq %r12, % rdx movq %r15, % rcx movl $8, % eax .align 16 .LB2_555: ## lineno: 138 ⌃ ⇧ ⌦ 13/34

Annotated Source for -fast ⌥ ## do 90 j=i+1,n − 1 ## ! −−−−−− Block A −−−−−− ## # if defined(A0) ## r2=0.0d0 ## do k=1,3 ## xx(k)=x(k, i) − x(k,j ) movlpd .BSS2+48(%rip),% xmm2 movlpd .C2_291(%rip),% xmm0 mulsd % xmm2 ,% xmm2 addsd % xmm1 ,% xmm2 movlpd % xmm2 ,344(% rsp ) sqrtsd % xmm2 ,% xmm2 movlpd % xmm2 ,448(% rsp ) mulsd md_globals_10_+120(%rip),% xmm2 % xmm2 ,% xmm0 subsd .p2align 4,,1 ⌃ ⇧ ⌦ 14/34

Overview PAPI measurements 3 15/34

Floating Point Instructions PAPI_FAD_INS" PAPI_FML_INS" 1.2E+10" 1.4E+10" 1.2E+10" 1E+10" 1E+10" #"of"instruc,ons" #"of"instruc,ons" 8E+09" 8E+09" 6E+09" O2" O2" 6E+09" O3" O3" 4E+09" FAST" FAST" 4E+09" 2E+09" 2E+09" 0" 0" A0_B0" A0_B1" A0_B2" A1_B0" A1_B1" A1_B2" A2_B0" A2_B1" A2_B2" A0_B0" A0_B1" A0_B2" A1_B0" A1_B1" A1_B2" A2_B0" A2_B1" A2_B2" code"block" code"block" PAPI_FP_INS" 2.5E+10" 2E+10" #"of"instruc,ons" 1.5E+10" O2" 1E+10" O3" FAST" 5E+09" 0" A0_B0" A0_B1" A0_B2" A1_B0" A1_B1" A1_B2" A2_B0" A2_B1" A2_B2" code"block" 15/34

FPU idle times FPU%idle%&me%in%%% 8.00%$ 7.00%$ 6.00%$ idle%&me%in%%% 5.00%$ 4.00%$ O2$ O3$ 3.00%$ FAST$ 2.00%$ 1.00%$ 0.00%$ A0_B0$ A0_B1$ A0_B2$ A1_B0$ A1_B1$ A1_B2$ A2_B0$ A2_B1$ A2_B2$ code%block% Figure : FPU idle times in percent of PAPI measured total cycles 16/34

Branch Miss Predictions PAPI_BR_INS" PAPI_BR_MSP" 1.2E+10" 6.E+08% 1E+10" 5.E+08% #"of"instruc,ons" #"of"instruc,ons" 8E+09" 4.E+08% 6E+09" O2" 3.E+08% O2% O3" O3% 4E+09" 2.E+08% FAST" FAST% 2E+09" 1.E+08% 0" 0.E+00% A0_B0" A0_B1" A0_B2" A1_B0" A1_B1" A1_B2" A2_B0" A2_B1" A2_B2" A0_B0% A0_B1% A0_B2% A1_B0% A1_B1% A1_B2% A2_B0% A2_B1% A2_B2% code"block" code"block" Branch$predic4on$miss$rate$ 16.00%$ 14.00%$ 12.00%$ miss$rate$in$%$ 10.00%$ 8.00%$ O2$ O3$ 6.00%$ FAST$ 4.00%$ 2.00%$ 0.00%$ A0_B0$ A0_B1$ A0_B2$ A1_B0$ A1_B1$ A1_B2$ A2_B0$ A2_B1$ A2_B2$ code$block$ 17/34

Overview Source Code Analysis 4 18/34

Analysis and Optimization of a Molecular Dynamics Code using PAPI - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing (ZIH) Analysis and Optimization of a Molecular Dynamics Code using PAPI and the Vampir Toolchain May 2, 2012 Thomas William Zellescher Weg 12 Willers-Bau A 34 +49 351 - 463 32446

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Reaction dynamics of small bio- -molecular ions with molecular ions with Reaction dynamics of

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

nudged elastic band method Molecular dynamics Car-Parrinello MD Meta-dynamics QM/MM

MOLECULAR DYNAMICS STUDY OF LIPOSOMES WITH A NEW COARSE-GRAINED MOLECULAR MODEL Wataru SHINODA

NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01 1 Molecular dynamics and NAMD MD

WP8 Molecular Dynamics of Time-Dependent Phenomena WP8 Molecular Dynamics of

High-throughput molecular dynamics simulation and Markov modeling Frank No (FU Berlin)

Molecular Dynamics Simulation A Short Introduction Michel Cuendet 1 Molecular Dynamics

Replica-exchange in molecular dynamics Part of 2014 SeSE course in Advanced molecular dynamics

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

2. Thermodynamics Introduction Understanding Molecular Simulation Molecular Simulations

2. Thermodynamics Introduction Understanding Molecular Simulation Molecular Simulations

Runtime Environments Where We Are Source Lexical Analysis Code Syntax Analysis Semantic

Workshop An ODOT Certification User Group / APWA Workshop April 3, 8:30-12:00 Eugene Hilton 99

Ce r tific ation Use r Gr oup 2019 Annua l Me e ting Building Mo me ntum o n the CUG Visio n

COGNOS USER GROUP February 25, 2020 COGNOS USER GROUP The Newberry Variety Show!* *But no

QUARTER 2014-2015 27 APRIL, 2015 1 AGENDA KEY POINTS SALES AT THE END OF THE 3 RD QUARTER

Cathy Bakk Lead Compliance Program Coordinator Notice of Audit Packet January 28, 2013

RTGS UDFS Replacement of ASI 2 and ASI 3 by standard payment functionality Meeting: TCCG

Global System for Mobile Communications (GSM) 818 West Diamond Avenue - Third Floor,

Township wnship of of Scugog Scugog Pr Presenta esentation tion Mar March h 11, 2015 11,