- Feb. 2, 2017
Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating - - PowerPoint PPT Presentation
Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating - - PowerPoint PPT Presentation
Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV
2
Accelerating Discoveries
Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid — “the perfect target for fighting the infection.” Without gpu, the supercomputer would need to be 5x larger for similar performance.
3
Overview of Life & Material Accelerated Apps
MD: All key codes are GPU-accelerated Great multi-GPU performance Focus on dense (up to 16) GPU nodes &/or large # of GPU nodes
ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso, Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*, LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD, OpenMM, PolyFTS, SOP-GPU* & more
QC: All key codes are ported or optimizing Focus on using GPU-accelerated math libraries, OpenACC directives GPU-accelerated and available today:
ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS- UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012, NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack, Quantum Espresso/PWscf, QUICK, TeraChem*
Active GPU acceleration projects:
CASTEP, GAMESS, Gaussian, ONETEP, Quantum Supercharger Library*, VASP & more
green* = application where >90% of the workload is on GPU
4
MD vs. QC on GPUs
“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp)
Simulates positions of atoms over time; chemical-biological or chemical-material behaviors Calculates electronic properties; ground state, excited states, spectral properties, making/breaking bonds, physical properties Forces calculated from simple empirical formulas (bond rearrangement generally forbidden) Forces derived from electron wave function (bond rearrangement OK, e.g., bond energies) Up to millions of atoms Up to a few thousand atoms Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods Single precision dominated Double precision is important Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC Geforce (Workstations), Tesla (Servers) Tesla recommended ECC off ECC on
5
GPU-Accelerated Molecular Dynamics Apps
ACEMD AMBER CHARMM DESMOND ESPResSO Folding@Home GPUGrid.net GROMACS HALMD HOOMD-Blue LAMMPS mdcore Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.
MELD NAMD OpenMM PolyFTS
6
Benefits of MD GPU-Accelerated Computing
- 3x-8x Faster than CPU only systems in all tests (on average)
- Most major compute intensive aspects of classical MD ported
- Large performance boost with marginal price increase
- Energy usage cut by more than half
- GPUs scale well within a node and/or over multiple nodes
- K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
ACEMD
www.acellera.com
470 ns/day on 1 GPU for L-Iduronic acid (1362 atoms) 116 ns/day on 1 GPU for DHFR (23K atoms)
- M. Harvey, G. Giupponi and G. De Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory and
- Comput. 5, 1632 (2009)
www.acellera.com
NVT, NPT, PME, TCL, PLUMED, CAMSHIFT1
1 M. J. Harvey and G. De Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory Comput., 5, 2371–2377 (2009) 2 For a list of selected references see http://www.acellera.com/acemd/publications
June 2017
AMBER 16
11
JAC_NVE on GP100s
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
320.19 320.14 370.32 404.09 50 100 150 200 250 300 350 400 450 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
23,558 atoms PME 2fs
12
JAC_NVE on GP100s
614.42 613.16 714.23 782.11 100 200 300 400 500 600 700 800 900 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
23,558 atoms PME 4fs
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
13
JAC_NPT on GP100s
295.75 295.42 333.03 360.64 50 100 150 200 250 300 350 400 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
23,558 atoms PME 2fs
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
14
JAC_NPT on GP100s
580.47 578.48 654.66 706.53 100 200 300 400 500 600 700 800 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
23,558 atoms PME 4fs
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
15
FactorIX_NVE on GP100s
106.23 105.98 142.45 166.61 20 40 60 80 100 120 140 160 180 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
90,906 atoms PME
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
16
FactorIX_NPT on GP100s
102.27 102.26 126.75 146.34 20 40 60 80 100 120 140 160 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
90,906 atoms PME
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
17
Cellulose_NVE on GP100s
24.01 24.02 31.35 36.91 5 10 15 20 25 30 35 40 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
408,609 atoms PME
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
18
Cellulose_NPT on GP100s
22.76 22.8 28.76 32.37 5 10 15 20 25 30 35 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
408,609 atoms PME
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
19
STMV_NPT on GP100s
15.64 15.43 20.22 23.13 5 10 15 20 25 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
1,067,095 atoms PME
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
20
TRPCAGE on GP100s
1216.56 1187.3 250 500 750 1000 1250 1500 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) ns/day
304 atoms GB
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
21
Myoglobin on GP100s
470.41 458.28 443.49 447.23 150 300 450 600 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
2,492 atoms GB
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
22
Nucleosome on GP100s
11.47 11.29 21.29 20.51 5 10 15 20 25 1 node + 1x GP100 per node (PCIe) 1 node + 1x GP100 per node (NVLink) 1 node + 2x GP100 per node (PCIe) 1 node + 2x GP100 per node (NVLink) ns/day
25,095 atoms GB
Running AMBER version 16 The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs + Quadro GP100s GPUs (PCIe and NVLink)
February 2017
AMBER 16
24
PME-Cellulose_NPT on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.35 11.36 15.43 4 8 12 16 20 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
PME-Cellulose_NPT
4.8X 6.6X
25
PME-Cellulose_NPT on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.35 21.85 30.00 5 10 15 20 25 30 35 40 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-Cellulose_NPT
9.3X
12.8X
26
PME-Cellulose_NPT on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.35 23.37 32.22 36.65 5 10 15 20 25 30 35 40 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-Cellulose_NPT
9.9X 13.7X 15.6X
27
PME-Cellulose_NVE on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.47 11.85 16.53 4 8 12 16 20 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
PME-Cellulose_NVE
4.8X 6.7X
28
PME-Cellulose_NVE on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.47 23.34 32.55 5 10 15 20 25 30 35 40 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-Cellulose_NVE
9.4X 13.2X
29
PME-Cellulose_NVE on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.47 24.94 35.16 40.88 5 10 15 20 25 30 35 40 45 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-Cellulose_NVE
10.1X 14.2X 16.6X
30
PME-FactorIX_NPT on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.43 48.54 66.68 10 20 30 40 50 60 70 80 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
PME-FactorIX_NPT
4.2X 5.8X
31
PME-FactorIX_NPT on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.43 98.77 132.86 20 40 60 80 100 120 140 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-FactorIX_NPT
8.6X 11.6X
32
PME-FactorIX_NPT on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.43 106.25 144.11 159.80 20 40 60 80 100 120 140 160 180 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-FactorIX_NPT
9.3X 12.6X 14.0X
33
PME-FactorIX_NVE on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.98 51.14 71.49 10 20 30 40 50 60 70 80 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
PME-FactorIX_NVE
5.4X 6.0X
34
PME-FactorIX_NVE on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.98 105.86 145.83 20 40 60 80 100 120 140 160 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-FactorIX_NVE
8.8X 12.2X
35
PME-FactorIX_NVE on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.98 114.88 159.24 178.02 20 40 60 80 100 120 140 160 180 200 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-FactorIX_NVE 9.6X
13.3X 14.9X
36
PME-JAC_NPT on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
45.89 162.09 216.78 50 100 150 200 250 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
PME-JAC_NPT
3.5X 4.7X
37
PME-JAC_NPT on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
45.89 283.60 327.69 50 100 150 200 250 300 350 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-JAC_NPT
6.2X 7.1X
38
PME-JAC_NPT on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
45.89 310.52 360.64 423.09 50 100 150 200 250 300 350 400 450 1 Broadwell node 1 node + 1x P100 PCIe per node 1 node + 2x P100 PCIe per node 1 node + 4x P100 PCIe per node ns/day
PME-JAC_NPT
6.8X 7.9X 9.2X
39
PME-JAC_NVE on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
47.90 173.20 234.99 50 100 150 200 250 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
PME-JAC_NVE
3.6X 4.9X
40
PME-JAC_NVE on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
47.90 308.46 363.79 50 100 150 200 250 300 350 400 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-JAC_NVE
6.4X 7.6X
41
PME-JAC_NVE on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
47.90 339.81 402.18 473.10 50 100 150 200 250 300 350 400 450 500 1 Broadwell node 1 node + 1x P100 PCIe per node 1 node + 2x P100 PCIe per node 1 node + 4x P100 PCIe per node ns/day
PME-JAC_NVE
7.1X 8.4X 9.9X
42
GB-Myoglobin on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
28.86 288.47 339.45 50 100 150 200 250 300 350 400 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
GB-Myoglobin
10.0X 11.8X
43
GB-Myoglobin on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
28.86 483.37 561.94 100 200 300 400 500 600 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
GB-Myoglobin
16.7X 19.5X
44
GB-Myoglobin on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
28.86 534.28 639.37 100 200 300 400 500 600 700 1 Broadwell node 1 node + 1x P100 PCIe per node 1 node + 4x P100 PCIe per node ns/day
GB-Myoglobin
18.5X 22.2X
45
GB-Nucleosome on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.40 5.84 11.31 20.55 5 10 15 20 25 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node ns/day
GB-Nucleosome
14.6X 28.3X 51.4X
46
GB-Nucleosome on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.40 11.91 22.77 39.91 45.92 5 10 15 20 25 30 35 40 45 50 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node ns/day
GB-Nucleosome
29.8X 56.9X 99.8X 114.8X
47
GB-Nucleosome on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.40 13.36 25.53 46.29 48.29 10 20 30 40 50 60 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node ns/day
GB-Nucleosome
33.4X 63.8X 115.7X 120.7X
48
Rubisco-75K on K80s
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.35 0.69 1.34 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node ns/day
Rubisco-75K
35.0X 69.0X 134.0X
49
Rubisco-75K on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.71 1.40 2.69 4.20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node ns/day
Rubisco-75K
71.0X 140.0X 269.0X 420.0X
50
Rubisco-75K on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.80 1.57 3.06 4.46 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node ns/day
Rubisco-75K
80.0X 157.0X 306.0X 446.0X
AMBER 14
52
AMBER 14 vs. AMBER 12
Courtesy of Scott Le Grand From GTC 2014 presentation
53
AMBER 14; large P2P and small Boost Clocks impacts
2 x Xeon E5-2690 v2@3.00GHz + 4 x Tesla K40@745Mhz (no P2P) 2 x Xeon E5-2690 v2@3.00GHz + 4 x Tesla K40@875Mhz (no P2P) 2 x Xeon E5-2690 v2@3.00GHz + 4 x Tesla K40@745Mhz (P2P) 2 x Xeon E5-2690 v2@3.00GHz + 4 x Tesla K40@875Mhz (P2P) Series1 125.77 132.97 196.68 215.18 125.77 132.97 196.68 215.18 50 100 150 200 250
ns/day
AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks Impact DHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off)
Boost P2P Boost No P2P No Boost P2P No Boost No P2P
54
54
AMBER Performance Over Time
Courtesy of Scott Le Grand From GTC 2014 presentation
55
Cellulose on K40s, K80s and M6000s
Running AMBER version 14 The blue node contains Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro M6000@987Mhz GPUs
1.93 8.96 7.87 11.76 10.49 13.67 15.38 14.90 4 8 12 16 20 1 Haswell Node 1 CPU Node + 1x K40 1 CPU Node + 0.5x K80 1 CPU Node + 1x K80 1 CPU Node + 1x M6000 1 CPU Node + 2x K40 1 CPU Node + 2x K80 1 CPU Node + 2x M6000 Simulated Time (ns/day)
PME-Cellulose_NVE
4.1X 6.1X 5.4X 8.0X 7.7X 4.6X 7.1X
56
Factor IX on K40s, K80s and M6000s
Running AMBER version 14 The blue node contains Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro M6000@987Mhz GPUs
9.68 40.48 33.59 50.70 47.80 61.18 60.93 66.89 10 20 30 40 50 60 70 80 1 Haswell Node 1 CPU Node + 1x K40 1 CPU Node + 0.5x K80 1 CPU Node + 1x K80 1 CPU Node + 1x M6000 1 CPU Node + 2x K40 1 CPU Node + 2x K80 1 CPU Node + 2x M6000 Simulated Time (ns/day)
PME-FactorIX_NVE
3.5X 5.2X 5.0X 6.4X 6.3X 7.0X 4.2X
57
JAC on K40s, K80s and M6000s
Running AMBER version 14 The blue node contains Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro M6000@987Mhz GPUs
37.38 134.82 121.30 174.34 161.53 200.34 225.34 219.83 50 100 150 200 250 1 Haswell Node 1 CPU Node + 1x K40 1 CPU Node + 0.5x K80 1 CPU Node + 1x K80 1 CPU Node + 1x M6000 1 CPU Node + 2x K40 1 CPU Node + 2x K80 1 CPU Node + 2x M6000 Simulated Time (ns/day)
PME-JAC_NVE
3.2X 4.7X 4.3X 5.4X 6.0X 5.9X 3.6X
58
Cellulose on M40s
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
1.07 10.12 14.40 15.90
2 4 6 8 10 12 14 16 18 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
PME - Cellulose_NPT
9.5X 13.5X 14.9X
59
Cellulose on M40s
1.07 10.50 15.41 17.13
2 4 6 8 10 12 14 16 18 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
PME - Cellulose_NVE
9.8X 14.4X 16.0X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
60
FactorIX on M40s
5.38 46.90 67.37 72.96
10 20 30 40 50 60 70 80 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
PME - FactorIX_NPT
8.7X 12.5X 13.6X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
61
FactorIX on M40s
5.47 49.33 73.00 80.04
10 20 30 40 50 60 70 80 90 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
PME - FactorIX_NVE
9.0X 13.3X 14.6X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
62
JAC on M40s
20.88 149.40 211.97 226.63
50 100 150 200 250 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
PME - JAC_NPT
7.2X 10.2X 10.9X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
63
JAC on M40s
21.11 157.68 230.18 246.15
50 100 150 200 250 300 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
PME - JAC_NVE
7.5X 10.9X 11.7X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
64
Myoglobin on M40s
9.83 232.20 300.86 322.09
50 100 150 200 250 300 350 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
GB - Myoglobin
23.6X 30.6X 32.8X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
65
Nucleosome on M40s
0.13 4.67 9.05 16.11
2 4 6 8 10 12 14 16 18 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
GB - Nucleosome
35.9X 69.6X 123.9X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
66
TrpCage on M40s
408.88 831.91 551.36 464.63
100 200 300 400 500 600 700 800 900 1 Node 1 Node + 1x M40 per node 1 Node + 2x M40 per node 1 Node + 4x M40 per node
Simulated Time (ns/Day)
GB - TrpCage
2.03X 1.3X 1.1X
Running AMBER version 14 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs
67
Recommended GPU Node Configuration for AMBER Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 GPUs Kepler K20, K40, K80, P100 # of GPUs per CPU socket 1-4 GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better
Scale to multiple nodes with same single node configuration
67
July 2016
CHARMM DOMDEC-GUI
69
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 2.15 1 2 3 4 1 Haswell node 1 node + 1x K80 per node ns/day
465 K System (Her1_HER1_membrane)
6.0X
*Higher is better
70
CHARMM DOMDEC-GUI 534 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.18 1.43 0.0 0.5 1.0 1.5 2.0 1 Haswell node 1 node + 1x K80 per node ns/day
534 K System (POPC_PSPC_CHL1mixture)
*Higher is better
8.0X
71
CHARMM DOMDEC-GUI 20 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 16.00 59.68 20 40 60 80 1 Haswell node 1 node + 1x M40 per node ns/day
20 K System (Crambin)
*Higher is better
3.7X
72
CHARMM DOMDEC-GUI 61 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 3.90 25.08 5 10 15 20 25 30 35 1 Haswell node 1 node + 1x M40 per node ns/day
61 K System (GlnBP)
6.4X
*Higher is better
73
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 2.27 1 2 3 4 1 Haswell node 1 node + 1x M40 per node ns/day
465 K System (Her1_HER1_membrane)
*Higher is better
6.3X
October 2016
GROMACS 2016
75
Erik Lindahl (GROMACS developer) video
76
Water 1.5M on K80s
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79 5.22 6.14 1 2 3 4 5 6 7 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node ns/day
Water 1.5M
1.9X 2.2X
77
Water 3M on K80s
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32 2.66 3.05 1 1 2 2 3 3 4 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node ns/day
Water 3M
2.0X 2.3X
78
Water 1.5M on M40s
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79 6.15 7.60 1 2 3 4 5 6 7 8 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node ns/day
Water 1.5M
2.2X 2.7X
79
Water 3M on M40s
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32 2.97 3.94 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node ns/day
Water 3M
2.3X 3.0X
80
Water 1.5M on P40s
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P40 GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79 6.60 8.07 1 2 3 4 5 6 7 8 9 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node ns/day
Water 1.5M
2.4X 2.9X
81
Water 3M on P40s
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P40 GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32 3.36 4.19 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node ns/day
Water 3M
2.5X 3.2X
82
Water 1.5M on P100 PCIes
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79 6.34 7.11 1 2 3 4 5 6 7 8 1 Broadwell node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 1.5M
2.3X 2.5X
83
Water 3M on P100 PCIes
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32 3.16 3.43 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 3M
2.4X 2.6X
February 2017
GROMACS 5.1.2
85
Water 1.5M on K80s
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
3.04 3.49 5.75 1 2 3 4 5 6 7 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
Water 1.5M
1.1X 1.9X
86
Water 1.5M on P100s PCIe
3.04 4.39 6.96 7.21 2 4 6 8 10 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 1.5M
1.4X 2.3X 2.4X
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
87
Water 1.5M on P100s SXM2
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
3.04 4.11 6.70 7.18 7.88 1 2 3 4 5 6 7 8 9 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x 100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node ns/day
Water 1.5M
1.4X 2.2X 2.4X 2.6X
88
Water 3M on K80s
1.38 1.59 2.98 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node ns/day
Water 3M
1.2X 2.2X
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
89
Water 3M on P100s PCIe
1.38 1.96 3.43 3.80 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 3M
1.4X 2.5X 2.8X
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
90
Water 3M on P100s SXM2
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
1.38 1.84 3.50 3.82 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
Water 3M
1.3X 2.5X 2.8X
91
Recommended GPU Node Configuration for GROMACS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 GPUs Kepler K20, K40, K80 # of GPUs per CPU socket 1x Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or high-end AMD Opterons GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand
91
February 2017
HOOMD-Blue 1.3.3
93
lj-liquid on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
326.52 1324.84 1594.37 1942.12 500 1000 1500 2000 2500 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg time steps/sec
lj-liquid
4.1X 4.9X 5.9X
94
lj-liquid on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
326.52 2912.66 3217.68 500 1000 1500 2000 2500 3000 3500 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timesteps/sec
lj-liquid
8.9X 9.9X
95
lj-liquid on P100s SXM2
326.52 3129.11 3397.74 500 1000 1500 2000 2500 3000 3500 4000 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timesteps/sec
lj-liquid
9.6X 10.4X
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
96
lj_liquid_512k on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
43.43 220.10 334.59 526.47 100 200 300 400 500 600 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg timesteps/sec
lj_liquid_512k
5.1X 7.7X 12.1X
97
lj_liquid_512k on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
43.43 398.12 534.54 770.18 1045.50 200 400 600 800 1000 1200 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timesteps/sec
lj_liquid_512k
9.2X 12.3X 17.7X 24.1X
98
lj_liquid_512k on P100s SXM2
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
43.43 443.74 568.51 793.36 1119.76 200 400 600 800 1000 1200 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timesteps/sec
lj_liquid_512k
10.2X 13.1X 18.3X 25.8X
99
lj_liquid_1m on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
22.07 109.54 181.42 303.00 50 100 150 200 250 300 350 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg timesteps/sec
lj_liquid_1m
5.0X 8.2X 13.7X
100
lj_liquid_1m on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
22.07 204.67 294.88 465.58 672.46 100 200 300 400 500 600 700 800 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timesteps/sec
lj_liquid_1m
9.3X 13.4X 21.1X 30.5X
101
lj_liquid_1m on P100s SXM2
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
22.07 221.02 315.07 488.04 707.73 100 200 300 400 500 600 700 800 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timesteps/sec
lj_liquid_1m
10.0X 14.3X 22.1X 32.1X
102
Microsphere on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
17.53 64.87 98.43 166.74 20 40 60 80 100 120 140 160 180 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg timesteps/sec
microsphere
3.7X 5.6X 9.5X
103
Microsphere on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
17.53 145.71 179.54 257.58 371.24 50 100 150 200 250 300 350 400 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timesteps/sec
microsphere
8.3X 10.2X 14.7X 21.2X
104
Microsphere on P100s SXM2
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
17.53 151.51 186.01 271.21 384.72 50 100 150 200 250 300 350 400 450 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timesteps/sec
microsphere
8.6X 10.6X 15.5X 21.9X
105
Polymer on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
362.19 975.14 1209.45 1518.99 200 400 600 800 1000 1200 1400 1600 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg timesteps/sec
polymer
2.7X 3.3X 4.2X
106
Polymer on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
362.19 1999.64 2143.15 2480.70 500 1000 1500 2000 2500 3000 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timesteps/sec
polymer
5.5X 5.9X 6.8X
107
Polymer on P100s SXM2
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
362.19 2111.99 2272.27 2651.56 500 1000 1500 2000 2500 3000 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timesteps/sec
polymer
5.8X 6.3X 7.3X
108
Quasicrystal on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
78.32 502.53 767.90 1280.44 200 400 600 800 1000 1200 1400 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg timesteps/sec
quasicrystal
6.4X 9.8X 16.3X
109
Quasicrystal on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
78.32 851.29 1199.64 1791.41 2261.72 500 1000 1500 2000 2500 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timsteps/sec
quasicrystal
10.9X 15.3X 22.9X 28.9X
110
Quasicrystal on P100s SXM2
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
78.32 939.53 1249.90 1940.29 2429.68 500 1000 1500 2000 2500 3000 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timsteps/sec
quasicrystal
24.8X 31.0X 12.0X 16.0X
111
Triblock-copolymer on K80s
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
361.42 953.01 1170.47 1492.01 200 400 600 800 1000 1200 1400 1600 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node avg timesteps/sec
triblock-copolymer
2.6X 3.2X 4.1X
112
Triblock-copolymer on P100s PCIe
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
361.42 1999.14 2155.27 2456.09 500 1000 1500 2000 2500 3000 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node avg timesteps/sec
triblock-copolymer
5.5X 6.0X 6.8X
113
Triblock-copolymer on P100s SXM2
Running HOOMD-Blue version 1.3.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
361.42 2132.92 2253.83 2587.91 0.00 500.00 1000.00 1500.00 2000.00 2500.00 3000.00 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node avg timesteps/sec
triblock-copolymer
5.9X 6.2X 7.2X
February 2017
LAMMPS 2016
115
Atomic-Fluid Lennard-Jones 2.5 Cutoff on K80s
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.37 0.57 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 2x K80 per node 1/seconds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.5X
116
Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s PCIe
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.37 0.62 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 2x P100 PCIe (16GB) per node 1/seconds
Atomic-Fluid Lennard- Jones 2.5 Cutoff
1.7X
117
Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s SXM2
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (autoboost) GPUs
0.37 0.64 0.00 0.25 0.50 0.75 1.00 1 Broadwell node 1 node + 2x P100 SXM2 per node 1/seconds
Atomic-Fluid Lennard- Jones 2.5 Cutoff
1.7X
118
Atomic-Fluid Lennard-Jones 5.0 Cutoff on K80s
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.10 0.14 0.26 0.36 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node 1/seconds
Atomic-Fluid Lennard- Jones 5.0 Cutoff
1.4X 2.6X 3.6X
119
Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s PCIe
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.10 0.22 0.35 0.37 0.38 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node 1/seconds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
2.2X 3.5X 3.7X 3.8X
120
Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s SXM2
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.10 0.22 0.36 0.41 0.00 0.25 0.50 0.75 1.00 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1/seconds
Atomic-Fluid Lennard- Jones 5.0 Cutoff
2.2X 3.6X 4.1X
121
Course-grain Water on K80s
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.00437 0.00444 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 1 Broadwell node 1 node + 4x K80 per node 1/seconds
Course-grain Water
1.0X
122
Course-grain Water on P100s PCIe
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.0044 0.0061 0.0093 0.0000 0.0010 0.0020 0.0030 0.0040 0.0050 0.0060 0.0070 0.0080 0.0090 0.0100 1 Broadwell node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node 1/seconds
Course-grain Water
1.4X 2.1X
123
Course-grain Water on P100s SXM2
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.0044 0.0069 0.0110 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 1 Broadwell node 1 node + 4x P100 SXM2 per node 1 node + 8x 100 SXM2 per node 1/seconds
Course-grain Water
1.6X 2.5X
124
EAM on K80s
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.02 0.04 0.07 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node 1/seconds
EAM
2.0X 4.0X 7.0X
125
EAM on P100s PCIe
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.03 0.05 0.08 0.13 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node 1/seconds
EAM
3.0X 5.0X 8.0X 13.0X
126
EAM on P100s SXM2
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.03 0.05 0.08 0.13 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node 1/seconds
EAM
3.0X 5.0X 8.0X 13.0X
127
Gay-Berne on K80s
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.02 0.03 0.04 0.00 0.01 0.02 0.03 0.04 0.05 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node 1/seconds
Gay-Berne
2.0X 3.0X 4.0X
128
Gay-Berne on P100s PCIe
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.02 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1/seconds
Gay-Berne
2.0X 4.0X 5.0X
129
Gay-Berne on P100s SXM2
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.02 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 1 Broadwell node 1 node + 1x SXM2 per node 1 node + 2x SXM2 per node 1 node + 4x SXM2 per node 1/seconds
Gay-Berne
2.0X 4.0X 5.0X
130
Rhodopsin on K80s
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs ➢ 1x K80 is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.22 0.22 0.31 0.38 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node 1 node + 4x K80 per node 1/seconds
Rhodopsin
1.4X 1.7X
131
Rhodopsin on P100s PCIe
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.22 0.29 0.33 0.48 0.52 0.00 0.10 0.20 0.30 0.40 0.50 0.60 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node 1/seconds
Rhodopsin
1.3X 1.5X 2.2X 2.4X
132
Rhodopsin on P100s SXM2
Running LAMMPS version 2016 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.22 0.30 0.38 0.49 0.50 0.00 0.10 0.20 0.30 0.40 0.50 0.60 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node 1/seconds
Rhodopsin
1.4X 1.7X 2.2X 2.3X
133
Recommended GPU Node Configuration for LAMMPS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 GPUs GTX Titan X, Kepler K20, K40, K80, M40 # of GPUs per CPU socket 1-2 GPU memory preference (GB) 6+ GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand
Scale to thousands of nodes with same single node configuration
13 3
July 2017
NAMD 2.12
135
APOA1 on K80s
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
3.45 14.92 17.73 4 8 12 16 20
1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node
ns/day
APOA1
136
APOA1 on P100s PCIe
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
3.45 22.58 22.85 4 8 12 16 20 24 28
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node
ns/day
APOA1
137
APOA1 on P100s SXM2
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
3.45 22.98 23.44 23.87 5 10 15 20 25 30
1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node
ns/day
APOA1
138
F1ATPASE on K80s
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
1.15 4.81 6.27 2 4 6 8
1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node
ns/day
F1ATPASE
139
F1ATPASE on P100s PCIe
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
1.15 7.34 6.99 7.40 2 4 6 8 10
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node
ns/day
F1ATPASE
140
F1ATPASE on P100s SXM2
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
1.15 7.11 6.85 7.11 2 4 6 8 10
1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node
ns/day
F1ATPASE
141
STMV on K80s
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.292 1.274 2.085 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1 Broadwell node 1 node + 1x K80 per node 1 node + 2x K80 per node
ns/day
STMV
142
STMV on P100s PCIe
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.29 2.15 2.32 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node
ns/day
STMV
143
STMV on P100s SXM2
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.292 2.077 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1 Broadwell node 1 node + 1x P100 SXM2 per node
ns/day
STMV
NAMD 2.11 – Up to 2X Faster
145
New GPU features in NAMD 2.11
- GPU-accelerated simulations up to twice as fast as NAMD 2.10
- Pressure calculation with fixed atoms on GPU works as on CPU
- Improved scaling for GPU-accelerated particle-mesh Ewald calculation
- CPU-side operations overlap better and are parallelized across cores.
- Improved scaling for GPU-accelerated simulations
- Nonbonded force calculation results are streamed from the GPU for better overlap.
- NVIDIA CUDA GPU-acceleration binaries for Mac OS X
Selected Text from the NAMD website
146
NAMD 2.11 is up to 2x faster
5 10 15 20 25 1 Node 2 Nodes 4 Nodes
Simulated Time (ns/day)
APoA1 (92,224 atoms)
1.2X 1.6X 2.0X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 v2@2.7GHz (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
147
NAMD 2.11 APoA1 on 1 and 2 nodes
Running NAMD version 2.11 The blue nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
2.77 11.67 16.99 5.22 19.73 24.31
5 10 15 20 25 1 Node 1 Node + 1x K80 1 Node + 2x K80 2 Nodes 2 Nodes + 1x K80 2 Nodes + 2x K80
Simulated Time (ns/day)
APoA1
(92,224 atoms)
4.2X 6.1X 3.8X 4.7X
148
NAMD 2.11 APoA1 on 4 and 8 nodes
Running NAMD version 2.11 The blue nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
10.27 20.64 23.52 16.85 27.83 27.74
5 10 15 20 25 30 4 Nodes 4 Nodes + 1x K80 4 Nodes + 2x K80 8 Nodes 8 Nodes + 1x K80 8 Nodes + 2x K80
Simulated Time (ns/day)
APoA1
(92,224 atoms)
2.0X 2.3X 1.7X 1.6X
149
NAMD 2.11 is up to 1.8x faster
2 4 6 8 10 1 Node 2 Nodes 4 Nodes
Simulated Time (ns/day)
F1-ATPase (327,506 atoms)
1.1X 1.8X 1.4X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 v2@2.7GHz (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
150
NAMD 2.11 F1-ATPase on 1 and 2 nodes
Running NAMD version 2.11 The blue nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
0.94 3.87 6.11 1.86 7.23 10.58
5 10 15 1 Node 1 Node + 1x K80 1 Node + 2x K80 2 Nodes 2 Nodes + 1x K80 2 Nodes + 2x K80
Simulated Time (ns/day)
F1-ATPase
(327,506 atoms)
4.1X 6.5X 3.9X 5.7X
151
NAMD 2.11 F1-ATPase on 4 and 8 nodes
Running NAMD version 2.11 The blue nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
3.63 11.66 12.62 6.88 14.22 15.74
5 10 15 20 4 Nodes 4 Nodes + 1x K80 4 Nodes + 2x K80 8 Nodes 8 Nodes + 1x K80 8 Nodes + 2x K80
Simulated Time (ns/day)
F1-ATPase
(327,506 atoms)
3.2X 3.5X 2.1X 2.3X
152
NAMD 2.11 is up to 1.5x faster
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 Node 2 Nodes 4 Nodes
Simulated Time (ns/day)
STMV (1,066,628 atoms)
1.5X 1.1X 1.5X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 v2@2.7GHz (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
153
NAMD 2.11 STMV on 1 and 2 nodes
Running NAMD version 2.11 The blue nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz CPUs (Haswell) + Tesla K80 (autoboost) GPUs
0.23 1.03 1.75 0.46 1.98 3.27
1 2 3 4 1 Node 1 Node + 1x K80 1 Node + 2x K80 2 Nodes 2 Nodes + 1x K80 2 Nodes + 2x K80
Simulated Time (ns/day)
STMV
(1,066,628 atoms)
4.5X 7.6X 4.3X 7.1X
154
NAMD 2.11 STMV on 4 and 8 nodes
Running NAMD version 2.11 The blue nodes contain Dual Intel E5- 2698 v3@2.3GHz (Haswell) CPUs The green nodes contain Dual Intel E5- 2698 v3@2.3GHz CPUs (Haswell) + Tesla K80 (autoboost) GPUs
0.90 3.61 4.54 1.74 5.86 6.24
2 4 6 8 4 Nodes 4 Nodes + 1x K80 4 Nodes + 2x K80 8 Nodes 8 Nodes + 1x K80 8 Nodes + 2x K80
Simulated Time (ns/day)
STMV
(1,066,628 atoms)
4.0X 5.0X 3.4X 3.6X
155
Benefits of MD GPU-Accelerated Computing
- 3x-8x Faster than CPU only systems in all tests (on average)
- Most major compute intensive aspects of classical MD ported
- Large performance boost with marginal price increase
- Energy usage cut by more than half
- GPUs scale well within a node and/or over multiple nodes
- K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
- Dec. 19, 2016
Molecular Dynamics (MD) on GPUs
157
GPU-Accelerated Quantum Chemistry Apps
Abinit ACES III ADF BigDFT CP2K GAMESS-US Gaussian GPAW LATTE LSDalton MOLCAS Mopac2012 NWChem Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.