October 2017 and RELION too
October 2017 and RELION too Accelerating Discoveries Using a - - PowerPoint PPT Presentation
October 2017 and RELION too Accelerating Discoveries Using a - - PowerPoint PPT Presentation
Molecular Dynamics (MD) on GPUs October 2017 and RELION too Accelerating Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom
2
Accelerating Discoveries
Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid — “the perfect target for fighting the infection.” Without gpu, the supercomputer would need to be 5x larger for similar performance.
3
Overview of Life & Material Accelerated Apps
MD: All key codes are GPU-accelerated Great multi-GPU performance Focus on dense (up to 16) GPU nodes &/or large # of GPU nodes
ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso, Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*, LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD, OpenMM, PolyFTS, SOP-GPU* & more
QC: All key codes are ported or optimizing Focus on using GPU-accelerated math libraries, OpenACC directives GPU-accelerated and available today:
ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS- UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012, NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack, Quantum Espresso/PWscf, QUICK, TeraChem*
Active GPU acceleration projects:
CASTEP, GAMESS, Gaussian, ONETEP, Quantum Supercharger Library*, VASP & more
green* = application where >90% of the workload is on GPU
4
MD vs. QC on GPUs
“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp)
Simulates positions of atoms over time; chemical-biological or chemical-material behaviors Calculates electronic properties; ground state, excited states, spectral properties, making/breaking bonds, physical properties Forces calculated from simple empirical formulas (bond rearrangement generally forbidden) Forces derived from electron wave function (bond rearrangement OK, e.g., bond energies) Up to millions of atoms Up to a few thousand atoms Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods Single precision dominated Double precision is important Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC GeForce (Workstations), Tesla (Servers) Tesla recommended ECC off ECC on
5
GPU-Accelerated Molecular Dynamics Apps
ACEMD AMBER CHARMM DESMOND ESPResSO Folding@Home GENESIS GPUGrid.net GROMACS HALMD HOOMD-Blue HTMD Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.
LAMMPS mdcore MELD NAMD OpenMM PolyFTS
6
Benefits of MD GPU-Accelerated Computing
- 3x-8x Faster than CPU only systems in all tests (on average)
- Most major compute intensive aspects of classical MD ported
- Large performance boost and save “Big Money” on CPUs, networks
- Energy usage cut by more than half
- GPUs scale well within a node and/or over multiple nodes
- P100 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
August 2017
RELION 2.0.3
8
Plasmodium ribosome on P100s PCIe
Running RELION version 2.0.3 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
Data Citation: http://en.community.dell.com/techcenter/hi gh-performance- computing/b/general_hpc/archive/2017/03/1 4/application-performance-on-p100-pcie-gpus
0.0003 0.0027 0.0046 0.0070 0.0101 0.0112 0.0120 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 0.0140
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 2 nodes + 8x P100 PCIe (16GB) 3 nodes + 12x P100 PCIe (16GB) 4 nodes + 16x P100 PCIe (16GB)
1/Minutes
ACEMD
: Extremely efficient and robust MD software built on GPUs
610 ns/day on 1 GPU for DHFR (23K atoms)
- M. Harvey, G. Giupponi and G. de Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory
and Comput. 5, 1632 (2009)
- Standardised and easy to use: ACEMD reads CHARMM/NAMD and AMBER input files
and uses similar syntax to other MD software.
- Fully featured: NVT, NPT, PME, TCL, PLUMED.1
- Robust: ACEMD is a proven computational engine and is used in one of the largest
distributed projects Worldwide: GPUGRID.
- Compatible: ACEMD works with CUDA and OpenCL, the new standard framework for
parallel and high-performance computing.
- Validated: ACEMD is used in reputable academic and industrial institutions. Results
describing its applications have appeared in peer-reviewed journals of high impact such as Nature Chemistry, PNAS, Scientific Reports, PLoS and JACS.2
- 1. M. J. Harvey, and G. de Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory
Comput., 5, 2371-2377 (2009)
- 2. For a list of selected references see http://www.acellera.com/science
October 2017
AMBER 16.8
13
PME-Cellulose_NPT on V100s PCIe
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
1.94 47.67 10 20 30 40 50
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
24.6X
14
PME-Cellulose_NPT on V100s SXM2
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
1.94 54.74 55.52 10 20 30 40 50 60
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
28.2X 28.6X
15
PME-Cellulose_NVE on V100s PCIe
1.96 54.08 10 20 30 40 50 60
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
27.6X
16
PME-Cellulose_NVE on V100s SXM2
1.96 63.04 65.02 10 20 30 40 50 60 70
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
32.2X 33.2X
17
PME-FactorIX_NPT on V100s PCIe
9.33 193.16 50 100 150 200 250
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
20.7X
18
PME-FactorIX_NPT on V100s SXM2
9.33 217.95 224.23 50 100 150 200 250
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
23.4X 24.0X
19
PME-FactorIX_NVE on V100s PCIe
9.61 217.95 50 100 150 200 250
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
22.7X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
20
PME-FactorIX_NVE on V100s SXM2
9.61 249.63 261.19 50 100 150 200 250 300
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
26.0X 27.2X
21
PME-JAC_NPT on V100s PCIe
34.35 439.87 100 200 300 400 500
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
12.8X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
22
PME-JAC_NPT on V100s SXM2
34.35 481.75 515.36 100 200 300 400 500 600
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
14.0X 15.0X
23
PME-JAC_NVE on V100s PCIe
36.53 490.77 100 200 300 400 500 600
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
13.4X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
24
PME-JAC_NVE on V100s SXM2
36.53 539.78 583.33 100 200 300 400 500 600 700
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
14.8X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
16.0X
25
PME-JAC_NPT_4fs on V100s PCIe
65.74 863.80 150 300 450 600 750 900
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
13.1X
26
PME-JAC_NPT_4fs on V100s SXM2
65.74 946.57 1006.32 200 400 600 800 1000 1200
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
14.4X 15.3X
27
PME-JAC_NVE_4fs on V100s PCIe
67.10 940.32 150 300 450 600 750 900 1050
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
26.0X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
28
PME-JAC_NVE_4fs on V100s SXM2
67.10 1027.44 1123.40 200 400 600 800 1000 1200
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
15.3X 16.7X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
29
PME-STMV_NPT_4fs on V100s PCIe
1.06 33.21 5 10 15 20 25 30 35
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
31.3X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
30
PME-STMV_NPT_4fs on V100s SXM2
1.06 37.24 5 10 15 20 25 30 35 40
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB)
ns/day
35.1X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
31
GB-Myoglobin on V100s PCIe
22.30 699.21 150 300 450 600 750
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
31.4X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
32
GB-Myoglobin on V100s SXM2
22.30 750.76 100 200 300 400 500 600 700 800
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB)
ns/day
33.7X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
33
GB-Nucleosome on V100s PCIe
0.31 49.14 78.39 17 34 51 68 85
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
158.5X 252.9X
34
GB-Nucleosome on V100s SXM2
0.31 52.89 92.46 25 50 75 100
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
170.6X 298.3X
35
Rubisco on V100s PCIe
0.01 2.79 5.22 6.78 1 2 3 4 5 6 7 8
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB)
ns/day
279.0X 522.0X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
678.0X
36
Rubisco on V100s SXM2
0.01 3.00 5.96 7.00 1 2 3 4 5 6 7 8
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
300.0X 596.0X 700.0X
February 2017
AMBER 16
38
PME-Cellulose_NPT on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.35 21.85 30.00 5 10 15 20 25 30 35 40 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-Cellulose_NPT
9.3X
12.8X
39
PME-Cellulose_NPT on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.35 23.37 32.22 36.65 5 10 15 20 25 30 35 40 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-Cellulose_NPT
9.9X 13.7X 15.6X
40
PME-Cellulose_NVE on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.47 23.34 32.55 5 10 15 20 25 30 35 40 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-Cellulose_NVE
9.4X 13.2X
41
PME-Cellulose_NVE on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
2.47 24.94 35.16 40.88 5 10 15 20 25 30 35 40 45 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-Cellulose_NVE
10.1X 14.2X 16.6X
42
PME-FactorIX_NPT on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.43 98.77 132.86 20 40 60 80 100 120 140 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-FactorIX_NPT
8.6X 11.6X
43
PME-FactorIX_NPT on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.43 106.25 144.11 159.80 20 40 60 80 100 120 140 160 180 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-FactorIX_NPT
9.3X 12.6X 14.0X
44
PME-FactorIX_NVE on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.98 105.86 145.83 20 40 60 80 100 120 140 160 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-FactorIX_NVE
8.8X 12.2X
45
PME-FactorIX_NVE on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
11.98 114.88 159.24 178.02 20 40 60 80 100 120 140 160 180 200 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
PME-FactorIX_NVE 9.6X
13.3X 14.9X
46
PME-JAC_NPT on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
45.89 283.60 327.69 50 100 150 200 250 300 350 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-JAC_NPT
6.2X 7.1X
47
PME-JAC_NPT on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
45.89 310.52 360.64 423.09 50 100 150 200 250 300 350 400 450 1 Broadwell node 1 node + 1x P100 PCIe per node 1 node + 2x P100 PCIe per node 1 node + 4x P100 PCIe per node ns/day
PME-JAC_NPT
6.8X 7.9X 9.2X
48
PME-JAC_NVE on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
47.90 308.46 363.79 50 100 150 200 250 300 350 400 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node ns/day
PME-JAC_NVE
6.4X 7.6X
49
PME-JAC_NVE on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
47.90 339.81 402.18 473.10 50 100 150 200 250 300 350 400 450 500 1 Broadwell node 1 node + 1x P100 PCIe per node 1 node + 2x P100 PCIe per node 1 node + 4x P100 PCIe per node ns/day
PME-JAC_NVE
7.1X 8.4X 9.9X
50
GB-Myoglobin on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
28.86 483.37 561.94 100 200 300 400 500 600 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
GB-Myoglobin
16.7X 19.5X
51
GB-Myoglobin on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
28.86 534.28 639.37 100 200 300 400 500 600 700 1 Broadwell node 1 node + 1x P100 PCIe per node 1 node + 4x P100 PCIe per node ns/day
GB-Myoglobin
18.5X 22.2X
52
GB-Nucleosome on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.40 11.91 22.77 39.91 45.92 5 10 15 20 25 30 35 40 45 50 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node ns/day
GB-Nucleosome
29.8X 56.9X 99.8X 114.8X
53
GB-Nucleosome on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.40 13.36 25.53 46.29 48.29 10 20 30 40 50 60 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node ns/day
GB-Nucleosome
33.4X 63.8X 115.7X 120.7X
54
Rubisco-75K on P100s PCIe
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.71 1.40 2.69 4.20 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node 1 node + 8x P100 PCIe (16GB) per node ns/day
Rubisco-75K
71.0X 140.0X 269.0X 420.0X
55
Rubisco-75K on P100s SXM2
Running AMBER version 16.3 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
0.01 0.80 1.57 3.06 4.46 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node ns/day
Rubisco-75K
80.0X 157.0X 306.0X 446.0X
56
Recommended GPU Node Configuration for AMBER Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 GPUs P100, V100 # of GPUs per CPU socket 1-4 GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better
Scale to multiple nodes with same single node configuration
56
July 2016
CHARMM DOMDEC-GUI
58
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 2.15 1 2 3 4 1 Haswell node 1 node + 1x K80 per node ns/day
465 K System (Her1_HER1_membrane)
6.0X
*Higher is better
59
CHARMM DOMDEC-GUI 534 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.18 1.43 0.0 0.5 1.0 1.5 2.0 1 Haswell node 1 node + 1x K80 per node ns/day
534 K System (POPC_PSPC_CHL1mixture)
*Higher is better
8.0X
60
CHARMM DOMDEC-GUI 20 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 16.00 59.68 20 40 60 80 1 Haswell node 1 node + 1x M40 per node ns/day
20 K System (Crambin)
*Higher is better
3.7X
61
CHARMM DOMDEC-GUI 61 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 3.90 25.08 5 10 15 20 25 30 35 1 Haswell node 1 node + 1x M40 per node ns/day
61 K System (GlnBP)
6.4X
*Higher is better
62
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 2.27 1 2 3 4 1 Haswell node 1 node + 1x M40 per node ns/day
465 K System (Her1_HER1_membrane)
*Higher is better
6.3X
October 2017
GROMACS 2016.4
64
Water 1.5M on P100s PCIe
2.28 5.34 7.30 1 2 3 4 5 6 7 8 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 2x V100 PCIe per node (16GB) ns/day
2.3X 3.2X
(Untuned on Volta) Running GROMACS version 2016.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
65
Water 3M on P100s PCIe
1.12 2.53 3.85 1 1 2 2 3 3 4 4 5 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 2x V100 PCIe per node (16GB) ns/day
2.3X 3.4X
(Untuned on Volta) Running GROMACS version 2016.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
October 2016
GROMACS 2016
67
Water 1.5M on P100 PCIes
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79 6.34 7.11 1 2 3 4 5 6 7 8 1 Broadwell node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 1.5M
2.3X 2.5X
68
Water 3M on P100 PCIes
Running GROMACS version 2016 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32 3.16 3.43 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 3M
2.4X 2.6X
February 2017
GROMACS 5.1.2
70
Water 1.5M on P100s PCIe
3.04 4.39 6.96 7.21 2 4 6 8 10 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 1.5M
1.4X 2.3X 2.4X
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
71
Water 1.5M on P100s SXM2
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
3.04 4.11 6.70 7.18 7.88 1 2 3 4 5 6 7 8 9 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x 100 SXM2 per node 1 node + 4x P100 SXM2 per node 1 node + 8x P100 SXM2 per node ns/day
Water 1.5M
1.4X 2.2X 2.4X 2.6X
72
Water 3M on P100s PCIe
1.38 1.96 3.43 3.80 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node ns/day
Water 3M
1.4X 2.5X 2.8X
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs ➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
73
Water 3M on P100s SXM2
Running GROMACS version 5.1.2 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs ➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell)
1.38 1.84 3.50 3.82 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node ns/day
Water 3M
1.3X 2.5X 2.8X
74
Recommended GPU Node Configuration for GROMACS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 GPUs Tesla P100, V100 # of GPUs per CPU socket 1x Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or high-end AMD Opterons GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand
74
September 2017
HOOMD-Blue 2.1.6
76
lj-liquid on V100s PS PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
238.47 2730.94 3890.73 500 1000 1500 2000 2500 3000 3500 4000 4500
1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB)
ns/day
lj-liquid
77
microsphere on V100s PS PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
9.79 182.20 262.79 360.26 298.15 371.06 466.88 50 100 150 200 250 300 350 400 450 500
1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 8x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB) 1 node + 8x V100 PS PCIe per node (16GB)
ns/day
microsphere
78
quasicrystal on V100s PS PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
52.82 1184.14 1819.76 2371.16 2530.74 500 1000 1500 2000 2500 3000
1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB)
ns/day
quasicrystal
79
triblock-copolymer on V100s PS PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
234.71 1972.93 2761.75 500 1000 1500 2000 2500 3000
1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB)
ns/day
triblock-copolymer
80
dodecahedron on V100s PS PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
25.84 121.49 196.18 226.39 172.28 277.85 293.25 50 100 150 200 250 300 350
1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 8x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB) 1 node + 8x V100 PS PCIe per node (16GB)
ns/day
dodecahedron
81
hexagon on V100s PS PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
6.33 30.30 55.15 102.16 37.30 69.55 126.70 20 40 60 80 100 120 140
1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 8x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB) 1 node + 8x V100 PS PCIe per node (16GB)
ns/day
hexagon
October 2017
HOOMD-Blue 2.1.6
83
lj-liquid on V100s PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
238.47 3890.73 500 1000 1500 2000 2500 3000 3500 4000 4500
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
Average TPS
16.3X
84
lj-liquid on V100s SXM2
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
238.47 4285.59 4435.12 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
Average TPS
18.0X 18.6X
85
microsphere on V100 PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
9.79 298.15 371.06 466.88 50 100 150 200 250 300 350 400 450 500
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB)
Average TPS
30.5X 37.9X 47.7X
86
microsphere on V100s SXM2
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
9.79 329.43 506.09 688.99 100 200 300 400 500 600 700 800
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB)
Average TPS
33.6X 51.7X 70.4X
87
quasicrystal on V100s PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
52.82 2371.16 2530.74 500 1000 1500 2000 2500 3000
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB)
Average TPS
44.9X 47.9X
88
quasicrystal on V100s SXM2
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
52.82 2546.38 3015.42 500 1000 1500 2000 2500 3000 3500
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
Average TPS
48.2X 57.1X
89
triblock-copolymer on V100s PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
234.71 2761.75 500 1000 1500 2000 2500 3000
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
Average TPS
11.8X
90
triblock-copolymer on V100s SXM2
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
234.71 2958.60 3188.84 500 1000 1500 2000 2500 3000 3500
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
Average TPS
12.6X 13.6X
91
dodecahedron on V100s PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
25.84 172.28 277.85 293.25 50 100 150 200 250 300 350
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB)
Average TPS
6.7X 10.8X 11.3X
92
dodecahedron on V100s SXM2
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
25.84 179.94 309.65 317.00 50 100 150 200 250 300 350
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB)
Average TPS
7.0X 12.0X 12.3X
93
hexagon on V100s PCIe
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
6.33 37.30 69.55 126.70 20 40 60 80 100 120 140
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB)
Average TPS
5.9X 11.0X 20.0X
94
hexagon on V100s SXM2
(Untuned on Volta) Running HOOMD-Blue version 2.1.6 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
6.33 38.70 69.08 119.50 20 40 60 80 100 120 140
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB)
Average TPS
6.1X 10.9X 18.9X
October 2017
LAMMPS 2017
96
Atomic-Fluid Lennard-Jones 2.5 Cutoff on V100s PCIe
0.25 0.73 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
3.0X
97
Atomic-Fluid Lennard-Jones 2.5 Cutoff on V100s SXM2
0.25 0.82 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
3.3X
98
Atomic-Fluid Lennard-Jones 5.0 Cutoff on V100s PCIe
0.06 0.45 0.47 0.60 0.00 0.20 0.40 0.60 0.80 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
7.5X 7.8X 10.0X
99
Atomic-Fluid Lennard-Jones 5.0 Cutoff on V100s SXM2
0.06 0.48 0.55 0.56 0.00 0.20 0.40 0.60 0.80 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
8.0X 9.2X 9.3X
100
Course-grain Water on V100s PCIe
0.003 0.007 0.011 0.016 0.000 0.005 0.010 0.015 0.020 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
2.3X 3.7X 5.3X
101
Course-grain Water on V100s SXM2
0.003 0.009 0.014 0.020 0.000 0.005 0.010 0.015 0.020 0.025 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds
2,048,000 atoms
3.0X
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
4.7X 6.7X
102
Gay-Berne on V100s PCIe
0.01 0.05 0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1/seconds
2,097,152 atoms
7.5X
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
103
Gay-Berne on V100s SXM2
0.01 0.05 0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1/seconds
2,097,152 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
5.0X
104
Rhodopsin on V100s PCIe
0.17 0.44 0.53 0.58 0.00 0.20 0.40 0.60 0.80 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds
256,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
2.6X 3.1X 3.4X
105
Rhodopsin on V100s SXM2
0.17 0.42 0.60 0.68 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds
256,000 atoms
2.5X 3.5X
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
4.0X
September 2017
LAMMPS 2017
107
Atomic-Fluid Lennard-Jones 2.5 Cutoff on V100s PS PCIe
0.25 0.73 0.73 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
108
Atomic-Fluid Lennard-Jones 2.5 Cutoff on V100s PS SXM2
0.25 0.70 0.82 0.00 0.20 0.40 0.60 0.80 1.00 1 Broadwell node 1 node + 2x P100 SXM2 per node (16GB) 1 node + 2x V100 PS SXM2 per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (16GB) or V100 PS SXM2 (16GB) GPUs
109
Atomic-Fluid Lennard-Jones 5.0 Cutoff on V100s PS PCIe
0.06 0.41 0.46 0.56 0.45 0.47 0.60 0.00 0.20 0.40 0.60 0.80 1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 8x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB) 1 node + 8x V100 PS PCIe per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
110
Atomic-Fluid Lennard-Jones 5.0 Cutoff on V100s PS SXM2
0.06 0.36 0.44 0.44 0.48 0.55 0.56 0.00 0.20 0.40 0.60 0.80 1 Broadwell node 1 node + 2x P100 SXM2 per node (16GB) 1 node + 4x P100 SXM2 per node (16GB) 1 node + 8x P100 SXM2 per node (16GB) 1 node + 2x V100 PS SXM2 per node (16GB) 1 node + 4x V100 PS SXM2 per node (16GB) 1 node + 8x V100 PS SXM2 per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (16GB) or V100 PS SXM2 (16GB) GPUs
111
Course-grain Water on V100s PS PCIe
0.003 0.004 0.007 0.011 0.007 0.011 0.016 0.000 0.005 0.010 0.015 0.020 1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 8x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB) 1 node + 8x V100 PS PCIe per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
112
Course-grain Water on V100s PS SXM2
0.003 0.004 0.007 0.012 0.009 0.014 0.020 0.000 0.005 0.010 0.015 0.020 0.025 1 Broadwell node 1 node + 2x P100 SXM2 per node (16GB) 1 node + 4x P100 SXM2 per node (16GB) 1 node + 8x P100 SXM2 per node (16GB) 1 node + 2x V100 PS SXM2 per node (16GB) 1 node + 4x V100 PS SXM2 per node (16GB) 1 node + 8x V100 PS SXM2 per node (16GB) 1/seconds
2,048,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (16GB) or V100 PS SXM2 (16GB) GPUs
113
Gay-Berne on V100s PS PCIe
0.01 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1/seconds
2,097,152 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
114
Gay-Berne on V100s PS SXM2
0.01 0.04 0.05 0.00 0.01 0.02 0.03 0.04 0.05 0.06 1 Broadwell node 1 node + 2x P100 SXM2 per node (16GB) 1 node + 2x V100 PS SXM2 per node (16GB) 1/seconds
2,097,152 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (16GB) or V100 PS SXM2 (16GB) GPUs
115
Rhodopsin on V100s PS PCIe
0.17 0.41 0.55 0.44 0.53 0.00 0.20 0.40 0.60 0.80 1 Broadwell node 1 node + 2x P100 PCIe per node (16GB) 1 node + 4x P100 PCIe per node (16GB) 1 node + 2x V100 PS PCIe per node (16GB) 1 node + 4x V100 PS PCIe per node (16GB) 1/seconds
256,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) or V100 PS PCIe (16GB) GPUs
116
Rhodopsin on V100s PS SXM2
0.17 0.40 0.54 0.42 0.60 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 1 Broadwell node 1 node + 2x P100 SXM2 per node (16GB) 1 node + 4x P100 SXM2 per node (16GB) 1 node + 2x V100 PS SXM2 per node (16GB) 1 node + 4x V100 PS SXM2 per node (16GB) 1/seconds
256,000 atoms
(Untuned on Volta) Running LAMMPS version 2017 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (16GB) or V100 PS SXM2 (16GB) GPUs
117
Recommended GPU Node Configuration for LAMMPS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 GPUs GTX Titan X, Tesla P100, V100 # of GPUs per CPU socket 1-2 GPU memory preference (GB) 6+ GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand
Scale to thousands of nodes with same single node configuration
11 7
July 2017
NAMD 2.12
119
APOA1 on P100s PCIe
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
3.45 22.58 22.85 4 8 12 16 20 24 28
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node
ns/day
APOA1
120
APOA1 on P100s SXM2
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
3.45 22.98 23.44 23.87 5 10 15 20 25 30
1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node
ns/day
APOA1
121
F1ATPASE on P100s PCIe
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
1.15 7.34 6.99 7.40 2 4 6 8 10
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node 1 node + 4x P100 PCIe (16GB) per node
ns/day
F1ATPASE
122
F1ATPASE on P100s SXM2
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
1.15 7.11 6.85 7.11 2 4 6 8 10
1 Broadwell node 1 node + 1x P100 SXM2 per node 1 node + 2x P100 SXM2 per node 1 node + 4x P100 SXM2 per node
ns/day
F1ATPASE
123
STMV on P100s PCIe
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.29 2.15 2.32 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1 Broadwell node 1 node + 1x P100 PCIe (16GB) per node 1 node + 2x P100 PCIe (16GB) per node
ns/day
STMV
124
STMV on P100s SXM2
Running NAMD version 2.12 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.292 2.077 0.0 0.5 1.0 1.5 2.0 2.5 3.0
1 Broadwell node 1 node + 1x P100 SXM2 per node
ns/day
STMV
125
Benefits of MD GPU-Accelerated Computing
- 3x-8x Faster than CPU only systems in all tests (on average)
- Most major compute intensive aspects of classical MD ported
- Large performance boost and save “Big Money” on CPUs, networks
- Energy usage cut by more than half
- GPUs scale well within a node and/or over multiple nodes
- K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
- Dec. 19, 2016
Molecular Dynamics (MD) on GPUs
127
GPU-Accelerated Quantum Chemistry Apps
Abinit ACES III ADF BigDFT CP2K GAMESS-US Gaussian GPAW LATTE LSDalton MOLCAS Mopac2012 NWChem Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.