October 2017
AMBER 16 on V100s October 2017 PME-Cellulose_NPT on V100s PCIe 50 - - PowerPoint PPT Presentation
AMBER 16 on V100s October 2017 PME-Cellulose_NPT on V100s PCIe 50 - - PowerPoint PPT Presentation
AMBER 16 on V100s October 2017 PME-Cellulose_NPT on V100s PCIe 50 47.67 40 24.6X (Untuned on Volta) Running AMBER version 16.8 30 ns/day The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs 20 The
13
PME-Cellulose_NPT on V100s PCIe
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
1.94 47.67 10 20 30 40 50
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
24.6X
14
PME-Cellulose_NPT on V100s SXM2
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
1.94 54.74 55.52 10 20 30 40 50 60
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
28.2X 28.6X
15
PME-Cellulose_NVE on V100s PCIe
1.96 54.08 10 20 30 40 50 60
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
27.6X
16
PME-Cellulose_NVE on V100s SXM2
1.96 63.04 65.02 10 20 30 40 50 60 70
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
32.2X 33.2X
17
PME-FactorIX_NPT on V100s PCIe
9.33 193.16 50 100 150 200 250
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
20.7X
18
PME-FactorIX_NPT on V100s SXM2
9.33 217.95 224.23 50 100 150 200 250
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
23.4X 24.0X
19
PME-FactorIX_NVE on V100s PCIe
9.61 217.95 50 100 150 200 250
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
22.7X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
20
PME-FactorIX_NVE on V100s SXM2
9.61 249.63 261.19 50 100 150 200 250 300
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
26.0X 27.2X
21
PME-JAC_NPT on V100s PCIe
34.35 439.87 100 200 300 400 500
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
12.8X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
22
PME-JAC_NPT on V100s SXM2
34.35 481.75 515.36 100 200 300 400 500 600
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
14.0X 15.0X
23
PME-JAC_NVE on V100s PCIe
36.53 490.77 100 200 300 400 500 600
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
13.4X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
24
PME-JAC_NVE on V100s SXM2
36.53 539.78 583.33 100 200 300 400 500 600 700
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
14.8X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
16.0X
25
PME-JAC_NPT_4fs on V100s PCIe
65.74 863.80 150 300 450 600 750 900
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
13.1X
26
PME-JAC_NPT_4fs on V100s SXM2
65.74 946.57 1006.32 200 400 600 800 1000 1200
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
14.4X 15.3X
27
PME-JAC_NVE_4fs on V100s PCIe
67.10 940.32 150 300 450 600 750 900 1050
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
26.0X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
28
PME-JAC_NVE_4fs on V100s SXM2
67.10 1027.44 1123.40 200 400 600 800 1000 1200
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
15.3X 16.7X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
29
PME-STMV_NPT_4fs on V100s PCIe
1.06 33.21 5 10 15 20 25 30 35
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
31.3X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
30
PME-STMV_NPT_4fs on V100s SXM2
1.06 37.24 5 10 15 20 25 30 35 40
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB)
ns/day
35.1X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
31
GB-Myoglobin on V100s PCIe
22.30 699.21 150 300 450 600 750
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)
ns/day
31.4X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
32
GB-Myoglobin on V100s SXM2
22.30 750.76 100 200 300 400 500 600 700 800
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB)
ns/day
33.7X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
33
GB-Nucleosome on V100s PCIe
0.31 49.14 78.39 17 34 51 68 85
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
158.5X 252.9X
34
GB-Nucleosome on V100s SXM2
0.31 52.89 92.46 25 50 75 100
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
170.6X 298.3X
35
Rubisco on V100s PCIe
0.01 2.79 5.22 6.78 1 2 3 4 5 6 7 8
1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB)
ns/day
279.0X 522.0X
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs
678.0X
36
Rubisco on V100s SXM2
0.01 3.00 5.96 7.00 1 2 3 4 5 6 7 8
1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB)
ns/day
(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs
300.0X 596.0X 700.0X
56
Recommended GPU Node Configuration for AMBER Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2 Cores per CPU socket 6+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 GPUs P100, V100 # of GPUs per CPU socket 1-4 GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better
Scale to multiple nodes with same single node configuration
56