AMBER 16 on V100s October 2017 PME-Cellulose_NPT on V100s PCIe 50 - - PowerPoint PPT Presentation

amber 16 on v100s
SMART_READER_LITE
LIVE PREVIEW

AMBER 16 on V100s October 2017 PME-Cellulose_NPT on V100s PCIe 50 - - PowerPoint PPT Presentation

AMBER 16 on V100s October 2017 PME-Cellulose_NPT on V100s PCIe 50 47.67 40 24.6X (Untuned on Volta) Running AMBER version 16.8 30 ns/day The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs 20 The


slide-1
SLIDE 1

October 2017

AMBER 16 on V100s

slide-2
SLIDE 2

13

PME-Cellulose_NPT on V100s PCIe

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

1.94 47.67 10 20 30 40 50

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

24.6X

slide-3
SLIDE 3

14

PME-Cellulose_NPT on V100s SXM2

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

1.94 54.74 55.52 10 20 30 40 50 60

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

28.2X 28.6X

slide-4
SLIDE 4

15

PME-Cellulose_NVE on V100s PCIe

1.96 54.08 10 20 30 40 50 60

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

27.6X

slide-5
SLIDE 5

16

PME-Cellulose_NVE on V100s SXM2

1.96 63.04 65.02 10 20 30 40 50 60 70

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

32.2X 33.2X

slide-6
SLIDE 6

17

PME-FactorIX_NPT on V100s PCIe

9.33 193.16 50 100 150 200 250

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

20.7X

slide-7
SLIDE 7

18

PME-FactorIX_NPT on V100s SXM2

9.33 217.95 224.23 50 100 150 200 250

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

23.4X 24.0X

slide-8
SLIDE 8

19

PME-FactorIX_NVE on V100s PCIe

9.61 217.95 50 100 150 200 250

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

22.7X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

slide-9
SLIDE 9

20

PME-FactorIX_NVE on V100s SXM2

9.61 249.63 261.19 50 100 150 200 250 300

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

26.0X 27.2X

slide-10
SLIDE 10

21

PME-JAC_NPT on V100s PCIe

34.35 439.87 100 200 300 400 500

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

12.8X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

slide-11
SLIDE 11

22

PME-JAC_NPT on V100s SXM2

34.35 481.75 515.36 100 200 300 400 500 600

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

14.0X 15.0X

slide-12
SLIDE 12

23

PME-JAC_NVE on V100s PCIe

36.53 490.77 100 200 300 400 500 600

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

13.4X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

slide-13
SLIDE 13

24

PME-JAC_NVE on V100s SXM2

36.53 539.78 583.33 100 200 300 400 500 600 700

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

14.8X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

16.0X

slide-14
SLIDE 14

25

PME-JAC_NPT_4fs on V100s PCIe

65.74 863.80 150 300 450 600 750 900

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

13.1X

slide-15
SLIDE 15

26

PME-JAC_NPT_4fs on V100s SXM2

65.74 946.57 1006.32 200 400 600 800 1000 1200

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

14.4X 15.3X

slide-16
SLIDE 16

27

PME-JAC_NVE_4fs on V100s PCIe

67.10 940.32 150 300 450 600 750 900 1050

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

26.0X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

slide-17
SLIDE 17

28

PME-JAC_NVE_4fs on V100s SXM2

67.10 1027.44 1123.40 200 400 600 800 1000 1200

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

15.3X 16.7X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

slide-18
SLIDE 18

29

PME-STMV_NPT_4fs on V100s PCIe

1.06 33.21 5 10 15 20 25 30 35

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

31.3X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

slide-19
SLIDE 19

30

PME-STMV_NPT_4fs on V100s SXM2

1.06 37.24 5 10 15 20 25 30 35 40

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB)

ns/day

35.1X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

slide-20
SLIDE 20

31

GB-Myoglobin on V100s PCIe

22.30 699.21 150 300 450 600 750

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB)

ns/day

31.4X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

slide-21
SLIDE 21

32

GB-Myoglobin on V100s SXM2

22.30 750.76 100 200 300 400 500 600 700 800

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB)

ns/day

33.7X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

slide-22
SLIDE 22

33

GB-Nucleosome on V100s PCIe

0.31 49.14 78.39 17 34 51 68 85

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

158.5X 252.9X

slide-23
SLIDE 23

34

GB-Nucleosome on V100s SXM2

0.31 52.89 92.46 25 50 75 100

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

170.6X 298.3X

slide-24
SLIDE 24

35

Rubisco on V100s PCIe

0.01 2.79 5.22 6.78 1 2 3 4 5 6 7 8

1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB)

ns/day

279.0X 522.0X

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

678.0X

slide-25
SLIDE 25

36

Rubisco on V100s SXM2

0.01 3.00 5.96 7.00 1 2 3 4 5 6 7 8

1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB)

ns/day

(Untuned on Volta) Running AMBER version 16.8 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

300.0X 596.0X 700.0X

slide-26
SLIDE 26

56

Recommended GPU Node Configuration for AMBER Computational Chemistry

Workstation or Single Node Configuration

# of CPU sockets 2 Cores per CPU socket 6+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 GPUs P100, V100 # of GPUs per CPU socket 1-4 GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better

Scale to multiple nodes with same single node configuration

56