VASP 5.4.4 October 2017 Silica IFPEN on V100s PCIe 0.00700 - - PowerPoint PPT Presentation

vasp 5 4 4
SMART_READER_LITE
LIVE PREVIEW

VASP 5.4.4 October 2017 Silica IFPEN on V100s PCIe 0.00700 - - PowerPoint PPT Presentation

VASP 5.4.4 October 2017 Silica IFPEN on V100s PCIe 0.00700 0.00628 0.00600 (Untuned on Volta) 3.0X 0.00537 Running VASP version 5.4.4 0.00500 The blue node contains Dual Intel Xeon 2.6X E5-2690 v4@2.6GHz [3.5GHz Turbo] 0.00418


slide-1
SLIDE 1

October 2017

VASP 5.4.4

slide-2
SLIDE 2

65

Silica IFPEN on V100s PCIe

0.00210 0.00418 0.00537 0.00628 0.00000 0.00100 0.00200 0.00300 0.00400 0.00500 0.00600 0.00700 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

240 ions, cristobalite (high) bulk 720 bands ? plane waves ALGO = Very Fast (RMM-DIIS)

2.0X 2.6X 3.0X

slide-3
SLIDE 3

66

Silica IFPEN on V100s SXM2

0.00210 0.00423 0.00541 0.00580 0.00000 0.00100 0.00200 0.00300 0.00400 0.00500 0.00600 0.00700 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

240 ions, cristobalite (high) bulk 720 bands ? plane waves ALGO = Very Fast (RMM-DIIS)

2.0X 2.6X 2.8X

slide-4
SLIDE 4

67

Si-Huge on V100s PCIe

0.00017 0.00045 0.00057 0.00065 0.00000 0.00010 0.00020 0.00030 0.00040 0.00050 0.00060 0.00070 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

512 Si atoms 1282 bands 864000 Plane Waves Algo = Normal (blocked Davidson)

2.6X 3.4X 3.8X

slide-5
SLIDE 5

68

Si-Huge on V100s SXM2

0.00017 0.00044 0.00056 0.00067 0.00000 0.00010 0.00020 0.00030 0.00040 0.00050 0.00060 0.00070 0.00080 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

512 Si atoms 1282 bands 864000 Plane Waves Algo = Normal (blocked Davidson)

2.6X 3.3X 4.0X

slide-6
SLIDE 6

69

SupportedSystems on V100s PCIe

0.0037 0.0068 0.0087 0.0000 0.0010 0.0020 0.0030 0.0040 0.0050 0.0060 0.0070 0.0080 0.0090 0.0100 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

267 ions 788 bands 762048 plane waves ALGO = Fast (Davidson + RMM-DIIS)

1.8X 2.4X

slide-7
SLIDE 7

70

SupportedSystems on V100s SXM2

0.0037 0.0068 0.0087 0.0100 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

267 ions 788 bands 762048 plane waves ALGO = Fast (Davidson + RMM-DIIS)

1.8X 2.4X 2.7X

slide-8
SLIDE 8

71

NiAl-MD on V100s PCIe

0.0031 0.0063 0.0068 0.0000 0.0010 0.0020 0.0030 0.0040 0.0050 0.0060 0.0070 0.0080 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

500 ions 3200 bands 729000 plane waves ALGO = Fast (Davidson + RMM-DIIS)

2.0X 2.2X

slide-9
SLIDE 9

72

NiAl-MD on V100s SXM2

0.0031 0.0064 0.0070 0.0074 0.0000 0.0010 0.0020 0.0030 0.0040 0.0050 0.0060 0.0070 0.0080 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

500 ions 3200 bands 729000 plane waves ALGO = Fast (Davidson + RMM-DIIS)

2.1X 2.3X 2.4X

slide-10
SLIDE 10

73

B.hR105 on V100s PCIe

0.0008 0.0077 0.0112 0.0119 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 0.0140 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

105 Boron atoms (β-rhombohedral structure) 216 bands 110592 plane waves Hybrid Functional with blocked Davicson (ALGO=Normal) LHFCALC=.True. (Exact Exchange)

9.6X 14.0X 14.9X

slide-11
SLIDE 11

74

B.hR105 on V100s SXM2

0.0008 0.0079 0.0116 0.0128 0.0000 0.0020 0.0040 0.0060 0.0080 0.0100 0.0120 0.0140 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

105 Boron atoms (β-rhombohedral structure) 216 bands 110592 plane waves Hybrid Functional with blocked Davicson (ALGO=Normal) LHFCALC=.True. (Exact Exchange)

9.9X 14.5X 16.0X

slide-12
SLIDE 12

75

B.aP107 on V100s PCIe

0.000038 0.000323 0.000462 0.000490 0.000000 0.000100 0.000200 0.000300 0.000400 0.000500 0.000600 1 Broadwell node 1 node + 2x V100 PCIe per node (16GB) 1 node + 4x V100 PCIe per node (16GB) 1 node + 8x V100 PCIe per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2690 v4@2.6GHz [3.5GHz Turbo] (Broadwell) CPUs + Tesla V100 PCIe (16GB) GPUs

107 Boron atoms (symmetry broken 107-atom β′ variant) 216 bands 110592 plane waves Hybrid functional calculation (exact exchange) with blocked Davidson. No KPoint parallelization. Hybrid Functional with blocked Davidson (ALGO=Normal) LHFCALC=.True. (Exact Exchange)

8.5X 12.2X 12.9X

slide-13
SLIDE 13

76

B.aP107 on V100s SXM2

0.000038 0.000324 0.000465 0.000523 0.000000 0.000100 0.000200 0.000300 0.000400 0.000500 0.000600 1 Broadwell node 1 node + 2x V100 SXM2 per node (16GB) 1 node + 4x V100 SXM2 per node (16GB) 1 node + 8x V100 SXM2 per node (16GB) 1/seconds

(Untuned on Volta) Running VASP version 5.4.4 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla V100 SXM2 (16GB) GPUs

107 Boron atoms (symmetry broken 107-atom β′ variant) 216 bands 110592 plane waves Hybrid functional calculation (exact exchange) with blocked Davidson. No KPoint parallelization. Hybrid Functional with blocked Davidson (ALGO=Normal) LHFCALC=.True. (Exact Exchange)

8.5X 12.2X 13.8X