bwHPC: Hardware and Storage Architecture Peter Weisbrod, SCC, KIT - - PowerPoint PPT Presentation

bwhpc hardware and storage architecture
SMART_READER_LITE
LIVE PREVIEW

bwHPC: Hardware and Storage Architecture Peter Weisbrod, SCC, KIT - - PowerPoint PPT Presentation

bwHPC: Hardware and Storage Architecture Peter Weisbrod, SCC, KIT Steinbuch Centre for Computjng (SCC) www.bwhpc-c5.de Funding: Reference: bwHPC-C5 Best Practjces Repository Most informatjon given by this talk can be found at


slide-1
SLIDE 1

Steinbuch Centre for Computjng (SCC) Funding:

www.bwhpc-c5.de

bwHPC: Hardware and Storage Architecture

Peter Weisbrod, SCC, KIT

slide-2
SLIDE 2

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

2

Reference: bwHPC-C5 Best Practjces Repository

Most informatjon given by this talk can be found at htup://bwhpc-c5.de/wiki:

Category:Hardware_and_Architecture Or choose the cluster, then „Hardware and Architecture“ or „File Systems“

slide-3
SLIDE 3

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

3

Clusters @ Tier 2+3

bwForCluster JUSTUS (12/2014): Computatjonal Chemistry bwForCluster BinAC (11/2016): Bioinformatjs, Astrophysics bwForCluster NEMO (09/2016): Neurosciences, Micro Systems Engineering, Elementary Partjcle Physics

bwUniCluster (02/2014): General purpose, Teaching & Educatjon ForHLR I+II (09/2014),(03/2016): Research, high scalability

bwForCluster MLS& WISO (10/2015): Economics & Social Science, Molecular Life Science

Karlsruhe Ulm Freiburg Tübingen Mannheim Heideberg

Hazel Hen ForHLR

bwUniCluster JUSTUS MLS&WISO NEMO BinAC

slide-4
SLIDE 4

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

4

System Architecture

slide-5
SLIDE 5

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

5

System and Storage Architecture (bwUniCluster)

each (compute/login) node has sixteen Intel Xeon processors, local memory, disks and network adapters, connected by fast InfjniBand 4X FDR interconnect Roles:

Login Nodes Compute Nodes File Server Nodes Administratjve Server Nodes

slide-6
SLIDE 6

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

6

bwUniCluster

Federated HPC tjer 3 resources Selected characteristjcs: General purpose HPC entry level incl. educatjon Universitjes are Shareholders Federated operatjons, multjlevel fairsharing

Thin Fat In Preparatjon # nodes 512 8 352 Core/node 16 32 28 Processor 2.6 GHz (Sandy Br.) 2.4 GHz (Sandy Br.) 2.0 GHz (Broadwell) Main Mem 64 GiB 1024 GiB 128 GiB Local Storage 2 TB HDD 7 TB HDD 480 GB SSD Interconnect InfjniBand 4x FDR InfjniBand FDR/EDR Blocking 1:1 (50%), 1:8 (50%) 1:1 PFS – HOME 427 TB Lustre PFS – Workspaces 853 TB Lustre

slide-7
SLIDE 7

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

7

System Propertjes (1)

Compute node types:

Thin: for applicatjons using high number of processors, distributed memory, communicatjon over InfjniBand (MPI) Fat: for shared memory applicatjons (OpenMP or explicit multjthreading) Other types exist on some clusters

Processor types:

(older ← → newer) … – Sandy Bridge – Ivy Bridge – Haswell – Broadwell – ...

Main memory:

Useful to know when requestjng resources (pmem, mem) during batch job submission

slide-8
SLIDE 8

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

8

System Propertjes (2)

Local Storage:

Size and read/write performance interestjng when using local fjle system ($TMP / $TMPDIR)

InfjniBand:

(older ← → newer, higher speed, lower latency) … – QDR – FDR – EDR – … Or Omni-Path instead

Blocking:

Ratjo of uplink and downlink bandwidth Non-blocking if equal Example bwUnicluster: both blocking and „fat tree“ area

slide-9
SLIDE 9

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

9

bwUniCluster

Federated HPC tjer 3 resources Selected characteristjcs: General purpose HPC entry level incl. educatjon Universitjes are Shareholders Federated operatjons, multjlevel fairsharing

Thin Fat In Preparatjon # nodes 512 8 352 Core/node 16 32 28 Processor 2.6 GHz (Sandy Br.) 2.4 GHz (Sandy Br.) 2.0 GHz (Broadwell) Main Mem 64 GiB 1024 GiB 128 GiB Local Storage 2 TB HDD 7 TB HDD 480 GB SSD Interconnect InfjniBand 4x FDR InfjniBand FDR/EDR Blocking 1:1 (50%), 1:8 (50%) 1:1 PFS – HOME 427 TB Lustre PFS – Workspaces 853 TB Lustre

slide-10
SLIDE 10

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

10

bwForCluster JUSTUS

Federated HPC tjer 3 resources

Diskless SSD Big SSD Large Mem SSD Visual # nodes 202 204 22 16 2 Core/node 16 16 16 16 16 Processor 2,4 GHz (Xeon E5-2630v3, Haswell) Main Mem 128 GiB 256 GiB 512 GiB 512 GiB Local Storage

  • 1 TB SSD

2 TB SSD 4 TB HDD Interconnect InfjniBand QDR Blocking 1:8 HOME 200 TB NFS PFS – Workspaces 200 TB Lustre Block storage 480 TB (local mount via RDMA) Special feature NVIDIA K6000

Selected characteristjcs: Dedicated to computatjonal chemistry High I/O, large MEM jobs User and sofuware support by bwHPC competence center

slide-11
SLIDE 11

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

11

bwForCluster MLS&WISO

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to molecular life science, economics and social science + cluster for method development User and sofuware support by bwHPC competence center

slide-12
SLIDE 12

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

12

bwForCluster NEMO

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to neuro science, elementary partjcle physics, micro systems engineering Virtual machine images deployable User and sofuware support by bwHPC competence center

slide-13
SLIDE 13

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

13

bwForCluster BinAC

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to astrophysics, bioinformatjcs Dual GPU systems User and sofuware support by bwHPC competence center

slide-14
SLIDE 14

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

14

ForHLR I

Federated HPC tjer 2 resources Selected characteristjcs: Next level for advanced HPC users Research, high scalability

Thin Fat # nodes 512 16 Core/node 20 32 Processor 2.5 GHz (Sandy Br.) 2.6 GHz (Sandy Br.) Main Mem 64 GiB 512 GiB Local Storage 2 TB HDD 8 TB HDD Interconnect InfjniBand 4x FDR Blocking Non-blocking PFS – HOME 427 TB Lustre PFS – Workspaces PROJECT 427 TB Lustre, WORK/workspace 853 TB Lustre

slide-15
SLIDE 15

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

15

ForHLR II

Federated HPC tjer 2 resources Selected characteristjcs: Next level for advanced HPC users Research, high scalability

Thin Fat # nodes 1152 21 Core/node 20 48 Processor 2.6 GHz (Haswell) 2.1 GHz (Haswell) Main Mem 64 GiB 1024 GiB Local Storage 480 GB SSD 3840 GB SSD Interconnect InfjniBand 4x EDR Blocking Non-blocking Graphic cards 4 NVIDIA GeForce GTX980 Ti PFS – HOME 427 TB Lustre PFS – Workspaces PROJECT 610 TB Lustre, WORK 1220 TB Lustre, workspace 3050 TB Lustre

slide-16
SLIDE 16

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

16

Storage Architecture

slide-17
SLIDE 17

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

17

System and Storage Architecture (bwUniCluster)

File Systems:

Local ($TMP or $TMPDIR): each node has its own fjle system Global ($HOME, $PROJECT, $WORK, workspaces): all nodes access the same fjle system; located in parallel fjle system

slide-18
SLIDE 18

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

18

File Systems

All Clusters:

$TMP or $TMPDIR: local, fjles are removed at end of batch job, no backup $HOME: global, permanent, backup on most clusters, quota, same home directories on ForHLR I+II, bwUniCluster workspaces: global, entjre workspace expires afuer fjxed period, no backup, no quota, higher throughput HowTo: htup://www.bwhpc-c5.de/wiki/index.php/Workspace

ForHLR I+II, bwUniCluster:

$WORK: global, no backup, no quota, higher throughput, fjle lifetjme 28 days (1 week guaranteed)

ForHLR I+II:

$PROJECT: global, permanent, backup, quota use $PROJECT instead because $HOME quota for project group very small

slide-19
SLIDE 19

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

19

bwUniCluster

Federated HPC tjer 3 resources Selected characteristjcs: General purpose HPC entry level incl. educatjon Universitjes are Shareholders Federated operatjons, multjlevel fairsharing

slide-20
SLIDE 20

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

20

bwForCluster JUSTUS

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to computatjonal chemistry High I/O, large MEM jobs User and sofuware support by bwHPC competence center

slide-21
SLIDE 21

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

21

bwForCluster MLS&WISO

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to molecular life science, economics and social science + cluster for method development User and sofuware support by bwHPC competence center

slide-22
SLIDE 22

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

22

bwForCluster NEMO

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to neuro science, elementary partjcle physics, micro systems engineering Virtual machine images deployable User and sofuware support by bwHPC competence center

slide-23
SLIDE 23

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

23

bwForCluster BinAC

Federated HPC tjer 3 resources Selected characteristjcs: Dedicated to astrophysics, bioinformatjcs Dual GPU systems User and sofuware support by bwHPC competence center

slide-24
SLIDE 24

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

24

ForHLR I

Federated HPC tjer 2 resources Selected characteristjcs: Next level for advanced HPC users Research, high scalability

slide-25
SLIDE 25

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

25

ForHLR II

Federated HPC tjer 2 resources Selected characteristjcs: Next level for advanced HPC users Research, high scalability

slide-26
SLIDE 26

bwHPC: Hardware and Storage Architecture / P. Weisbrod 06/04/2017

26

Thank you for your atuentjon! Questjons?