HPC Architectures Types of resource currently in use Outline - - PowerPoint PPT Presentation

hpc architectures
SMART_READER_LITE
LIVE PREVIEW

HPC Architectures Types of resource currently in use Outline - - PowerPoint PPT Presentation

HPC Architectures Types of resource currently in use Outline Shared memory architectures Distributed memory architectures Distributed memory with shared-memory nodes Accelerators What is the difference between different Tiers?


slide-1
SLIDE 1

HPC Architectures

Types of resource currently in use

slide-2
SLIDE 2

Outline

  • Shared memory architectures
  • Distributed memory architectures
  • Distributed memory with shared-memory nodes
  • Accelerators
  • What is the difference between different Tiers?
  • Interconnect
  • Software
  • Job-size bias (capability)
slide-3
SLIDE 3

Shared memory architectures

Simplest to use, hardest to build

slide-4
SLIDE 4

Symmetric Multi-Processing Architectures

  • All cores have the same access to memory
slide-5
SLIDE 5

Non-Uniform Memory Access Architectures

  • Cores have faster/wider access to local memory
slide-6
SLIDE 6

Shared-memory architectures

  • Most computers are now shared memory machines due to

multicore

  • Some true SMP architectures…
  • e.g. BlueGene/Q nodes
  • …but most are NUMA
  • Program NUMA as if they are SMP – details are hidden from the

user.

  • Difficult to build shared-memory systems with large core

numbers (> 1024 cores)

  • Expensive and power hungry
  • Some systems manage by using software to provide shared-

memory capability

slide-7
SLIDE 7

Distributed memory architectures

Clusters and interconnects

slide-8
SLIDE 8

Distributed-Memory Architectures

slide-9
SLIDE 9

Distributed-memory architectures

  • Each self-contained part is called a node.
  • Almost all HPC machines are distributed memory in some

way

  • Although they all tend to be shared-memory within a node.
  • The performance of parallel programs often depends on

the interconnect performance

  • Although once it is of a certain (high) quality, applications usually

reveal themselves to be CPU, memory or IO bound

  • Low quality interconnects (e.g. 10Mb/s – 1Gb/s Ethernet) do not

usually provide the performance required

  • Specialist interconnects are required to produce the largest
  • supercomputers. e.g. Cray Aries, IBM BlueGene/Q
  • Infiniband is dominant on smaller systems.
slide-10
SLIDE 10

Distributed/shared memory hybrids

Almost everything now falls into this class

slide-11
SLIDE 11

Hybrid Architectures

slide-12
SLIDE 12

Hybrid architectures

  • Almost all HPC machines fall in this class
  • Most applications use a message-passing (MPI) model for

programming

  • Usually use a single process per core
  • Increased use of hybrid message-passing + shared

memory (MPI+OpenMP) programming

  • Usually use 1 or more processes per NUMA region and then the

appropriate number of shared-memory threads to occupy all the cores

  • Placement of processes and threads can become

complicated on these machines

slide-13
SLIDE 13

Example: ARCHER

  • ARCHER has two 12-way multicore processors per node
  • Each 12-way processor is made up of two 6-core dies
  • Each node is a 24-core, shared-memory, NUMA machine
slide-14
SLIDE 14

Accelerators

How are they incorporated?

slide-15
SLIDE 15

Including accelerators

  • Accelerators are usually incorporated into HPC machines

using the hybrid architecture model

  • A number of accelerators per node
  • Nodes connected using interconnects
  • Communication from accelerator to accelerator depends
  • n the hardware:
  • NVIDIA GPU support direct communication
  • AMD GPU have to communicate via CPU memory
  • Intel Xeon Phi communication via CPU memory
  • Communicating via CPU memory involves lots of extra copy
  • perations and is usually very slow
slide-16
SLIDE 16

Comparison of types

What is the difference between different tiers?

slide-17
SLIDE 17

HPC Facility Tiers

  • HPC facilities are often spoken about as belonging to

Tiers Tier ¡0 ¡– ¡Pan-­‑na,onal ¡Facili,es ¡ Tier ¡1 ¡– ¡Na,onal ¡Facili,es ¡ Tier ¡2 ¡– ¡Regional ¡Facili,es ¡ Tier ¡3 ¡– ¡Ins,tu,onal ¡Facili,es ¡

slide-18
SLIDE 18

Summary

  • Vast majority of HPC machines are shared-memory nodes

linked by an interconnect.

  • Hybrid HPC architectures – combination of shared and distributed

memory

  • Most are programmed using a pure MPI model (more later
  • n MPI).
  • Does not really reflect the hardware layout
  • Shared HPC machines span a wide range of sizes:
  • From Tier 0 – Multi-petaflops (1 million cores)
  • To workstations with multiple CPUs (+ Accelerators)