Introduction to High Performance Computing at ZIH Architecture of - - PowerPoint PPT Presentation

introduction to high performance computing at zih
SMART_READER_LITE
LIVE PREVIEW

Introduction to High Performance Computing at ZIH Architecture of - - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone +49 351 - 463 - 39871 Guido Juckeland


slide-1
SLIDE 1

Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone +49 351 - 463 - 39871 Guido Juckeland (guido.juckeland@tu-dresden.de)

Center for Information Services and High Performance Computing (ZIH)

Introduction to High Performance Computing at ZIH

Architecture of the PC Farm (Deimos)

slide-2
SLIDE 2

Slide 2 - Guido Juckeland

Agenda

PC Farm Components AMD Opteron Prozessors und Systems Infiniband Networks

slide-3
SLIDE 3

Slide 3 - Guido Juckeland

PC Farm Components (Deimos)

slide-4
SLIDE 4

Slide 4 - Guido Juckeland

Linux Networx PC-Farm (Deimos)

1292 AMD Opteron x85 Dual-Core CPUs (2,6 GHz) 726 Compute nodes with 2, 4 oder 8 CPU Cores Per core 2 GiByte main memory 2 Infiniband interconnects (MPI- and I/O-Fabric) 68 TByte SAN-Storage Per node 70, 150, 290 GByte scratch- disk OS: SuSE SLES 10 Batch system: LSF Compiler: Pathscale, PGI, Intel, Gnu 3rd party applications: Ansys100, CFX, Fluent, Gaussian, LS-DYNA, Matlab, MSC,…

slide-5
SLIDE 5

Slide 5 - Guido Juckeland

Deimos - Partitions

2 Master Nodes – Not accessible for users, PC-Farm management 4 Login Nodes – 4 Core Nodes – Accessible with DNS Round Robin under deimos.hrsk.tu-dresden.de Single-, Dual- und Quad-Nodes – 1, 2 or 4 CPUs – 4, 8 or 16 GiByte main memory (24 Quads with 32 GiByte) – 80, 160 or 300 GByte local disks Setup in phase 1 and phase 2 nodes – Identical hardware – Differences in the connection to the MPI- and the I/O-Fabric (later)

slide-6
SLIDE 6

Slide 6 - Guido Juckeland

AMD Opteron Processors und Systems

slide-7
SLIDE 7

Slide 7 - Guido Juckeland

AMD Opteron CPU - Design

AMD Opteron x85 (2,6 GHz) Memory controller on-chip (2 memory channels with 3.2 GiByte/s transfer bandwidth each) Each Core 64 KiByte level 1 instruciton- and data cache 1 MiByte Level 2 Cache 64 Bit extension of IA-32 x86- architecture (x86-64, x64 oder EM64T) Now also as quad core CPUs available

slide-8
SLIDE 8

Slide 8 - Guido Juckeland

AMD Opteron – Block diagram

Instr'n TLB Level 1 Instr'n Cache Fetch 2 - transit Pick Decode 1 Decode 2 Decode 1 Decode 2 Decode 1 Decode 2 Pack Pack Pack Decode Decode Decode 8-entry Scheduler 8-entry Scheduler 8-entry Scheduler ALU AGU ALU AGU ALU AGU FADD FMUL FMISC 36-entry Scheduler Data TLB Level 1 Data Cache ECC 2k Branch Targets 16k History Counter RAS & Target Address Level 2 Cache L2 ECC L2 Tags L2 Tag ECC System Request Queue (SRQ) Cross Bar (XBAR) Memory Controller & HyperTransport

TM

v

slide-9
SLIDE 9

Slide 9 - Guido Juckeland

Deimos – Layout of a single-CPU node AMD Opteron 185

Memory

(4 GiByte)

Hypertransport Peripheral devices

(Infiniband, Ethernet, Disk)

slide-10
SLIDE 10

Slide 10 - Guido Juckeland

Deimos – Layout of a dual-CPU nodes AMD Opteron 285 AMD Opteron 285

Memory

(4 GiByte)

Memory

(4 GiByte)

Hypertransport Hypertransport Peripheral devices

(Infiniband, Ethernet, Festplatte)

slide-11
SLIDE 11

Slide 11 - Guido Juckeland

Deimos - Layout of a quad-CPU Node AMD Opteron 885 AMD Opteron 885

Memory

(4 GiByte)

Memory

(4 GiByte)

Hypertransport Hypertransport Peripheral devices

(Infiniband, Ethernet, Festplatte)

AMD Opteron 885 AMD Opteron 885

Memory

(4 GiByte)

Memory

(4 GiByte)

Hypertransport Hypertransport Hypertransport

slide-12
SLIDE 12

Slide 12 - Guido Juckeland

Infiniband Networks

slide-13
SLIDE 13

Slide 13 - Guido Juckeland

Basic Layout

slide-14
SLIDE 14

Slide 14 - Guido Juckeland

More complicated structures

slide-15
SLIDE 15

Slide 15 - Guido Juckeland

Infiniband-Stack

slide-16
SLIDE 16

Slide 16 - Guido Juckeland

Consequences for the user

No standard Linux networks (eth0,...) No IP-addresses No direct traffic monitoring possible Very low MPI latency (about 5-15 μs) High MPI bandwidth (up to 900 MiByte/s) The batch system does not know about the state of the Infiniband fabric

slide-17
SLIDE 17

Slide 17 - Guido Juckeland

Deimos Infiniband-Layout (rough sketch)

Node Node Node Node Node ... Node Node Node Node Node ... MPI Netzwerk IO Netzwerk

slide-18
SLIDE 18

Slide 18 - Guido Juckeland

Deimos MPI-Fabric

+-------------------+ +--------------------+ +-------------------+ | Switch 1 | | Switch 2 | | Switch 3 | | | 30x | | 30x | | | Rack 05 |-------| Rack 20 |-------| Rack 25 | | | | | | | | all Phase1 Nodes | | Phase2 Duals+Quads | | Phase 2 Singles | +-------------------+ +--------------------+ +-------------------+

3 288-Port Voltaire ISR 9288 IB-Switches with 4x Infiniband Ports

slide-19
SLIDE 19

Slide 19 - Guido Juckeland

Deimos I/O Fabric

Tree structure with – 1 192 Port Voltaire ISR 9288 IB-Switch with 4x Infiniband Ports (Rack 07) – 36 24 Port Mellanox IB-Switch (4x) passive Voltaire Core-Switch

24 Port Mellanox 24 Port Mellanox 24 Port Mellanox 24 Port Mellanox 24 Port Mellanox 24 Port Mellanox

... ... Phase 1 Phase 2