introduction to high performance computing at zih
play

Introduction to High Performance Computing at ZIH Architecture of - PowerPoint PPT Presentation

Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone +49 351 - 463 - 39871 Guido Juckeland


  1. Center for Information Services and High Performance Computing (ZIH) Introduction to High Performance Computing at ZIH Architecture of the PC Farm (Deimos) Zellescher Weg 12 Trefftz-Bau/HRSK 151 Phone +49 351 - 463 - 39871 Guido Juckeland (guido.juckeland@tu-dresden.de)

  2. Agenda PC Farm Components AMD Opteron Prozessors und Systems Infiniband Networks Slide 2 - Guido Juckeland

  3. PC Farm Components (Deimos) Slide 3 - Guido Juckeland

  4. Linux Networx PC-Farm (Deimos) 1292 AMD Opteron x85 Dual-Core CPUs (2,6 GHz) 726 Compute nodes with 2, 4 oder 8 CPU Cores Per core 2 GiByte main memory 2 Infiniband interconnects (MPI- and I/O-Fabric) 68 TByte SAN-Storage Per node 70, 150, 290 GByte scratch- disk OS: SuSE SLES 10 Batch system: LSF Compiler: Pathscale, PGI, Intel, Gnu 3rd party applications: Ansys100, CFX, Fluent, Gaussian, LS-DYNA, Matlab, MSC,… Slide 4 - Guido Juckeland

  5. Deimos - Partitions 2 Master Nodes – Not accessible for users, PC-Farm management 4 Login Nodes – 4 Core Nodes – Accessible with DNS Round Robin under deimos.hrsk.tu-dresden.de Single-, Dual- und Quad-Nodes – 1, 2 or 4 CPUs – 4, 8 or 16 GiByte main memory (24 Quads with 32 GiByte) – 80, 160 or 300 GByte local disks Setup in phase 1 and phase 2 nodes – Identical hardware – Differences in the connection to the MPI- and the I/O-Fabric (later) Slide 5 - Guido Juckeland

  6. AMD Opteron Processors und Systems Slide 6 - Guido Juckeland

  7. AMD Opteron CPU - Design AMD Opteron x85 (2,6 GHz) Memory controller on-chip (2 memory channels with 3.2 GiByte/s transfer bandwidth each) Each Core 64 KiByte level 1 instruciton- and data cache 1 MiByte Level 2 Cache 64 Bit extension of IA-32 x86- architecture (x86-64, x64 oder EM64T) Now also as quad core CPUs available Slide 7 - Guido Juckeland

  8. AMD Opteron – Block diagram Instr'n 2k Level 1 Instr'n Cache TLB Branch Targets Fetch 2 - transit 16k History Level 2 Pick Counter Cache RAS & Decode 1 Decode 1 Decode 1 Target Address Decode 2 Decode 2 Decode 2 v Pack Pack Pack L2 ECC L2 Tags Decode Decode Decode L2 Tag ECC System Request 8-entry 8-entry 8-entry 36-entry Queue (SRQ) Scheduler Scheduler Scheduler Scheduler Cross Bar (XBAR) ALU AGU ALU AGU ALU AGU FADD FMUL FMISC Memory Controller & Data TM HyperTransport Level 1 Data Cache ECC TLB Slide 8 - Guido Juckeland

  9. Deimos – Layout of a single-CPU node AMD Memory (4 GiByte) Opteron 185 Hypertransport Peripheral devices (Infiniband, Ethernet, Disk) Slide 9 - Guido Juckeland

  10. Deimos – Layout of a dual-CPU nodes AMD AMD Memory Memory (4 GiByte) (4 GiByte) Opteron Opteron 285 285 Hypertransport Hypertransport Peripheral devices (Infiniband, Ethernet, Festplatte) Slide 10 - Guido Juckeland

  11. Deimos - Layout of a quad-CPU Node AMD AMD Memory Memory (4 GiByte) (4 GiByte) Opteron Opteron 885 885 Hypertransport Hypertransport Hypertransport AMD AMD Memory Memory (4 GiByte) (4 GiByte) Opteron Opteron 885 885 Hypertransport Hypertransport Peripheral devices (Infiniband, Ethernet, Festplatte) Slide 11 - Guido Juckeland

  12. Infiniband Networks Slide 12 - Guido Juckeland

  13. Basic Layout Slide 13 - Guido Juckeland

  14. More complicated structures Slide 14 - Guido Juckeland

  15. Infiniband-Stack Slide 15 - Guido Juckeland

  16. Consequences for the user No standard Linux networks (eth0,...) No IP-addresses No direct traffic monitoring possible Very low MPI latency (about 5-15 μ s) High MPI bandwidth (up to 900 MiByte/s) The batch system does not know about the state of the Infiniband fabric Slide 16 - Guido Juckeland

  17. Deimos Infiniband-Layout (rough sketch) Node Node MPI Netzwerk Node Node Node Node Node Node IO Netzwerk ... ... Node Node Slide 17 - Guido Juckeland

  18. Deimos MPI-Fabric 3 288-Port Voltaire ISR 9288 IB-Switches with 4x Infiniband Ports +-------------------+ +--------------------+ +-------------------+ | Switch 1 | | Switch 2 | | Switch 3 | | | 30x | | 30x | | | Rack 05 |-------| Rack 20 |-------| Rack 25 | | | | | | | | all Phase1 Nodes | | Phase2 Duals+Quads | | Phase 2 Singles | +-------------------+ +--------------------+ +-------------------+ Slide 18 - Guido Juckeland

  19. Deimos I/O Fabric Tree structure with – 1 192 Port Voltaire ISR 9288 IB-Switch with 4x Infiniband Ports (Rack 07) – 36 24 Port Mellanox IB-Switch (4x) passive 24 Port Mellanox 24 Port Mellanox 24 Port Mellanox 24 Port Mellanox Voltaire ... ... Core-Switch 24 Port Mellanox 24 Port Mellanox Phase 2 Phase 1 Slide 19 - Guido Juckeland

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend