introduction to pc cluster hardware ii
play

Introduction to PC- Cluster Hardware (II) Russian-German School on - PowerPoint PPT Presentation

Introduction to PC- Cluster Hardware (II) Russian-German School on High Performance Computer Systems, June, 27 th until July, 6 th 2005, Novosibirsk 1. Day, 27 th of June, 2005 HLRS, University of Stuttgart High Performance Computing Center


  1. Introduction to PC- Cluster Hardware (II) Russian-German School on High Performance Computer Systems, June, 27 th until July, 6 th 2005, Novosibirsk 1. Day, 27 th of June, 2005 HLRS, University of Stuttgart High Performance Computing Center Stuttgart

  2. Outline • I/O – Bus – PCI – PCI-X – PCI-express • Network Interconnects – Ethernet – Myrinet – Quadrics Elan4 – Infiniband • Mass Storage – Hard disks and RAIDs • Cluster File Systems High Performance Computing Center Stuttgart

  3. High Performance Computing Center Stuttgart I/O Bus Layout Example

  4. PCI • Stands for P eriphal C omponent I nterconnect • Standard for I/O interface in PC’s since 1992 • 32 Bit wide • 33.33 MHz clock • Max. 133 MB/s throughput • Extented to 64 Bit, 266 MB/s throughput • Several adapters can share the bus (and the bandwidth) High Performance Computing Center Stuttgart

  5. PCI and PCI-X overview Width Clock Throughput Voltage PCI 32 Bit 33.33 MHz 133 MB/s 3.3 and 5 V PCI 64 Bit 33.33 MHz 266 MB/s 3.3 and 5 V PCI(-X) 66 64 Bit 66.66 MHz 533 MB/s 3.3 V PCI-X 100 64 Bit 100 MHz 800 MB/s 3.3 V PCI-X 133 64 Bit 133 MHz 1066 MB/s 3.3 V (PCI-X 266) 64 Bit 266 MHz 2133 MB/s 3.3 and 1.5V (PCI-X 533) 64 Bit 533 MHz 4266 MB/s 3.3 and 1.5V Also additional features within PCI-X and PCI-X 2.0 High Performance Computing Center Stuttgart

  6. PCI-Express (PCIe) • Formerly known as 3GIO • Bases on PCI programming concepts • Uses serial communication system – Much faster • 2.5 GBit per lane (5 and 10 GBit in future) 8B/10B encoding � max. 250 MB/s per lane • • Standard allows for 1, 2, 4, 8, 12, 16 and 32 lanes • Point-to-point connection Adapter - Chipset • Allows for 95 % of peak rate for large transfers High Performance Computing Center Stuttgart

  7. PCI-Express (II) • Performance Clock Throughput Throughput Bidir. Unidir. 1 lane 2.5 GHz 250 MB/s 500 MB/s 4 lanes 2.5 GHz 1 GB/s 2 GB/s 8 lanes 2.5 GHz 2 GB/s 4 GB/s 16 lanes 2.5 GHz 4 GB/s 8 GB/s 32 lanes 2.5 GHz 8 GB/s 16 GB/s High Performance Computing Center Stuttgart

  8. Outline • I/O – Bus – PCI – PCI-X – PCI-express • Network Interconnects – Ethernet – Myrinet – Quadrics – Infiniband • Mass Storage – Hard disks and RAIDs • Cluster File Systems High Performance Computing Center Stuttgart

  9. Ethernet • Gigabit Ethernet – Standard – Available within neraly every PC – Mostly copper – Cheap – But costs CPU performance • 10 GBit Ethernet – First Adapters available – Copper/fibre – Currently expensive – Eats up to 100% CPU – TCP offloading to decrease CPU load High Performance Computing Center Stuttgart

  10. Myrinet • Prefferd used cluster interconnect for a quite long time • Bandwidth higher than with Gigabit Ethernet • Lower Latency than Ethernet Has a processor on each adapter � overlap of computation and • communication possible • RDMA capability • Link aggrgation possible • Myrinet 10G planned • Only one supplier, Myricom High Performance Computing Center Stuttgart

  11. Quadrics Elan 4 • Interconnect used for high performance clusters • Higher bandwidth than Myrinet • Lower Latency • Has a processor on each adapter � overlap of computation and communication possible • RDMA capability • Link aggrgation possible • Quite expensive • Only one supplier, Quadrics High Performance Computing Center Stuttgart

  12. Infiniband • Specified standard protocol • Interconnect often used today for high performance clusters • Bandwith like Quadrics • Latency like Myrinet • RDMA capability • Link aggrgation possible • Same costs like Myrinet, is planned to be as cheap as GigE • Many vendors High Performance Computing Center Stuttgart

  13. Bandwidth 2000,00 1800,00 1600,00 Myrinet Throughput MB/s 1400,00 Myrinet dual 1200,00 Q Elan 4 1000,00 Elan 4 dual rail 800,00 IB PCI-X 600,00 IB PCI-Express 400,00 200,00 0,00 6 1 6 k k M 1 4 5 4 1 2 6 Message Size High Performance Computing Center Stuttgart

  14. Bandwidth Infiniband System Interface unidirectional bidirectional PCI-X 830 MB/s 900 MB/s PCI Express 930 MB/s 1800 MB/s High Performance Computing Center Stuttgart

  15. Network latency Gigabit Ethernet min. 11 us up to 40 10 G Ethernet ? Myrinet 3.5 to 6 Quadrics Elan 4 2.5 Infiniband PCI-X 4.5 Infiniband PCIe 3.5 High Performance Computing Center Stuttgart

  16. Outline • I/O – Bus – PCI – PCI-X – PCI-express • Network Interconnects – Ethernet – Myrinet – Quadrics Elan4 – Infiniband • Mass Storage – Hard disks and RAIDs • Cluster File Systems High Performance Computing Center Stuttgart

  17. Technologies to connect HDD (I) • IDE/PATA – Bus – max. 2 devices – max. 133 MB/s (ATA/133) – Typically system internal • SATA – Point-to-point – 150 MB/s (300 MB/s SATA 2.0) – Typically system internal High Performance Computing Center Stuttgart

  18. Technologies to connect HDD (II) • SCSI – Bus – max. 7/15 devices – Up to 320 MB/s Throughput per bus – System internal and external • FC (Fibre Channel) – Network (fabric) and Loop – Max 127 devices per loop – Used for storage area networks – 2 Gbit, near future 4 GBit – 8 and 10 GBit planned High Performance Computing Center Stuttgart

  19. Storage Area Network (SAN) • Fabric – HBA’s – Switches – Today typically Fibre Channel, but also IP (iSCSI) SAN High Performance Computing Center Stuttgart

  20. Storage Media • Single Harddisks • RAID Systems (Disk Arrays) – Fault tolerance • RAID 1 • RAID 3 • RAID 5 – Higher Throughput • RAID 0 • RAID 3 • RAID 5 – FC, SCSI and SATA (performance and reliability <-> costs) High Performance Computing Center Stuttgart

  21. File Systems for Cluster High Performance Computing Center Stuttgart

  22. Topologies • Roughly two classes • Shared storage class • Network Centric class – Shared nothing class High Performance Computing Center Stuttgart

  23. Shared Storage Class • Sharing physical devices (disks) – Mainly by using a Fibre Channel Network (SAN) – IP SAN with iSCSI is also possible – SRP within Infiniband – (Using a metadata server to organize disk access) SAN High Performance Computing Center Stuttgart

  24. Shared Storage Class - Implementation • Topologie (CXFS, OpenGFS, SNFS and NEC GFS) CXFS Clients CXFS Server (privat) IP Network SAN High Performance Computing Center Stuttgart

  25. Network Centric Class • The storage is in the network – On storage nodes – May be on all nodes Network High Performance Computing Center Stuttgart

  26. mgr Network High Performance Computing Center Stuttgart PVFS - topology

  27. File Systems for Clusters Distributed File System Symmetric Clustered C C C C C C e.g. NFS/CIFS File System Server is bottleneck e.g. GPFS Lock management Server Is bottleneck Parallel File System C C C SAN based C C C like PFS File Systems Asymmetric san like SANergy MD Server is bottleneck Server Component Component Metadata Server Server Server Server is bottleneck Scale limited

  28. Lustre Solution Asymmetric Cluster File System Scalable MDS handles object allocation, C C C OSTs handle block allocation OST OST MDS Cluster

  29. Necessary features of a Cluster File System • Accessability/Global Namespace • Access method • Authorization (and Accounting), Security • Maturity • Safety, Reliability, Availability • Parallelism • Scalability • Performance • Interfaces for Backup, Archiving and HSM Systems • (Costs) High Performance Computing Center Stuttgart

  30. HLRS File System Benchmark • The disk-I/O Benchmark – Allows throughput measurements for reads and writes • Arbitrary file size • Arbitrary I/O chunk size • Arbitrary number of performing processes – Allows metadata performance measurements • file creation, file status (list), file deletion rate • with an arbitrary number of processes (p-threads or MPI) High Performance Computing Center Stuttgart

  31. Measurement Method • Measurement with disk I/O – HLRS file system benchmark • Measuring of Throughput – depending on the I/O chunk size (1, 2, 4, 8, and 16 MB chunks) – for clients and server • Measuring of metadata performance – Essential for cluster file systems – Measuring of clients and servers – file creation, file status and file deletion rate – with 1, 5, 10 and 20 processes on a client – 50 files per process High Performance Computing Center Stuttgart

  32. Measurement Environment • CXFS – Server: Origin 3000, 8 procs, 6 GByte buffer cache, 2x2Gbit FC – Client: Origin 3000, 20 procs, 20 GByte buffer cache, 2x2Gbit FC – RAID: Data Direct Networks • PVFS – 4 systems IA-64, 2procs, 8 GB Memory, 36 GB local disk each symmetric setup • Lustre – 7 systems IA-32, Pentium III 1 GHz, 18 GB local disk, 1 MDS, 2 OST, 4 clients, Lustre 0.7 (Old version !!!) • Measurements have been performed 2003 High Performance Computing Center Stuttgart

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend