The Beowulf Cluster at the Center for Computational Mathematics, - PowerPoint PPT Presentation

The Beowulf Cluster at the Center for Computational Mathematics, CU-Denver www-math.cudenver.edu/ccm/beowulf Jan Mandel, CCM Director Russ Boice, System Administrator Supported by National Science Foundation Grant DMS-0079719 CLUE North September 18, 2003

Overview • Why a Beowulf cluster? • Parallel programming • Some really big clusters • Design constraints and objectives • System hardware and software • System administration • Development tools • The burn-in experience • Lessons learned

Why a Beowulf Cluster? • Parallel supercomputer on the cheap • Take advantage of bulk datacenter pricing • Open source software tools available • Uniform system administration • Looks like one computer from the outside • Better than a network of workstation

Why parallel programming • Speed: Divide problem into parts that can run on different CPUs – Communication between the parts is necessary, and – the art of efficient parallel programming is to minimize the communication • Memory: On a cluster, the memory needed for the problem is distributed between the nodes • But parallel programming is hard!

Numerical parallel programming software layers Distributed High parallel object OpenMP Performance libraries Fortran (HPF) (PetSC, HYPRE,…) Shared Message passing memory libraries (hardware, (MPI, PVM) virtual) Interconnect hardware drivers (ethernet, SCI, Myrinet…)

Top 500 list • Maintained by www.top500.org • Speed measured in floating point operations per second (FLOPs) • LINPACK benchmark = solving dense square linear systems of algebraic equations by Gaussian elimination www.netlib.org • Published twice a year at the International Supercomputing Conference

Source: Jack Dongarra http://www.cs.utk.edu/~dongarra/esc.pdf

Source: www.top500.org

Design objectives and constraints • Budget $200,000, including 3 year warranty • Maximize computing power in GFLOPs • Maximize communication speed • Maximize memory per node • Run standard MPI codes • Nodes useful as computers in themselves • Use existing application software licenses • Run existing software, porting, development • Remote control of everything, including power • System administration over low bandwidth links

Basic choices • Linux, because – It is free and we have been using it on the whole network already for years – Cluster software runs on Linux – Our applications run on Linux • Thick nodes, with disks and complete Linux, because – Nodes need to be useful for classical computations – Local disks are faster than over network – Tested Scyld (global process space across the cluster), which did not work well at the time – At least we know how to make them run

Interconnects available in early 2001 • 100Mb/s ethernet: slow, high latency • 1Gb/s ethernet: expensive (fibre only), high latency • Myrinet: nominal 2Gb/s duplex, star topology, needs expensive switch • SCI (Dolphin): nominal 10Gb/s, actual 1.6Gb/s duplex, torus topology, no switch, best latency and best price per node. Also promised serial consoles and remote power cycling of individual nodes. • Dolphin and Myrinet avoid TCP/IP stack • Speed limited by the PCI bus - 64bit 66MHz required to get fast communication • Decision: SCI Dolphin Wulfkit with Scali cluster software

x86 CPUs available in early 2001 • Intel PIII: 1GHz, cca 0.8 GFLOPs – Dual CPUs = best GFLOPs/$ – 64bit 66MHz PCI bus available on server class motherboards – 1U 2CPU possible – Cheap DRAM • Intel P4 1.5GHz – SSE2 = double precision floating point vector processor – Theoretically fast, but no experience with SSE2 at the time – No 64bit 66MHz PCI bus, no dual processors – Rambus memory only, expensive • AMD Athlon – Not available with dual processors – No experience in house • Decision: Dual PIII, server class motherboard, 1U

Disks available in early 2001 • ATA100 – Internal only, 2 devices/bus, no RAID – Simple drives, less expensive • Ultra160 SCSI – Internal/external, RAID – 16bit bus, up to 160MB/s – Up to 16 devices/channel – More intelligence on drive, more expensive – Disk operation interleaving – High-end server class motherboards have SCSI • Decision: Ultra160 SCSI

Remote console management • Goal: manage the cluster from off campus • Considered KVM switches • Solutions exist to convert KVM to a graphics session, but – Required a windoze client – And lots of bandwith, even DSL may not be enough (bad experience with sluggish VNC even over 10Mb/s) – Would the client run through a firewall? • All we wanted was to convert KVM to a telnet session when the display is in green screen text mode – when we are up and run X we do not need a console … but found no such gadget on the market • Decision: console through serial port and reverse telnet via terminal servers

Purchasing • Bids at internet prices + few % for integration, delivery, installation, tech support, and 3 year warranty • Vendor acts as a single point for all warranties and tech support (usual in the cluster business) • Worked out detailed specs with vendors – DCG, became ATIPA in the process – Paralogic – Finally bought from Western Scientific

The Beowulf Cluster at CCM

Cluster hardware • 36 nodes (Master + 35 slaves) – Dual PIII-933MHz, 2GB memory – Slaves have 18GB IBM Ultrastar Ultra160 SCSI disk, floppy • Master node – mirrored 36GB IBM Ultrastar Ultra160 SCSI disk, CDROM – External enclosure 8*160GB Seagate Barracuda Ultra160 SCSI, PCI RAID card with dual SCSI channels, mirrored & striped – Dual gigabit fiber ethernet – VXA 30 AIT tape library • SCI Dolphin interconnect • Cluster infrastructure – 100Mb/s switch with gigabit fiber uplink to master – 4 APC UPS 2200 with temperature sensors and ethernet – 3 Perle IOLAN+ terminal servers for the serial consoles – 10Mb/s hub for the utility subnet (UPS, terminal servers)

Performance • CPU theoretical ~60 GFLOPs • Actual 38 GFLOPs LINPACK benchmark • Disk array: 4 striped disks@40MB/s on 160MB/s channel=160MB/s theoretical, 100MB/s actual disk array bandwidth • SCI interconnect: 10Gb/s between cards, card to node 528MB/s theoretical (PCI), 220MB/s actual bandwidth, <5µs latency

Each UPS To Four 30 SCI Supplies two Amp 115 power strips Fiber Gaga bit to Internet which then VAC Circuits Supply 36 Nodes 2 CPU 2GB RAM Each Master Nodes and other Node 1 equipment UPS Power SCI Cable Interconnect Node 10 UPS Power Node 20 UPS Power Node 30 UPS Power Node 35 RS 232 100 MB Ethernet Fiber Giga Bit Link Three 100Mb/s Terminal Switch Controllers To Internet 10 MB Hub

SCI Dolphin interconnect topology 6 x 6 2D torus 1 13 25 31 19 7 3 15 27 33 21 9 5 17 29 35 23 11 4 16 28 34 22 10 2 14 26 32 20 8 M 12 24 30 18 6

Mass storage Internal 36GB mirrored drives on 160MB/s SCSI Two 160MB/s Eight 160 GB SCSI Channels SCSI Drives (with RAID leaves Master node 698 GB actual capacity) Node 1 Node 2 Two VXA Tape Drives With 30 Tape Auto Load Library Node 3 One SCSI (Recording Rate: 4000 kB/s) (70 GB Typical Compressed Channel Capacity Per Tape) Node 35 18 GB Internal SCSI Drives

OF BIG HARD DRIVES WARNING EXPLICIT PICTURES NEXT

The master node The master

The slave nodes The slaves

The back Serial console cables Ethernet cables Keyboard, monitor, and mouse plugged into a slave node SCI Dolphin cables

The uninterruptible power supplies (UPS), utility hub, two UPS network interface boxes, and temperature sensor on top

Disk array 8*160GB Tape library An extra fan to blow in the beast’s face

The beast eats lots of power

Perfectly useless doors that came with the beast but just inhibit air flow. Note the shiny surface of the left door, holes only on the sides.

Ethernet switch Terminal servers for serial consoles

The disk array and the tape library

are a bit too long. The 10Gb/s SCI Dolphin cables

Office power strips were commandeered to distribute the load between the outlets on the uninterruptible power supplies to avoid tripping the circuits breakers.

The backplane in gory detail

Cluster software • Redhat Linux 7.2 • Scali cluster software – SCI Dolphin accessible through MPI library only – Management tools, Portable Batch System (PBS) • Portland group compilers – C, C++, Fortran 90 – High Performance Fortran (HPF) is the easiest way to program the cluster • Totalview debugger by Etnus – Switch between group of processes in an MPI job, control all processes at once – Alternative: one gdb per process in an xterm window…

Networking • All slaves and one master fiber interface run a local network with NAT • Master is the gateway, only node visible from the outside • Disk array on master shared by NFS with slaves • Utility hub (power supplies, serial consoles) is accessible from the outside • Master runs ntp server – important to keep time in sync • All other protocols pass through master to the outside • FlexLM license server for apps is outside of cluster

The Beowulf Cluster at the Center for Computational Mathematics, - PowerPoint PPT Presentation

The Beowulf Cluster at the Center for Computational Mathematics, CU-Denver www-math.cudenver.edu/ccm/beowulf Jan Mandel, CCM Director Russ Boice, System Administrator Supported by National Science Foundation Grant DMS-0079719 CLUE North

BEOWULF MINING plc 0 Disclaimer The presentation has been prepared by Beowulf Mining Plc (the

BEOWULF MINING plc Helsinki, 17 April 2018 0 Disclaimer The presentation has been prepared by

BEOWULF MINING plc Stockholm, November 2016 0 Disclaimer The presentation has been prepared by

BEOWULF MINING plc Cape Town 121, 5-6 February 2018 0 Disclaimer The presentation has been

BEOWULF MINING plc Future Mine and Mineral, 28 January 2019 0 Disclaimer The presentation has

Beowulf - part 3 05.23.13 || English 2322: British Literature: Anglo-Saxon Mid 18th Century ||

I. BASIC VERB-PLACEMENT PATTERNS IN SWISS GERMAN Matrix clauses: Verb Second (1) a. De Beowulf

Beowulf -part 1 05.23.13 || English 2322: British Literature: Anglo-Saxon Mid 18th Century ||

What is beowulf? Mythical Old-English hero who defeats Grendel, the green dragon Movie

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

USING FIELD PROGRAMMABLE GATE ARRAYS IN A BEOWULF CLUSTER Matthew J. Krzych Naval Undersea

history and drivers The Aerospace Cluster The Cluster-Association The Aerospace Cluster The

Getting started on the cluster Learning Objectives Describe the structure of a compute cluster

What is Cluster Analysis? Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan

What is Cluster Analysis? Cluster: a collection of data objects Similar to one another

Cluster Presentation Cluster Presentation EU-EECA ICT Cluster is the joint effort of three

Lexical Cohesion in Texts: Extraction Methods and Applications Beata Beigman Klebanov Eli

Physical Database Design Basic considerations: Data independence: The user should be

Opera&ng Systems ECE344 Lecture 11: File System Ding

Feign - Laboratory for I/O Research Flexible Event Imitation Engine Jakob L uttgau, Julian

Firewalls Summary ITS335: IT Security Sirindhorn International Institute of Technology

Chapter 2 Storage Disks, Buffer Manager, Files. . . Magnetic Disks Access Time Sequential vs.

Disk Storage Disk Storage Different types of disk storage: The smallest addressable unit

gir Overview John Studdard Big Couch Media Group @johnstuddard @bigcouchmedia Topics What

Sambuz

Useful Links

Newsletter

Mail Us

The Beowulf Cluster at the Center for Computational Mathematics, - PowerPoint PPT Presentation

The Beowulf Cluster at the Center for Computational Mathematics, CU-Denver www-math.cudenver.edu/ccm/beowulf Jan Mandel, CCM Director Russ Boice, System Administrator Supported by National Science Foundation Grant DMS-0079719 CLUE North

BEOWULF MINING plc 0 Disclaimer The presentation has been prepared by Beowulf Mining Plc (the

BEOWULF MINING plc Helsinki, 17 April 2018 0 Disclaimer The presentation has been prepared by

BEOWULF MINING plc Stockholm, November 2016 0 Disclaimer The presentation has been prepared by

BEOWULF MINING plc Cape Town 121, 5-6 February 2018 0 Disclaimer The presentation has been

BEOWULF MINING plc Future Mine and Mineral, 28 January 2019 0 Disclaimer The presentation has

Beowulf - part 3 05.23.13 || English 2322: British Literature: Anglo-Saxon Mid 18th Century ||

I. BASIC VERB-PLACEMENT PATTERNS IN SWISS GERMAN Matrix clauses: Verb Second (1) a. De Beowulf

Beowulf -part 1 05.23.13 || English 2322: British Literature: Anglo-Saxon Mid 18th Century ||

What is beowulf? Mythical Old-English hero who defeats Grendel, the green dragon Movie

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

USING FIELD PROGRAMMABLE GATE ARRAYS IN A BEOWULF CLUSTER Matthew J. Krzych Naval Undersea

history and drivers The Aerospace Cluster The Cluster-Association The Aerospace Cluster The

Getting started on the cluster Learning Objectives Describe the structure of a compute cluster

What is Cluster Analysis? Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan

What is Cluster Analysis? Cluster: a collection of data objects Similar to one another

Cluster Presentation Cluster Presentation EU-EECA ICT Cluster is the joint effort of three

Lexical Cohesion in Texts: Extraction Methods and Applications Beata Beigman Klebanov Eli

Physical Database Design Basic considerations: Data independence: The user should be

Opera&amp;ng Systems ECE344 Lecture 11: File System Ding

Feign - Laboratory for I/O Research Flexible Event Imitation Engine Jakob L uttgau, Julian

Firewalls Summary ITS335: IT Security Sirindhorn International Institute of Technology

Chapter 2 Storage Disks, Buffer Manager, Files. . . Magnetic Disks Access Time Sequential vs.

Disk Storage Disk Storage Different types of disk storage: The smallest addressable unit

gir Overview John Studdard Big Couch Media Group @johnstuddard @bigcouchmedia Topics What

Sambuz

Useful Links

Newsletter

Mail Us

Opera&ng Systems ECE344 Lecture 11: File System Ding