Bright Cluster Manager Advanced HPC cluster management made easy - - PowerPoint PPT Presentation

bright cluster manager
SMART_READER_LITE
LIVE PREVIEW

Bright Cluster Manager Advanced HPC cluster management made easy - - PowerPoint PPT Presentation

Bright Cluster Manager Advanced HPC cluster management made easy Martijn de Vries CTO Bright Computing About Bright Computing Bright Computing 1. Develops and supports Bright Cluster Manager for HPC systems and server farms 2. Incorporated


slide-1
SLIDE 1

Bright Cluster Manager

Advanced HPC cluster management made easy

Martijn de Vries

CTO Bright Computing

slide-2
SLIDE 2

2

About Bright Computing

Bright Computing

  • 1. Develops and supports Bright Cluster Manager for

HPC systems and server farms

  • 2. Incorporated in USA (HQ in San Jose, California)
  • 3. Development office in Amsterdam, NL
  • 4. Backed by ING Bank as shareholder and investor
  • 5. Sells through a rapidly growing network of resellers

and OEMs world-wide

  • 6. Customers and resellers in US, Canada, Brazil, Europe,

Middle-East, India, Singapore, Japan

  • 7. Installations in Academia, Government, Industry,

ranging from 4 node to TOP500 systems

slide-3
SLIDE 3

3

Customers

I ndustry Governm ent Academ ia

slide-4
SLIDE 4

4

The Commonly Used “Toolkit” Approach

  • Most HPC cluster management solutions use the “toolkit”

approach (Linux distro + tools)

  • Examples: Rocks, PCM, OSCAR, UniCluster, CMU, bullx, etc.
  • Tools typically used: Ganglia, Cacti, Nagios, Cfengine, System Imager,

Puppet, Cobbler, Hobbit, Big Brother, Zabbix, Groundwork, etc.

  • Issues with the “toolkit” approach:
  • Tools rarely designed to work together
  • Tools rarely designed for HPC
  • Tools rarely designed to scale
  • Each tool has its own command line interface and GUI
  • Each tool has its own daemon and database
  • Roadmap dependent on developers of the tools
  • Making a collection of unrelated tools work together
  • Requires a lot of expertise and scripting
  • Rarely leads to a really easy-to-use and scalable solution
slide-5
SLIDE 5

5

About Bright Cluster Manager

  • Bright Cluster Manager takes a much more fundamental

& integrated approach

  • Designed and written from the ground up
  • Single cluster management daemon provides all functionality
  • Single, central database for configuration and monitoring data
  • Single CLI and GUI for ALL cluster management functionality
  • Which makes Bright Cluster Manager …
  • Extremely easy to use
  • Extremely scalable
  • Secure & reliable
  • Complete
  • Flexible
slide-6
SLIDE 6

6

CMDaemon

Architecture

slide-7
SLIDE 7

7

Bright Cluster Manager — Elements

Cluster Management Shell Cluster Management GUI SSL / SOAP / X509 / IPtables Cluster Management Daemon Disk Ethernet Interconnect IPMI / iLO PDU CPU GPU Memory PBS Pro Torque Maui/MOAB Grid Engine SLURM LSF* Monitoring Automation Health Checks Management Compilers Libraries Debuggers Profilers Provisioning SLES / RHEL / CentOS / SL / Oracle EL SLES / RHEL / CentOS / SL / Oracle EL ScaleMP vSMP

slide-8
SLIDE 8

8

HPC User Environment

  • Let users focus on performing computations
  • Rich collection of HPC software
  • Compilers (GNU, Intel*, Portland*, Open64, etc.)
  • Parallel middleware (MPI libraries, threading libraries, OpenMP,

Global Arrays, etc.)

  • Mathematical libraries (ACML, MKL*, LAPACK, BLAS, etc.)
  • Development tools (debuggers, profilers, etc.)
  • Environment modules
  • Intel Cluster Ready Compliant

Compliant applications run out of the box

slide-9
SLIDE 9

9

Management Interface

Graphical User Interface (GUI)

  • Offers administrator full cluster control
  • Standalone desktop application
  • Manages multiple clusters simultaneously
  • Runs on Linux, Windows, MacOS X*
  • Built on top of Mozilla XUL engine

Cluster Management Shell (CMSH)

  • All GUI functionality also available through

Cluster Management Shell

  • Interactive and scriptable in batch mode
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12

12

Cluster Management Shell (CMSH)

Features:

  • Modular interface
  • Command completion using tab key
  • Command line history
  • Output redirection to file or shell command
  • Scriptable in batch mode
  • Support for looping over objects

Example

[demo]% device [demo->device]% status demo ................ [ UP ] node001 ............. [ UP ] node002 ............. [ UP ]

slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

17

Node Provisioning

Image based

  • Slave node image is a directory on the head node
  • Unlimited number of images can be created
  • Software changes for the slave nodes are made inside the

image(s) on the head node

  • Provisioning system ensures that changes are propagated to the

slave nodes

Nodes always boot over the network

  • Slave nodes PXE boot into Node Installer, which
  • Identifies node (switch port or MAC based)
  • Configures BMC
  • Partition disks (if any) and creates file systems
  • Installs or updates software image
  • Pivot the root from NFS to the local file system
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

20

Architecture — Monitoring

CMDaemon

BMC BMC BMC

slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25

25

Bright Cluster Manager for GPGPU

slide-26
SLIDE 26

26

GPU Development Environment

  • CUDA & OpenCL redistribution rights
  • Current and previous versions of CUDA & OpenCL
  • Easy switching between CUDA & OpenCL versions
  • CUDA driver automatically compiled at boot time
  • Support for new Fermi architecture
  • Native 64-bit GPU support
  • Multiple copy engine support
  • ECC reporting
  • Concurrent kernel execution
  • Fermi HW debugging support in cuda-gdb
slide-27
SLIDE 27

27

GPU Monitoring

slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

31

Cluster Health Checking

  • Goal: provide problem free environment for running jobs
  • Hardware & software health
  • Three types of health check
  • Health checks before jobs are run

– Halt workload manager few (milli)seconds before job is executed – Check health of each reserved node – If unhealthy, take off line, inform system administrator – Hand job back to workload manager

  • Frequently scheduled health checks

– Run health check when node is not used – Run health check through queuing system

  • Hardware burn-in environment

– Most thorough health check – Requires reboot

  • All types are extensible
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36

36

Scalability

Cluster Management software should not be limiting factor for cluster size. Philosophy used for Bright Cluster Manager:

  • All tasks performed by master node should be off-

loadable to dedicated nodes.

  • If master node can not handle a task as a result of

cluster size, task can be placed on 1 or more dedicated nodes.

  • For example: multiple dedicated load-balanced

provisioning nodes may be assigned in a cluster.

slide-37
SLIDE 37

37

Image Based Provisioning

  • Software image (or “image”) is directory on head node
  • Image contains full Linux file-tree (/bin, /usr, …)
  • Software is not installed on nodes directly, but rather

to image

  • After image has been changed, changes can be

propagated to the compute nodes

  • Propagating image changes to nodes can be done in

two ways:

1. Rebooting nodes 2. Using device imageupdate in CMSH, or “Update Node” in GUI

  • Latter allows nodes to be updated without reboot
  • Some changes do require reboot (e.g. kernel update)
slide-38
SLIDE 38

38

Provisioning Process

  • Node Installer submits provisioning request to head

node

  • Head node will queue request until a provisioning slot

becomes available on one of the provisioning nodes (possibly just head node itself)

  • Provisioning node will connect to compute node to

provision software image to local file system

  • Two install modes:
  • FULL: Re-partition hard drives, transfer image from scratch
  • SYNC: Only transfer differences between image and local disk
  • Default install mode is SYNC
  • Disk setup mismatch triggers FULL install mode
slide-39
SLIDE 39
slide-40
SLIDE 40

40

Changing Software Images

  • Installing/updating RPMs

rpm --root=/cm/images/default-image –i myapp.rpm yum --installroot=/cm/images/default-image install myapp yum --installroot=/cm/images/default-image update

  • Installing software from source

make DESTDIR=/cm/images/default-image install Note that not all Makefiles support $DESTDIR Usage example from Makefile: install -m644 file-example $(DESTDIR)/etc/file

  • Making changes manually
  • chroot /cm/images/default-image

cd /usr/src/myapp; make install

  • emacs /cm/images/default-image/etc/file
slide-41
SLIDE 41

41

Cloud Bursting (in development)

  • Allow clusters to be extended with cloud resources
  • Cluster can grow or shrink based on workload and

policies

  • Integrated interface to public cloud providers
  • Unsolved problem: how to deal with local storage?
slide-42
SLIDE 42

42

Looking for challenging and exciting jobs in HPC?

www.brightcomputing.com www.clustervision.com