XT9? XT9? Integrating Integrating and and Operating Operating a - - PowerPoint PPT Presentation

xt9 xt9 integrating integrating and and operating
SMART_READER_LITE
LIVE PREVIEW

XT9? XT9? Integrating Integrating and and Operating Operating a - - PowerPoint PPT Presentation

XT9? XT9? Integrating Integrating and and Operating Operating a a Conjoined XT4 Conjoined X T4+XT5 +XT5 Sy System stem presented by Don Maxwell HPC Systems ORNL Managed by UT-Battelle for the Department of Energy What is a


slide-1
SLIDE 1

Managed by UT-Battelle for the Department of Energy

XT9? XT9? Integrating Integrating and and Operating Operating a a Conjoined X Conjoined XT4 T4+XT5 +XT5 Sy System stem

presented by Don Maxwell HPC Systems ORNL

slide-2
SLIDE 2

2 Managed by UT-Battelle for the Department of Energy

What is a Conjoined XT4+XT5?

Jaguar XT4 Jaguar XT5

MOAB on Cray XT

slide-3
SLIDE 3

3 Managed by UT-Battelle for the Department of Energy

Wha What is t is a Con a Conjoined joined XT4 XT4+XT5? +XT5?

Jaguar XT5 Jaguar XT4 Cabinets 200 84 Processors AMD Opteron 2.3 GHz quad-core AMD Opteron 2.1 GHz quad-core Compute Cores 149,504 31,328 Memory (TB) 300 62 Links 115,200 48,384 Theoretical Peak Performance (TFLOPS/s) 1,375 263 I/O Capacity (TB) 4,100* 700 I/O Bandwidth (GB/s) 100* 40 Service Nodes 256 116

* The current filesystem on Jaguar XT5 is an Infiniband direct-attached configuration using roughly half of the available storage capacity available. The other half is being used for development of a Lustre routed filesystem called Spider. The two halves will be merged into a Spider configuration which will be mounted center wide during the next few months.

MOAB on Cray XT

slide-4
SLIDE 4

4 Managed by UT-Battelle for the Department of Energy

Wha What is t is a Conjoine a Conjoined d XT4+XT5 XT4+XT5? Combining two resources into one SION External Logins Need a platform for access to both machines

MOAB on Cray XT

Jaguar XT4 Jaguar XT5 Cisco IB Core 1 Cisco IB Core 2 Cisco IB Aggregation Cisco IB 2nd Floor External Logins

slide-5
SLIDE 5

5 Managed by UT-Battelle for the Department of Energy

Rou Routing ting XT Comp XT Comput utes es

XT Compute Node Routes

192 IB nodes XT5 48 IB nodes XT4 IB Router <-> IB Router Selection based on IB switch Compute node router selection based on distance

MOAB on Cray XT

SION Jaguar XT4 Computes Cisco IB Core 1 Cisco IB Core 2 XT4 Service Nodes XT5 Service Nodes Jaguar XT5 Computes

slide-6
SLIDE 6

6 Managed by UT-Battelle for the Department of Energy

External Login Nodes

 Motivation

– Single platform for accessing both XTs – To provide a much more capable platform for software development than the current service nodes directly attached to the XTs

 Prototype Hardware

– Quad socket AMD Opteron 2.0 GHz quad-core – 32 GB memory – SLES 10.2 – Autoyast – Cfengine – Conserver

MOAB on Cray XT

slide-7
SLIDE 7

7 Managed by UT-Battelle for the Department of Energy

External Login Nodes

 XT Software

– Batch Systems – Filesystems – Cray XT Stack

MOAB on Cray XT

slide-8
SLIDE 8

8 Managed by UT-Battelle for the Department of Energy

Batch

 Moab/TORQUE

– History dating back to 2005 – First port to XT platform on ORNL development system – Requirements discussion in December for conjoined project

 Two potential development paths

– Modify existing XT native resource manager – Use grid model

 Modifying existing RM seemed to be the easiest path

MOAB on Cray XT

slide-9
SLIDE 9

9 Managed by UT-Battelle for the Department of Energy

Moab features support NCCS mission

 Job templates to categorize job sizes – Large jobs favored to support capability mission – DOE metrics requirement for Capability Usage  In the first year following general availability of a new or upgraded system, 35% of the CPU time used on the system will be accumulated by jobs using 20% or more of the available processors  In subsequent years, 30% of the CPU time used on the system will be accumulated by jobs using 30% or more of the available processors  Supported through use of Moab job templates/fairshare/priorities  Identity manager to import project priorities – RATS maintains project information – Priorities changed dynamically via import from ASCII file  Size 0 jobs eliminate need for user cron jobs – Cron can causes issues with filesystem unmounts  Batch control more desirable – Accounting method same as traditional batch jobs  LENS Visualization cluster job pre-emption – 32 nodes with each node containing four quad-core 2.3 GHz AMD Opteron processors with 64 GB of memory, and 2 NVIDIA 8800 GTX GPUs – Computational jobs allowed unless an analysis job appears

MOAB on Cray XT

slide-10
SLIDE 10

10 Managed by UT-Battelle for the Department of Energy

Batch Batch

What’s the model? ALPS only has knowledge of one XT/domain Passwordless ssh using sudo for communication External Moab allows each XT to

  • perate

independently

MOAB on Cray XT

Jaguar XT4 Jaguar XT5 External Logins ALPS TORQUE Moab External Server ALPS TORQUE

slide-11
SLIDE 11

11 Managed by UT-Battelle for the Department of Energy

Batch

 Features

– Target a particular resource

 qsub  msub -l partition=(xt4|xt5)

– No specific resource

 msub  Load balancer

– Simple algorithm based purely on availability of resources at the time of job launch – Open to more sophisticated algorithm  Delay choice until runtime  Queue depth  Historical utilization

– Restrict each partition based on user – Direct jobs based on size using job templates

MOAB on Cray XT

slide-12
SLIDE 12

12 Managed by UT-Battelle for the Department of Energy

Filesystems

 Production

– 3 Fibre-channel Lustre filesystems on XT4

 150TB spans first half of DDN 9550s  150TB spans second half of DDN 9550s  300TB spans all DDN 9550s

– 1 Infiniband direct-attached 4.5PB Lustre filesystem on XT5

 How do I mount these filesystems on external login nodes? Answer: Not easily

MOAB on Cray XT

slide-13
SLIDE 13

13 Managed by UT-Battelle for the Department of Energy

Filesystems

 Method – LNET routing via SION  Advantages – Users have same filesystems available to them on external login nodes  However… – Using XTs as Lustre file servers is a bad idea  Hangs for users accessing filesystems – Users have to compile for multiple filesystems if allowing the system to choose the partition  LMON – Script to monitor health of filesystems – Lctl ping mds to detect state – umount problems  /etc/mtab locking issues

MOAB on Cray XT

slide-14
SLIDE 14

14 Managed by UT-Battelle for the Department of Energy

MOAB on Cray XT

Jaguar XT4 Jaguar XT5 SION External Logins

slide-15
SLIDE 15

15 Managed by UT-Battelle for the Department of Energy

Cray XT software

 Same versions of XT software must be available on external logins  Method – xt-rpm utility  External NFS Sharedroot for Cray XT software  /opt/xt* links back to External NFS Sharedroot  Separate RPM database  Default programming environment for both XTs same – Software packages per machine can vary

MOAB on Cray XT

slide-16
SLIDE 16

16 Managed by UT-Battelle for the Department of Energy

XT Modules

 Module named XT4 or XT5 will be loaded as a key to determine which machine is being addressed  XT-specific commands such as apstat, xtnodestats, etc. will be wrapped based on XT module  Lustre scratch directory /tmp/work/$USER changes based on XT module  Provides TORQUE environment

MOAB on Cray XT

slide-17
SLIDE 17

17 Managed by UT-Battelle for the Department of Energy

Status

 Prototype up and working – External login node up with SLES 10.2 – Using XT5 TDS/XT4 TDS for XTs – Cray software installed and communication working with XTs using XT[45] modules – Local Lustre filesystems from each XT mounted – Single scheduler running on external server  4 External Logins in testing for Jaguar with SLES 10.2 – Local Lustre filesystems from XT4/XT5 mounted – LMON hardening – Moab policy review for final configuration underway

MOAB on Cray XT

slide-18
SLIDE 18

18 Managed by UT-Battelle for the Department of Energy

XT9?

 Futures

– Filesystems

 Spiders everywhere

– More sophisticated Moab load-balancing algorithm – Moab priorities based on fairshare force Grid model? – Cray software is multi-XT aware – Spanning machines

 Moab can span partitions using a QOS with SPAN feature  Requires OpenMPI or another MPI derivate

MOAB on Cray XT

slide-19
SLIDE 19

19 Managed by UT-Battelle for the Department of Energy

Questions?

MOAB on Cray XT