IRIX Resource Management Plans & Status Dan Higgins - - PowerPoint PPT Presentation

irix resource management plans status
SMART_READER_LITE
LIVE PREVIEW

IRIX Resource Management Plans & Status Dan Higgins - - PowerPoint PPT Presentation

IRIX Resource Management Plans & Status Dan Higgins djh@sgi.com Engineering Manager Resource Management Team SGI 41st Cray User Group Conference Minneapolis, Minnesota IRIX Resource Management Overview IRIX Job Limits IRIX


slide-1
SLIDE 1

IRIX Resource Management Plans & Status

Dan Higgins

djh@sgi.com

Engineering Manager Resource Management Team

SGI

41st Cray User Group Conference Minneapolis, Minnesota

slide-2
SLIDE 2

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 2

IRIX Resource Management

Overview

¥ IRIX Job Limits ¥ IRIX Comprehensive System Accounting (CSA) ¥ IRIX Scheduling

Ð Share II Fair Share Scheduler Ð Miser Ð eXtensible Resource Scheduler (XRS)

¥ Workload management

Ð LSF Integration Ð NQE

slide-3
SLIDE 3

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 3

IRIX Job Limits

What is it?

¥ Job Concept ¥ Limit Domains ¥ Supported Limits

slide-4
SLIDE 4

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 4

IRIX Job Concept

Every connection to the machine starts a ÒjobÓ

Job Batch submit telnet rsh proc p2 p3 Job proc Job proc

slide-5
SLIDE 5

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 5

Limit Domains

¥ Allows administrators & vendors to set limits on a per-user basis ¥ Extendable domains - batch, interactive, ++

¥ Limits set when a job is initiated

slide-6
SLIDE 6

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 6

Supported Limits for Jobs

¥ Extends current IRIX process limits across all processes within a job ¥ A couple new job-only limits to limit number of processes and tapes (enforceable by TMF) per job ¥ Used via new setjlimit(2) & getjlimit(2) calls ¥ jlimit command displays or alters job limits ¥ Ps command modified to show job ids ¥ Job ids are unique in a cluster

slide-7
SLIDE 7

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 7

IRIX Job Limits

Status

¥ Requirements, User Interface and Design documents are complete ¥ Much of the IRIX kernel changes are complete ¥ Beta testing in September at Boeing ¥ Generally availability with IRIX 6.5.7 in Q1CY00 ¥ Integrating IRIX Job Limits with LSF

slide-8
SLIDE 8

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 8

IRIX Comprehensive System Accounting (CSA)

An alternative accounting package for customers that demand more detail

¥ Use Cray accounting functionality with IRIX terminology ¥ Standard UNIX V accounting and IRIX extended accounting still supported and coexist ¥ Published API for vendor integration

slide-9
SLIDE 9

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 9

IRIX CSA Features

Phase 1

¥ Per-job accounting ¥ User job accounting (ja command) ¥ Daemon accounting ¥ Flexible accounting periods ¥ Flexible system billing units (SBUs) ¥ +++

slide-10
SLIDE 10

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 10

IRIX Comprehensive System Accounting (CSA)

Status

¥ Requirements and Design documents complete ¥ Significant amount of coding for IRIX kernel changes already complete ¥ Beta testing in December at Boeing ¥ General availability with IRIX 6.5.8 Q2CY00

¥ Integrating IRIX CSA with LSF

slide-11
SLIDE 11

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 11

IRIX CSA Futures

Features for consideration (post phase 1)

¥ Support for specific hardware capabilities:

Ð Multi-tasking records Ð MPP records for MPI jobs

¥ Incremental accounting for long running jobs ¥ Accounting by Array Session Handle (ASH) ¥ API for reading the accounting records

slide-12
SLIDE 12

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 12

IRIX Scheduling

Overview

¥ Share II ¥ Miser ¥ eXtensible resource scheduler (XRS)

slide-13
SLIDE 13

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 13

Share II Resource Manager

ÒFair shareÓ scheduling

¥ Users and/or Groups can be guaranteed a certain percentage of the machine ¥ Uses group dynamics to keep overall usage fair ¥ Often used when multiple groups share machine ¥ Currently single system only ¥ Available for IRIX 6.5

slide-14
SLIDE 14

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 14

Share II Resource Manager

Single system Origin

Dan - 100 Physics Chemistry 100 shares 100 shares 100 shares Math Marlys - 20 Sam - 35 Tina - 35 Todd - 30 Tom- 70

slide-15
SLIDE 15

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 15

Miser

Overview

¥ Deterministic batch scheduler for applications with known time and space requirements ¥ Generally Available since IRIX 6.5 ¥ DidnÕt quite meet userÕs functional expectations ¥ Had some stability issues

slide-16
SLIDE 16

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 16

Miser

Many improvements

¥ Improved Repeatability ¥ Many Miser related panics fixed ¥ Added repack policy (backfill) ¥ Increased performance & CPU utilization ¥ Miser_cpuset job tracking problem ¥ Miser_cpuset recovery mechanism ¥ Additional information in command output ¥ Better documentation

slide-17
SLIDE 17

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 17

Miser

Plans

¥ Evaluating Integration of miser QÕs & miser_cpusets ¥ Integrating Miser & miser_cpusets with LSF 4.0 (Available Q4CY99) ¥ Fix critical customer issues ¥ Add new functionality into XRS

slide-18
SLIDE 18

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 18

IRIX eXtensible Resource Scheduler (XRS)

Next Generation Resource Scheduler

¥ Manages the allocation of resources for jobs

Ð Guaranteed resource reservations

¥ Flexible resource reservation framework

Ð Customer extensible to meet unique scheduling requirements Ð User specific placements

¥ Published API

slide-19
SLIDE 19

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 19

IRIX Extensible Resource Scheduler (XRS)

XRS Scheduling Domains

Scheduling Domains XRS - xrsd/OS TimeShare: OS Batch submission user Interactive user LSF, etc xrsd XRS client

slide-20
SLIDE 20

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 20

IRIX Extensible Resource Scheduler (XRS)

Scheduling Partitions

¥ The XRS scheduling domain can be organized into various scheduling partitions ¥ A scheduling partition is a collection of resources and the scheduling policy that manages those resources

slide-21
SLIDE 21

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 21

IRIX Extensible Resource Scheduler (XRS)

Resources to be managed initially:

¥ CPU - speed, cache size and speed, local memory size, neighbor cpus ¥ Memory - allocations managed per-node, cross referenced against resident cpus ¥ Topology - user can provide dplace-compliant placement file

slide-22
SLIDE 22

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 22

XRS Scheduling Policies

¥ Predictive

Ð predictive completion times, no preemption

¥ Availability

Ð like predictive with repack if jobs complete early

¥ Priority

Ð like availability with priority scheme and re-ordering

¥ Shared

Ð allows over-subscription of renewable resources

¥ Preemptive

Ð user may preempt running job. Running job is suspended , or

  • checkpointed. Supplementary to all but Predictive.
slide-23
SLIDE 23

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 23

IRIX Extensible Resource Scheduler (XRS)

Status

¥ Requirements and Concept documents are complete ¥ Research, prototyping, and design in progress ¥ Beta testing in Q2CY00 at Boeing ¥ General availability planned for IRIX 6.5.9 (Q3CY00) ¥ Integrating IRIX XRS with LSF.

slide-24
SLIDE 24

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 24

Workload Management

Partnership with platform computing

¥ LSF 3.2 for IRIX, UNICOS & UNICOS/mk available now ¥ LSF will support SNx & SVx ¥ MPT supported with LSF Parallel available now ¥ NQE features in LSF 4.0 available in Q4CY99: Ð File Transfer Agent (FTA) Ð Improved output file handling Ð UNICOS accounting support Ð Job-based limits for major resources ¥ Integrating IRIX job limits, CSA, Miser, and XRS with LSF

slide-25
SLIDE 25

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 25

Workload Management

Network queuing environment (NQE)

¥ NQE feature development is complete for SGI platforms with NQE 3.3 ¥ NQE support for SGI platforms (including SV1) continues through 2004 ¥ NQE is retired for non-sgi platforms

slide-26
SLIDE 26

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 26

IRIX Resource Management Roadmap

Miser Stability

6.5.2 6.5.3 6.5.4 6.5.5 6.5.6 6.5.7 6.5.8 6.5.9

IRIX Job Limits IRIX Comprehensive System Accounting eXtensible Resource Scheduler (XRS)

1999 2000

Miser Supt in LSF

slide-27
SLIDE 27

IRIX Resource Management Plans and Status, Dan Higgins -- CUG Minn, May 1999 -- Page 27

Summary

¥ IRIX Job Limits in IRIX 6.5.7 (Q1CY00) ¥ IRIX CSA in IRIX 6.5.8 (Q2CY00) ¥ Miser much more reliable and performs better in IRIX 6.5.4 ¥ IRIX XRS in IRIX 6.5.9 (Q3CY00) ¥ LSF is our workload management solution ¥ NQE 3.3 supported on SGI platforms through 2004 ¥ NQE retired on non SGI platforms