Workload Management: NQE/LSF Status & Plans
Jack Thompson
Marketing Product Manager
SGI jt@sgi.com
41st Cray User Group Conference Minneapolis, Minnesota
Brian MacDonald
Technical Relationship Manager
Workload Management: NQE/LSF Status & Plans Jack Thompson - - PowerPoint PPT Presentation
Workload Management: NQE/LSF Status & Plans Jack Thompson Brian MacDonald Marketing Product Manager Technical Relationship Manager SGI Platform Computing jt@sgi.com brian@platform.com 41st Cray User Group Conference Minneapolis,
Marketing Product Manager
41st Cray User Group Conference Minneapolis, Minnesota
Technical Relationship Manager
2
3
Ð Core competency issue Ð Multi-vendor environment
4
Ð Support through year-end, 2004 Ð Critical bugs fixed Ð Call center support
5
Ð Available through January 31, 2000
Ð Developed jointly by Platform and SGI
6
Ð Including Cray SV1
Ð LSF Standard Edition, LSF Parallel, LSF Client
Ð LSF Analyzer, LSF MultiCluster, LSF JobScheduler, LSF Make
7
Ð Single point of control and administration Ð Logically present a single system image to users, applications and networks Ð Application of policies across the consolidated platform
Ð Uniform policies to satisfy workload performance
response time Ð Improved application availability - both for failures and planned outages
8
12 jobs, 900 MB
access
9
CPU Utilization
1
Ð Miser (Q4 99) Ð Miser CPU sets (Q4 99) Ð OS service follow-on (XRS)
2
Ð Match necessary conditions
Ð Choose the best from eligible candidates
Ð Adjust load values for selected hosts
Ð Define locality of parallel jobs
3
LIM
4
Parallel Application Manager
Remote Execution Server
¥ placement ¥ control (signals, limits, message) ¥ consolidated accounting ¥ SGI Array Session ¥ Task startup and control ¥ ASH returned to PAM ¥ ASH sent to RES used to discover per job usage ¥ MPT 1.3 Plug-in
5
¥ Application Checkpoint Restart ¥ Transparent host selection ¥ Accounting for ISV applications
ISVs, Custom Scientific and Commercial Applications transparently gain access to resource management services without changing their code
6
Ð Scalability improvements for all the bells and whistles turned on - Fair-share + Back-filling
Ð Dynamic re-configuration without re-start
Ð Client query scalability
Ð Adaptive dispatch for high throughput, short running jobs Ð Time dependent configuration for queues
7
Ð Improved Input/Output handling support
Ð Integrated FTA supported within LSF Ð Job Flow Ð Kill re-queue
Ð Non-shared daemon configuration support Ð Automatic host type and model detection