Comparison of Scheduling Policies and Workloads on the NCCS and - PowerPoint PPT Presentation

Comparison of Scheduling Policies and Workloads on the NCCS and NICS XT4 Systems at Oak Ridge National Laboratory Troy Baer HPC System Administrator National Institute for Computational Sciences University of Tennessee Don Maxwell Senior HPC System Administrator National Center for Computational Sciences Oak Ridge National Laboratory

Overview • Introduction • Workload Analysis – Overall Utilization • System Descriptions – Breakdown by Job Size – Hardware & Software – Quantifying User – Batch Environment Experience • Queue Structures – Application Usage • Job Prioritization • Conclusions and • Quality of Service Levels • Other Scheduling Policies Future Work • Allocation Processes

Introduction • Oak Ridge National Laboratory is home to two supercomputing centers: – National Center for Computational Sciences • Founded in 1992. • DoE Leadership Computing Facility – National Institute for Computational Science • Joint project between ORNL and University of Tennessee, founded in 2007. • NSF Petascale Track 2B awardee • Both centers have Cray XT4 systems – Jaguar (NCCS) – Kraken (NICS) K • Both systems have the goal of running as many big jobs as possible

System Hardware and Software Jaguar Kraken • 84 cabinets • 40 cabinets • 7,832 compute nodes • 4,508 compute nodes (31,328 cores) (18,032 cores) • Quad-core Opteron @ • Quad-core Opterons 2.1 GHz @ 2.3 GHz • 61.19 TB of RAM • 17.61 TB of RAM • 700 TB of disk • 450 TB of disk • CLE 2.0 • CLE 2.0

Batch Environment • Both Jaguar and Kraken use TORQUE as their batch system, with Moab as the scheduler. • Moab has a number of advanced features, including a “native resource manager” interface for connecting to e.g. ALPS. • While the software is the same on the two systems, there are significant differences in how it is configured on the two systems.

Jaguar Queue Structure • dataxfer – Size = 0 – Max time = 24 hrs. • batch – Max time = 24 hrs. • debug – Max time = 4 hrs. • Additional walltime limits for smaller jobs (size<1024) imposed by TORQUE submit filter

Kraken Queue Structure • dataxfer • medium – Size = 0 – 512 < size <= 2048 – Max time = 24 hrs. – Max time = 24 hrs. • small • large – 0 >= size >= 512 – 2048 < size <= 8192 – Max time = 12 hrs. – Max time = 24 hrs. • longsmall • capability – 0 <= size <= 512 – 8192 < size <=18032 – Max time = 60 hrs. – Max time = 48 hrs.

Job Prioritization Jaguar Kraken • Priority thought of in • Priority units are units of “days”, arbitrary equivalent to one day • Components: of queue wait time – Job size • Components: – Queue wait time – QoS, assigned based – Expansion factor (ratio mainly on job size of queue time plus run time to run time) – Queue wait time – Fair share targets assigned to QoS

Quality of Service Levels on Jaguar • sizezero • ldrship – size = 0 – 6000 < size <= 17000 – +90 days priority. – +8 days priority. – Max 10 jobs/user. – 80% fair share target. • smallmaxrun • topprio – 0 < size <= 256 – size > 17000 – 20% fair share target. – +10 days priority. – Max 2 jobs/user. – 80% fair share target. • nonldrship – 256 < size <= 6000 – 20% fair share target.

Quality of Service Levels on Kraken • sizezero – size=0 – Queue time target of 00:00:01. – Priority grows exponentially after queue time target is passed. • negbal – Applied to jobs from projects with negative balances. – -100000 priority. – Additional penalties (e.g. disabling backfill or a small fair share target) have been discussed as well.

Other Scheduling Policies on Kraken • longsmall jobs limited to 1,600 cores total. • Only 1 capability is eligible to run at any given time.

Allocation Processes Jaguar Kraken • DoE INCITE • NSF/Teragrid TRAC • Made annually • Made quarterly • Allocations can last • Allocations last one year multiple years (i.e. “use it or lose it”) • Applications must be • No major requirement on able to use a application scalability “significant fraction” of LCF systems at ORNL and/or ANL

Workload Analysis • TORQUE accounting records parsed and loaded into a database. • Job scripts also captured and stored in DB. – On Kraken, this happens automatically. – On Jaguar, the aprun parts of scripts are reconstructed using another database. • Period of interest is the 4 th quarter of 2008. – Both XT4 machines in production and allocated. – XT5 successor systems not yet generally available. • To be able to compare apples to apples, size breakdowns are normalized by the size of each machine.

Overall Utilization for 4Q2008 Jaguar Kraken • 46,023 jobs run. • 15,744 jobs run. • 54.46 million CPU- • 21.00 million CPU- hours consumed. hours consumed. • 89.7% average • 57.0% average utilization. utilization. • 300 active users. • 116 active users. • 142 active projects. • 40 active projects.

Breakdown by Job Size -- Count Kraken Job Count by Normalized Core Count Jaguar Job Count by Normalized Core Count <=0.01 <=0.01 >0.01-0.10 >0.01-0.10 >0.10-0.25 >0.10-0.25 >0.25-0.5 >0.25-0.5 >0.5-0.75 >0.5-0.75 >0.75 >0.75

Breakdown by Job Size – CPU Hours Jaguar CPU Hours by Normalized Core Count Kraken CPU Hours by Normalized Core Count <=0.01 <=0.01 >0.01-0.10 >0.01-0.10 >0.10-0.25 >0.10-0.25 >0.25-0.5 >0.25-0.5 >0.5-0.75 >0.5-0.75 >0.75 >0.75

Quantifying User Experience Average Queue Time on Jaguar and Kraken by Normalized Core Count 25 Average Queue Tim e (hours) 20 15 Jaguar Kraken 10 5 0 >0.01-0.10 >0.25-0.5 >0.75 <=0.01 >0.10-0.25 >0.5-0.75 Norm alized Core Count

Quantifying User Experience (con’t.) Expansion Factor on Jaguar and Kraken by Normalized Core Count 60 50 40 Expansion Factor Jaguar Kraken 30 20 10 0 >0.01-0.10 >0.25-0.5 >0.75 <=0.01 >0.10-0.25 >0.5-0.75 N orm alized Core Count

Application Usage Top 10 Jaguar Applications Top 10 Kraken Applications by CPU Hours by CPU Hours chim era nam d ccsm am ber vasp dns2d gtc hm c pwscf m ilc qm c aces3 xgc overlap pop sovereign nam d wrf cfd++ enzo other other

Conclusions • Jaguar and Kraken actually do a lot of the same things using different mechanisms • Both systems achieve their goal of running the big jobs – For Jaguar, this consists mostly of jobs using 10% or more of the system each, with a strong bias toward jobs using 25% or more. – For Kraken, this is a more bimodal distribution with many small jobs (<25% of the system) and a significant number of whole-system jobs with no much in between. – Difference is largely due by how the systems are allocated.

Future Work

Future Work (con’t) • XT5 systems will require some changes due to sheer scale. • Better understanding of queue time – Resource availability and policy components. – Some times these overlap (e.g. standing reservations). • Fair share on Kraken? – On per-project basis, based on allocation balance. • More complex queue structure on Jaguar? – Centralize where walltime limits are defined. – Would simplify TORQUE submit filter.

Comparison of Scheduling Policies and Workloads on the NCCS and - PowerPoint PPT Presentation

Comparison of Scheduling Policies and Workloads on the NCCS and NICS XT4 Systems at Oak Ridge National Laboratory Troy Baer HPC System Administrator National Institute for Computational Sciences University of Tennessee Don Maxwell Senior

Introduction Workloads for Experiments Introduction to workloads CS 239 Workload

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

Traffic Footprint Characterization of Workloads using BPF Aditi Ghag aghag@vmware.com VMware

Evaluation of Memory and CPU usage via Cgroups of ATLAS workloads via Cgroups of ATLAS workloads

Understanding Big Data Workloads on Understanding Big Data Workloads on Modern Processors using

Scheduling and SAT Emmanuel Hebrard Toulouse Outline Introduction 1 Scheduling and SAT

Planning and Scheduling Operations part 2 Scheduling and Control Functions Facility

2017 presentation Page 2 Cautionary statements Statements in this presentation, other than

HAI CLINICAL PRACTICE FORUM HA HAI T TEAM 2 2019 019 KATHY DEMPSEY RONALD GOVERS SUSAN

Stand and deliver! Your money or your data. James Burchell Sophos Security Specialist Endpo

Not bugging the neighbours: building an evidence-based regulatory framework for industrial

The Target Cash Buffer Government Bond Market: Peer Group Dialogue Mike Williams

Value Creation in Symbiotic Innovation Value Creation in Symbiotic Innovation Ecosystems

Rethinking Power, Resilience, and Sustainability Issues for Large-scale Computing and Storage

Water quality management plans Sarah Bailey #HFMC17 #HFMC17 #HFMC17 #HFMC17 #HFMC17 #HFMC17