learning based approaches to estimate job wait time in
play

Learning-based Approaches to Estimate Job Wait Time in HTC - PowerPoint PPT Presentation

Learning-based Approaches to Estimate Job Wait Time in HTC Datacenters Luc Gombert and Fr ed eric Suter IN2P3 Computing Center / CNRS Villeurbanne, France HEPiX Fall Workshop October 13, 2020 F. Suter HEPiX Fall 2020 Workshop 1/15


  1. Learning-based Approaches to Estimate Job Wait Time in HTC Datacenters Luc Gombert and Fr´ ed´ eric Suter IN2P3 Computing Center / CNRS Villeurbanne, France HEPiX Fall Workshop October 13, 2020 F. Suter – HEPiX Fall 2020 Workshop 1/15

  2. Previously in HEPiX series . . . ◮ A first study of the workload processed at CC-IN2P3 ◮ Focus on fairness for Local users ◮ Simulation of queue reconfiguration F. Suter – HEPiX Fall 2020 Workshop 2/15

  3. Acknowledgment ◮ Original motivation for this work came from a talk by Wataru Takase (KEK) at the FJPPL — Japan-France workshop on computing technologies F. Suter – HEPiX Fall 2020 Workshop 3/15

  4. Motivations and Objectives ◮ Fair-share scheduling ⇒ no estimation of job start time returned to the user! ◮ Distribution of Local job wait time ◮ Over 23 weeks from June 25, 2018 to December 2, 2018 ◮ 5,748,922 jobs on 35,000 cores 0.25 26.9 % 29.3 % 33.6 % 10.2 % 0.20 Density 0.15 0.10 0.05 0.00 10s 1m 5mn 30mn 3h 9h 1d 3d 1w 1mo Job wait time F. Suter – HEPiX Fall 2020 Workshop 4/15

  5. Motivations and Objectives ◮ Fair-share scheduling ⇒ no estimation of job start time returned to the user! ◮ Distribution of Local job wait time ◮ Over 23 weeks from June 25, 2018 to December 2, 2018 ◮ 5,748,922 jobs on 35,000 cores 0.25 26.9 % 29.3 % 33.6 % 10.2 % 0.20 Density 0.15 0.10 0.05 0.00 10s 1m 5mn 30mn 3h 9h 1d 3d 1w 1mo Job wait time 1. Can we explain why a job waits more than another? 2. Can we train some Machine Learning algorithms? 3. Can we get a good estimation of job wait time in the orange and red zones? F. Suter – HEPiX Fall 2020 Workshop 4/15

  6. Outline Introduction Some Intuitive Causes of Job Wait Time Who Submits the Job? What is the Job Requesting? When and Where is the Job Submitted? Learning-Based Job Wait Time Estimators Objectives and Performance Metrics ML Algorithm Selection Experimental Evaluation Conclusion and Future Work F. Suter – HEPiX Fall 2020 Workshop 5/15

  7. Who Submits the Job? Job Features ◮ Owner: more than 2,500 individual accounts at CC-IN2P3 ◮ Group: About 80 scientific collaborations Resource Allocation Principle 1. Groups express pledges every year (as a computing power in HS06) 2. The sum of all pledges defines what CC-IN2P3 has to deliver 3. Each group gets a proportional share of this ◮ Defines an consumption objective ◮ Used by the job scheduler as a basis of its Fair-Share policy F. Suter – HEPiX Fall 2020 Workshop 6/15

  8. Who Submits the Job? Job Features ◮ Owner: more than 2,500 individual accounts at CC-IN2P3 ◮ Group: About 80 scientific collaborations Resource Allocation Principle 1. Groups express pledges every year (as a computing power in HS06) 2. The sum of all pledges defines what CC-IN2P3 has to deliver 3. Each group gets a proportional share of this ◮ Defines an consumption objective ◮ Used by the job scheduler as a basis of its Fair-Share policy Intuitive Causes 1. Small groups get less resources � wait more! 2. Overconsumption of share � lower priority � wait more! 3. Job owners can be manually blocked by operators � wait more! F. Suter – HEPiX Fall 2020 Workshop 6/15

  9. What is the Job Requesting? Job Features ◮ Time: either Walltime or CPU time ◮ hard or soft limits – default values if none provided ◮ Memory: either resident or virtual ◮ hard or soft limits – default values if none provided ◮ Slots: almost always one for Local jobs ◮ Access to special resources: submitted to quotas F. Suter – HEPiX Fall 2020 Workshop 7/15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend