evolution of cms workload management evolution of cms
play

Evolution of CMS workload management Evolution of CMS workload - PowerPoint PPT Presentation

Evolution of CMS workload management Evolution of CMS workload management towards multicore job support towards multicore job support A. Prez-Calero Yzquierdo, A. Prez-Calero Yzquierdo, J. Hernandez, F. Khan, K. Larson, J. Letts, J.


  1. Evolution of CMS workload management Evolution of CMS workload management towards multicore job support towards multicore job support A. Pérez-Calero Yzquierdo, A. Pérez-Calero Yzquierdo, J. Hernandez, F. Khan, K. Larson, J. Letts, J. Hernandez, F. Khan, K. Larson, J. Letts, A. Malta, A. McCrea, E. Vaandering A. Malta, A. McCrea, E. Vaandering for CMS Computing for CMS Computing OIST, Okinawa, 13th April, 2015 OIST, Okinawa, 13th April, 2015

  2. Outline Outline ● Multicore jobs for LHC run 2 Multicore jobs for LHC run 2 ● CMS workload management and submission CMS workload management and submission infrastructure infrastructure ● Results from latest tests Results from latest tests ● Monitoring deployment Monitoring deployment ● Conclusions & Conclusions & Outlook for future developments Outlook for future developments 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 2

  3. Multicore jobs for LHC run 2 Multicore jobs for LHC run 2 Motivation for multithreaded applications: Motivation for multithreaded applications: ● Hardware evolution: best exploitation of current best exploitation of current multicore CPUs multicore CPUs Hardware evolution: ● Evolution of LHC conditions : increased : increased data volumes data volumes and and event complexity event complexity Evolution of LHC conditions ● New era for HEP computing with the integration of elements of Grid Computing and High Performance New era for HEP computing with the integration of elements of Grid Computing and High Performance ● Computing ( distributed parallel computing) distributed parallel computing) which requires to adapt different levels of our computing: which requires to adapt different levels of our computing: Computing ( Multithreaded applications Multithreaded applications ● Grid-wide scheduling Grid-wide scheduling ● Site scheduling Site scheduling ● CMS priority in 2015 is Tier1 sites : Prompt data reconstruction will be run using : Prompt data reconstruction will be run using T0+50% of T1 CPUs T0+50% of T1 CPUs CMS priority in 2015 is Tier1 sites ● Drives first phase of deployment, focused on T1s Drives first phase of deployment, focused on T1s ● Simulation and digitisation will follow, with deployment to T2s Simulation and digitisation will follow, with deployment to T2s ● Single core and multicore jobs will coexist during run 2: mixed scheduling is mandatory Single core and multicore jobs will coexist during run 2: mixed scheduling is mandatory ● See talk '' Using the CMS Threaded Application in a Production Environment “ for details on CMS multithreaded applications 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 3

  4. CMS WM and SI CMS WM and SI CMS workload management and submission infrastructure is based on: CMS workload management and submission infrastructure is based on: ● WMAgents : : manage centralized workflows manage centralized workflows populating job queues, assigning job populating job queues, assigning job WMAgents ● priorities, handling errors and job retrials, etc. priorities, handling errors and job retrials, etc. GlideinWMS: matches jobs to resources matches jobs to resources managing a transient pool of computing managing a transient pool of computing GlideinWMS: ● resources controlled by pilot jobs pilot jobs resources controlled by ● Main tool: Main tool: multicore pilots multicore pilots with internal dynamic with internal dynamic WMAgent(s) GlideinWMS FE partitioning of resources partitioning of resources jobs CPU cores 1 Job 3 Job 8 Job 1 PILOT Job 4 Job 9 2 jobs Job 7 Job 10 3 Job 2 job5 Job 6 4 time See talk '' Using the glideinWMS System as a Common Resource Provisioning Layer in CMS “ for more details 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 4

  5. Multicore pilot model Multicore pilot model Advantages of managing all CMS workflows with of managing all CMS workflows with Advantages multicore partitionable pilots: multicore partitionable pilots: ● Total Total control of scheduling priorities control of scheduling priorities of single and multicore jobs of single and multicore jobs ● Remove unwanted effects Remove unwanted effects from single core and multicore pilot from single core and multicore pilot competition competition ● for resources at the sites for resources at the sites ● for matching jobs once running for matching jobs once running ● Reduced number of pilots Reduced number of pilots in the system in the system 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 5

  6. Multicore pilot model Multicore pilot model Disadvantages of multicore pilots: of multicore pilots: Disadvantages Inefficiency in draining of retiring pilots Inefficiency in draining of retiring pilots ● Slow ramp up of resources in shared sites of resources in shared sites Slow ramp up ● who protect their farms from excessive WN who protect their farms from excessive WN draining draining Slots for 8-core pilots available at PIC over a week 95% Simulation ~4 days for 1000 slots 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 6

  7. Multicore pilot model Multicore pilot model Fragmentation of the pilot internal resources limits pilot ability to pull new limits pilot ability to pull new Fragmentation of the pilot internal resources ● multicore jobs. multicore jobs. Renewal of finite-lifetime pilots provides fresh non-fragmented pilots non-fragmented pilots Renewal of finite-lifetime pilots provides fresh ● continuously continuously No forced defragmentation inside the pilots needed inside the pilots needed No forced defragmentation ● Tuning of the system is essential : performance dependence on ratio of job and : performance dependence on ratio of job and Tuning of the system is essential ● pilot running times, mixture of single and multicore jobs, number of cores, job pilot running times, mixture of single and multicore jobs, number of cores, job ranks, etc. ranks, etc. CPU cores 1 Job 3 Job 8 Job 3 Job 8 Job 1 Job 1 Job 4 Job 9 Job 4 Job 9 2 Job 7 Job 7 Job 10 Job 10 3 Job 2 job5 Job 2 job5 Job 6 Job 6 4 time pilot 2 pilot 1 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 7

  8. Multicore job scheduling overview Multicore job scheduling overview CMS target for 2015: 50% of the T1 resources in use by multicore jobs for PromptReco 50% of the T1 resources in use by multicore jobs for PromptReco CMS target for 2015: ● Objectives: Objectives: ● Integrate scheduling of both multicore and single-core jobs: control fragmentation of multicore and single-core jobs: control fragmentation of Integrate scheduling of both ● resources in use resources in use High High efficiency CPU usage, efficiency CPU usage, minimizing any inefficiencies minimizing any inefficiencies deriving from scheduling deriving from scheduling ● Components Components in WM and SI which required in WM and SI which required modifications and tuning modifications and tuning ● WMAgent , managing mixed single core and multicore loads with correct job prioritization , managing mixed single core and multicore loads with correct job prioritization WMAgent ● Pilot allocation to the sites: Pilot allocation to the sites: ● GlideinWMS pilot factories produce enough multicore pilots for each site GlideinWMS pilot factories produce enough multicore pilots for each site – Local resource managers ( T1s batch systems T1s batch systems ) ) Local resource managers ( – GlideinWMS FE : pilot performance in scheduling mixed loads inside the running pilots : pilot performance in scheduling mixed loads inside the running pilots GlideinWMS FE ● Tuning and debugging the system requires fully developed monitoring: fully developed monitoring: Tuning and debugging the system requires ● Pilots Pilots ● Jobs Jobs ● 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 8

  9. Latest results Latest results 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 9

  10. Scale test to Tier1s Scale test to Tier1s Multicore job submission works: successfully submitting and running multicore jobs to all CMS T1s : successfully submitting and running multicore jobs to all CMS T1s Multicore job submission works ● PromptReco multicore jobs regularly submitted to T1s to test scale: regularly submitted to T1s to test scale: PromptReco multicore jobs ● Targetting 18k cores at KIT, PIC, CCIN2P3, CNAF, JINR, RAL and FNAL 18k cores at KIT, PIC, CCIN2P3, CNAF, JINR, RAL and FNAL Targetting ● Using 4-core jobs 4-core jobs inside inside 8-core pilots 8-core pilots Using ● Tested under heavy pressure from single core jobs heavy pressure from single core jobs Tested under ● Good results, close to target overall Good results, close to target overall ● Peaks of 8k cores achieved Peaks of 8k cores achieved ● Results vary from site to site : work ongoing to reach target at each site independently Results vary from site to site : work ongoing to reach target at each site independently ● single core job slots multicore job slots 4-13-2015 Evolution of CMS WM for multicore support - Antonio Pérez-Calero Yzquierdo 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend