Managing a Dynamic Sharded Pool Anthony Tiradani HTCondor Week 2019 - - PowerPoint PPT Presentation

managing a dynamic sharded pool
SMART_READER_LITE
LIVE PREVIEW

Managing a Dynamic Sharded Pool Anthony Tiradani HTCondor Week 2019 - - PowerPoint PPT Presentation

Managing a Dynamic Sharded Pool Anthony Tiradani HTCondor Week 2019 22 May 2019 Introduction Some archaeology from my time at Fermilab Earliest archived Fermilab talks at HTCondor Week 15 years ago! My earliest HTCondor Week talk


slide-1
SLIDE 1

Managing a Dynamic Sharded Pool

Anthony Tiradani HTCondor Week 2019 22 May 2019

slide-2
SLIDE 2
  • Some archaeology from my time at Fermilab

– Earliest archived Fermilab talks at HTCondor Week – 15 years ago! – My earliest HTCondor Week talk in 2012

  • Describe the current state of the cluster(s)
  • Along the way, I hope to:

– Show some (maybe) unique uses of HTCondor – Explain why we did what we did – Give a peek into some future activities

Introduction

5/16/19 Anthony Tiradani | HTCondor Week 2019 2

slide-3
SLIDE 3
  • There was HTCondor! And it was Good.

– When I started, the silent “HT” hadn’t been added to the name yet

In the Beginning… (At least for me)

5/16/19 Anthony Tiradani | HTCondor Week 2019 3

  • Single VO
  • Grid-enabled
  • Priorities
  • CMS + OSG
  • Multi-VO Pool
  • Grid-enabled
  • Quotas
  • Many experiments +

OSG GPGrid

CMS Tier-1 CMS LPC

  • Single VO Pool
  • Local Analysis only
  • Priority based

scheduling

slide-4
SLIDE 4

Queued Idle Busy

|- Holidays -|

Last 3 months 24000

4 5/16/19 Anthony Tiradani | HTCondor Week 2019

Net Batch Slot Utilization – 2013 Scientific Computing Portfolio Review

slide-5
SLIDE 5
  • FifeBatch was created using GlideinWMS

– Main motivation was the desire to use OSG resources seamlessly.

FIFEBatch

5/22/19 Anthony Tiradani | HTCondor Week 2019 5

GPGrid FifeBatch (GlideinWMS Pool) Pilot Pilot OSG

slide-6
SLIDE 6
  • FIFEBatch was a GlideinWMS pool

– All slots are similar – controlled by pilot (glidein) – Used the glideinWMS Frontend to implement policies – Used the OSG Factory for pilot submission – Pilot “shape” defined by Factory – All of the benefits of glideinWMS and OSG

  • All FNAL experiment jobs ran within the FifeBatch pool
  • FIFEBatch managed by experimental support team
  • GPGrid Managed by Grid Computing team

FIFEBatch

5/22/19 Anthony Tiradani | HTCondor Week 2019 6

slide-7
SLIDE 7

SC-PMT - GP Grid Processing requests: Large memory or multi-core as single slot

5/16/19

Anthony Tiradani | HTCondor Week 2019

7

Last year’s SC-PMT

  • We began to see increased

demand for large memory or multi-core slots

  • For context:

– A “standard” slot was defined as 1 core, 2GB RAM

  • Partitionable slots limited by the

pilot size

  • Unable to use extra worker

resources beyond what is claimed by the pilot

slide-8
SLIDE 8

Combined: GPGrid + FifeBatch = FermiGrid

5/22/19 Anthony Tiradani | HTCondor Week 2019 8

FermiGrid GlideinWMS Services OSG Pilots Worker Nodes Pilot Quota based scheduling Priority based scheduling OSG

slide-9
SLIDE 9
  • New requirements:

– Make LPC available to CMS Connect – Make CRAB3 jobs run on LPC resources

  • LPC workers reconfigured to remove all extra storage mounts

– Now LPC workers look identical to the Tier-1 workers

  • LPC needed Grid interface for CMS Connect and CRAB3

– The Tier-1 was already Grid-enabled

  • However, 2 competing usage models:

– Tier-1 wants to be fully utilized – LPC wants resources at the time of need

CMS Tier-1 + LPC

5/22/19 Anthony Tiradani | HTCondor Week 2019 9

slide-10
SLIDE 10

Combined CMS Pool LPC Workers Tier-1 Workers

LPC User

CMS Global Pool Interactive Login Nodes CMS LPC HTCondor-CE CMS LPC Schedd CMS Tier-1 HTCondor-CE CMS CRAB3 CRAB Submit Direct Submit Reserved glide-in (From CRAB submit

  • r CMS Connect)

Global Pool Pilot CMS - Other CMS Connect

CMS Tier-1 + LPC

5/22/19 Anthony Tiradani | HTCondor Week 2019 10

slide-11
SLIDE 11

HTCondor-CE HTCondor Worker Advertises: FERMIHTC_DOCKER_CAPABLE=True FERMIHTC_DOCKER_TRUSTED_IMAGES= <comma separated list> Job Router Sets WantDocker = MachineAttrFERMIHTC_DOCKER_CAPABLE0 Sets DockerImage = image expression LPC Schedd Job Transform GlideinWMS Pilot Advertises: FERMIHTC_DOCKER_CAPABLE=False Sets WantDocker = MachineAttrFERMIHTC_DOCKER_CAPABLE0 Sets DockerImage = image expression

CMS - Docker

5/16/19 Anthony Tiradani | HTCondor Week 2019 11

slide-12
SLIDE 12

HEPCloud - Drivers for Evolving the Facility

5/16/19 Anthony Tiradani | HTCondor Week 2019 12

Price of one core-year on Commercial Cloud

  • HEP computing needs will be 10-

100x current capacity

Two new programs coming online (DUNE, High-Luminosity LHC), while new physics search programs (Mu2e) will be

  • perating
  • Scale of industry at or above R&D

– Commercial clouds offering increased value for decreased cost compared to the past

slide-13
SLIDE 13

HEPCloud - Drivers for Evolving the Facility: Elasticity

5/16/19 Anthony Tiradani | HTCondor Week 2019 13

  • Usage is not steady-state
  • Computing schedules driven by real-world considerations

(detector, accelerator, …) but also ingenuity – this is research and development of cutting-edge science NOvA jobs in the queue at FNAL Facility size

slide-14
SLIDE 14

HEPCloud - Classes of Resource Providers Grid Cloud HPC Trust Federation Economic Model Grant Allocation

▪ Community Clouds - Similar trust federation to Grids ▪ Commercial Clouds - Pay-As- You-Go model

๏ Strongly accounted ๏ Near-infinite capacity ➜ Elasticity ๏ Spot price market

▪ Researchers granted access to HPC installations ▪ Peer review committees award Allocations

๏ Awards model designed for individual PIs rather than

large collaborations

  • Virtual Organizations (VOs)
  • f users trusted by Grid sites
  • VOs get allocations ➜

Pledges

– Unused allocations: opportunistic resources

“Things you rent” “Things you borrow” “Things you are given”

5/22/19 Anthony Tiradani | HTCondor Week 2019 14

slide-15
SLIDE 15
  • New DOE requirements: Use LCF Facilities
  • HEPCloud adds Cloud and HPC resources to the pool
  • Cloud and HPC resource requests are carefully curated for specific classes of jobs

– Only want appropriate jobs to land on Cloud and HPC resources – Additional negotiator also gives more flexibility in handling new resource types

HEPCloud

5/22/19 Anthony Tiradani | HTCondor Week 2019 15

slide-16
SLIDE 16

CMS HEPCloud Services LPC Workers LPC Negotiator Tier-1 Scheduler HPC Pilots Cloud Pilots HEPCloud Negotiator HPC Cloud Tier-1 Workers

HEPCloud Era

5/22/19 Anthony Tiradani | HTCondor Week 2019 16

slide-17
SLIDE 17

Monitoring – Negotiation Cycles

5/22/19 Anthony Tiradani | HTCondor Week 2019 17

Negotiation Cycle Time Successful Matches Idle Jobs Considered Jobs Rejected Jobs

slide-18
SLIDE 18

Monitoring – Central Manager

5/22/19 Anthony Tiradani | HTCondor Week 2019 18

Recent Updates Average match rates

slide-19
SLIDE 19
  • CI/CD pipelines for Docker containers
  • Containerizing workers? (Kubernetes, DC/OS, etc.)
  • HTCondor on HPC facilities with no outbound networking
  • Better handling of MPI jobs

– No dedicated FIFO scheduler – No preemption

Next Steps

5/22/19 Anthony Tiradani | HTCondor Week 2019 19

slide-20
SLIDE 20

Questions, Comments?

5/22/19 Anthony Tiradani | HTCondor Week 2019 20