LHCb, Vac, Vcycle status Andrew McNab University of Manchester - - PowerPoint PPT Presentation

lhcb vac vcycle status
SMART_READER_LITE
LIVE PREVIEW

LHCb, Vac, Vcycle status Andrew McNab University of Manchester - - PowerPoint PPT Presentation

LHCb, Vac, Vcycle status Andrew McNab University of Manchester LHCb status Running production jobs in VMs at 3 UK Vac sites and on 2 IaaS Cloud sites using Vcycle Manchester, Lancaster, Oxford; Imperial and CERN Both Vac and Vcycle


slide-1
SLIDE 1

LHCb, Vac, Vcycle status

Andrew McNab University of Manchester

slide-2
SLIDE 2

LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 2

LHCb status

  • Running production jobs in VMs at 3 UK Vac sites and on 2 IaaS

Cloud sites using Vcycle

Manchester, Lancaster, Oxford; Imperial and CERN

Both Vac and Vcycle are advertised as GridPP products

Vac has been a supported LHCb platform since last year

Vcycle now adopted by LHCb too

  • (LHCb hasn't tried running the HLT as a Cloud service, since it has

been a production DIRAC site for several years)

  • LHCb's VM architecture is done by us, using the Pilot VM model also

used to make GridPP DIRAC and ATLAS VMs

  • DIRAC Pilot 2.0 with improved monitoring, VM support, and

modularity will also be joint CERN and Manchester work

  • More VM slots at sites for LHCb would be welcome!
slide-3
SLIDE 3

LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 3

LHCb jobs in VMs

  • CLOUD jobs in VMs managed by Vcycle
  • n OpenStack
  • CLOUD.CERN.ch is ~500 VM slots
  • VAC jobs in VMs managed by Vac of

course

slide-4
SLIDE 4

LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 4

Vac

  • On each physical node, Vac VM factory daemon runs to create and

apply contextualization to transient VMs

  • Multiple VM flavours (“VM types”) are supported, ~1 per experiment
  • Each site or Vac “space” is composed of autonomous factory nodes

All using the same /etc/vac.conf

  • Factories communicate with each other via UDP

Type of VM to start in a free slot based on what else is running and target shares

So no headnode central point of failure; robust against losing individual nodes

  • Aims for reliability and robustness through simplicity

VM instantiation failure rate << 1/1000 – much better than typical IaaS sites

  • Running LHCb production jobs since last year; and ATLAS production

jobs at Manchester (40K+ jobs), Lancaster, Oxford since early April

  • Documentation, RPMs, links to GitHub at www.gridpp.ac.uk/vac
slide-5
SLIDE 5

LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 5

Vcycle on OpenStack etc

  • Use Vac approach to run VMs on IaaS cloud platforms
  • Python daemon manages lifecycle of VMs in tenancy

– (Re)creates VMs using boot image and user_data

  • Supports multiple tenancies and multiple vmtypes per tenancy
  • Doesn't need to know about task queues etc

– VMs are black boxes: created, run, shutdown, then deleted – Vcycle can be run by the experiment, site, or a third-party

  • In production for LHCb at CERN since early May (~500 VMs)
  • Running production ATLAS and LHCb VMs in the gridpp-vcycle

tenancy at Imperial College

  • Being evaluated for ATLAS at CERN
  • Sources in Vac GitHub area
slide-6
SLIDE 6

LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 6

Immediate plans

  • LHCb

Begin work on Pilot 2.0

New monitoring framework

Multiple concurrent single-processor payloads in one pilot job or pilot VM

Improve TimeLeft handling, for better elastic MC jobs and multiple payload jobs

  • Vac

CloudInit support

Increase robustness of UDP protocol if high (50%?) packet loss

Increase scalability from present level of hundreds of VMs

Generic condor worker VM based on CernVM condor support

  • Vcycle

Man page, Admin Guide, RPMs

(Re-)add EC2 support for non-OpenStack tenancies