lhcb vac vcycle status
play

LHCb, Vac, Vcycle status Andrew McNab University of Manchester - PowerPoint PPT Presentation

LHCb, Vac, Vcycle status Andrew McNab University of Manchester LHCb status Running production jobs in VMs at 3 UK Vac sites and on 2 IaaS Cloud sites using Vcycle Manchester, Lancaster, Oxford; Imperial and CERN Both Vac and Vcycle


  1. LHCb, Vac, Vcycle status Andrew McNab University of Manchester

  2. LHCb status Running production jobs in VMs at 3 UK Vac sites and on 2 IaaS ● Cloud sites using Vcycle Manchester, Lancaster, Oxford; Imperial and CERN – Both Vac and Vcycle are advertised as GridPP products – Vac has been a supported LHCb platform since last year – Vcycle now adopted by LHCb too – (LHCb hasn't tried running the HLT as a Cloud service, since it has ● been a production DIRAC site for several years) LHCb's VM architecture is done by us, using the Pilot VM model also ● used to make GridPP DIRAC and ATLAS VMs DIRAC Pilot 2.0 with improved monitoring, VM support, and ● modularity will also be joint CERN and Manchester work More VM slots at sites for LHCb would be welcome! ● LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 2

  3. LHCb jobs in VMs CLOUD jobs in VMs managed by Vcycle ● on OpenStack CLOUD.CERN.ch is ~500 VM slots ● VAC jobs in VMs managed by Vac of ● course LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 3

  4. Vac On each physical node, Vac VM factory daemon runs to create and ● apply contextualization to transient VMs Multiple VM flavours (“VM types”) are supported, ~1 per experiment ● Each site or Vac “space” is composed of autonomous factory nodes ● All using the same /etc/vac.conf – Factories communicate with each other via UDP ● Type of VM to start in a free slot based on what else is running and target shares – So no headnode central point of failure; robust against losing individual nodes – Aims for reliability and robustness through simplicity ● VM instantiation failure rate << 1/1000 – much better than typical IaaS sites – Running LHCb production jobs since last year; and ATLAS production ● jobs at Manchester (40K+ jobs), Lancaster, Oxford since early April Documentation, RPMs, links to GitHub at www.gridpp.ac.uk/vac ● LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 4

  5. Vcycle on OpenStack etc Use Vac approach to run VMs on IaaS cloud platforms ● Python daemon manages lifecycle of VMs in tenancy ● – (Re)creates VMs using boot image and user_data Supports multiple tenancies and multiple vmtypes per tenancy ● Doesn't need to know about task queues etc ● – VMs are black boxes: created, run, shutdown, then deleted – Vcycle can be run by the experiment, site, or a third-party In production for LHCb at CERN since early May (~500 VMs) ● Running production ATLAS and LHCb VMs in the gridpp-vcycle ● tenancy at Imperial College Being evaluated for ATLAS at CERN ● Sources in Vac GitHub area ● LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 5

  6. Immediate plans LHCb ● Begin work on Pilot 2.0 – New monitoring framework – Multiple concurrent single-processor payloads in one pilot job or pilot VM – Improve TimeLeft handling, for better elastic MC jobs and multiple payload jobs – Vac ● CloudInit support – Increase robustness of UDP protocol if high (50%?) packet loss – Increase scalability from present level of hundreds of VMs – Generic condor worker VM based on CernVM condor support – Vcycle ● Man page, Admin Guide, RPMs – (Re-)add EC2 support for non-OpenStack tenancies – LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend