GlideinWMS Marco Mambelli Stakeholders Meeting May 11, 2018 - - PowerPoint PPT Presentation

glideinwms
SMART_READER_LITE
LIVE PREVIEW

GlideinWMS Marco Mambelli Stakeholders Meeting May 11, 2018 - - PowerPoint PPT Presentation

GlideinWMS Marco Mambelli Stakeholders Meeting May 11, 2018 Overview Releases since last stakeholders meeting Upcoming releases Current focus GlideinWMS roadmap Reference slides GlideinWMS Architecture Quick Facts


slide-1
SLIDE 1

GlideinWMS

Marco Mambelli Stakeholders Meeting May 11, 2018

slide-2
SLIDE 2

Overview

  • Releases since last stakeholder’s meeting
  • Upcoming releases
  • Current focus
  • GlideinWMS roadmap
  • Reference slides

– GlideinWMS Architecture – Quick Facts – Releases since last stakeholders meeting

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 2

slide-3
SLIDE 3

Releases Since Last Stakeholders Meeting

  • v3_2_22 released on April 10

– Bug Fix: Incorrect behavior of Singularity – Bug Fix: proxy-renewal-script updates and bug fixes – Bug Fix: Protection against malformed Frontend messages and hardening of forked processes

  • v3_2_22_1 and v3_2_22_2 followed shortly after on April 11

and 17 to adapt to new Singularity 2.4.6 requirement and because I did only a partial fix in rushing 3.2.22.1

– Fixes to the proxy-renewal-script (OSG contributed) were also added

  • v3_3_3 (Development series) released on April 17

– Includes all features and bug fixes released in v3_2_22_2

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 3

slide-4
SLIDE 4

Next Planned Release

  • v3_4 planned for May 24

– Merging of production and development branches (v3.2 and v3.3), will bring Google CE support and policy plugin to the production version – Code modernization to Python 2.7 (and 2.6) standards – Increase number and coverage of the unit tests

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 4 5 10 15 20 25 30 3.4 3.2.22.2 3.2.21

Tickets per release

Features Bug fix Other Total

  • 10k lines code change
  • Doubled unit test coverage
  • More than doubled tests
slide-5
SLIDE 5

Next Planned Release (cont)

  • v3_4 planned for May 24

– Glidein lifetime not based anymore on the length of the proxy – Internal support of condor_switchboard (discontinued by HTCondor) – New option to kill glideins when job requests decrease – Estimate in advance the cores provided to glideins discovering cores automatically – Add entry monitoring breakdown for metasites – Review Factory and Frontend tools, especially glidien_off and manual_glidein_submit.py

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 5

slide-6
SLIDE 6

GlideinWMS: Current Focus

  • Improve stability

– More automated testing & CI (pylint, pythoscope, futurize, unittest …) is ongoing focus – Developer’s test infrastructure to connect to Factory ITB services for scale testing – External contributions should be production ready

  • Minimize wastage of resources from over-provisioning

– Consider site topology – AUTO estimate – Actively follow the requests and adapt as the request goes down – Solution addressed in phases

  • First phase of the solution is available in v3.2.21, next in 3.4
  • Consider ”transactional provisioning”
  • Containerization

– Singularity support changes

  • Security

– Adapt to sites with tighter security restrictions

  • Support for shorter proxy lifetime

– Impacts how we determine lifetime of a glidein

– Successful test w/ FIFE

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 6

slide-7
SLIDE 7

GlideinWMS Roadmap

  • Medium term (2018 – mid 2019)

– Keep up with the scalability requirements

  • Investigate and incorporate new technologies like pandas dataframes,

numpy, etc

– Outsource GlideinWMS functionality to the HTCondor

  • Work with the HTCondor team to provide some of the frontend

functionality natively through HTCondor

– Leaner & modular Frontend

  • Adapt to changes/introduction of Acquisition Engine by HTCondor

– Dependent on the work that will be done in HTCondor in future

  • Very thin GlideinWMS factory

– Support for new HPC sites with stricter policies (e.g. no outbound connection except gateways, MFA)

  • Depends on support from HTCondor. Discussion with the HTCondor

team next week.

– Monitoring Modernization

  • Retire GlideinWMS monitoring pages
  • Move to grafana/graphite/elastic search based solution

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 7

slide-8
SLIDE 8

GlideinWMS Roadmap

  • Long term (> mid-2019)

– Moving to Decision Engine (DE)

  • Replace frontend with the Decision Engine

– Make Glidein as a service capable of talking to multiple WMS middleware/frameworks

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 8

slide-9
SLIDE 9

Questions/Comments

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 9

slide-10
SLIDE 10

Reference Slides

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 10

slide-11
SLIDE 11

GlideinWMS

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 11 condor submit VO Frontend HTCondor Central Manager HTCondor Schedulers HTCondor Schedulers VO Frontend

Clouds (AWS/OpenStack OpenNebula)

Virtual Machine Job

HTCondor CE

Virtual Machine Job GlideinWMS Factory HTCondor-G

Super Computers (via BOSCO)

Virtual Machine Job

Grid Site

Virtual Machine WN/VM Glidein HTCondor Startd Job

Pull Job

NOTE: Frontend can talk to multiple factories Factory can serve multiple frontends

2014 2014 2012 2006

slide-12
SLIDE 12

GlideinWMS: Quick Facts

  • GlideinWMS is an open-source product (http://tinyurl.com/glideinWMS)
  • Heavy reliance on HTCondor (UW Madison) and we work closely with them
  • Effort:

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 12

Role Resources Effort (FTE) Project Mgmt/Lead Parag Mhashilkar (0.15 USCMS) 0.15 Development & Support Parag Mhashilkar (0.20 SCD) Marco Mambelli (1 SCD) Dennis Box (0.75 SCD) Marco Mascheroni (0.5 CMS - Contractor) 2.45 TOTAL 2.60 Table: Current Resources & Roles

slide-13
SLIDE 13

Quick Facts: Releases & Support Structure

  • Releases

– Issues tracked in redmine issue tracker

  • https://cdcvs.fnal.gov/redmine/projects/glideinwms/issues
  • Categorized and prioritized based on impact, urgency and requester

– Issues are now associated with respective stakeholders

  • Issues are assigned based on developer’s expertise and other

workload

  • Roadmap for upcoming releases available in redmine (See reference

slides)

– SCM

  • All releases are version controlled and tagged
  • http://glideinwms.fnal.gov/doc.prd/download.html

– Release notes & history

  • http://glideinwms.fnal.gov/doc.prd/history.html
  • Support

– Entire development team is responsible for support

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 13

slide-14
SLIDE 14

Quick Facts: Project Status & Communication Channels

Area of Interest Mailing Lists Support glideinwms-support@fnal.gov Stakeholders glideinwms-stakeholders@fnal.gov Release Announcements glideinwms-support@fnal.gov cms-dct-wms@fnal.gov glideinwms-stakeholders@fnal.gov Future Release plans See next slide Discussions glideinwms-discuss@fnal.gov Code commits glideinwms-commit@fnal.gov Twitter Tag: @glideinwms

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 14

  • Project meeting: Wednesdays 10 – 11 am

– Technical discussions & status updates – Regular stakeholder participation – Contact Parag Mhashilkar if you need invite for this meeting

  • Quarterly Stakeholders Meeting
  • Project Management

– Project Status reported monthly at CS Project status meetings

slide-15
SLIDE 15

Tracking Releases in Redmine

05/11/2018 Marco Mambelli | GlideinWMS - Stakeholders Meeting 15

  • 1. Visit the redmine issues tab for GlideinWMS or the URL
  • 2. Click custom query for stakeholder or version roadmap

Default tabs not too useful