Worker Node Software Management: the VO perspective Mark Santcroos - - PowerPoint PPT Presentation
Worker Node Software Management: the VO perspective Mark Santcroos - - PowerPoint PPT Presentation
Worker Node Software Management: the VO perspective Mark Santcroos Dennis van Dok Introduction e-BioScience group Bioinformatics Laboratory Clinical Epidemiology, Biostatistics and Bioinformatics Academic Medical Centre,
Introduction
- e-BioScience group
– Bioinformatics Laboratory – Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, Amsterdam
- Intermediate between medical researchers and
Dutch NGI
- Support a wide range of applications in Next
Generation Sequencing and Medical Imaging
Worker Node Software
- Running on 15 sites in the Netherlands
- Base worker node installation (glite-WN)
- Proof of Concept (PoC) software installation,
heritage of Virtual Laboratory for e-Science (ended 2009)
Perspective
- Dennis van Dok is part of team that developed
and managed the PoC environment at BiG Grid
- Mark is a VO manager for the vlemed VO
Job / Application Scenarios
- Use installed software
- Application in Job Sandbox
- Fetch Application using wrapper
- Upgrade versions in PoC distribution
- Lobby for new versions with Site admins
Limitations
- Sandbox solution has size limits
- Sandbox and wrapper have network overhead
- Installed version out of date / too new
- Responsibility of maintaining applications for end-
user not always preferable
- Site admins have to be in the loop
High Level Goal
- Have a flexible solution to make software
available on the grid for end users that is also manageable from a VO admin perspective.
Packaging Requirements
- Automatic dependency resolution
- Supported on Linux
- Tools for install/update/remove/status
- Running entire in userspace, unprivileged
- Multiple installed versions of the same software
Unsuitable candidates
- rpm/yum
- deb/apt
- portage
- Arch User Repository
- pacman
- …
- Reasons: too OS specific, difficult to manage
unprivileged
Pkgsrc
- Originating in NetBSD
- Supported on Linux
- Self contained
- Actively maintained
- Can be used as a non-privileged user
- Large collection of applications already packaged
- Can make use of system provided dependencies
- Allows maintaining a local set of packages
- Could add packages to the main distribution
- Supports binary and source packages
Creating a package
DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip MAINTAINER= m.a.santcroos@amc.uva.nl HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= yes NO_BUILD= yes PKG_DESTDIR_SUPPORT= user-destdir INSTALLATION_DIRS= bin lib post-extract: ${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile .include "../../mk/bsd.pkg.mk"
Package Tree Management
- update-tree.sh
– Pull upstream pkgsrc changes – Create tarball – Put on website
Implementation Principles
- $VO_[VONAME]_SW_DIR is a directory shared
between all worker nodes on a site
- Run with a Software (VO) Manager proxy
- Install packages per site / cluster / CE
Architecture
Server (UI) Shared Storage Area Worker Nodes Management Jobs Mount
Managing packages
- site-pkgtool.sh
– Program to manage packages centrally – Initiates grid jobs
- Install, Remove, Update
- Init, Reinit, Check, Dump, Info, Version
Script on the worker node
- pkgsrc-cmd.sh
– Wrapper program that runs on the worker node
- Running as a grid job
Information Management
- list-installed-packages.sh
– Display information about installed packages for sites
- get-site-status.sh
– Gather information from all supported sites
- verify-package.sh
– Check if a certain package is available on a site
- get-tags.sh
– Get all the package tags for the configured sites
Installing a package
- Check if distribution is fresh
- Extract tree in scratch space
- Build package and dependencies
- Install package in shared software area
- Install modulefile
Environment Modules
- “The Environment Modules package provides for
the dynamic modification of a user's environment via modulefiles.”
- Select versions
- Setup environment
- Integrates with system provided setup
Tags
- Software Tags in Information System (BDII)
- Publish installed software versions per CE
- Used for resource selection by adding it to the
“Requirements” of a JDL
- Use lcg-ManageVOTag tool to publish tag
- Structure of tags is VO-${vo}_SW_${package}
Practical issues
- Tags are not omnipresent
- Shared area can become bottleneck
- No intelligent matching on tags
Conclusions
- Flexible software management system
- Relieves burden from user
- Creating packages is still labor intensive work
Discussion
- One size fits all? (Did we reinvent the wheel?)
- Connect to EGI AppDB?
- EMI Community Repositories?
- Usable for data distribution?
- Other mechanism for matching?
Links
- pkgsrc
– http://www.netbsd.org/docs/software/packages.html
- Modules
– http://modules.sourceforge.net/
- BiG Grid
– http://www.biggrid.nl/
- Bioinformatics Laboratory
– http://www.bioinformaticslaboratory.nl/
- Project Code
– http://dvandok.github.com/userspace-package- management/
Acknowledgements
- AMC Bioinformatics Laboratory
– Prof. dr. Antoine van Kampen – Dr. Silvia Delgado Olabarriaga – Barbera van Schaik
- Big Grid / Nikhef