Worker Node Software Management: the VO perspective Mark Santcroos - - PowerPoint PPT Presentation

worker node software management
SMART_READER_LITE
LIVE PREVIEW

Worker Node Software Management: the VO perspective Mark Santcroos - - PowerPoint PPT Presentation

Worker Node Software Management: the VO perspective Mark Santcroos Dennis van Dok Introduction e-BioScience group Bioinformatics Laboratory Clinical Epidemiology, Biostatistics and Bioinformatics Academic Medical Centre,


slide-1
SLIDE 1

Worker Node Software Management: the VO perspective

Mark Santcroos Dennis van Dok

slide-2
SLIDE 2

Introduction

  • e-BioScience group

– Bioinformatics Laboratory – Clinical Epidemiology, Biostatistics and Bioinformatics – Academic Medical Centre, Amsterdam

  • Intermediate between medical researchers and

Dutch NGI

  • Support a wide range of applications in Next

Generation Sequencing and Medical Imaging

slide-3
SLIDE 3

Worker Node Software

  • Running on 15 sites in the Netherlands
  • Base worker node installation (glite-WN)
  • Proof of Concept (PoC) software installation,

heritage of Virtual Laboratory for e-Science (ended 2009)

slide-4
SLIDE 4

Perspective

  • Dennis van Dok is part of team that developed

and managed the PoC environment at BiG Grid

  • Mark is a VO manager for the vlemed VO
slide-5
SLIDE 5

Job / Application Scenarios

  • Use installed software
  • Application in Job Sandbox
  • Fetch Application using wrapper
  • Upgrade versions in PoC distribution
  • Lobby for new versions with Site admins
slide-6
SLIDE 6

Limitations

  • Sandbox solution has size limits
  • Sandbox and wrapper have network overhead
  • Installed version out of date / too new
  • Responsibility of maintaining applications for end-

user not always preferable

  • Site admins have to be in the loop
slide-7
SLIDE 7

High Level Goal

  • Have a flexible solution to make software

available on the grid for end users that is also manageable from a VO admin perspective.

slide-8
SLIDE 8

Packaging Requirements

  • Automatic dependency resolution
  • Supported on Linux
  • Tools for install/update/remove/status
  • Running entire in userspace, unprivileged
  • Multiple installed versions of the same software
slide-9
SLIDE 9

Unsuitable candidates

  • rpm/yum
  • deb/apt
  • portage
  • Arch User Repository
  • pacman
  • Reasons: too OS specific, difficult to manage

unprivileged

slide-10
SLIDE 10

Pkgsrc

  • Originating in NetBSD
  • Supported on Linux
  • Self contained
  • Actively maintained
  • Can be used as a non-privileged user
  • Large collection of applications already packaged
  • Can make use of system provided dependencies
  • Allows maintaining a local set of packages
  • Could add packages to the main distribution
  • Supports binary and source packages
slide-11
SLIDE 11

Creating a package

DISTNAME= vlet-1.3.2 CATEGORIES= local MASTER_SITES= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ EXTRACT_SUFX= .zip MAINTAINER= m.a.santcroos@amc.uva.nl HOMEPAGE= http://orange.ebioscience.amc.nl/pkgsrc/distfiles/ COMMENT= This is the VL-e Toolkit LICENSE= apache-2.0 NO_CONFIGURE= yes NO_BUILD= yes PKG_DESTDIR_SUPPORT= user-destdir INSTALLATION_DIRS= bin lib post-extract: ${CP} ${FILESDIR}/Makefile ${WRKSRC}/Makefile .include "../../mk/bsd.pkg.mk"

slide-12
SLIDE 12

Package Tree Management

  • update-tree.sh

– Pull upstream pkgsrc changes – Create tarball – Put on website

slide-13
SLIDE 13

Implementation Principles

  • $VO_[VONAME]_SW_DIR is a directory shared

between all worker nodes on a site

  • Run with a Software (VO) Manager proxy
  • Install packages per site / cluster / CE
slide-14
SLIDE 14

Architecture

Server (UI) Shared Storage Area Worker Nodes Management Jobs Mount

slide-15
SLIDE 15

Managing packages

  • site-pkgtool.sh

– Program to manage packages centrally – Initiates grid jobs

  • Install, Remove, Update
  • Init, Reinit, Check, Dump, Info, Version
slide-16
SLIDE 16

Script on the worker node

  • pkgsrc-cmd.sh

– Wrapper program that runs on the worker node

  • Running as a grid job
slide-17
SLIDE 17

Information Management

  • list-installed-packages.sh

– Display information about installed packages for sites

  • get-site-status.sh

– Gather information from all supported sites

  • verify-package.sh

– Check if a certain package is available on a site

  • get-tags.sh

– Get all the package tags for the configured sites

slide-18
SLIDE 18

Installing a package

  • Check if distribution is fresh
  • Extract tree in scratch space
  • Build package and dependencies
  • Install package in shared software area
  • Install modulefile
slide-19
SLIDE 19

Environment Modules

  • “The Environment Modules package provides for

the dynamic modification of a user's environment via modulefiles.”

  • Select versions
  • Setup environment
  • Integrates with system provided setup
slide-20
SLIDE 20

Tags

  • Software Tags in Information System (BDII)
  • Publish installed software versions per CE
  • Used for resource selection by adding it to the

“Requirements” of a JDL

  • Use lcg-ManageVOTag tool to publish tag
  • Structure of tags is VO-${vo}_SW_${package}
slide-21
SLIDE 21

Practical issues

  • Tags are not omnipresent
  • Shared area can become bottleneck
  • No intelligent matching on tags
slide-22
SLIDE 22

Conclusions

  • Flexible software management system
  • Relieves burden from user
  • Creating packages is still labor intensive work
slide-23
SLIDE 23

Discussion

  • One size fits all? (Did we reinvent the wheel?)
  • Connect to EGI AppDB?
  • EMI Community Repositories?
  • Usable for data distribution?
  • Other mechanism for matching?
slide-24
SLIDE 24

Links

  • pkgsrc

– http://www.netbsd.org/docs/software/packages.html

  • Modules

– http://modules.sourceforge.net/

  • BiG Grid

– http://www.biggrid.nl/

  • Bioinformatics Laboratory

– http://www.bioinformaticslaboratory.nl/

  • Project Code

– http://dvandok.github.com/userspace-package- management/

slide-25
SLIDE 25

Acknowledgements

  • AMC Bioinformatics Laboratory

– Prof. dr. Antoine van Kampen – Dr. Silvia Delgado Olabarriaga – Barbera van Schaik

  • Big Grid / Nikhef

– Jan Just Keijser

slide-26
SLIDE 26

Thanks!