Jenkins + CVMS : Distributed Development, Centralised Delivery - - PowerPoint PPT Presentation
Jenkins + CVMS : Distributed Development, Centralised Delivery - - PowerPoint PPT Presentation
Jenkins + CVMS : Distributed Development, Centralised Delivery Bruce Becker | bbecker@csir.co.za Coordinator: SAGrid SANREN, Meraka Institute, CSIR Outline What users want SAGrid VO a catch-all VO with many applications Problem
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Outline
- What users want
- SAGrid VO – a catch-all VO with many applications
- Problem statements:
- Problem 1: ”the usual problem” – maintaining
applications in a distributed computing environment
- Problem 2: ”Another usual problem” - maintaining a
complex application inventory
- General solution : CVMFS + Jenkins
- Some specifics of SAGrid CI platform
- Outlook
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
SAGrid as a catch-all VO
- The South African National Grid operates a
catch-all VO which all South African researchers can use to access computing and data resources.
- SAGrid VO is not a domain-specific VO, so
- several widely-varying uses for the applications
supported by this VO
- Applications requested by users or communities
themselves
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
What users want
Amazing infrastructure Some users want highly varied, modular application selection Vertically integrated Highly specialised applications Highly trained support Highly trained support
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
What users get sometimes
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
The problem (1) - ”the usual problem”
- Software distribution was done mostly by hand”:
- Someone from the ops team develops script to install the application
- Apps installed via job submission
- Tags applied via script or by the job itself
- Issues:
- Major overhead of work
- Inconsistent installation procedures between applications and sites
- Bottleneck in porting applications (has to be done by someone in the
VO)
- Duplication of effort, especially in dependencies of applications
- Difficult to manage application lifecycles
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
The problem (2) - what about the community ?
- Managing the inventory in a catch-all VO can be complex
when there are many applications
- Prioritising porting requests depends on the knowledge
- f the export porting the application
- Can lead to major delays in porting and deploying applications
- However, a user or community usually has an expert who
knows how to tune, port and configure the application properly, as well as dependencies
- Usually, ”they” have to conform to ”us” - learn grid tools and
terminology, etc
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Problem (3) : Changes to the playing fjeld
- New middleware stacks
- New architectures – GPGPU, ARM
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Questions to answer
- How do we lower the barrier to entry to the grid or
cloud infrastructure ?
- How can the application expert prove to the resource
provider that the application will actually run on the execution environment of the site ?
- How can we manage the lifecycle of applications
across multiple versions, architectures, configurations ?
- How can we ensure that once applications are
”certified”, they are actually available on as many sites as possible ?
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
General Solution: Jenkins + CVMFS
- The issues outlined are ”typical” in a large
software project
- Usually solved by judicious use of Continuous
Integration system
- Once applications have been ”ported”, put them
into a trusted repository
- Previously – built RPMs, but required site-
admin intervention
- One-time configuration with CVMFS
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
First, some changes
- Distribute the effort, centralise the tools
- Move repository from ”closed” SVN repo
– https://ops.sagrid.ac.za/trac/svn/repo
- to git
– https://github.com/SAGridOps/SoftwareInstallation
- Don't have to give write access to a single repo, instead
accept pull requests
- Take advantage of all the Github infrastructure
- Expand possible contributors to those ”outside” the
infrastructure
- Recognise individuals' contribution
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Recognise individuals...
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Decentralise the team
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Collaborate with code
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Let the robots do the work
- Define what we want to deploy – let the experts
take care of how to deploy
- DevOps paradigm – same review/tag/release
mechanisms on operations code as we have for scientific applications
- Teach a marketable skill
- Allow specialisation
- Enable remote management of complex services
- Ensure that published methodology is adopted
methodology
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Quality Control and feedback
- Ensure that
requested applications are included in the repo
- Provide testing and
QA infrastructure
- Self-serve to users
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
The CI environment
- Jenkins is extremely flexible... can do almost anything
- AuthN/AuthZ
- Currently using Github Oauth
- Take advantage of future Identity Federation
- We wanted to simulate different execution
environments
- Already in production
- Planned for future
- Track and re-use depedendencies
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Matrix-based builds
- Independent different builds and build statuses for
different configurations:
- Application name
- Version
- OS
- Architecture
- … can add specific tuning configurations...
- We can see exactly what's broken where – build
more resilient integration code.
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
T ypical workfmow
Testing matrix Defines relevant tests in Jenkins Writes code to pass required tests Dev/Stage env. Application developer Infrastructure expert Reads description
- f execution environment tests
Promote a build to CVMFS
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Dependency management simple case
- Common problem with applications :
need a specific version of a compiler
- Compiling the compiler can itself be
tricky...
- Jenkins tests the full dependency
chain necessary
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Real-world application
- GADGET –
astrophysics hydrodynamic simulations
- Many (levels of)
dependencies
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Public Application Dashboard
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Authenticated view
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Generic build script
# GADGET requires HDF5 FFTW2 ZLIB and
- penmpi
module add ci module add fftw/2.1.5 module add hdf5 module add openmpi module add gsl # GADGET requires HDF5 FFTW2 ZLIB and
- penmpi
module add ci module add fftw/2.1.5 module add hdf5 module add openmpi module add gsl
rm -rf $FFTW_DIR tar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz -C / rm -rf $HDF5_DIR tar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz -C / rm -rf $OPENMPI_DIR tar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz -C / rm -rf $GSL_DIR tar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz -C / rm -rf $FFTW_DIR tar xvfz /repo/$SITE/$OS/$ARCH/fftw/$FFTW_VERSION/build.tar.gz -C / rm -rf $HDF5_DIR tar xvfz /repo/$SITE/$OS/$ARCH/hdf5/$HDF5_VERSION/build.tar.gz -C / rm -rf $OPENMPI_DIR tar xvfz /repo/$SITE/$OS/$ARCH/openmpi/$OPENMPI_VERSION/build.tar.gz -C / rm -rf $GSL_DIR tar xvfz /repo/$SITE/$OS/$ARCH/gsl/$GSL_VERSION/build.tar.gz -C /
Set up the environment Clean build, retrieve dependency artifacts
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
Generic build script
make install DESTDIR=$WORKSPACE/build mkdir -p $REPO_DIR rm -rf $REPO_DIR/* tar -cvzf $REPO_DIR/build.tar.gz -C $WORKSPACE/build apprepo make install DESTDIR=$WORKSPACE/build mkdir -p $REPO_DIR rm -rf $REPO_DIR/* tar -cvzf $REPO_DIR/build.tar.gz -C $WORKSPACE/build apprepo
Actually build... Create the artifact
cat <<MODULE_FILE #%Module1.0 ## $NAME modulefile ## proc ModulesHelp { } { puts stderr " This module does nothing but alert the user" puts stderr " that the [module-info name] module is not available" } preqreq("gsl","fftw/2.1.5","hdf5") module-whatis "$NAME $VERSION." setenv GSL_VERSION $VERSION setenv GSL_DIR /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSION prepend-path LD_LIBRARY_PATH $::env(GSL_DIR)/lib MODULE_FILE ) > modules/$VERSION cat <<MODULE_FILE #%Module1.0 ## $NAME modulefile ## proc ModulesHelp { } { puts stderr " This module does nothing but alert the user" puts stderr " that the [module-info name] module is not available" } preqreq("gsl","fftw/2.1.5","hdf5") module-whatis "$NAME $VERSION." setenv GSL_VERSION $VERSION setenv GSL_DIR /apprepo/$::env(SITE)/$::env(OS)/$::env(ARCH)/$NAME/$VERSION prepend-path LD_LIBRARY_PATH $::env(GSL_DIR)/lib MODULE_FILE ) > modules/$VERSION
Create the modulefile
Bruce Becker: Coordinator, SAGrid | bbecker@csir.co.za | http://www.sagrid.ac.za
So, it works ! … almost Next steps
- We have an open, collaborative, low-barrier platform for researchers
to bring applications to the grid
- Small technical tasks :
- Implement promoted builds mechanism to populate sagrid.ac.za CVMFS repo
- Implement SAML AuthN, integrate IdF
- Probes to check that CVMFS is mounted on sites (?)
- Operating in ”stealth mode” at the moment – not advertising, but open
to anyone who is interested to collect feedback
- Addressing specific user communities to test drive the system:
- Machine learning astro applications (rapid prototyping)
- Bioinformatics application suites (complex ecosystem)
- Present next phase of the project in November in Cape Town – move
to production