Global Software Distribution with CernVM-FS Jakob Blomer CERN 2016 - - PowerPoint PPT Presentation

global software distribution with cernvm fs
SMART_READER_LITE
LIVE PREVIEW

Global Software Distribution with CernVM-FS Jakob Blomer CERN 2016 - - PowerPoint PPT Presentation

Global Software Distribution with CernVM-FS Jakob Blomer CERN 2016 CCL Workshop on Scalable Computing October 19th, 2016 jblomer@cern.ch CernVM-FS 1 / 15 The Anatomy of a Scientific Software Stack (In High Energy Physics) jblomer@cern.ch


slide-1
SLIDE 1

Global Software Distribution with CernVM-FS

Jakob Blomer CERN 2016 CCL Workshop on Scalable Computing October 19th, 2016

jblomer@cern.ch CernVM-FS 1 / 15

slide-2
SLIDE 2

The Anatomy of a Scientific Software Stack

(In High Energy Physics)

jblomer@cern.ch CernVM-FS 2 / 15

slide-3
SLIDE 3

The Anatomy of a Scientific Software Stack

(In High Energy Physics)

CentOS 6 and Utilities O(10) Libraries Simulation and I/O Libraries ROOT, Geant4, MC-XYZ CMS Software Framework O(1000) C++ Classes My Analysis Code < 10 Python Classes

jblomer@cern.ch CernVM-FS 2 / 15

slide-4
SLIDE 4

The Anatomy of a Scientific Software Stack

(In High Energy Physics)

CentOS 6 and Utilities O(10) Libraries Simulation and I/O Libraries ROOT, Geant4, MC-XYZ CMS Software Framework O(1000) C++ Classes My Analysis Code < 10 Python Classes

How to install on. . .

  • my laptop:

compile into /opt ∼ 1 week

  • my local cluster:

ask sys-admin to install in /nfs/software > 1 week

  • someone else’s cluster: ?

jblomer@cern.ch CernVM-FS 2 / 15

slide-5
SLIDE 5

The Anatomy of a Scientific Software Stack

(In High Energy Physics)

CentOS 6 and Utilities O(10) Libraries Simulation and I/O Libraries ROOT, Geant4, MC-XYZ CMS Software Framework O(1000) C++ Classes My Analysis Code < 10 Python Classes stable changing

How to install (again) on. . .

  • my laptop:

compile into /opt ∼ 1 week

  • my local cluster:

ask sys-admin to install in /nfs/software > 1 week

  • someone else’s cluster: ?

jblomer@cern.ch CernVM-FS 2 / 15

slide-6
SLIDE 6

Beyond the Local Cluster

World Wide LHC Computing Grid

  • ∼ 200 sites: from 100 to 100 000 cores
  • Different countries, institutions, batch schedulers, OSs, . . .
  • Augmented by clouds, supercomputers, LHC@Home

jblomer@cern.ch CernVM-FS 3 / 15

slide-7
SLIDE 7

What about Docker?

Example: in Docker $ docker pull r-base − → 1 GB image $ docker run -it r-base $ ... (fitting tutorial) − → only 30 MB used

Container (łAppž) Linux Libs . . .

It’s hard to scale Docker: iPhone App Docker Image 20 MB 1 GB changes every month changes twice a week phones update staggered servers update synchronized − → Your preferred cluster or supercomputer might not run Docker

jblomer@cern.ch CernVM-FS 4 / 15

slide-8
SLIDE 8

A File System for Software Distribution

rAA Basic System Utilities Software FS OS Kernel Global HTTP Cache Hierarchy Worker Node’s Memory Buffer Megabytes Worker Node’s Disk Cache Gigabytes Central Web Server Entire Software Stack Terabytes Pioneered by CCL’s GROW-FS for CDF at Tevatron Refined in CernVM-FS, in production for CERN’s LHC and other experiments

1 Single point of publishing 2 HTTP transport, access and caching on demand 3 Important for scaling: bulk meta-data download (not shown)

jblomer@cern.ch CernVM-FS 5 / 15

slide-9
SLIDE 9

One More Ingredient: Content-Addressable Storage

Read/Write File System Content-Addressed Objects Read-Only File System Transformation HTTP Transport Caching & Replication Software Publisher / Master Source Worker Nodes

Two independent issues

1 How to mount a file system (on someone else’s computer)? 2 How to distribute immutable, independent objects?

jblomer@cern.ch CernVM-FS 6 / 15

slide-10
SLIDE 10

Content-Addressable Storage: Data Structures

Repository

/cvmfs/icecube.opensciencegrid.org amd64-gcc6.0 4.2.0 ChangeLog . . . 806fbb67373e9...

Object Store File catalogs Compression, SHA-1

Object Store

  • Compressed files and chunks
  • De-duplicated

File Catalog

  • Directory structure, symlinks
  • Content hashes of regular files
  • Digitally signed

⇒ integrity, authenticity

  • Time to live
  • Partitioned / Merkle hashes

(possibility of sub catalogs)

⇒ Immutable files, trivial to check for corruption, versioning

jblomer@cern.ch CernVM-FS 7 / 15

slide-11
SLIDE 11

Transactional Publish Interface

CernVM-FS Read-Only Read/Write Scratch Area Union File System AUFS or OverlayFS Read/Write Interface File System, S3

Publishing New Content

[ ~ ]# cvmfs_server transaction icecube.opensciencegrid.org [ ~ ]# make DESTDIR=/cvmfs/opensciencgrid.org/amd64-gcc6.0/4.2.0 install [ ~ ]# cvmfs_server publish icecube.opensciencegrid.org

Uses cvmfs-server tools and an Apache web server

jblomer@cern.ch CernVM-FS 8 / 15

slide-12
SLIDE 12

Transactional Publish Interface

CernVM-FS Read-Only Read/Write Scratch Area Union File System AUFS or OverlayFS Read/Write Interface File System, S3

Publishing New Content

[ ~ ]# cvmfs_server transaction icecube.opensciencegrid.org [ ~ ]# make DESTDIR=/cvmfs/opensciencgrid.org/amd64-gcc6.0/4.2.0 install [ ~ ]# cvmfs_server publish icecube.opensciencegrid.org

Uses cvmfs-server tools and an Apache web server

Reproducible: as in git, you can always come back to this state

jblomer@cern.ch CernVM-FS 8 / 15

slide-13
SLIDE 13

Content Distribution over the Web

Server side: stateless services Data Center Worker Nodes Caching Proxy

O(100) nodes / server

Web Servery

O(10) DCs / server

HTTP HTTP

jblomer@cern.ch CernVM-FS 9 / 15

slide-14
SLIDE 14

Content Distribution over the Web

Server side: stateless services Data Center Worker Nodes Load Balancing

O(100) nodes / server

Web Servery

O(10) DCs / server

HTTP HTTP HTTP HTTP

jblomer@cern.ch CernVM-FS 9 / 15

slide-15
SLIDE 15

Content Distribution over the Web

Server side: stateless services Data Center Worker Nodes Caching Proxies

O(100) nodes / server

Web Servery

O(10) DCs / server

HTTP Failover HTTP

jblomer@cern.ch CernVM-FS 9 / 15

slide-16
SLIDE 16

Content Distribution over the Web

Server side: stateless services Data Center Worker Nodes Caching Proxies

O(100) nodes / server

Mirror Serversy

O(10) DCs / server

HTTP Failover Geo-IP

jblomer@cern.ch CernVM-FS 9 / 15

slide-17
SLIDE 17

Content Distribution over the Web

Server side: stateless services Data Center Worker Nodes Caching Proxies

O(100) nodes / server

Mirror Serversy

O(10) DCs / server

HTTP Failover

jblomer@cern.ch CernVM-FS 9 / 15

slide-18
SLIDE 18

Content Distribution over the Web

Server side: stateless services Data Center Worker Nodes

Prefetched Cache

Caching Proxies

O(100) nodes / server

Mirror Serversy

O(10) DCs / server jblomer@cern.ch CernVM-FS 9 / 15

slide-19
SLIDE 19

Mounting the File System Client: Fuse

  • pen(/ChangeLog)

glibc

Available for RHEL, Ubuntu, OS X; Intel, ARM, Power Works on most grids and virtual machines (cloud)

VFS inode cache dentry cache ext3 NFS

. . .

Fuse libfuse CernVM-FS user space kernel space syscall /dev/fuse SHA1 file descr. fd HTTP GET inflate+verify

jblomer@cern.ch CernVM-FS 10 / 15

slide-20
SLIDE 20

Mounting the File System Client: Parrot

Parrot Sandbox

Available for Linux / Intel Works on supercomputers, opportunistic clusters, in containers

  • pen(/ChangeLog)

glibc VFS inode cache dentry cache ext3 NFS

. . .

Fuse libparrot libcvmfs user space kernel space syscall / Parrot SHA1 file descr. fd HTTP GET inflate+verify

jblomer@cern.ch CernVM-FS 11 / 15

slide-21
SLIDE 21

Scale of Deployment

  • > 350 million őles under

management

  • > 50 repositories
  • Installation service by OSG

and EGI

jblomer@cern.ch CernVM-FS 12 / 15

slide-22
SLIDE 22

Docker Integration

Docker Registry Docker Daemon pull & push containers Funded Project CernVM File System Improved Docker Daemon file-based transfer

Under Construction!

jblomer@cern.ch CernVM-FS 13 / 15

slide-23
SLIDE 23

Client Cache Manager Plugins

cvmfs/fuse libcvmfs/parrot Cache Manager (Key-Value Store) C library Memory, Ceph, RAMCloud, . . . 3rd party plugins Transport Channel (TCP, Socket, . . . )

Draft C Interface

cvmfs_add_refcount ( s t r u c t hash

  • bject_id ,

i n t change_by ) ; cvmfs_pread ( s t r u c t hash

  • bject_id ,

i n t

  • f f s e t ,

i n t s i z e , void ∗ b u f f e r ) ; // T r a n s a c t i o n a l w r i t i n g i n f i x e d −s i z e d chunks cvmfs_start_txn ( s t r u c t hash

  • bject_id ,

i n t txn_id , s t r u c t i n f o

  • b j e c t _ i n f o ) ;

cvmfs_write_txn ( i n t txn_id , void ∗ b u f f e r , i n t s i z e ) ; cvmfs_abort_txn ( i n t txn_id ) ; cvmfs_commit_txn ( i n t txn_id ) ;

Under Construction!

jblomer@cern.ch CernVM-FS 14 / 15

slide-24
SLIDE 24

Summary

CernVM-FS

  • Global, HTTP-based file system

for software distribution

  • Works great with Parrot
  • Optimized for small files,

heavy meta-data workload

  • Open source (BSD),

used beyond high-energy physics Use Cases

  • Scientific software
  • Distribution of static data
  • e. g. conditions, calibration
  • VM / container distribution
  • cf. CernVM
  • Building block for long-term

data preservation Source code: https://github.com/cvmfs/cvmfs Downloads: https://cernvm.cern.ch/portal/filesystem/downloads Documentation: https://cvmfs.readthedocs.org Mailing list: cvmfs-talk@cern.ch

jblomer@cern.ch CernVM-FS 15 / 15

slide-25
SLIDE 25

Backup Slides

jblomer@cern.ch CernVM-FS 16 / 15

slide-26
SLIDE 26

CernVM-FS Client Tools

Fuse Module

  • Normal namespace:

/cvmfs/<repository>

  • e. g. /cvmfs/atlas.cern.ch
  • Private mount as a user possible
  • One process per fuse module +

watchdog process

  • Cache on local disk
  • Cache LRU managed
  • NFS Export Mode
  • Hotpach functionality

cvmfs_config reload Parrot

  • Built in by default

Mount helpers

  • Setup environment (number of file

descriptors, access rights, . . . )

  • Used by autofs on /cvmfs
  • Used by /etc/fstab or mount as

root

mount -t cvmfs atlas.cern.ch /cvmfs/atlas.cern.ch

Diagnostics

  • Nagios check available
  • cvmfs_config probe
  • cvmfs_config chksetup
  • cvmfs_fsck
  • cvmfs_talk, connect to running

instance

jblomer@cern.ch CernVM-FS 17 / 15

slide-27
SLIDE 27

Experiment Software from a File System Viewpoint

1 5 10 15 Statistics over 2 Years File System Entries [×106] Files

Software Directory Tree

atlas.cern.ch repo software x86_64-gcc43 17.1.0 17.2.0 . . .

jblomer@cern.ch CernVM-FS 18 / 15

slide-28
SLIDE 28

Experiment Software from a File System Viewpoint

1 5 10 15 Statistics over 2 Years File System Entries [×106] File Kernel Duplicates

Software Directory Tree

atlas.cern.ch repo software x86_64-gcc43 17.1.0 17.2.0 . . .

jblomer@cern.ch CernVM-FS 18 / 15

slide-29
SLIDE 29

Experiment Software from a File System Viewpoint

1 5 10 15 Statistics over 2 Years File System Entries [×106] File Kernel Duplicates

Software Directory Tree

atlas.cern.ch repo software x86_64-gcc43 17.1.0 17.2.0 . . .

Between consecutive software versions: only ≈ 15 % new files

jblomer@cern.ch CernVM-FS 18 / 15

slide-30
SLIDE 30

Experiment Software from a File System Viewpoint

1 5 10 15 Statistics over 2 Years File System Entries [×106] File Kernel Duplicates Directories Symlinks

Software Directory Tree

atlas.cern.ch repo software x86_64-gcc43 17.1.0 17.2.0 . . .

Fine-grained software structure (Conway’s law) Between consecutive software versions: only ≈ 15 % new files

jblomer@cern.ch CernVM-FS 18 / 15

slide-31
SLIDE 31

Directory Organization

10 20 30 40 50 5 10 15 20 Fraction of Files [%] Directory Depth Athena 17.0.1 CMSSW 4.2.4 LCG Externals R60

Typical (non-LHC) software: majority of files in directory level ≤ 5

jblomer@cern.ch CernVM-FS 19 / 15

slide-32
SLIDE 32

Cumulative File Size Distribution

24 26 28 210 212 214 216 218 10 20 30 40 50 60 70 80 90 100 Dateigröße [B] Perzentil ATLAS LHCb ALICE CMS UNIX Web Server Requested

  • cf. Tanenbaum et al. 2006 for łUnixž and łWebserverž

Good compression rates (factor 2–3)

jblomer@cern.ch CernVM-FS 20 / 15

slide-33
SLIDE 33

Runtime Behavior

Working Set

  • ≈10 % of all available files are requested at runtime
  • Median of file sizes: < 4 kB

Flash Crowd Effect

  • Up to 500 kHz

meta data request rate

  • Up to 1 kHz file
  • pen request rate

/share /share /share

dDoS Shared Software Area

Software

  • • •

jblomer@cern.ch CernVM-FS 21 / 15

slide-34
SLIDE 34

Software vs. Data

Based on ATLAS Figures 2012 Software Data POSIX Interface put, get, seek, streaming File dependencies Independent files 107 objects 108 objects 1012 B volume 1016 B volume Whole files File chunks Absolute paths Any mountpoint Open source Confidential WORM (“write-once-read-many”) Versioned

jblomer@cern.ch CernVM-FS 22 / 15