Torrent-based software distribution Costin Grigoras Pablo Saiz - - PowerPoint PPT Presentation

torrent based software distribution
SMART_READER_LITE
LIVE PREVIEW

Torrent-based software distribution Costin Grigoras Pablo Saiz - - PowerPoint PPT Presentation

Torrent-based software distribution Costin Grigoras Pablo Saiz ALICE Offline Week 24.06.2009 Current way of distributing sw SLC4 SLC4 SLC5 SLC5 32bit 64bit 32bit 64bit Build servers SLC4 Mac Mac Ubuntu AliRoot & deps


slide-1
SLIDE 1

Torrent-based software distribution

Costin Grigoras Pablo Saiz

ALICE Offline Week – 24.06.2009

slide-2
SLIDE 2

Current way of distributing sw

Build servers AliRoot & deps SLC4 32bit SLC4 64bit SLC5 32bit SLC5 64bit Ubuntu 64bit SLC4 Itanium Mac 32bit Mac 64bit AliEn ALICE::CERN::SE Catalogue Grid Site X VoBox PackMan Shared software area NFS/AFS/... Worker nodes

slide-3
SLIDE 3

Current way of distributing sw

Advantages

 A single service/site

manages the installation of required packages Disadvantages

 Shared software area

is a single point of failure / bottleneck

 Difficult to update

packages keeping the version number

 Need to keep a short

list of active software packages

slide-4
SLIDE 4

How can we avoid using a shared software area ?

 Worker nodes are independent

 Self-consistent software packages are required

 No site-local software repository

 Avoid overloading central software repositories

 Would be nice to be able to quickly update

software packages if needed

 We are trying to use BitTorrent technology to

solve all the above

slide-5
SLIDE 5

Preparing for torrent

package.tar.bz2 package.tar.bz2.torrent (tens of KB) Metadata info of the original file:

  • SHA1 hashes of chunks
  • SHA1 hash of the entire file

* uniquely identifies the file

  • Tracker location (entry point)

Chunks of equal size

slide-6
SLIDE 6

Data flow in torrent networks

Tracker Seeder Seeder Client Client Clients that have the complete file and serve it Discovery service: keeps track of who has which files/chunks. HTTP-based protocol Are in the process of downloading the file. Cooperate to download faster.

slide-7
SLIDE 7

Implementation in AliEn

Build servers AliRoot & deps SLC4 32bit SLC4 64bit SLC5 32bit SLC5 64bit Ubuntu 64bit SLC4 Itanium Mac 32bit Mac 64bit AliEn http://alitorrent.cern.ch Seeder alitorrent:8092 Catalogue torrent://... Grid Site X VoBox Worker nodes Tracker alitorrent:8088

slide-8
SLIDE 8

Implementation in AliEn

 Worker nodes keep seeding the packages that

they have downloaded

 Other worker nodes will fetch the content mostly

from local nodes

 Worker nodes from site A are usually firewalled

from site B, so no inter-site traffic

 If initial download is not possible via torrent, fall

back to wget and then seed the fetched files

 Multiple versions of the same file can co-exist

since they will have different hash codes; old

  • nes will be graciously phased out.
slide-9
SLIDE 9

Current status

 AliEn itself is packaged in a small (35MB) archive  AliRoot, Root & deps. packaged in single archives:

  • max. 300MB/job

 Subatech is used as testbed

 LDAP flag to switch modes:

name=Subatech-CREAM,ou=CE,ou=Services,ou=Subatech,ou=Sites,o=alice,dc=cern,dc=ch

installMethod=Torrent

 Production jobs work fine  Analysis jobs fail to load a particular library; most

probably a configuration issue that is currently tracked

 You can download precompiled packages from

http://alitorrent.cern.ch/

slide-10
SLIDE 10

Future plans

 Full-scale testing of the solution  Evaluate the need for caching

 On worker nodes, as files  On VoBox, as seeder  Regional seeders  All these would require managers

 Try to use the solution for distributing data files

  • r pre-compiled PAR files

 Latest version would be fetched at every execution,

no cleanup required for previous ones