Dune computing Workshop
Cédric Serfon
Cedric.Serfon@cern.ch On behalf of the Rucio team
Dune computing Workshop Cdric Serfon Cedric.Serfon@cern.ch On - - PowerPoint PPT Presentation
Dune computing Workshop Cdric Serfon Cedric.Serfon@cern.ch On behalf of the Rucio team Rucio in a nutshell (1) Rucio is a Distributed Data Management system built initially for the ATLAS experiment It allows to federate the data
Cedric.Serfon@cern.ch On behalf of the Rucio team
2018-10-08 Rucio - Dune Workshop
experiment
○ It allows to federate the data located on different heterogenous sites (small/big, grid/cloud, Tape/Disk) under a common namespace and hide the complexity of the underlying storage layer ○ It provides tools to manage efficiently the data according to the policies defined by the collaboration,
○ It provides tools for the the end-users to interact with the data
○ Designed from experience from the previous ATLAS data management system DQ2 ○ Integrate new features and technologies
2018-10-08 Rucio - Dune Workshop
○ Used by ATLAS, AMS and Xenon1T . CMS just choose to move to Rucio for LHC Run3 ○ Being evaluated by other small/medium/big HEP/Astro experiments:
more collaboration/scientific communities
○ Attended by more than 80 people ○ A new workshop will be organized in spring (3 instituts already candidated to host the meeting)
2018-10-08 Rucio - Dune Workshop
○ File and dataset catalog (logical definition and replicas) ○ Transfers between sites and staging capabilities ○ User Interface and Command Line Interface to allow user to download/upload/transfer their data ○ Extensive monitoring ○ Powerful policy engines (rules and subscriptions) ○ Bad file identification and recovery ○ Dataset popularity based replication ○ …
○ Already supporting PanDA (ATLAS WFMS) ○ Possibilities of integration with other like Dirac More advanced features
2018-10-08 Rucio - Dune Workshop
○ Data coming from the detector ○ Monte Carlo data ○ User data
○ Ensuring the replication of files according to the replication policy specified by ATLAS ○ Replicate the data for other applications (e.g. panda) and for the end-users ○ Ensure file recovery ○ Staging data from TAPE ○ And plenty other things
2018-10-08 Rucio - Dune Workshop
○ More than 1B files, ~0.4 EB ○ Up to 4M files/2.5 PB transferred per day ○ More than 1000 active users
2018-10-08 Rucio - Dune Workshop
group
different credentials (X509, kerberos token, userpass, ssh) to connect to Rucio
2018-10-08 Rucio - Dune Workshop
Data IDentifier (DID)
○ Files ○ Datasets : collection of files ○ Container : collection of dataset and/or container
○ A scope : 25 characters to partition your data, e.g. data17, mc17 ○ A name (up to 255 character)
○ Bytes ○ Checksum (for files) ○ Number of events ○ Datatype
2018-10-08 Rucio - Dune Workshop
○ No software needed to run at the site ○ RSE names are arbitrary (e.g., "CERN-PROD_DATADISK", "AWS_REGION_USEAST", … ) ○ Usually one RSE per site and storage data class
○ protocols, hostnames, ports, prefixes, paths, implementations, … ○ data access priorities can be set (e.g., to prefer a protocol for LAN access)
○ Key/Value pairs (e.g., country=UK, type=TAPE, support=brian@unl.edu) ○ You can use RSE expressions to describe a list of RSEs (e.g. country=UK&type=TAPE)
2018-10-08 Rucio - Dune Workshop
○ Describe how a Data IDentifier (DID) must be replicated on a list of Rucio Storage Elements (RSE) ○ e.g. : Make 2 replicas of dataset data15_13TeV:mydatasetname on tier=1&disk=1 ○ Rucio will create the minimum number of replicas to optimise storage space, minimise the number of transfers and automate data distribution
○ Replication policies based on Data IDentifiers metadata, for Data IDentifiers that will be produced in the future ○ e.g. : Make 2 replicas of datasets with scope=data15_13TeV and datatype=AOD on tier=1&disk=1
2018-10-08 Rucio - Dune Workshop
○ Support for generic metadata added this summer ■ I.e. now Rucio can support whatever key:value pairs on DIDs. This was a requests of many collaborations during the 1st Rucio workshop ○ Support for archive files ■ This new feature allow to registers the constituents of an archive file ■ Rucio can extract automatically the constituent of the archive
○ New authentication/authorization based on Macaroon/SciTokens ○ Support of different QoS for the storages
2018-10-08 Rucio - Dune Workshop
○ Right now the core development team is composed from people from ATLAS and CMS
collaboration O(1 EB)
system and hope that more collaborations will follow this path
2017-01-01
13
http://rucio.cern.ch https://rucio.readthedocs.io https://github.com/rucio/ https://travis-ci.org/rucio/ https://hub.docker.com/r/rucio/ https://rucio.slack.com/messages/#support/ rucio-dev@cern.ch Website Documentation Repository Continuous Integration Images Online support Developer contact