EGI-InSPIRE iRODS: Setup and Use of a National Data Management - - PowerPoint PPT Presentation

egi inspire irods setup and use of a national data
SMART_READER_LITE
LIVE PREVIEW

EGI-InSPIRE iRODS: Setup and Use of a National Data Management - - PowerPoint PPT Presentation

EGI-InSPIRE iRODS: Setup and Use of a National Data Management System in the French NGI Jerome PANSANEL & the FG-iRODS Team jerome.pansanel@iphc.cnrs.fr Hubert Curien Multidisciplinary Institute, Strasbourg, France 1 05/21/14 www.egi.eu


slide-1
SLIDE 1

www.egi.eu EGI-InSPIRE RI-261323

EGI-InSPIRE

www.egi.eu EGI-InSPIRE RI-261323

iRODS: Setup and Use of a National Data Management System in the French NGI

Jerome PANSANEL & the FG-iRODS Team

jerome.pansanel@iphc.cnrs.fr Hubert Curien Multidisciplinary Institute, Strasbourg, France

05/21/14 1

slide-2
SLIDE 2

www.egi.eu EGI-InSPIRE RI-261323

iRODS

slide-3
SLIDE 3

www.egi.eu EGI-InSPIRE RI-261323

Scientific Data Today

  • Large amounts of data are collected by scientist and have to be

analyzed

  • Data are distributed across the world and have to be shared between

the different partners

  • Each data center has it own storage infrastructure (most of the time

based on heterogeneous systems)

  • The physical organization of the data should be transparent to the users
  • Data should be easy to manage
  • Data should be easily retrievable (for example with metadata search)
  • Data access has to be protected (data replication, specific ACLs, ...)
  • Data has to be available from anywhere

→ How to solve these challenges?

05/21/14 3

slide-4
SLIDE 4

www.egi.eu EGI-InSPIRE RI-261323

IRODS: iRule Oriented Data Systems

05/21/14 4

iRODS

Disk MS Disk

  • Project started in 2006 (based on SRB)
  • Release under an Open Source license (BSD)
  • Developed by the DICE group and several

collaborators

  • Rule Engine applies user-defined policies and rules
  • Integrate a descriptive metadata system to manipulate

data

  • Data collections manageable over several sites and

heterogeneous hardware

  • Logical organization of files is independent of its

physical implementation

  • Enforcement of data consistency and homogeneity

http://irods.org/

slide-5
SLIDE 5

www.egi.eu EGI-InSPIRE RI-261323

  • Objectives:

 Guide the continuous development of iRODS  Obtain funding to support this development  Provide a fully tested software by using complementary process of

testing, packaging, and expertise developed at RENCI

 Evangelize iRODS among potential users

  • For more informations:

 → http://irods-consortium.org/

IRODS Consortium

5

slide-6
SLIDE 6

www.egi.eu EGI-InSPIRE RI-261323

Fact Sheet

05/21/14 6

  • Usable from personal laptop to institutional repositories to international

projects

  • Thousands of users
  • Billions of files and several petabytes of data
  • Extensive documentation:

https://wiki.irods.org/index.php/Documentation

  • Binary packages available
slide-7
SLIDE 7

www.egi.eu EGI-InSPIRE RI-261323

Under the Hood

05/21/14 7

Desktop Grid Cloud

  • An iRODS system (a “zone”) is based on

three main elements:

 Database  Rule engine  Resources

  • Data servers can be spread

geographically within one zone

  • Possibility to have different zones

interconnected

  • Available user interfaces : GUIs, CLIs and

APIs (C, Java, Python, …)

  • Automatic and manual data processing

possible through the rule engine

slide-8
SLIDE 8

www.egi.eu EGI-InSPIRE RI-261323

User Interfaces

05/21/14 8

[user ~]$ ils /frgrid/home/UNECOLLAB/RAWDATA: C- /frgrid/home/UNECOLLAB/RAWDATA/CALIBRATION C- /frgrid/home/UNECOLLAB/RAWDATA/BE C- /frgrid/home/UNECOLLAB/RAWDATA/ZR [user ~]$ ils -l BE/ /frgrid/home/UNECOLLAB/RAWDATA/BE:

  • wner 0 ps-lpsc-lpscdata7-fr 80072192 2013-11-11.16:21 & run0977156_123.dst
  • wner 0 ps-lpsc-lpscdata7-fr 1748189011 2013-11-11.15:48 & run0977156_123.raw
  • wner 1 iphcCache1 1748189011 2013-11-11.16:42 & run0977156_123.raw
  • wner 0 ps-lpsc-lpscdata7-fr 80072192 2013-11-11.16:21 & run0977234_673.dst

...

slide-9
SLIDE 9

www.egi.eu EGI-InSPIRE RI-261323 [user ~]$ imeta add -d run0977156_123.raw length 10 cm [user ~]$ imeta add -d run0977156_123.raw hall east [user ~]$ imeta ls -d run0977156_123.raw AVUs defined for dataObj run0977156_123.raw: attribute: length value: 10 units: cm

  • attribute: hall

value: east units: [user ~]$ imeta -d qu hall east collection: /frgrid/home/UNECOLLAB/RAWDATA/ZR dataObj: run0977156_123.raw

  • collection: /frgrid/home/UNECOLLAB/RAWDATA/ZR

dataObj: run0817773_556.raw

Metadata

  • Associated with a file, a collection, a resource or a user
  • Based on a triplet: name, value and unit

9

slide-10
SLIDE 10

www.egi.eu EGI-InSPIRE RI-261323

IRODS Rule Sample

05/21/14 10

  • Constitution of a rule:

actionDef | condition | workflow-chain |recovery-chain

  • Example:

acPostProcForPut { ON($objPath like "/tempZone/home/rods/monitored/\*") { msiSplitPath($objPath, *collection, *fileName); msiCollRsync(*collection, "/targetZone/home/rods/safe- copy", "demoResc", "IRODS_TO_IRODS", *Status); writeLine("serverLog", "Rsync of *collection to its safe copy done (status=*Status) Triggered by creation of $objPath); } }

slide-11
SLIDE 11

www.egi.eu EGI-InSPIRE RI-261323

Genomic Data Management with iRODS

05/21/14 11

1G.-T. Chiang, P. Clapham, G. Qi, K. Sale & G. Coates: Implementing a genomics data

management system using iRODS in the Wellcome Trust Sanger Institute. BMC Bioinformatics 2011, 12, 361.

WTSI Use Case:1

  • Managing and accessing sequencing Binary Alignment/Map (BAM) files
  • 500 TB SAN Storage
  • Integrated in the sequencing pipeline
  • Fine-grained access control
  • Data replication
  • Metadata on alignment are

automatically added

  • Data federation with other research

institutes

slide-12
SLIDE 12

www.egi.eu EGI-InSPIRE RI-261323

Other Examples

05/21/14 12

  • Astrophysics: Auger supernova search
  • Atmospheric science: NASA Langley Atmospheric Sciences Center
  • Biology: Phylogenetics at CC IN2P3
  • Climate: NOAA National Climatic Data Center
  • Cognitive Science: Temporal Dynamics of Learning Center
  • Computer Science: GENI experimental network
  • Cosmic Ray: AMS experiment on the International Space Station
  • Dark Matter Physics: Edelweiss II
  • Digital Library French National Library, Texas Digital Libraries
  • Earth Science: NASA Center for Climate Simulations, Vhub - vulcanism
  • Ecology: CEED Caveat Emptor Ecological Data
  • Engineering: CIBER-U
  • High Energy Physics: BaBar
  • Hydrology: Institute for the Environment, UNC-CH; Hydroshare
  • Genomics: Broad Institute, Wellcome Trust Sanger Institute, NGS
  • Indexing: Cheshire
  • Institutional repository: Carolina Digital Repository
  • Medicine: Sick Kids Hospital
  • Neuroscience: International Neuroinformatics Coordinating Facility
  • Neutrino Physics: T2K and dChooz neutrino experiments
  • Oceanography: Ocean Observatories Initiative
  • Optical Astronomy: National Optical Astronomy Observatory
  • Particle Physics: Indra
slide-13
SLIDE 13

www.egi.eu EGI-InSPIRE RI-261323

FG-IRODS

13

slide-14
SLIDE 14

www.egi.eu EGI-InSPIRE RI-261323

  • Coordinated by France Grilles
  • A single production instance:

 Federated resources and workforce  Hosting users from any scientific domain  Design for small and medium projects  Open to new resource providers  User support and training

FG-iRODS Federated Infrastructure

05/21/14 14

40 TB 20 TB iCAT 20 TB replicated

14

slide-15
SLIDE 15

www.egi.eu EGI-InSPIRE RI-261323

French iRODS Federated Infrastructure

05/21/14 15

Collaboration:

  • National instance coordinated by the French NGI "France Grilles"
  • Project started in 2013
  • Authenticate by identifiers or certificates
  • Administrated collectively by four partners
  • Centralised iRODS rule engine and catalogue to enforce coherent and

homogeneous data management

  • Resources distributed in different locations for high data availability
slide-16
SLIDE 16

www.egi.eu EGI-InSPIRE RI-261323

FG-IRODS Team

05/21/14 16

  • Yonny CARDENAS (CC-IN2P3, Lyon)
  • Jean-Yves NIEF (CC-IN2P3, Lyon)
  • Gilles MATHIEU (France Grilles, Lyon)
  • Geneviève ROMIER (France Grilles, Lyon)
  • Jerome PANSANEL (IPHC, Strasbourg)
  • Catherine BISCARAT (LSPC, Grenoble)
  • David BENABEN (CBIB & INRA, Bordeaux)
  • Pierre GAY (MCIA, Bordeaux)
  • Benoît HIROUX (MCIA, Bordeaux)
slide-17
SLIDE 17

www.egi.eu EGI-InSPIRE RI-261323

Achievements

05/21/14 17

  • Federated set of resources for a total of 80 TB
  • Real synergy between the administrators
  • Reliable and highly available storage
  • Usage policies is published
  • First training has been performed (Clermont-Ferrand, February 2014)
  • First users are currently hosted (proteomics and biological data)
  • IRODS clients installed on all grid sites supporting the france-grilles VO
  • IRODS packaging with GSI support (deb, rpm)
  • WEB Interface available
  • VM appliance provided to access the computing Grid and iRODS
slide-18
SLIDE 18

www.egi.eu EGI-InSPIRE RI-261323

Perspectives

05/21/14 18

→ http://www.france-grilles.fr/Pour-les-chercheurs-ou- ingenieurs#iRODS

  • Extend the storage pool with new resource providers
  • Welcome more new users
  • Deploy a monitoring solution to ensure infrastructure reliability
  • Test the S3 plugin
  • Find new financial resources to ensure the sustainability of the

infrastructure

  • Share expertise regarding data management and user support with other

groups