FAIR Sequencing Data Repository based on iRODS Felipe O. Gutierrez - - PowerPoint PPT Presentation

fair sequencing data repository based on irods
SMART_READER_LITE
LIVE PREVIEW

FAIR Sequencing Data Repository based on iRODS Felipe O. Gutierrez - - PowerPoint PPT Presentation

FAIR Sequencing Data Repository based on iRODS Felipe O. Gutierrez AMC - Academic Medical Center - Amsterdam, Netherlands A.C.Camargo Cancer Center - So Paulo, Brazil F. Oliveira Aldo Sjoerd Diogo A.H.C. van Silvia D. P.F.G. De J.T.


slide-1
SLIDE 1

Felipe O. Gutierrez

AMC - Academic Medical Center - Amsterdam, Netherlands A.C.Camargo Cancer Center - São Paulo, Brazil

FAIR Sequencing Data Repository based on iRODS

Silvia D. Olabarriaga

  • F. Oliveira

Gutierrez P.F.G. De Geest Diogo Ferreira Patrão A.H.C. van Kampen J.T. van den Berg Aldo Jongejan Sjoerd Repping

slide-2
SLIDE 2

Problem

  • Inadequate RDM (Research Data Management) solution for NGS data (Next

Generation Sequencing):

○ Individual storage and backup ○ Dispersed datasets ○ Disconnected from metadata ○ Not FAIR

2

slide-3
SLIDE 3

Considerations

Fit within organization

  • ICT culture
  • Research culture
  • Sustainability vision

Adhere to international community best practices Reuse and extend existing solutions

3 Freeman, 1983

slide-4
SLIDE 4

Fit into AMC Vision for RDM

Based on NFU Data4Lifesciences WP2

4

An NGS repository that is:

  • Part of an ecosystem
  • Controlled by AMC
  • Distributed
  • Scalable
  • FAIR compliant
  • Easy to use
slide-5
SLIDE 5

System Design

  • iRODS 4.1.10

○ Middleware ○ Data virtualization

  • Virtuoso 7.2

○ Triplestore ○ Supports ontologies

  • User interfaces:

○ Metalnx web ○ Davrods 4.1 ○ iCommands

5

slide-6
SLIDE 6

System Architecture

6

slide-7
SLIDE 7

Stewardship: Ontologies

  • EDAM

Ontology for bioinformatics

  • perations, types of data, data

identifiers, data formats, and topics

  • OMIABIS

Ontologized Minimum Information About Biobank data Sharing (MIABIS)

  • OBI

Ontology for Biomedical Investigations

  • EFO

Experimental Factor Ontology

7

slide-8
SLIDE 8

Workflow: Data Ingestion

8

slide-9
SLIDE 9

Workflow: (meta)data Registration

9

slide-10
SLIDE 10

Workflow: (meta)data Retrieval

10

slide-11
SLIDE 11

Access and Security

11

slide-12
SLIDE 12

12

Status

slide-13
SLIDE 13

Report file

13

slide-14
SLIDE 14

nmon read KB/s

14

slide-15
SLIDE 15

nmon write KB/s

15

slide-16
SLIDE 16

nmon IOPs

16

slide-17
SLIDE 17

Qualitative & Quantitative questions

  • (meta)data preparation? Clear, doable, easy, ...
  • (meta)data upload? Type, size, quantity, integrity, ...
  • Rule processing? Report file clear and easy, system delay feedback, ...
  • (meta)data retrieval? Findable, Accessible, Organized, Interoperable,

Reusable, ..

  • Concurrent users, variation on the number and size of files.

17

slide-18
SLIDE 18

Acknowledgements

KEBB:

  • Barbera van Schaik
  • Allard van Altena

ADICT: Hans van den Berg UvA ICTS: Joyce Nijkamp Medical Library: Lieuwe Kool Clinical Research Unit: Rudy Scholte Reproductive medicine: Sjoerd Repping Genetic Metabolic Diseases: Frédéric Vaz Immunogenomics: Niek de Vries