N EUROIMAGING R ESEARCH D ATA L IFE - CYCLE M ANAGEMENT Hurng-Chun - - PowerPoint PPT Presentation

n euroimaging r esearch d ata l ife
SMART_READER_LITE
LIVE PREVIEW

N EUROIMAGING R ESEARCH D ATA L IFE - CYCLE M ANAGEMENT Hurng-Chun - - PowerPoint PPT Presentation

N EUROIMAGING R ESEARCH D ATA L IFE - CYCLE M ANAGEMENT Hurng-Chun (Hong) Lee, Robert Oostenveld, Erik van den Boogert, Eric Maris Outlines Lifecycle RDM: objectives and challenges The method the RDM protocol Donders Research Data


slide-1
SLIDE 1

NEUROIMAGING RESEARCH DATA LIFE-

CYCLE MANAGEMENT

Hurng-Chun (Hong) Lee, Robert Oostenveld, Erik van den Boogert, Eric Maris

slide-2
SLIDE 2

Outlines

  • Lifecycle RDM: objectives and challenges
  • The method
  • the RDM protocol
  • Donders Research Data Repository (DRDR) -

usage of iRODS

  • Strength and weakness of DRDR
  • Future focuses

2

slide-3
SLIDE 3

Lifecycle RDM

  • RDM spans the entire

research lifecycle

  • Objectives:
  • long-term data preservation
  • scientific-process

documentation

  • data publication

conception of research data acquisition data analysis publication

3

slide-4
SLIDE 4

Challenges

  • large institute with heterogeneous scientific-administrative workflows
  • 600 researchers in PI groups ➞ more than 150 projects per year
  • 3 centres with 4 administrative domains
  • data complexity:
  • text, audio/video, imaging or signal data, etc.
  • sensitive data
  • size ranges from a few large (>2GB) files to a huge amount of small

files (<1MB)

  • user expectation

4

slide-5
SLIDE 5

The RDM protocol

data acquisition data analysis data publication conception of research

5

* Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

slide-6
SLIDE 6

The RDM protocol

data acquisition data analysis data publication conception of research

research documentation collection data sharing collection collection: a container of

✓ data (files/folders) ✓ metadata

has a “state” attribute data acquisition collection 5

* Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

slide-7
SLIDE 7

The RDM protocol

data acquisition data analysis data publication conception of research

research documentation collection data sharing collection collection: a container of

✓ data (files/folders) ✓ metadata

has a “state” attribute data acquisition collection PID 5

* Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

slide-8
SLIDE 8

The RDM protocol

data acquisition data analysis data publication conception of research

research documentation collection data sharing collection research administrator (RA) manager contributor viewer collection: a container of

✓ data (files/folders) ✓ metadata

has a “state” attribute data acquisition collection PID 5

* Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

slide-9
SLIDE 9

The RDM protocol

data acquisition data analysis data publication conception of research

research documentation collection data sharing collection research administrator (RA) manager contributor viewer collection: a container of

✓ data (files/folders) ✓ metadata

has a “state” attribute

w

  • r

k fl

  • w

r e s p

  • n

s i b i l i t y e l i g i b i l i t y

data acquisition collection PID 5

* Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

slide-10
SLIDE 10

The RDM protocol

Organisational Unit (OU)

data acquisition data analysis data publication conception of research

research documentation collection data sharing collection research administrator (RA) manager contributor viewer collection: a container of

✓ data (files/folders) ✓ metadata

has a “state” attribute

w

  • r

k fl

  • w

r e s p

  • n

s i b i l i t y e l i g i b i l i t y

data acquisition collection PID 5

* Box icons are made by Roundicons and Pixel Buddha from www.flaticon.com are licensed by CC 3.0 BY.

slide-11
SLIDE 11

The data repository

  • a iRODS-based ICT system implementing the workflow defined by the protocol
  • a single data-management system enabling internal collaboration, and

external data sharing

storage system data management middleware interfaces WebDAV Stager management portal iRODS management rules ELK stack

✓file-based system ✓single and uniform namespace ✓access-controlled metadata management ✓authentication via trusted identity providers ✓role-based authorisation ✓data replication for disaster recovery ✓workflow automation and policy enforcement

6

slide-12
SLIDE 12

Storage resources

vault_ou1_1 vault_ou1_2 vault_ou2_1 vault_ou2_2 resc_ou1 resc_ou2 vault_nl_1 resc_nl

NFS export NFS export

Location A (first replica) Location B (second replica) storage

data of collections of OU1 data of collections of OU2 asynchronous data replication

iRODS resources

quota quota quota quota

filesystem

dataflow controle load-balancing/OU-level quota two identical copies of data (disaster recovery) 7

slide-13
SLIDE 13

Collection namespace

/rdm/DI/DCCN/DAC_3010000.01_123/

iRODS zone

  • rganisation
  • rganisational unit

DRDR collection

admin viewer manager contributor viewer

  • namespace reflects administrative hierarchy
  • metadata in KVU triplets
  • role-based authorisation with iRODS groups

8

slide-14
SLIDE 14

Management rules

  • a RPC-like interface for collection management

rdmUpdateCollectionMetadata Filter out attributes the client doesn’t have right to set Verify attribute value Update collection attributes return up-to-date collection attributes Server (core.re) Client inputs: *collName, *kvp

  • utput: *errorcode, *collectionAttrs

9

slide-15
SLIDE 15

User provisioning and authentication

  • authentication via a national federated IdP
  • user is provisioned upon sign-up to the

management portal

  • IdP attributes are stored as user KVU-triplets

in iRODS

  • setup PAM authentication on OTP (one-time

password) for data access

10

slide-16
SLIDE 16

Event logging

iCAT

PEP

filebeat

rodsLog

reporting auditing

  • essential user actions are logged as events
  • non-blocking way streaming events to Elastic stack via “filebeat”

11

slide-17
SLIDE 17

User interfaces

  • separating data-access from collection management
  • web portal for collection management
  • WebDAV (Davrods) for “easy” data transfer
  • file stager: a service “intelligently” managing bulk file transfer between a

local storage and the repository

actual transfer file stager transfer job manager/queue local storage DRDR irsync transfer agents web interface

12

slide-18
SLIDE 18

Strength and weakness

👎 It fits to a combined scientific-administrative workflow of a

large and heterogeneous institute

👎 It provides sufficient functionality for

  • 1. sharing data for publication
  • 2. implementing Data Management Plan (DMP)

👏 It has weak integration with data analysis facility 👏 It doesn’t implement standard way of organising

collection content

13

slide-19
SLIDE 19

Future focuses

  • seamless integration with computing (data-

analysis) facility

  • FAIR-ness of published data collections
  • adoption beyond neuroimaging

14

slide-20
SLIDE 20

Summary

  • We structured a RDM workflow
  • which covers the entire research lifecycle
  • in which both researcher and administrator take part of

responsibility

  • which is specified by protocol; implemented by a iRODS-

based digital repository

https://data.donders.ru.nl

15