Putting the Trust into Trusted Data Repositories: A Federated - - PowerPoint PPT Presentation

putting the trust into trusted data repositories
SMART_READER_LITE
LIVE PREVIEW

Putting the Trust into Trusted Data Repositories: A Federated - - PowerPoint PPT Presentation

Putting the Trust into Trusted Data Repositories: A Federated Solution for the Australian National Imaging Facility Andrew Mehnert * Joint NIF and Microscopy Australia Informatics Fellow Senior Lecturer Data Management, Analysis and


slide-1
SLIDE 1

Putting the Trust into Trusted Data Repositories:

A Federated Solution for the Australian National Imaging Facility

Andrew Mehnert

Joint NIF and Microscopy Australia Informatics Fellow Senior Lecturer – Data Management, Analysis and Visualisation Centre for Microscopy, Characterisation & Analysis (CMCA) The University of Western Australia

*

https://www.slideshare.net/OpenAIRE_eu/overview-of-the-data-pilot-and-openaire-tools-elly-dijk-and- marjan-grootveld-openaire-workshop-ghent-nov2015

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-2
SLIDE 2
  • NIF is a 200 million AUD network of

characterisation facilities

  • State-of-the-art imaging

capability for the characterisation of humans, animals, plants and materials for the Australian research community

  • Its MRI, PET and CT scanners

produce vast amounts of valuable research data

Australian National Imaging Facility

http://anif.org.au

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-3
SLIDE 3
  • For many characterisation facilities it is the user’s responsibility!
  • Several drawbacks:

Ø Security and virus infection risks Ø Inability to monitor the quality of the data acquired Ø Difficulty tracking outcomes such as publications and data reuse Ø Does not follow best practices - data management, legal & funding obligations Ø Difficulty collaborating with other researchers and institutions Ø Impracticality of moving and analysing the data as instruments generate ever- larger volumes of data Ø Does not support the reuse of data

Data Management

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-4
SLIDE 4
  • Why trusted data repositories?

Ø To be able to share data Ø To preserve the initial investment in collecting the data Ø To ensure that data remain useful and meaningful into the future Ø Funding authorities increasingly require continued access to data produced by the projects they fund

Solution: Trusted Data Repositories

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019 https://www.coretrustseal.org

slide-5
SLIDE 5
  • 12-month project completed December 2017
  • Broad aim:

To enhance the quality, durability and reliability of data generated by NIF Quality - data captured according to the NIF Agreed Process Durable - data that has guaranteed availability for 10 years Reliable - data that is useful for future researchers

  • Motivation:

Ø NIF’s desire to enhance the quality of the data acquired across its facilities Ø ARDC’s desire to establish TDRs and learn how to move beyond simple data storage services

NIF/RDS/ANDS Trusted Data Repositories Project

Delivering durable, reliable, high-quality image data

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-6
SLIDE 6
  • Broad goals:

Ø Define requirements and best practices for a federated network

  • f repositories for NIF

Ø Exemplar services across several NIF nodes

  • NIF nodes:

NIF/RDS/ANDS Trusted Data Repositories Project

Delivering durable, reliable, high-quality image data

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-7
SLIDE 7

1. Trust in the repository service (Container)

à CoreTrustSeal https://www.coretrustseal.org

Ø Community-based non-profit organisation Ø Core-level certification for a data repository Ø 16+ requirements

2. Trust in the data quality (Contents)

A NIF user’s expectation is that an animal, plant or material can be scanned and from that data reliable outcomes/characterisations can be obtained (e.g. signal, volume, morphology) over time and across NIF sites à NIF Agreed Process

Two Types of Trust

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-8
SLIDE 8

1. Do not mandate a particular software platform for implementing the exemplar TDR services 2. Implement the exemplar TDR services using existing

Ø local and national infrastructure Ø open source software platforms

3. Focus on magnetic resonance imaging (MRI) instrumentation 4. Be guided by the requirements needed to attain trusted data repository certification rather than seek certification

Project Scope

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-9
SLIDE 9

1. NIF Agreed Process (NAP) to obtain trusted data from NIF instruments 2. Requirements necessary and sufficient for a basic NIF trusted data repository service (platform agnostic) 3. Exemplar repository services across all four participating nodes 4. Self-assessments against the “Core Trustworthy Data Repositories Requirements”

Key Project Outcomes

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-10
SLIDE 10
  • Requirements that must be satisfied to obtain high-quality or NIF-certified data suitable

for ingestion in a NIF trusted data repository service

  • Repository data must be organised by Project ID
  • For data to meet the definition of NIF-certified it must:

Ø Have been acquired on a NIF-compliant instrument Ø Possess NIF-minimal metadata including cross-reference to instrument Quality Control data Ø Include native data generated by the instrument including the acquisition settings/parameters Ø Include conversions to one or more open data formats

Key project outcomes

  • 1. NIF Agreed Process for acquiring high-quality data

Projects Datasets Datafiles

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-11
SLIDE 11
  • “Core Trustworthy Data Repositories Requirements”
  • Additional NIF requirements

Ø Project ID Ø Instrument ID Ø Quality control (QC) Ø Authentication Ø Interoperability Ø Redeployability Ø Service

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

Key project outcomes

  • 2. NIF requirements for a TDR service
slide-12
SLIDE 12

Key project outcomes

  • 3. Exemplar repository services
slide-13
SLIDE 13

Blue numbers indicate responses showing a variance greater than 1 across the field

0 – Not applicable 1 – The repository has not considered this yet 2 – The repository has a theoretical concept 3 – The repository is in the implementation phase 4 – The guideline has been fully implemented in the repository

Rn Description Monash UNSW UQ UWA 1 Mission / scope: The repository has an explicit mission to provide access to and preserve data in its domain. 3 4 4 4 2 Licenses: The repository maintains all applicable licenses covering data access and use and monitors compliance. 3 3 3 3 3 Continuity of access: The repository has a continuity plan to ensure ongoing access to and preservation of its holdings. 1 4 4 4 4 Confidentiality/Ethics: The repository ensures, to the extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms. 4 2 3 2 5 Organizational infrastructure: The repository has adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission. 4 3 4 3 6 Expert guidance: The repository adopts mechanism(s) to secure ongoing expert guidance and feedback (either in- house, or external, including scientific guidance, if relevant). 4 3 3 3 7 Data integrity and authenticity: The repository guarantees the integrity and authenticity of the data. 3 3 4 3 8 Appraisal: The repository accepts data and metadata based on defined criteria to ensure relevance and understandability for data users. 3 3 4 3

Key project outcomes

  • 4. CoreTrustSeal self-assessments
slide-14
SLIDE 14

Blue numbers indicate responses showing a variance greater than 1 across the field

0 – Not applicable 1 – The repository has not considered this yet 2 – The repository has a theoretical concept 3 – The repository is in the implementation phase 4 – The guideline has been fully implemented in the repository

Key project outcomes

  • 4. CoreTrustSeal self-assessments
slide-15
SLIDE 15

NIF TDR Project in a nutshell

Instrument PC

  • Uploader client

Research Data Australia (RDA)

  • Data + service discovery

Instrument record

  • Unique handle

(Instrument ID)

  • Instrument description

Quality Control (QC) Dataset

  • QC standard operating procedure
  • QC data

NIF-agreed process

TruDat@{UWA, UQ, UNSW, Monash}

  • Login via AAF
  • Datasets organised by Project ID
  • Dataset

Ø Linked to an instrument Ø NIF-certification flag

  • Instrument

Ø Linked to a QC Project ID Ø Handle to a record in RDA

Projects Datasets Datafiles

User Dataset

  • NIF-minimal metadata

Ø Project ID, Instrument ID Ø Date and time Ø Implicit metadata

  • Native data
  • Data conversions to one or more
  • pen formats

https://researchdata.ands.org.au

slide-16
SLIDE 16

Projects Datasets Datafiles

Based on a docker deployment of MyTardis* + extensions

*Client-server software platform for storing, sharing, visualising and annotating instrument data – originated at Monash University

Exemplar: TruDat@UWA

https://trudat.cmca.uwa.edu.au

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-17
SLIDE 17

TruDat@UWA: Project/dataset view

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-18
SLIDE 18

TruDat@UWA: Dataset/datafile view

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-19
SLIDE 19

Challenges

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

Multiple NIF nodes, Universities and States posed two major challenges: 1. Project management and execution

  • Project lead at each node; one appointed overall Project Manager
  • Steering Committee
  • Face-to-face workshops
  • Regular fortnightly minuted Zoom meetings
  • Document sharing/collaboration via Google Docs

2. Institutional differences with respect to existing technical approaches

  • “NIF requirements for a trusted data repository service” – platform agnostic
  • NAP – either client-side NIF-certification of data or server-side validation
slide-20
SLIDE 20

Lessons learned

  • Community collaboration and consensus is essential

Ø NIF agreed process, TDR requirements, QC process (common) Ø Documentation registry, software repository Ø Regular meetings and collaborative platforms for documentation and meetings

  • Project documents will evolve with time and need to be adapted for different

instruments

  • CoreTrustSeal

Ø Guide to establishment of services Ø Metric against which to assess services

  • Platform-agnostic service branding

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-21
SLIDE 21
  • Maintenance of services for 10 years + refinements and improvements
  • Integration of additional NIF instruments (MRI, PET, CT)
  • Planned new national and international service deployments

Ø UWA integration of Microscopy Australia instruments (CT, EM) Ø South Australian Health and Medical Research Institute (SAHMRI) Ø Euro-BioImaging Finnish Advanced Light Microscopy (EuBI Finnish ALM) Node in Turku Ø King’s College London

  • CoreTrustSeal certification
  • Support the ARDC C-DeVL project

Future Developments

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-22
SLIDE 22
  • Benefits for NIF users and the broader community

Ø Reliable and durable access to data Ø Improved reliability of research outputs and provenance Ø Making NIF data more FAIR* (Findable, Accessible, Interoperable, Reusable) Ø Easier linkages between publications and data Ø Stronger research partnerships

  • Benefits NIF and Research Institutions

Ø Improved data quality Ø Improved international reputation Ø The ability to run or engage in multi-centre imaging trials and projects Ø A means by which to comply with the Australian Code for the Responsible Conduct of Research

Conclusions

*https://www.ands.org.au/working-with-data/fairdata

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-23
SLIDE 23

Acknowledgements

Project Manager and UWA lead: Andrew Mehnert (NIF Informatics Fellow, Centre for Microscopy, Characterisation and Analysis) NIF lead: Graham Galloway (Chief Executive Officer, NIF) UQ lead: Andrew Janke (NIF Informatics Fellow, Centre for Advanced Imaging) UNSW lead: Marco Gruwel (Senior Research Associate, Mark Wainwright Analytical Centre) Monash lead: Wojtek Goscinski (Associate Director, Monash eResearch Centre)

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-24
SLIDE 24

It used to be “simple” …

Memorial portrait of Robert Hooke for Christ Church Oxford, where he studied. Painted by Rita Greer, 2011.

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-25
SLIDE 25

Research Data Australia

https://researchdata.ands.org.au

http://hdl.handle.net/102.100.100/50041

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-26
SLIDE 26

Quality Control

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-27
SLIDE 27

Australian National Infrastructure and Tools

National Servers (Storage) Research Cloud (Computation) eResearch Tools & Virtual Laboratories (Functionality)

slide-28
SLIDE 28
  • A University centre - we collaborate

in microscopy & characterisation supporting research excellence locally, nationally and internationally

  • ~48 instruments (~$45m)
  • ~35 staff (15 academics)
  • >500 users
  • Integrated microscopy & microanalysis, imaging,

cytometry, metabolomics, genomics, NMR

About the CMCA

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-29
SLIDE 29

Where is the CMCA located?

UWA main campus Harry Perkins Institute, QEII Medical Centre

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-30
SLIDE 30

TruDat@UWA: Client + Server

Data upload from an instrument or companion PC TruDat@UWA web-based application

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-31
SLIDE 31

TruDat@UWA: Facility overview

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-32
SLIDE 32
  • Provides researchers with access to instrumentation and expertise

to probe and measure structures & properties of matter

  • Essential across natural, agricultural, physical, life and biomedical

sciences and engineering

  • Encompasses a diverse range of techniques including:

Ø Optical, electron, X-ray and ion-beam techniques, magnetic resonance imaging, positron emission tomography, mass spectrometry, ultrasound and cytometry

  • State characterisation facility
  • National characterisation facilities

Characterisation Facility

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-33
SLIDE 33

TruDat@UWA: QC project view

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

slide-34
SLIDE 34

1. List of requirements necessary and sufficient for a basic NIF TDR service 2. NIF-wide agreed process (NAP) for acquiring quality or trusted data and uploading it to a NIF TDR service; 3. Instantiations of the NAP for repository services supporting two exemplar data collections chosen to reflect both preclinical and clinical imaging within NIF, and also the difference between data acquisition for a specialist repository and a more general repository; 4. Exemplar repository services across the participating nodes; 5. Assessments against the CoreTrustSeal “Core Trustworthy Data Repositories Requirements” (CoreTrustSeal, 2016b).

NIF TDR Project: Goals

14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019