putting the trust into trusted data repositories
play

Putting the Trust into Trusted Data Repositories: A Federated - PowerPoint PPT Presentation

Putting the Trust into Trusted Data Repositories: A Federated Solution for the Australian National Imaging Facility Andrew Mehnert * Joint NIF and Microscopy Australia Informatics Fellow Senior Lecturer Data Management, Analysis and


  1. Putting the Trust into Trusted Data Repositories: A Federated Solution for the Australian National Imaging Facility Andrew Mehnert * Joint NIF and Microscopy Australia Informatics Fellow Senior Lecturer – Data Management, Analysis and Visualisation Centre for Microscopy, Characterisation & Analysis (CMCA) The University of Western Australia https://www.slideshare.net/OpenAIRE_eu/overview-of-the-data-pilot-and-openaire-tools-elly-dijk-and- marjan-grootveld-openaire-workshop-ghent-nov2015 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  2. Australian National Imaging Facility http://anif.org.au • NIF is a 200 million AUD network of characterisation facilities • State-of-the-art imaging capability for the characterisation of humans, animals, plants and materials for the Australian research community • Its MRI, PET and CT scanners produce vast amounts of valuable research data 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  3. Data Management • For many characterisation facilities it is the user’s responsibility! • Several drawbacks: Ø Security and virus infection risks Ø Inability to monitor the quality of the data acquired Ø Difficulty tracking outcomes such as publications and data reuse Ø Does not follow best practices - data management, legal & funding obligations Ø Difficulty collaborating with other researchers and institutions Ø Impracticality of moving and analysing the data as instruments generate ever- larger volumes of data Ø Does not support the reuse of data 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  4. Solution: Trusted Data Repositories • Why trusted data repositories? Ø To be able to share data https://www.coretrustseal.org Ø To preserve the initial investment in collecting the data Ø To ensure that data remain useful and meaningful into the future Ø Funding authorities increasingly require continued access to data produced by the projects they fund 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  5. NIF/RDS/ANDS Trusted Data Repositories Project Delivering durable, reliable, high-quality image data • 12-month project completed December 2017 • Broad aim: To enhance the quality, durability and reliability of data generated by NIF Quality - data captured according to the NIF Agreed Process Durable - data that has guaranteed availability for 10 years Reliable - data that is useful for future researchers • Motivation: Ø NIF’s desire to enhance the quality of the data acquired across its facilities Ø ARDC’s desire to establish TDRs and learn how to move beyond simple data storage services 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  6. NIF/RDS/ANDS Trusted Data Repositories Project Delivering durable, reliable, high-quality image data • Broad goals: Ø Define requirements and best practices for a federated network of repositories for NIF Ø Exemplar services across several NIF nodes • NIF nodes: 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  7. Two Types of Trust 1. Trust in the repository service (Container) à CoreTrustSeal https://www.coretrustseal.org Ø Community-based non-profit organisation Ø Core-level certification for a data repository Ø 16+ requirements 2. Trust in the data quality (Contents) A NIF user’s expectation is that an animal, plant or material can be scanned and from that data reliable outcomes/characterisations can be obtained (e.g. signal, volume, morphology) over time and across NIF sites à NIF Agreed Process 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  8. Project Scope 1. Do not mandate a particular software platform for implementing the exemplar TDR services 2. Implement the exemplar TDR services using existing Ø local and national infrastructure Ø open source software platforms 3. Focus on magnetic resonance imaging (MRI) instrumentation 4. Be guided by the requirements needed to attain trusted data repository certification rather than seek certification 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  9. Key Project Outcomes 1. NIF Agreed Process (NAP) to obtain trusted data from NIF instruments 2. Requirements necessary and sufficient for a basic NIF trusted data repository service (platform agnostic) 3. Exemplar repository services across all four participating nodes 4. Self-assessments against the “Core Trustworthy Data Repositories Requirements” 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  10. Key project outcomes 1. NIF Agreed Process for acquiring high-quality data • Requirements that must be satisfied to obtain high-quality or NIF-certified data suitable for ingestion in a NIF trusted data repository service Projects • Repository data must be organised by Project ID Datasets Datafiles • For data to meet the definition of NIF-certified it must: Ø Have been acquired on a NIF-compliant instrument Ø Possess NIF-minimal metadata including cross-reference to instrument Quality Control data Ø Include native data generated by the instrument including the acquisition settings/parameters Ø Include conversions to one or more open data formats 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  11. Key project outcomes 2. NIF requirements for a TDR service • “Core Trustworthy Data Repositories Requirements” • Additional NIF requirements Ø Project ID Ø Instrument ID Ø Quality control (QC) Ø Authentication Ø Interoperability Ø Redeployability Ø Service 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  12. Key project outcomes 3. Exemplar repository services

  13. Key project outcomes 4. CoreTrustSeal self-assessments Blue numbers R n Description Monash UNSW UQ UWA indicate responses 1 Mission / scope: The repository has an explicit mission 3 4 4 4 to provide access to and preserve data in its domain. showing a variance 2 Licenses: The repository maintains all applicable licenses 3 3 3 3 covering data access and use and monitors compliance. greater than 1 3 Continuity of access: The repository has a continuity 1 4 4 4 across the field plan to ensure ongoing access to and preservation of its holdings. 4 Confidentiality/Ethics: The repository ensures, to the 4 2 3 2 0 – Not applicable extent possible, that data are created, curated, accessed, and used in compliance with disciplinary and ethical norms. 1 – The repository has not considered this yet 5 Organizational infrastructure: The repository has 4 3 4 3 adequate funding and sufficient numbers of qualified staff managed through a clear system of governance to effectively carry out the mission. 2 – The repository has a theoretical concept 6 Expert guidance: The repository adopts mechanism(s) to 4 3 3 3 secure ongoing expert guidance and feedback (either in- house, or external, including scientific guidance, if 3 – The repository is in the relevant). implementation phase 7 Data integrity and authenticity: The repository 3 3 4 3 guarantees the integrity and authenticity of the data. 4 – The guideline has been fully 8 Appraisal: The repository accepts data and metadata 3 3 4 3 based on defined criteria to ensure relevance and implemented in the repository understandability for data users.

  14. Key project outcomes 4. CoreTrustSeal self-assessments Blue numbers indicate responses showing a variance greater than 1 across the field 0 – Not applicable 1 – The repository has not considered this yet 2 – The repository has a theoretical concept 3 – The repository is in the implementation phase 4 – The guideline has been fully implemented in the repository

  15. NIF TDR Project in a nutshell User Dataset Projects • NIF-minimal metadata Ø Project ID, Instrument ID Ø Date and time Datasets Ø Implicit metadata • Native data • Data conversions to one or more Datafiles open formats NIF-agreed process Instrument PC TruDat@{UWA, UQ, UNSW, Monash} Quality Control (QC) Dataset • Uploader client • Login via AAF • QC standard operating procedure • QC data • Datasets organised by Project ID Instrument record Unique handle • (Instrument ID) • Dataset Instrument description • Ø Linked to an instrument Ø NIF-certification flag Research Data Australia (RDA) • Instrument • Data + service discovery Ø Linked to a QC Project ID https://researchdata.ands.org.au Ø Handle to a record in RDA

  16. Exemplar: TruDat@UWA https://trudat.cmca.uwa.edu.au Projects Datasets Datafiles Based on a docker deployment of MyTardis* + extensions *Client-server software platform for storing, sharing, visualising and annotating instrument data – originated at Monash University 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  17. TruDat@UWA: Project/dataset view 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

  18. TruDat@UWA: Dataset/datafile view 14th International Digital Curation Conference (IDCC19) | Melbourne – Australia | 4 - 7 February 2019

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend