 
              Bringing Long-Tail Microscopy & Characterisation Data into the Light RAiD service Characterisation: Process of User Dataset • Project ID probing and measuring the Projects • Instrument ID * • Native data structures and properties of • Implicit/explicit metadata Datasets • Conversions to open format(s) materials at the micro, nano and atomic scales Datafiles Quality Control (QC) Dataset • QC standard operating NIF/MA Instrument procedure Long-tail data: Relatively NIF/MA Trusted Data Repository • QC data small (KB, MB, GB), Instrument record unstructured and un-curated • Instrument Published dataset record description • Dataset description • Instrument ID • Dataset ID Question: What is needed to extend the ARDC-funded NIF Trusted Data Repository solution to include Microscopy Australia instrument data and Data & service to facilitate FAIR for both Handle minting service discovery portal DOI minting service characterisation communities? * Icons made by Freepik from www.flaticon.com
Key issues Flexible data model Catalogue metadata schemas and vocabularies • DCC list of Metadata Standards, RDA Metadata Projects Directory, FAIRsharing.org Standards… Datasets Findable: • USID, OME-XML, (EPS) Equipment Data Standard, Directory Interchange Format, AODN Instrument • PIDs and rich metadata for Datafiles vocabulary, Core Scientific Metadata Model projects (RAiD), datasets (CSMD), … (DOI) and instruments (handle) • Data & Service Discovery Catalogue file types, metadata extraction Data packaging portal (RDA) tools and file conversion tools standard • 60+ instruments recorded to date • BagIt Accessible : • DataCrate • Deposit of quality data into a • Visits to CMCA + ACMM in November • RO-Crate (Research Object trusted data repository service Crate) (TruDat@UWA) Interoperable • Data packaging specification Standardised Matrix of candidate Community-agreed for interoperability (Data protocol for repository licenses for data collecting quality platforms publishing Crate, RO-Crate) data • MyTardis, Dspace, Reusable • V1.0 derived from CKAN, XNAT, NIF Trusted Data 4Ceed/Clowder, • Licenses for data publishing Repository IMS, OMERO, • Open data formats Project LORIS, …
Lessons learnt and findings to date • Most crucial information must be captured at the project creation and data collection stages • Lack of open standards Ø 100s to 1000s of hours wasted in finding and sharing data, converting between formats, seeking missing parameters and fixing missing values • Need to support a variety of data repository platforms • Agree upon a common data packaging standard to facilitate interoperability Ø Metadata schemas and vocabularies Ø Tools for metadata extraction and data transformation • Cloud-based service for metadata extraction and file conversion to open formats • PID services needed: RAiD (Project), DOI (Dataset), ORCiD (User), Handle (Instrument)
Acknowledgements Project Team: Andrew Mehnert 1,2 (CMCA, UWA – Project Lead) • Roger Wepf 1 (Director, CMM, UQ; Head, MA D&I Committee) • Aswin Narayanan 2 (CAI, UQ) • • Lisa Yen (Chief Operating Officer, MA, USyd) • Ryan Sullivan (eResearch Consultant, USyd) Matt Foley 1 (ACMM, USyd) • • Abby Asomani (Library Information Specialist, UWA) • Mingfang Wu (ARDC) • Alexander Joos (CMCA, UWA) • Instrument managers and technique group leaders at ACMM, CMM, CAI and CMCA 1. Microscopy Australia Data & Informatics Committee 2. National Imaging Facility Informatics Fellow
Recommend
More recommend