De Develop opment of of the new Research Infrastructure for or - - PowerPoint PPT Presentation

de develop opment of of the new research infrastructure
SMART_READER_LITE
LIVE PREVIEW

De Develop opment of of the new Research Infrastructure for or - - PowerPoint PPT Presentation

De Develop opment of of the new Research Infrastructure for or Europ opes Na Natural Sc Science Collections using g novel building g blocks in EOSC SC wouter.addink@naturalis.nl Naturalis Biodiversity Center Distributed System of


slide-1
SLIDE 1

De Develop

  • pment of
  • f the new Research Infrastructure for
  • r Europ
  • pe’s Na

Natural Sc Science Collections using g novel building g blocks in EOSC SC

wouter.addink@naturalis.nl Naturalis Biodiversity Center Distributed System of Scientific Collections (DiSSCo)

slide-2
SLIDE 2

European Collections

European Collection facilities:

> 1 b

billion specimens

>80%

80% of world’s species

> 5,000

5,000 scientists employed

> 16,000

16,000 scientific visitors pa

> 10 m

million public visitors pa

> 25 m

million web visitors pa

slide-3
SLIDE 3

Lowering barriers for users

25, 25,000 000 resear arche chers travel every year to physically access scientific collections and 800k 800k obj bject cts are packed and shipped (at at an an annual annual publ public c cost of more than han €70M €70M)

slide-4
SLIDE 4

DiSSCo Collections

530 Million

slide-5
SLIDE 5

115 National Facilities 21

Countries

  • La

Largest ever formal agreement between natural science collection facilities

  • Ce

Centralised go governance model

  • Sy

Synchronisation of facilities at access, data and policy level

  • One European virtual Collection

DiSSCo: A new European infrastructure

slide-6
SLIDE 6

DiSSCo science services

e-Sci cience ce service ces

Ph Physic ical an al and r rem emote e acces access s ser ervices ices

Su Suppor

  • rt & Training

g services

1 2 3

A one-stop shop for services providing unified di discovery, access, interpr pretation n and nd ana nalysis of complex linked data A universal harmonised ph physical access service and di digitisation n on n de demand nd service Integrated us user suppo upport de desk and implementation of mu multi-mo modal training pr programmes to enhance data skills

single entry point

slide-7
SLIDE 7

Specimens representations become the centrepiece of the DiSSCo knowledge base – They are used as anchoring points for disperse data classes

All data classes una unambi biguo uous usly lin linked to the ph physical

  • bject

cts they derive from

Occurrence Specimen Taxon Concept Taxon Interaction Taxon Name Publication Trait Collection Sequence Gene

slide-8
SLIDE 8

GBIF

Occurrence / Images

Catalogue of LIfe

Taxonomy

Genbank

Genomic information

EoL - TraitBank

Traits

Plazi – TreatmentBank

Literature - Treatments

GloBI

Species Interactions Nomenclature

IPNI / Zoobank

Collections-related Data classes

Re Re-unit unite and and Serve

slide-9
SLIDE 9

Building block: Digital Objects (DO)

DO DO

bi bit se seque quenc nce re repository pe peristent ID ID me metad adata

collect ction

is is_d _describ ibed-by by is is_r _referenced-by by is is_a _a ag aggregates is is_a _a

d-en entit ity

is is_r _represented-by by is is_s _stored_in _in

A A ne new, , simple mod model fo for or

  • rganis

isin ing dat data

  • Digital Objects are widely discussed in RDA GEDE by experts from 47 large Research Infrastructures
  • Piloted in C2Camp (github.com/c2camp/core/wiki) by Ris (ICOS, CLARIN, DISSCO, ENES) and others

to create critical mass across 3+ continents.

slide-10
SLIDE 10

Why DO?

  • Data heterogeneity hampers data exchange and reuse
  • 80 % of researchers time in data-intensive projects is wasted with data wrangling
  • To a large extent inefficiencies are due to bad data organisation

Developments in science towards a stable data domain:

  • DONA foundation: global domain of resolvable PIDs (Handles, DOIs, ePIC, etc)
  • FAIR principles: globally agreed guidance for proper data management/stewardship
  • Various RDA WG results
  • Organisation: Research Infrastructures, eInfrastructures, clusters, EOSC

But: A breakthrough for harmonised infrastructure building is still needed!

We are using HTML/HTTP now for everything, the web is great but not for creating a stable data domain

DO Architecture is a logical extension of the Internet to simplify the task of information management

R D A E U 2 1 3 S u r v e y : 7 5 % M . B r

  • d

i e M I T S . : 8 % C r

  • w

d F l

  • w

e r 2 1 7 S . : 7 9 %

slide-11
SLIDE 11

What will DO bring us?

  • Abstraction for cross domain data management
  • Reusability (by binding metadata and data with PID to a digital object)
  • Interoperability by Registration of Types (RDA working group on Data Type

Registries) and the Digital Object Interface Protocol (DOIP)

  • Collections (PIDs pointing to a list of PIDs) enabling recursion
  • Encapsulated Complexity for the Users View of the DO Cloud

Global Digital Object Cloud, Larry Lannom, 2016

(See presentation by Ulrich Schwardmann, GWDG, GEDE Workshop on DO, 2018:)

The DOIP Specification (version 2.0) will be conveyed by CNRI to the DONA Foundation in the coming weeks for public release – https://www.dona.net

slide-12
SLIDE 12

GET Physical Object (PO) PID GET PO PID metadata PO PID GenBank

Accession No

Occurrence ID DSDO Taxon PID

DSDO: Digital Specimen Digital Object PO PID: Physical object PID

slide-13
SLIDE 13

Why Digital Objects?

Specimens are atomic items

  • Like journal articles, archaeological

artefacts, DNA sequences, YouTube videos, taxon concepts, software programs, workflows, etc.

  • Deserve individual and unique

identification to avoid ambiguity around use and interpretation

1

Digital objects collect all core information about the thing in one place

  • What it is, how it came into being,

where it can be found, and pointers to other related things

  • Editable but accuracy/authenticity

can be controlled

2

A new kind of industrial

  • bject that pervades every

aspect of our life today, a technical essence of a thing in cyberspace

  • Virtual collection joined together

through logical and temporal relations, networks, etc.

3

Further reading: 1) Yuk Hui, On the existence of digital objects; 2) Jannis Kallinikos et al., A theory of digital objects

Why DOs approach is appropriate for re-uniting natural science collections-derived information

slide-14
SLIDE 14

Building block: PIDs & Minimal metadata

Developed in RDA Data Fabric IG

  • Enabling DOIP protocol
  • Building on RDA Kernel Information WG
  • To be aligned with minimum metadata

schema in EOSC Service catalogue

slide-15
SLIDE 15

Building block: AARC based AAI

  • Piloted in Synthesys+ by the EGI AAI technology provider GRNET
  • Experimentation with user profiles based on augmented ORCIDs
  • Enables monitoring of Open Science done in the RI
  • Supports implementation of metrics for open science and collection

management

slide-16
SLIDE 16

Building block: Attribution model

RDA/TDWG Attribution Metadata Working Group Recommendation linking people, the curatorial actions they perform, and the objects they are curating.

slide-17
SLIDE 17

Building block: Data packaging

  • Easy to use by end users
  • Flexible (extensible, scalable and customisable)
  • Machine readable metadata that is human-editable
  • Use of existing standard formats
  • Language, technology and infrastructure agnostic

Requirements: Examples:

  • DarwinCore Archives (github.com/gbif/ipt/wiki/DwCAHowToGuide)
  • Data Packages (frictionlessdata.io)
  • Linked Data Fragments (linkeddatafragments.org)
slide-18
SLIDE 18

Questions on DiSSCo Technical Architecture? Contact

info@dissco.eu