INTRODUCTION TO EUDAT CDI AND B2 SERVICES SUITE Mark van de Sanden - - PowerPoint PPT Presentation

introduction to eudat cdi and b2 services suite
SMART_READER_LITE
LIVE PREVIEW

INTRODUCTION TO EUDAT CDI AND B2 SERVICES SUITE Mark van de Sanden - - PowerPoint PPT Presentation

INTRODUCTION TO EUDAT CDI AND B2 SERVICES SUITE Mark van de Sanden | EUDAT/SURFsara @eudat_eu eudat.eu Ou Outlin line EUDAT B2 services suite EUDAT CDI infrastructure Example use cases Q&A CDI I Data Domain PUBLISHED DATA DOMAIN


slide-1
SLIDE 1

INTRODUCTION TO EUDAT CDI AND B2 SERVICES SUITE

eudat.eu @eudat_eu Mark van de Sanden | EUDAT/SURFsara

slide-2
SLIDE 2

EUDAT B2 services suite EUDAT CDI infrastructure Example use cases Q&A

Ou Outlin line

slide-3
SLIDE 3

Data Domain modeled on the ANDS1 Data Curation Continiuum

  • 1. Australian National Data Service organization – www.ands.org.au

CDI I Data Domain

WORKSPACE (TEMPORARY - TRANSIENT)

Register Digital Objects Stage Digital Objects

REGISTERED DATA DOMAIN

Discovery of Digital Objects

PUBLISHED DATA DOMAIN

Linking Publications To Digital Objects

slide-4
SLIDE 4

Co Community ty-Dr Driven en Pi Pilots

EUDAT services are designed, built and implemented based on user community requirements.

slide-5
SLIDE 5

Community Repositories

(thematic data centres)

EUDAT generic data service provider

storage, workflows, processing, archive

slide-6
SLIDE 6
slide-7
SLIDE 7

Ser Service ce Diagr gram

slide-8
SLIDE 8

Guidelines for data providers which are DataCite and OpenAIRE compliant Harvesting via OAI-PMH, JSON-API and CSW 2.0 Data selection on basis of 9 facets, including Spatial, Time, Publication Year, Tags Full text search on metadata Who Anyone What Find collections of scientific data quickly and easily, irrespective of their origin, discipline or community Get quick overviews of available data Browse through collections using standardized facets Why Unique collection Ease of Searching

http://b2find.eudat.eu/

slide-9
SLIDE 9

Faceted Search and Data Access

B2FIND provides ‘faceted’ search for

  • Free text
  • Geo spatial
  • Temporal coverage
  • Publication year
  • Textual facets as
  • Tags
  • Creator
  • Discipline
  • Language
  • Publisher
slide-10
SLIDE 10

Faceted Search and Data Access

Dataset view provides display of metadata :

  • Spatial extent
  • Table of field-value pairs
  • Links to data resources
slide-11
SLIDE 11
slide-12
SLIDE 12

Support for direct publishing of data sets in B2SHARE Personal quota of 20GB, extended quota possible Fine grain control to share data with

  • ther researchers

Share data with researchers across

  • ther B2DROP and other ownCloud/

Nextcloud instances Easy integration with other research platforms Who Citizens Scientists and small teams What Store and exchange data Synchronize multiple versions Ensure automatic desktop synchronization Why Ease of Use Trusted European Service

https://b2drop.eudat.eu/

slide-13
SLIDE 13

Easy sharing of data

slide-14
SLIDE 14

Direct publishing to B2SHARE

slide-15
SLIDE 15

Minimum metadata compliant with DataCite and OpenAIRE, flexible support for community specific metadata extensions Support for DOI’s on dataset level Support for PIDs, checksums and download statistics on object level Dataset record lifecycle and versioning Authorisation for community domains Metadata automatic harvested by B2FIND Support for annotation via B2NOTE Direct uploads from B2DROP Easy installable as local instance via Docker Who Small to Medium Teams What Store data (incl. software) and add domain meta data Share registered research data worldwide Preserve (small-scale) research data for long- term Why Register Data for Publications (FAIR) Make known to wider community

https://b2share.eudat.eu/

slide-16
SLIDE 16

Features

slide-17
SLIDE 17

Community metadata extensions

slide-18
SLIDE 18

an annotation is “a note added to a text, book, drawing, etc., as a comment or an explanation” (from Merriam Webster) Provide a service to add annotations to digital assets New B2 service, launched at Jan 2018 Can be integrated within community repositories and services

https://b2note.eudat.eu/

Manual annotations via WUI, or programmatic via a REST API Annotation on existing ontologies (in Biomedical domain) Integrated with B2SHARE on basis of PIDs Uses W3C Annotation Model standard (JSON-LD and RDF)

slide-19
SLIDE 19

B2SHARE integration

slide-20
SLIDE 20

Support for different storage systems (e.g. Posix, NFS,S3, Tape, UMS) Policies for data replication, registration

  • f PIDs, data integrity checks, versioning

(alpha) Access via GridFTP and HTTP APIs Data downloads via PIDs Support for automated data publication via B2SHARE (alpha) Central policy management Who Community Data Managers ‘Sophisticated’ Organisations What Provide an abstraction layer which virtualizes large-scale data resources Guard against data loss in long-term archiving and preservation Optimize access for users from different regions and to computing resources Data management on basis of policies

Why

Performance Replication between trusted sites Data Preservation

slide-21
SLIDE 21

Data Policy Manager

Data policies are centrally managed Policy rules are implemented and enforced by site-local rule engines Policies describe in an abstract language Community data managers must authenticate to provide trust Support policies for data replication and integrity checking Central logging for auditable data policies to monitor execution Active collaboration with the RDA Practical Policy WG

slide-22
SLIDE 22

Common API on basis of GridFTP and HTTP Upload/Download of data to/from B2SAFE Downloads via PIDs for both APIs (GridFTP and HTTP)\ Support for anonymous data access HTTP API defined via OpenAPI

Who Users and Communities who want to interact with EUDAT CDI services What Provide a common access layer to B2 services Copy large data sets, ingesting them onto EUDAT data services Enables data transfer for large data collections from EUDAT storages to external HPC facilities for processing Why Support data transfers between PRACE and EGI Simplify data transfers

http://petstore.swagger.io/?url=https://b2stage.cineca.it/api/specs&docExpansion=none - /

slide-23
SLIDE 23

Based on Handle v8 Machine readable via HTTP RESTful API EUDAT standardized PID record and data types B2HANDLE API Python library for easy integration is client services PID prefixes provided via ePIC Multiple B2HANDLE service providers Who Groups or Communities who want to make their data citable What Follows policies to register data and make it long term refer- and citable Reliability through mutual PID mirroring Provides abstraction layer between a globally unique persistent identifier and physical location of data objects PIDs global resolvable Why Simple integration Technology Agnostic

slide-24
SLIDE 24

Support for eduGAIN, Social Identities (Facebook, Google, Microsoft, Github), Orcid and local accounts IdP integration support for SAML, OpenID, OAuth2, X.509 Community IdP (ELIXIR, PRACE, EGI) SP integration support for SAML, OIDC, OAuth2, X.509 Integrated with B2SHARE, B2SAFE, B2STAGE, B2DROP, B2NOTE, SPMT, DPMT and Gitlab Joint proposal for the Life Sciences with GEANT and EGI Common Federated AAI planned in EOSC-hub

Who Anyone wanting to use the B2 Services What Complies with community ownerships and access rights, basis of trust Credential conversion approach (e.g. SAML, OpenID, X.509, Username/password) Identity provider for citizen scientists Why Use your own ID in federated environment

slide-25
SLIDE 25
slide-26
SLIDE 26

WORKSPACE (TEMPORARY - TRANSIENT)

Register Digital Objects Stage Digital Objects

REGISTERED DATA DOMAIN

Discovery of Digital Objects

PUBLISHED DATA DOMAIN

Linking Publications To Digital Objects Register Digital Objects Stage Digital Objects

REGISTERED DATA DOMAIN

Discovery of Digital Objects Data Domain modeled on the ANDS1 Data Curation Continiuum

  • 1. Australian National Data Service organization – www.ands.org.au

CDI I Data Domain

Data Objects Data Entities

EUD EUDAT

slide-27
SLIDE 27

Us User D r Docum umentation

Total 33 documents maintained and revised 3 levels of documentation:

Engage: for Community decision-makers and data managers Deploy: for system and support engineers Use: for researchers and end users

Participation from community experts

https://eudat.eu/services/userdoc

slide-28
SLIDE 28

Tr Training Material

https://eudat.eu/training - https://github.com/EUDAT-Training

Total of 14 training modules developed and maintained Hands-on training environments for: B2SAFE B2SHARE B2FIND B2HANDLE B2NOTE

slide-29
SLIDE 29

more than 20 European research organisations, data and computing centres in 14 countries

CDI I members

slide-30
SLIDE 30

Thematic Service Provider Repository Provider Generic Service Provider (Archive, Large Storage System, HTC/HPC)

providing PIDs

Operational and Support services

CDI I In Infrastructure

Project (Configuration) Management: DPMT Service Portfolio & Catalogue Management Tool A&R Monitoring, Software Version Monitoring Accounting, Reporting Helpdesk

Operational Services

Vulnerability Scanning, CSIRT

PID Service Provider

Service Hosting Framework

slide-31
SLIDE 31

Support Request Webform Trouble Ticketing System B2-Service Queues Site Queues

He Help lpdesk k and and Su Support rt T Team

Requests via the Helpdesk Webform

Helpdesk Channels Responsibilities

1st Level Support (BSC) 2nd Level Support: Project Enabling Team (OP) 3rd Level Support Service Developer Teams (DEV)

https://eudat.eu/contact-support-request

slide-32
SLIDE 32

Example SeaDataNet EUDAT CDI Cloud Example EuroArgo Data Subscription service

Us Use C Cases

slide-33
SLIDE 33
  • SeaDataNet consortium operates a state-of-the-art

pan-European infrastructure to manage high quality

  • cean and marine data
  • SeaDataCloud is third proposal, after SeaDataNet and

SeaDataNet2

  • Duration: 2016 - 2019
  • Aim:

– To advance SeaDataNet service and increase their usage by adopting cloud and HPC technology

  • EUDAT CDI:

– Leverage EUDAT CDI infrastructure for long-term digital preservation and curation provide unified data access – 5 partners: DKRZ, CINECA, CSC, GRNET, STFC – B2 services: B2DROP, B2SHARE, B2SAFE, B2HOST, B2STAGE, B2FIND and B2ACCESS

Us Use C Case

slide-34
SLIDE 34

Common Data In Index

slide-35
SLIDE 35
  • Import area

distributed across the five EUDAT partners

  • Production area

replicated across the five EUDAT partners

EUD EUDAT C Cloud ud

slide-36
SLIDE 36

Da Data Subscripti tion Ser ervice

  • User subscribes to a data set, e.g. through the

use of queries

  • When a match for a query is found, the user is

notified

  • A notification includes information (a link) to

access the data set found through the query

  • Expiration of the subscription?

– Persistent: user removes out-dated subscription? – Time-limited: automatic expiration and expiry notification?

  • Notification intervals?
slide-37
SLIDE 37

Da Data Subscripti tion Ser ervice

slide-38
SLIDE 38
slide-39
SLIDE 39

www.eudat.eu @eudat_eu