preservation / curation Logo Research cannot flourish if data are - - PowerPoint PPT Presentation

preservation curation
SMART_READER_LITE
LIVE PREVIEW

preservation / curation Logo Research cannot flourish if data are - - PowerPoint PPT Presentation

preservation / curation Logo Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. Nature 461, 145 (10 September 2009), doi:10.1038/461145a 1 Photon Sciences + X ray Facilities


slide-1
SLIDE 1

Logo

preservation / curation

1

  • „Research cannot flourish if data are

not preserved and made accessible. All concerned must act accordingly.“

Nature 461, 145 (10 September 2009), doi:10.1038/461145a

slide-2
SLIDE 2

Logo

Photon Sciences + X‐ray Facilities

  • motivation: costs of data creation

2.000€/h beamline + overhead 120k€ ‐ 50M€ (ribosome‐structure)

  • data preservation and re‐use (status)

portable USB hard‐disks, responsibility of researcher

  • data preservation and re‐use (goal)

sharing, re‐use and integration, validation of results

  • requirements

federated repositories

  • pen access (after a 5 year moving wall)

Web‐based (Shibboleth) but integrated in Grid

slide-3
SLIDE 3

Logo

German Language Studies

  • preserve, share, analyze, re‐use German literature texts
  • 1,5 Mio texts + metadata + annotations = 5TB
  • requirements

– versioning – provenance – licensing + IPR (author, publisher) → data are irretrievable, but may need to be deleted

http://www.nlcphs.org/Academics/English/Pictures/shakespeare.jpg

slide-4
SLIDE 4

Logo

4

  • existing infrastructure
  • data types
  • data integrity
  • rights

– open access: climate – licensing: literature – de‐personalization: medicine – private data: photon sciences

variation in community requirements

available (climate) not available (bio statistics, social sciences, archaeology) homogeneous (literature) heterogeneous (medicine) not changeable (climate) erasable (literature) phases, versioning (humanities)

slide-5
SLIDE 5

Logo

curation layers

conceptual object logical object physical object digital object

research data/ semantics formats, e.g. images, XML bit-stream research data

  • cf. Thibodeau: Overview of Technological Approaches to Digital Preservation and Challenges in

Coming Years, 2002. http://www.clir.org/pubs/reports/pub107/thibodeau.html

5

slide-6
SLIDE 6

Logo

roles / responsibilities

Content Preservation Bitstream Preservation Data Curation

WissGrid D-Grid Community-Grid

6

slide-7
SLIDE 7

Logo

slide-8
SLIDE 8

catalogue services archive and storage infrastructure services (pluggable) preservation planning registries: formats, ontologies, ... persistent identifier AAI, user management monitoring, logging workflow metadata extraction community-spec services validation, quality control rights management search integrity checks, replication provenance repository: ingest, storage, access conversion Repository: metadata management

8

WissGrid Community D-Grid ++

slide-9
SLIDE 9

Logo

9

grid/repository integration

  • employing grid storage resources
  • Bit Preservation + Trust Zones
  • employing grid compute resources
  • our focus: data to service

repos

  • repository federation

1 2 3

repos grid grid repos data Storage Kopplung

slide-10
SLIDE 10

10

research data repository curation services community-specific data handling community-specific tools D-Grid infrastructure AAI, security, VO management, licenses, SLA, ...

slide-11
SLIDE 11

Logo

11

finally

  • there is no single approach that serves all needs (curation)
  • foster interoperability to emerge locally (not force)
  • link activities / communities