The R e Role of e of T Trustwort rthy D Digi gital Rep - - PowerPoint PPT Presentation

the r e role of e of t trustwort rthy d digi gital rep
SMART_READER_LITE
LIVE PREVIEW

The R e Role of e of T Trustwort rthy D Digi gital Rep - - PowerPoint PPT Presentation

The R e Role of e of T Trustwort rthy D Digi gital Rep epositori ries i in Sustainability David Giaretta david@giaretta.org www.giaretta.org and www.iso16363.org Big Data to Knowledge AHM & Open Data Science Symposium 29 Nov 1


slide-1
SLIDE 1

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 1

The R e Role of e of T Trustwort rthy D Digi gital Rep epositori ries i in Sustainability

David Giaretta david@giaretta.org www.giaretta.org and www.iso16363.org Big Data to Knowledge AHM & Open Data Science Symposium 29 Nov – 1 Dec 2016

slide-2
SLIDE 2

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 2

Interoperability, Re-use, Preservation and Sustainability

Interoperability Replication of results Exploitation/ Re-use Preservation Usability VALUE Sustainability

  • What do the bits mean?
  • Need “metadata”
  • What kinds? How much of

each kind?

  • EU Commissioner for the Digital Agenda said:

“Data is the new Gold” but

  • Gold is precious because it is rare, and does not combine
  • Data is precious because there is so much and it becomes

more valuable when it is combined

“metadata”

slide-3
SLIDE 3

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 3

Digitally encoded information – 1’s and 0’s

  • BITS:

01001110 01001101 01010001 01001101 01010000 01001010 00100000 00100000

  • HEX:
  • Two IEEE 754 32 bit real numbers:

8.6116461E8 1.35644119E10

  • Two 32 bit integers

164211241 168379396

  • Actually...

....

  • ASCII Characters:

NMQMPJ

  • ………. Was my flight reference

Example: “ca fe ba be” at start indicates Java class file Assuming “big-endian” What does this mean?

4e 4d 51 4d 50 4a 20 20

slide-4
SLIDE 4

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 4

…sem emanti tics … s …

Longitude Latitude Ozone

Could be Findable and Accessible - encoded as Comma Separate Value (CSV) file in ASCII

  • r Unicode or encoded with XML markup

Can anyone guess what this table means?

Date

slide-5
SLIDE 5

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 5

OAIS (ISO 14721) and digital preservation

  • Reference Model for Open Archival Information System (OAIS)

provides a very general approach

  • OAIS approach to digital preservation:

–covers all types of digitally encoded information –provides a way to test whether preservation is successful –does not require seeing into the future –does require transparency – be clear what is being promised

  • but does not require “open access”
  • Very widely accepted and provides the basis for pretty well all work in

digital preservation

  • OAIS provides a good basis for certification
  • Available free from https://public.ccsds.org/Pubs/650x0m2.pdf
slide-6
SLIDE 6

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 6

Pres eser erving d g digi gital ally y encod

  • ded

ed i information

  • n
  • In order to use/understand the bits requires what OAIS calls

“Representation Information” – anything needed to allow the data to be interpreted by software or people and certainly requires semantics and many other things

  • Additional things such as software which are readily available now may not be

available in future

  • If the bits are unchanged we can keep hashes and be pretty sure of authenticity.
  • If we have to change the bits e.g. Transform to another format then
  • Evidence of Authenticity needs care
  • Probably needs other software etc
  • It may be that the information must be handed over
  • To different system and/or different organisation
  • Need to take care of the details which tend to be ignored
slide-7
SLIDE 7

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 7

Partial Representation Information Network for MERIS Level 2 data

slide-8
SLIDE 8

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 8

Role of people (and automated systems)

  • Creation of data and capture/creation of the metadata

required for use/exploitation now and into the future

  • Follow “Active” Data Management Plans (RDA and CCSDS/ISO)
  • Funding, Management and Operation of the repository
  • Defines the “Designated Community” e.g. people who understands

particular sub-discipline

  • Undertakes preservation activities for the data – ensuring that the

data will be usable by members of the Designated Community despite changes in h/w, s/w, environment etc

  • Use the data (including by the Designated Community)
  • Exploit and create value from the data
  • Judge the value of the data
slide-9
SLIDE 9

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 9

Many types of Audit and Certification

  • ISO 16363 focuses on keeping the Information understandable / usable
  • www.iso16363.org
  • based on OAIS concepts – including usability
  • 100+ metrics covering all aspects of the repository to ensure the auditor looks at the details
  • uses the ISO certification process on which our lives depend in so may areas e.g. medical equipment, food

safety, airlines, automobiles etc.- 3rd party visits and evaluation

  • ISO 27000 type audits focus on keeping the bits safe in the context of the needs of the organisation
  • the information is an asset of the business – what happens after the organisation ceases to exist is of no
  • concern. Security certification may be needed for any information that can be used to identify an individual
  • DIN 31644
  • audit and certification process not clear
  • ISO 15489 – Records Management
  • No formal audit process
  • World Data System and Data Seal of Approval
  • Small set (16) metrics – not detailed
  • Recognised as much “lower” than ISO 16363 (DSA as “bronze” and ISO 16363 as “gold”)
slide-10
SLIDE 10

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 10

ISO Standards for certification

  • ISO 16363: Audit and Certification of Trustworthy Digital

Repositories

  • Available free from https://public.ccsds.org/Pubs/652x0m1.pdf
  • ISO 16919: Requirements For Bodies Providing Audit And

Certification of Trustworthy Digital Repositories

  • Available free from https://public.ccsds.org/Pubs/652x1m2.pdf
  • Used for accreditation of auditors by National Accreditation Bodies
  • Auditors available early next year
slide-11
SLIDE 11

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 11

Sustainability and Trustworthiness

  • Requires resources ($ / £ / …)
  • Are the resources being well spent – will the data be usable?
  • Is the Value (or potential value likely to be derived) worth the Cost
  • An important factor in appraisal – cannot preserve everything
  • There are economies of scale
  • There are limits to the availability of expertise
  • Competition between repositories?
  • Trustworthiness is a way to choose between repositories
  • ISO 16363 certification requires detailed evidence and is fundamentally

linked to usability - from which value, and hence sustainability, is derived

slide-12
SLIDE 12

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 12

Useful Links

  • OAIS
  • WEB pages: www.oais.info
  • Site to gather proposals for OAIS updates in 2017: http://review.oais.info
  • ISO 16363:
  • www.iso16363.org
  • Integrated GLOSSARY of digital preservation

http://www.alliancepermanentaccess.org/index.php/consultancy/dpglossary/

  • SKOS ontology to show relationship between terms from different glossaries
  • OAIS, APARSEN, DPC, ANZ, SNIA, INTERPARES, ISO16363
  • Active Data Management Plans:
  • CCSDS/ISO
  • http://cwe.ccsds.org/moims/default.aspx#_MOIMS-DAI
  • Research Data Alliance:
  • https://www.rd-alliance.org/groups/active-data-management-plans.html
  • Me:
  • www.giaretta.org
slide-13
SLIDE 13

Big Data to Knowledge AHM & Open Data Science Symposium Bethesda, MD 29 Nov – 1 Dec 2016 The Role of Trustworthy Digital Repositories in Sustainability David Giaretta www.giaretta.org 13

END

david@giaretta.org