ARCLib Development of Open Source Solution for Long-term - - PowerPoint PPT Presentation

arclib development of open source solution for long term
SMART_READER_LITE
LIVE PREVIEW

ARCLib Development of Open Source Solution for Long-term - - PowerPoint PPT Presentation

ARCLib Development of Open Source Solution for Long-term Preservation Martin Lhotk Library of the Czech Academy of Sciences 11. 6. 2019 Open Repositories, Hamburg ARCLib Complex Solution for Long Term Archiving of (Library)


slide-1
SLIDE 1

ARCLib – Development of Open Source Solution for Long-term Preservation

Martin Lhoták Library of the Czech Academy of Sciences

  • 11. 6. 2019

Open Repositories, Hamburg

slide-2
SLIDE 2

ARCLib

  • Complex Solution for Long Term Archiving of

(Library) Digital Collections

  • Applied research grant from Ministry of Culture of CR
  • Technologies and methodologies for preservation of

culture heritage

  • 2016-2020
  • 850k Euro
slide-3
SLIDE 3

ARCLib

Goals:

  • Development of complex OS LTP system

using Archivematica as 1 of its components

  • Logical preservation methodology for Czech

institutions

  • Bit level preservation methodology
slide-4
SLIDE 4

ARCLib

State of archiving in CZ libraries

  • simple file system with TAR or ZIP packages, backups
  • uploads to cloud services (mostly just backups)
  • central (commercial) solution in National Digital Library project (for

the NL mainly)

  • no reasonable alternative (40 libraries with Kramerius, 15

DSpace,...)

  • master copies, access copies, several generation of metadata

standards, some metadata without standards

  • central registry of digitization, but no registry of archival packages
slide-5
SLIDE 5

ARCLib – inspiration

Systems

  • Archivematica
  • RODA
  • Commercial solutions

(Rosetta, Preservica,...)

  • Custom made solutions

(NDK, NDA SVK)

Projects

  • CESNET LTP Pilot (testing
  • f Archivematica)
  • NDK – National Digital

Library

  • Czech National Archive

(Archivematica)

  • Foreign projects (Finland,

Germany)

slide-6
SLIDE 6

ARCLib – inspiration

Archivematica

  • pen source software
  • rapid development
  • too big and too general ... (Finland experience)
  • dependency, uncertainty
  • it could be probably more simple (we don’t need universal

product, we need system for defined environment and type of data)

  • inspiring approaches – microservices, the way of management
  • f ingest
slide-7
SLIDE 7

ARCLib – standards

Data and metadata standards

  • ISO 14721 (OAIS)
  • ISO 16363 (Audit a certification)
  • Data Seal of Approval
  • NDK (National Digital Library) standard
  • Kramerius and Dspace formats
  • Export from ProArc production system (NDK or wider)
slide-8
SLIDE 8

ARCLib – tools

Available open tools

  • format identification, validation, technical metadata extraction
  • DROID, FIDO, JHOVE, JPYLYZER, etc.
slide-9
SLIDE 9

ARCLib – functional requirements

slide-10
SLIDE 10

ARCLib AIP

AIP consist of two parts:

provided SIP + metadata partly extracted from SIP

  • BibMD – DC + MODS
  • TechMD – type of scanner, date of scannig, operator, data from

JHOVE, etc. and partly generated by ARCLib on ingest AdmMD – data provider, workflow, validation log, validation profile, format identification, date, etc.

slide-11
SLIDE 11

ARCLib Ingest

Creation of AIP packages from SIP packages

  • antivirus and MD5 controls
  • validation according to validation profiles
  • extraction of metadata from XML of SIP
  • identification of formats in SIP
  • creation of ARCLib AIP and transfer to persistent storage
slide-12
SLIDE 12

ARCLib Data Management

Database of AIP packages (location, BibMD, TechMD, AdmMD)

  • indexation of ARCLib AIP XML
  • indexation of complete content of SIP
  • editation of ARCLib AIP XML
  • export of DIP
  • API
slide-13
SLIDE 13

ARCLib Archival Storage

  • Archival Storage is complex service for bit-level

preservation enabling replication of data in more geographical localities using advanced technologies to store data

slide-14
SLIDE 14

ARCLib Archival Storage

slide-15
SLIDE 15

ARCLib Administration

  • Workflow configuration
  • Infrastructure control
  • Registry of users, their roles, profiles o SIPs, data

providers, validation profiles and storage locations

  • Communication with database
  • Control of storage capacity
  • Administration of jobs
slide-16
SLIDE 16

ARCLib Access

  • ARCLib is backend aplication not given for end users
  • ARCLib doesn’t solve any access politics
  • export from AIP to DIP is 1 : 1
slide-17
SLIDE 17

ARCLib Preservation Planning

  • format database – registry of used formats
  • registry of profiles / workflows
  • events in archive
slide-18
SLIDE 18

ARCLib

  • ARCLib is a solution for logical and bit-level

preservation/protection of digital data.

  • ARCLib doesn’t have pre-ingest or deposit module –

it doesn’t involve conversion to SIP.

  • ARCLib is system for management of archival

packages.

  • ARCLib is dark archive. It is not repository for end

users.

slide-19
SLIDE 19

ARCLib release harmonogram

  • 2018-2019

prototype testing

  • 2020

full final version

https://arclib.cz/

slide-20
SLIDE 20

Thank you for attention

Martin Lhoták Lhotak@knav.cz http://www.knav.cz