Bring out yer SIPs: An Introduction to Digital Preservation with - - PowerPoint PPT Presentation

bring out yer sips an introduction to digital
SMART_READER_LITE
LIVE PREVIEW

Bring out yer SIPs: An Introduction to Digital Preservation with - - PowerPoint PPT Presentation

Bring out yer SIPs: An Introduction to Digital Preservation with Archivematica iSkills Workshop February 9, 2018 Grant Hurley, Digital Preservation Librarian, Scholars Portal Agenda - Basic concepts in digital preservation - Introduction


slide-1
SLIDE 1

Bring out yer SIPs: An Introduction to Digital Preservation with Archivematica

iSkills Workshop February 9, 2018

Grant Hurley, Digital Preservation Librarian, Scholars Portal

slide-2
SLIDE 2

Agenda

  • Basic concepts in digital preservation
  • Introduction to Archivematica
  • Preparing transfers + Demo
  • Processing transfers + Demo
  • Looking at AIPs
  • Thinking about DIPs
  • Processing activity
slide-3
SLIDE 3

What’s this “digital preservation” thing?

Uh oh

slide-4
SLIDE 4
  • Digital objects (both born digital

and digitized) need active management to ensure ongoing access

  • Quickly-changing technological

norms create risks that must be managed from the object’s creation

  • Digital preservation is a set of

theories and practices that work to keep digital objects authentic, available and reliable over time.

slide-5
SLIDE 5

Identity: what it is; format identification, descriptive information, provenance, etc. Integrity: establishing that a file remains unaltered over time

slide-6
SLIDE 6

Identity: File formats

filename : '/Users/hurleyg/Documents/Teaching/iSkills/CheckYourBits.jpg' filesize : 582231 modified : 2018-01-24T15:50:08-05:00 errors : matches :

  • ns : 'pronom'

id : 'fmt/43' format : 'JPEG File Interchange Format' version : '1.01' mime : 'image/jpeg' basis : 'extension match jpg; byte match at [[[0 14]] [[582229 2]]]' warning :

File format identifications/descriptions in Pronom (UK National Archives) - ID = Pronom identifier Archivematica uses Siegfried or FIDO

slide-7
SLIDE 7

Integrity: The almighty checksum

md5 checksum = 2c93b97c3d7e53dab9161e389c98465c md5 checksum = 1148058955697062ca583d0cc0474322

slide-8
SLIDE 8

The even more almighty OAIS

slide-9
SLIDE 9

Other important concepts

Identification: determining what a particular file’s format and version is Characterization: extracting metadata related to the file’s intrinsic

  • properties. For example, audio sample rate, channels, etc. for a mp3 file.

Validation: determining if a file is well-formed and valid according to its specification. Normalization: converting a file from a source format to a standardized format.

slide-10
SLIDE 10

What is Archivematica?

slide-11
SLIDE 11

What it does

  • Creates well-formed data packages for long-term

preservation and access

  • Takes a pre-structured transfer from a data source
  • Makes a Submission Information Package (SIP)
  • Transforms the SIP into an Archival Information Package

(AIP)

  • Also can create a dissemination information Package

(DIP) for access

  • Each of these functions has configurable tasks associated
slide-12
SLIDE 12

What it does

  • Stores and applies preservation policies for

normalization, access copies, etc.

  • Allows access to, and deletion of, AIPs
  • Assists in ingest of descriptive metadata, rights

information

  • Manages data flows in and out of system through

separate Storage Service module

  • Can connect to access systems for DIP deposit (mostly

just AtoM)

  • Can be fully automated
slide-13
SLIDE 13

Where it came from

  • Standards for digital preservation developed in late

1990s and early 2000s, but no easy way of applying them

  • UNESCO released 2007 report advocating for open

source digital preservation system

  • Artefactual Systems started up by creating Access to

Memory (AtoM) system for archival description

  • Various small open source tools were also being

developed by others for particular tasks

  • Artefactual developed Archivematica beginning in 2008
  • Beta release in 2012; current release is 1.6.1 (2017)
slide-14
SLIDE 14

What it is

  • Modular workflow created using a microservices design

pattern

  • Data follows structured, chained pathway, there the results of one

step triggers the initiation of the next step.

  • Components can be replaced or turned off/on.
  • Accessible through the browser
  • Requires a virtual machine to run on (Ubuntu or CentOS)
  • Runs in LAMP environment (Linux, Apache, MySQL, PHP)
  • Open source, developed by Artefactual Systems staff
slide-15
SLIDE 15

What it isn’t

  • A storage system
  • An access system
  • Easy to install or maintain in production
  • User friendly
  • A complete digital archives workflow
slide-16
SLIDE 16

Who uses it

Largely, memory institutions (libraries, archives, galleries, museums) with digital collections that need preserving

  • Libraries:
  • Digitized/born-digital content in institutional repositories
  • Research data management (several current projects trying to

develop Archivematica’s capacity in this domain)

  • Digital collections (books, journals, maps, etc.)
  • Archives
  • Digitized collections (photographs, audio-visual materials, etc.)
  • Born digital donations (all sorts of stuff)
  • Private papers/collections
  • Records from corporate bodies, institutions, etc.
slide-17
SLIDE 17

The Workflow

Pre-Transfer*

Selection of

  • bjects to

preserve Metadata preparation Packaging for transfer

Transfer

Generates METS file to be written to Virus scan File ID, characterization, validation

Backlog

You can send something here if you don’t want to continue processing it

Appraisal

File format view/analysis Selection for retention ID sensitive data

Ingest

Normalize files Create & store AIP/DIP

Storage & Access*

Store in location Send access copies to other systems

*Not in Archivematica *Linked to by Archivematica

slide-18
SLIDE 18

Preparing transfers

slide-19
SLIDE 19

Steps

  • Determining content and structure (1 SIP = 1 AIP = fonds,

series, item? Or section of one of these?)

  • Gather and structure metadata (next slide)
  • Gather submission documentation (not in demo)
  • Package and structure for ingest
  • All data needs to be in a directory, at minimum
slide-20
SLIDE 20

Metadata

Descriptive metadata

  • Uses simple Dublin Core as key standard, other information

is recorded as ‘Custom’

  • Transfer level can be added through interface or imported
  • Item level must be imported via csv file

Rights metadata

  • Mapped to PREMIS
  • Same import structure as above
slide-21
SLIDE 21

Demo

  • Set of photos + metadata csv file
  • Bagging using Python script
slide-22
SLIDE 22

Processing transfers

slide-23
SLIDE 23

Demo

  • Same materials as before
  • Uploaded to transfer source on Ontario Library Research

Cloud

  • Process using standard workflow and settings
  • Briefly demo backlog/appraisal tabs
  • Store AIP on OLRC
  • No DIP
slide-24
SLIDE 24

Looking at AIPs

slide-25
SLIDE 25

AIP Contents

  • METS file
  • Originals + normalized copies in ‘objects’ folder
  • Materials that made up original transfer
  • Logs
slide-26
SLIDE 26

Thinking about DIPs

slide-27
SLIDE 27

DIPs

  • Set of normalized files for access, created with access

policies in preservation planning module

  • Archivematica can connect to AtoM for DIP deposit to

existing description

  • Can transfer over some metadata, so description

work can be lessened, but only at transfer/item level

slide-28
SLIDE 28

Activity time!