SLIDE 1 Prioritizing use over perfection: a risk management approach to digital preservation
Matthew Mihalik George Washington University Rachel Trent Library of Congress (Formerly George Washington University)
SLIDE 2
GW has committed to a transparent digital stewardship strategy that prioritizes access to digital assets over adherence to digital preservation standards. Our current strategy is informed by our past experience, failures, and our users needs.
Introduction
SLIDE 3
Invested 1.5 years developing a custom storage environment, id system, custom inventory and audit tools. Features: file system inventory system, web management UI, synchronization between storage environments, built in checksum auditing Outcome: project became paralyzed by complexity and was never adopted.
Distant history
SLIDE 4
No articulated commitments for our digital preservation work. Disconnected storage environments split between access and preservation. No active auditing or inventorying of digital assets. Patrons were only able to access a subset of our digital collections
Recent history
SLIDE 5
- No available inventory of digital assets on storage
- No access controls to preservation servers
- No auditing of digital assets on preservation servers
- Limited redundant copies of digital assets
- No clearly articulated policy of commitments to our
digital content
- Unclear roles and responsibilities between GW units
for management of digital assets
Initial risk landscape
SLIDE 6 We defined set of principles that our stakeholders were able to commit to and defined our minimum viable product. They decided we wanted to first provide access to more
- ur digital collections and then we wanted ensure that
we know what we have, where we have it, and if it’s changed.
What did we decide to do about it?
SLIDE 7 GW Libraries’ Digital Stewardship Services provides long-term preservation of selected unique, rare, and institutionally-created digital materials, such as student and faculty research products, University records of enduring value, and specialized cultural heritage
- collections. These include born-digital and digitized materials.
New mission statement
SLIDE 8 As a part of our digital steward initiatives, we committed to being transparent with our stakeholders and users about what GW is and is not doing for our digital assets. GW has committed to preserving and providing access to this carefully selected set of digital materials over the long term. Commitments are the result of strategic resource planning that balances the benefits of providing engaging, rich access for today’s users with key investments to support access for future users.
Transparent commitments
SLIDE 9
Tier 2
SLIDE 10
Tier 1
SLIDE 11
Tier 0
SLIDE 12
- Storage environment comprised of linux servers
○ Current file systems mirror legacy storage file systems. ○ Offsite backups of these servers of accepted as our “second” copy
- Access environment built on Hyrax
- Simple Audit Tool
- Amazon Web Services for offsite copies
○ Reserved for a selective subset of materials
Current infrastructure
SLIDE 13
Stakeholder group at GW responsible for digital stewardship decision making and resourcing. Membership includes: associate deans, IT staff, developers, scholarly communication staff, and digital services staff
Digital stewardship group
SLIDE 14
- Know if something has been added to a filesystem
- Know if something has been removed from a
filesystem
- Know if an asset on the filesystem has changed
- Know who performed actions on the filesystem
- Ability to schedule audits and run ad-hoc
- Email reports with results
Digital services team needs
SLIDE 15
Developed to meet our digital services team basic needs to know where assets are stored, what we have, and if anything has changed. Command line tool written in Python that can be run manually or via cronjob. Available on GitHub: gwu-libraries/audit-tool
Simple audit tool
SLIDE 16
Excel report of files missing from inventory
SLIDE 17
Excel report results summary
SLIDE 18
JSON report sample
SLIDE 19
- Inventory of digital assets on storage tracking adds, deletes, and
changes
- Limited access control in place on preservation servers
- Proactive and ad-hoc auditing of digital assets on preservation
servers with routine reporting
- Redundant copies of selective digital assets
- Clearly articulated policy of commitments to our digital content
- Clear roles and responsibilities between GW units for management of
digital assets
Current risk landscape
SLIDE 20
Implement an administrative web interface to facilitate searching for items on storage servers by filename. Enhancing our support for Tier 1 content Explore automated restoration of files using JSON report outputs
What’s next? pt. 1
SLIDE 21
Develop a collection management policy for born digital content Evaluate restructuring our storage server filesystems from legacy paths to a modern storage hierarchy Annually reassess our risk management strategy
What’s next? pt. 2
SLIDE 22
Exploring integrating MetaArchive as a storage location within our infrastructure. Looking at leveraging Simple Audit Tool with items stored in MetaArchive for consistency. Updating our digital services catalog to reflect this new endpoint.
What’s next pt. 3
SLIDE 23