Web At Risk: Extending the Digital Curation Mission to the Web - - PowerPoint PPT Presentation

web at risk extending the digital curation mission to the
SMART_READER_LITE
LIVE PREVIEW

Web At Risk: Extending the Digital Curation Mission to the Web - - PowerPoint PPT Presentation

DigCCurr 2007 April 18-20 UNC Building Capabilities for Digital Curation Repositories Web At Risk: Extending the Digital Curation Mission to the Web Patricia Cruse, Director, Digital Preservation Program Kirsten Neilsen, Digital


slide-1
SLIDE 1

Preservation Program

Digital Preservation Program

Web At Risk: Extending the Digital Curation Mission to the Web

Patricia Cruse, Director, Digital Preservation Program Kirsten Neilsen, Digital Preservation Services Manager California Digital Library

DigCCurr 2007 – April 18-20 UNC Building Capabilities for Digital Curation Repositories

slide-2
SLIDE 2

Preservation Program

Digital Preservation Program

The Digital Preservation Program

  • Established in 2002
  • UC-wide program
  • Goal: ensure long-term availability and accessibility to

materials that are important to the research, teaching, and learning on the UC campuses.

  • Centrally managed
  • Central and external funds
  • A partnership
slide-3
SLIDE 3
slide-4
SLIDE 4

Preservation Program

Digital Preservation Program

Cornerstone of the Program: Digital Preservation Repository (DPR)

  • Suite of tools & services:

– Digital Preservation Repository – Documentation, guidelines, policies

  • Intern’l Standards & Open Source
  • Service oriented architecture: flexible, adaptable,

simple

  • Preservation Partnership

– Curate – Preserve

slide-5
SLIDE 5

Preservation Program

Digital Preservation Program

Digital Preservation Repository core services

  • A set of services that support the long-term

retention of digital objects: – Submit (deposit) digital objects – Manage digital objects: add versions, replace, update, delete – Request dissemination – Request administrative reports (forthcoming)

  • What the service is not…
slide-6
SLIDE 6

Preservation Program

Digital Preservation Program

slide-7
SLIDE 7
slide-8
SLIDE 8

Preservation Program

Digital Preservation Program

DPR to W eb Archiving Service

slide-9
SLIDE 9

Preservation Program

Digital Preservation Program

Web-at-Risk: NDIIPP Funds

Jan 2005 – Jan 2008

  • Build tools to allow librarians to capture, curate and

preserve web-based government and political information.

– Create topical and event-based archives – Capture individual sites and documents

  • Assess the impact of these tools on traditional

collection development practices.

  • Explore web archiving service sustainability.
slide-10
SLIDE 10

Project Partners

slide-11
SLIDE 11

Preservation Program

Digital Preservation Program

Preserving the Web

  • Why all the fuss?
  • What is “Web Archiving?”
  • Web Archiving Service (WAS)

– Collecting content – Curating content

  • Current status & future plans
slide-12
SLIDE 12

Preservation Program

Digital Preservation Program

slide-13
SLIDE 13

Preservation Program

Digital Preservation Program

slide-14
SLIDE 14

Preservation Program

Digital Preservation Program

  • 2003 survey of the .gov domain:

– as much as 65 percent of all government publications that are distributed to libraries through the federal depository library program are currently produced exclusively in electronic form and distributed via the web.

slide-15
SLIDE 15

Preservation Program

Digital Preservation Program

What is a “Web Archive?”

  • Automated method to gather web content
  • Collections composed of multiple sites
  • Captured content preserved
  • Meaningful access to content provided

– Public or end-user access may not be available

slide-16
SLIDE 16

Preservation Program

Digital Preservation Program

slide-17
SLIDE 17

Preservation Program

Digital Preservation Program

Domain-Based Web Archives

Nordic Web Archive Kulturarw3 National Web Archive Nordic National Libraries National Library of Sweden National Library of Iceland

slide-18
SLIDE 18

Preservation Program

Digital Preservation Program

Topical Web Archives

slide-19
SLIDE 19

Preservation Program

Digital Preservation Program

Event-Based Web Archives

slide-20
SLIDE 20

Preservation Program

Digital Preservation Program

slide-21
SLIDE 21

Preservation Program

Digital Preservation Program

Web Archiving Lingo

  • Crawler
  • Host
  • Site
  • Seed
  • Capture
  • Robots.txt
slide-22
SLIDE 22

Preservation Program

Digital Preservation Program

slide-23
SLIDE 23

Preservation Program

Digital Preservation Program

slide-24
SLIDE 24

Preservation Program

Digital Preservation Program

slide-25
SLIDE 25

Preservation Program

Digital Preservation Program

Sample Collection Plan

  • Section 1. Mission & Scope
  • Section 2. Selection
  • Section 3. Acquisition
  • Section 4. Descriptive Metadata
  • Section 5. Rights and Access
  • Section 6. Maintenance and Weeding
  • Section 7. Preservation
  • Appendix A.

Letter of Agreement

  • Appendix B. Seed List
  • Appendix C. Metadata
slide-26
SLIDE 26

Preservation Program

Digital Preservation Program

Flexibility in the face of uncertainty

slide-27
SLIDE 27

Preservation Program

Digital Preservation Program

Title Parallel Title Alternate Title Added Title Series Title Serial Title Uniform Title Other Creator Creator Name Creator Role Creator Information Contributor Contributor Name Contributor Role Contributor Information Publisher Publisher Name Place of Publication Publisher Information Date Original Resource Creation Date Digital Creation Date Language Description Content Description Physical Description Subject and Keywords Primary Source Coverage Place Name Time Period Date Date Range Source Relation Collection Institution Rights Management Resource Type Format Identifier URL URN DOI ISBN ISSN OCLC No. Report No. Government Document No. Accession or Local Control No. UNT Catalog No. RISM No. Other Identifier Note Metadata Information Metadata Creator Date of Creation Metadata Modifier Date of Modification File Information File Size File Name Format Name Format Version File description Resolution Dimension Duration Rate Tonal-Resolution Color Compression Other File information Fixity Information Authentication Type Authentication Result Date First Date Last date System Information Software Creation Application Software Creation Application Name Creation Application Version Access Application Software Access Application Name Access Application Version Other Software Information Hardware Creation Hardware Access Hardware Other Hardware Information Documentation Structural Composition Storage Medium Access Inhibitors Inhibitor Key Functionality Exception Alteration History Action Taken Date of Alteration Modifier Other Alteration Information Metadata Information Metadata Editor/Modifier Metadata Creation/Modification Date Metadata Modification Action Other Metadata Information Comments

What metadata will you need?

slide-28
SLIDE 28

Preservation Program

Digital Preservation Program

Rights Management Approaches

  • Library of Congress

– Extensive rights management efforts – Permission secured for any site not clearly in the public domain

  • If no response, the site is not captured
  • Internet Archive

– Opt-out policy – Obey robots.txt

  • WAS

– Flexibility

slide-29
SLIDE 29

Preservation Program

Digital Preservation Program

Preservation

  • Content preserved in the DPR

– Bit preservation (fixity, integrity) – Replication – Desiccation

  • Massive storage requirements

– Multiple projects investigating mass storage environments

slide-30
SLIDE 30

Preservation Program

Digital Preservation Program

WAS: Now & into the Future

  • Current Status

– in development – 12/07 roll out to current curators

  • Beyond 2007

– Extending service to additional curators – Developing end user access – Exploring release of open access tools

slide-31
SLIDE 31

Preservation Program

Digital Preservation Program

Acknowledgements

  • Tracy Seneca, Web Archiving Coordinator

– CDL WAS development team

  • Kathleen Murray

– UNT Partners

  • NDIIIPP