PANIC PANIC Scientific Data Preservation Scientific Data - - PowerPoint PPT Presentation

panic panic
SMART_READER_LITE
LIVE PREVIEW

PANIC PANIC Scientific Data Preservation Scientific Data - - PowerPoint PPT Presentation

PANIC PANIC Scientific Data Preservation Scientific Data Preservation using Semantic Grid Services using Semantic Grid Services Jane Hunter, jane@dstc.edu.au Objective Objective Address the long term preservation and accessibility of


slide-1
SLIDE 1

PANIC PANIC

Scientific Data Preservation Scientific Data Preservation using Semantic Grid Services using Semantic Grid Services

Jane Hunter, jane@dstc.edu.au

slide-2
SLIDE 2

20th APAN Meeting Taipei Aug 2005

Objective Objective

Address the long term preservation and accessibility of digital objects/scientific data

slide-3
SLIDE 3

20th APAN Meeting Taipei Aug 2005

Problems Problems

  • Obsolescence of physical storage devices
  • Obsolescence of hardware
  • Obsolescence of software

– Operating systems – Authoring software – Web, Application, Database Servers – Search, retrieval software – Rendering/Display software – browser plugins

  • Obsolescence of file formats
slide-4
SLIDE 4

20th APAN Meeting Taipei Aug 2005

Problems Problems

Within digital libraries/scientific data archives:

  • Wide range of file formats - different platforms,

different authoring/display software

  • Massive collections
  • Composite mixed-media objects – web pages,

images, video, audio, Flash, SMIL, SVG

  • Highly proprietary – software & hardware

dependent

  • Dynamic and interactive
  • Difficult to capture – boundary problem
  • Few guides/recommendations
slide-5
SLIDE 5

20th APAN Meeting Taipei Aug 2005

Related Work Related Work

  • LoC National Digital Information Infrastructure

and Preservation Program (NDIIPP)

  • CEDARS, CAMiLEON
  • National Library of Australia, PANDORA
  • Networked European Deposits Library

(NEDLIB)

  • OCLC/RLG Preservation Metadata WG

– PREMIS Preservation Metadata

  • International Internet Preservation Consortium
  • IBM – UVC (Universal Virtual Computer)
  • UK Digital Curation Centre
slide-6
SLIDE 6

20th APAN Meeting Taipei Aug 2005

Current Strategies Current Strategies

  • Maintenance

– of obsolete hardware/software

  • Migration

– convert to sequence of new formats

  • Emulation

– mimic original software application on current environment

  • Preservation Metadata/Encapsulation

– gather information that assists in the process of preservation (e.g., METS) – usually used in conjunction with Emulation or Migration.

  • Normalisation

– original file is converted into platform-independent XML

slide-7
SLIDE 7

20th APAN Meeting Taipei Aug 2005

Existing Tools Existing Tools

  • OCLC’s INFORM, Cornell’s VRC – risk

assessment -> notification services

  • GDFR, PRONOM, DCC-RR – Format registries
  • VersionTracker, IIPC – Software Registries
  • XENA, TOM – Conversion services
  • UVC – Emulation services
slide-8
SLIDE 8

20th APAN Meeting Taipei Aug 2005

Objectives Objectives

Provide an Integrated Preservation Framework which supports:

  • Large, heterogeneous, distributed collections
  • Multiple formats
  • Changing organizational needs

– Range of solutions

  • Flexible, Dynamic, Scalable, Extensible
  • New emerging formats, software, recommendations
  • New migration, emulation services
  • Recommender services/decision support
  • Sustainable - cost-effective, semi-automated
slide-9
SLIDE 9

20th APAN Meeting Taipei Aug 2005

Preservation Metadata Capture Tools (PREMINT, JHOVE, NLNZ)

PANIC

Networked Distributed Archives

Protein Data Bank

Registries

Software Registry (VersionTracker) Format Registry (PRONOM, GDFR)

Recommendation Registry (INFORM)

Web services

Service Descriptions (OWL-S)

Risk Assessment & Notification Services (VRC, INFORM) Preservation Services (XENA, TOM, UVC) SDSS SkyServer ESO Science Archive GenBank ADIL

slide-10
SLIDE 10

20th APAN Meeting Taipei Aug 2005

Steps Steps

  • Archival – selection and capture of digital object +

preservation metadata

  • Risk assessment and notification of potential
  • bsolescence

– New recommendations, format, software versions

  • Service Specification and Request

– Emulation or Migration – Inputs/Outputs – Cost – Speed – Reliability – Lossiness

  • Select, Compose, Invoke Preservation Service
  • Record preservation events
slide-11
SLIDE 11

20th APAN Meeting Taipei Aug 2005

PANIC PANIC A Architecture rchitecture

Preservation Metadata input tool Invocation component Multimedia Collection Preservation Metadata Requester Agent Discovery component Discovery Agent (e.g. Semantic Matchmaker) Notification component Notification Service Registry(s) Internet Preservation Service Registry OWL-S Profiles CustodialOrganization Obsolescence Detector Service Discovery Service Selection Service Invocation WSDL SOAP Provider component TIFF-to-JPEG2000 AIFF-to-MP3 Mac OS1 Emulator

Preservation Web Services

Preservation Service Provider Agent Retrieve and Invoke Appropriate Service(s) Collections Manager

Apache AXIS Sesame RDF Store

slide-12
SLIDE 12

20th APAN Meeting Taipei Aug 2005

Preservation Metadata Preservation Metadata I Input nput/Capture Tool /Capture Tool

  • XML Schema based on

extended METS schema

  • XML metadata is used by

Invocation component.

  • PREMINT Demo available:

Preservation Metadata input tool Multimedia Collection Preservation Metadata Collections Manager

http://metadata.net/panic

slide-13
SLIDE 13

20th APAN Meeting Taipei Aug 2005

METS METS

  • Metadata Encoding and

Transmission Standard

  • Extended to include

presentation and creator intention information

  • Structural metadata –

use SMIL

Presentation Metadata Intention Metadata Descriptive Metadata File Groups Structural Map Administrative Technical Metadata Rights Metadata Source Metadata DigiProv Metadata Extensions

slide-14
SLIDE 14

20th APAN Meeting Taipei Aug 2005

Presentation Metadata

Intention Metadata Descriptive Metadata

File Groups Structural Map Administrative

Technical Metadata

Rights Metadata Source Metadata

DigiProv Metadata

Extensions Metadata Encoding and Transmission Standard (METS) Return Incompatibilities Format Registry Recommendation Registry Software Registry

FormatName FormatType CurrentVersion PreviousVersion ReleaseDate SoftwareName SoftwareType CurrentVersion PreviousVersion ReleaseDate

FormatSupported

Company Platform

Recommendation

FormatVersion Authority URL ReleaseDate FormatName

Compare Extract

Format Details Software Dependencies

Obsolescence Detector

Notification component Notification component

Obsolescence detector – periodically compares the preservation metadata for each object with registries to determine when object is at risk

  • f obsolescence
slide-15
SLIDE 15

20th APAN Meeting Taipei Aug 2005

PANIC PANIC A Architecture rchitecture

Preservation Metadata input tool Invocation component Multimedia Collection Preservation Metadata Requester Agent Discovery component Discovery Agent (e.g. Semantic Matchmaker) Notification component Notification Service Registry(s) Internet Preservation Service Registry OWL-S Profiles CustodialOrganization Obsolescence Detector Service Discovery Service Selection Service Invocation WSDL SOAP Provider component TIFF-to-JPEG2000 AIFF-to-MP3 Mac OS1 Emulator

Preservation Web Services

Preservation Service Provider Agent Retrieve and Invoke Appropriate Service(s) Collections Manager

Step 2. Semi-automated Migration

Client-side software modules which control the invocation of preservation services Provides an interface to Software Version, Format Version and Recommendations registries. Provides an interface to match service request to Web service registries. Delivers and invokes the chosen preservation service Components communicate with each other using platform-neutral standards: OWL-S, WSDL and SOAP

slide-16
SLIDE 16

20th APAN Meeting Taipei Aug 2005

Invocation component Invocation component

  • Service Discovery –provides a

user interface so collections manager can specify the type

  • f preservation service they

are looking for.

  • Service Selection –presents

the services retrieved by the Discovery agent for selection.

  • Service Invocation – invokes

the chosen service and updates the preservation metadata where necessary;

Invocation component Requester Agent Obsolescence Detector Service Discovery Service Selection Service Invocation

slide-17
SLIDE 17

20th APAN Meeting Taipei Aug 2005

OWL OWL-

  • S Ontology for

S Ontology for Web Services Web Services

ServiceGrounding ServiceProfile Service ServiceModel Resources Provides Presents What the service does DescribedBy How it works supports How to access it

(automatic discovery) (automatic discovery) (automatic composition) (automatic composition) (automatic invocation) (automatic invocation)

Superclass

slide-18
SLIDE 18

20th APAN Meeting Taipei Aug 2005

OWL OWL-

  • S Preservation Extensions

S Preservation Extensions

Service ExecutionStatus SystemRequirment Remote Execution Download Creator ReleaseDate ServiceQuality Speed Reliability Emulation

e.g. Windows XP e.g. John Doe e.g. 8-12-2003 e.g High e.g. Low

EmulatedObject EmulationType SystemSetting

e.g. 256 bit palette

Migration OriginalObjectFormat OriginalObjectVersion

e.g. TIFF e.g. 5.12

TargetObjectFormat

e.g. JPEG 2000

TargetObjectVersion

e.g. 2.02

Lossiness

e.g. lossless e.g. MAC OS e.g. OS

subClassOf PreservationService

slide-19
SLIDE 19

20th APAN Meeting Taipei Aug 2005

Discovery component Discovery component

  • Discovery Agent - matches service request against

OWL-S descriptions of Preservation Web services

  • Returns a ranked list of Preservation Web services

that match the request

Discovery component Discovery Agent (e.g. Semantic Matchmaker) Preservation Service Registry OWL-S Profiles

Sesame RDF Store

slide-20
SLIDE 20

20th APAN Meeting Taipei Aug 2005

Provider component Provider component

Provider Agent either:

  • retrieves and invokes

preservation service locally or;

  • Invokes preservation

service remotely

Provider component TIFF-to-JPEG2000 AIFF-to-MP3 Mac OS1 Emulator

Preservation Web Services

Preservation Service Provider Agent

slide-21
SLIDE 21

20th APAN Meeting Taipei Aug 2005

Hypothetical Example Hypothetical Example

  • Russel Coight is an astronomer at the Australian

Telescope National Facility (ATNF)

  • Large collection of astronomy images in TIFF

format

  • ImageViewer 1.0 used to view TIFF images
  • New version of ImageViewer (2.0) no longer

supports TIFF

  • RLG recommends that TIFF format be replaced by

JPEG2000 for archival

slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24

20th APAN Meeting Taipei Aug 2005

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

20th APAN Meeting Taipei Aug 2005

slide-29
SLIDE 29

20th APAN Meeting Taipei Aug 2005

Future Future D Direction irections s

  • Ongoing refinement
  • Web Services Resource Framework (WSRF)
  • Evaluation within real scientific archive
  • Integrate - GDFR, PRONOM, TOM, XENA,

media longevity estimates, risk assessments

  • Trusted services - quality ratings
  • Composite services
  • AONS project – NLA and UK DCC
slide-30
SLIDE 30

20th APAN Meeting Taipei Aug 2005

Conclusions Conclusions

  • (No need to) PANIC
  • Collaborative effort ->

– dynamic, adaptable, intelligent – scalable, extensible, customizable – platform neutral – leverages existing and emerging work – interactive and/or automatic – cost-effective, sustainable

slide-31
SLIDE 31

20th APAN Meeting Taipei Aug 2005

Reference Reference

http://metadata.net/panic Jane Hunter jane@dstc.edu.au