PANIC PANIC Scientific Data Preservation Scientific Data - - PowerPoint PPT Presentation
PANIC PANIC Scientific Data Preservation Scientific Data - - PowerPoint PPT Presentation
PANIC PANIC Scientific Data Preservation Scientific Data Preservation using Semantic Grid Services using Semantic Grid Services Jane Hunter, jane@dstc.edu.au Objective Objective Address the long term preservation and accessibility of
20th APAN Meeting Taipei Aug 2005
Objective Objective
Address the long term preservation and accessibility of digital objects/scientific data
20th APAN Meeting Taipei Aug 2005
Problems Problems
- Obsolescence of physical storage devices
- Obsolescence of hardware
- Obsolescence of software
– Operating systems – Authoring software – Web, Application, Database Servers – Search, retrieval software – Rendering/Display software – browser plugins
- Obsolescence of file formats
20th APAN Meeting Taipei Aug 2005
Problems Problems
Within digital libraries/scientific data archives:
- Wide range of file formats - different platforms,
different authoring/display software
- Massive collections
- Composite mixed-media objects – web pages,
images, video, audio, Flash, SMIL, SVG
- Highly proprietary – software & hardware
dependent
- Dynamic and interactive
- Difficult to capture – boundary problem
- Few guides/recommendations
20th APAN Meeting Taipei Aug 2005
Related Work Related Work
- LoC National Digital Information Infrastructure
and Preservation Program (NDIIPP)
- CEDARS, CAMiLEON
- National Library of Australia, PANDORA
- Networked European Deposits Library
(NEDLIB)
- OCLC/RLG Preservation Metadata WG
– PREMIS Preservation Metadata
- International Internet Preservation Consortium
- IBM – UVC (Universal Virtual Computer)
- UK Digital Curation Centre
20th APAN Meeting Taipei Aug 2005
Current Strategies Current Strategies
- Maintenance
– of obsolete hardware/software
- Migration
– convert to sequence of new formats
- Emulation
– mimic original software application on current environment
- Preservation Metadata/Encapsulation
– gather information that assists in the process of preservation (e.g., METS) – usually used in conjunction with Emulation or Migration.
- Normalisation
– original file is converted into platform-independent XML
20th APAN Meeting Taipei Aug 2005
Existing Tools Existing Tools
- OCLC’s INFORM, Cornell’s VRC – risk
assessment -> notification services
- GDFR, PRONOM, DCC-RR – Format registries
- VersionTracker, IIPC – Software Registries
- XENA, TOM – Conversion services
- UVC – Emulation services
20th APAN Meeting Taipei Aug 2005
Objectives Objectives
Provide an Integrated Preservation Framework which supports:
- Large, heterogeneous, distributed collections
- Multiple formats
- Changing organizational needs
– Range of solutions
- Flexible, Dynamic, Scalable, Extensible
- New emerging formats, software, recommendations
- New migration, emulation services
- Recommender services/decision support
- Sustainable - cost-effective, semi-automated
20th APAN Meeting Taipei Aug 2005
Preservation Metadata Capture Tools (PREMINT, JHOVE, NLNZ)
PANIC
Networked Distributed Archives
Protein Data Bank
Registries
Software Registry (VersionTracker) Format Registry (PRONOM, GDFR)
Recommendation Registry (INFORM)
Web services
Service Descriptions (OWL-S)
Risk Assessment & Notification Services (VRC, INFORM) Preservation Services (XENA, TOM, UVC) SDSS SkyServer ESO Science Archive GenBank ADIL
20th APAN Meeting Taipei Aug 2005
Steps Steps
- Archival – selection and capture of digital object +
preservation metadata
- Risk assessment and notification of potential
- bsolescence
– New recommendations, format, software versions
- Service Specification and Request
– Emulation or Migration – Inputs/Outputs – Cost – Speed – Reliability – Lossiness
- Select, Compose, Invoke Preservation Service
- Record preservation events
20th APAN Meeting Taipei Aug 2005
PANIC PANIC A Architecture rchitecture
Preservation Metadata input tool Invocation component Multimedia Collection Preservation Metadata Requester Agent Discovery component Discovery Agent (e.g. Semantic Matchmaker) Notification component Notification Service Registry(s) Internet Preservation Service Registry OWL-S Profiles CustodialOrganization Obsolescence Detector Service Discovery Service Selection Service Invocation WSDL SOAP Provider component TIFF-to-JPEG2000 AIFF-to-MP3 Mac OS1 Emulator
Preservation Web Services
Preservation Service Provider Agent Retrieve and Invoke Appropriate Service(s) Collections Manager
Apache AXIS Sesame RDF Store
20th APAN Meeting Taipei Aug 2005
Preservation Metadata Preservation Metadata I Input nput/Capture Tool /Capture Tool
- XML Schema based on
extended METS schema
- XML metadata is used by
Invocation component.
- PREMINT Demo available:
Preservation Metadata input tool Multimedia Collection Preservation Metadata Collections Manager
http://metadata.net/panic
20th APAN Meeting Taipei Aug 2005
METS METS
- Metadata Encoding and
Transmission Standard
- Extended to include
presentation and creator intention information
- Structural metadata –
use SMIL
Presentation Metadata Intention Metadata Descriptive Metadata File Groups Structural Map Administrative Technical Metadata Rights Metadata Source Metadata DigiProv Metadata Extensions
20th APAN Meeting Taipei Aug 2005
Presentation Metadata
Intention Metadata Descriptive Metadata
File Groups Structural Map Administrative
Technical Metadata
Rights Metadata Source Metadata
DigiProv Metadata
Extensions Metadata Encoding and Transmission Standard (METS) Return Incompatibilities Format Registry Recommendation Registry Software Registry
FormatName FormatType CurrentVersion PreviousVersion ReleaseDate SoftwareName SoftwareType CurrentVersion PreviousVersion ReleaseDate
FormatSupported
Company Platform
Recommendation
FormatVersion Authority URL ReleaseDate FormatName
Compare Extract
Format Details Software Dependencies
Obsolescence Detector
Notification component Notification component
Obsolescence detector – periodically compares the preservation metadata for each object with registries to determine when object is at risk
- f obsolescence
20th APAN Meeting Taipei Aug 2005
PANIC PANIC A Architecture rchitecture
Preservation Metadata input tool Invocation component Multimedia Collection Preservation Metadata Requester Agent Discovery component Discovery Agent (e.g. Semantic Matchmaker) Notification component Notification Service Registry(s) Internet Preservation Service Registry OWL-S Profiles CustodialOrganization Obsolescence Detector Service Discovery Service Selection Service Invocation WSDL SOAP Provider component TIFF-to-JPEG2000 AIFF-to-MP3 Mac OS1 Emulator
Preservation Web Services
Preservation Service Provider Agent Retrieve and Invoke Appropriate Service(s) Collections Manager
Step 2. Semi-automated Migration
Client-side software modules which control the invocation of preservation services Provides an interface to Software Version, Format Version and Recommendations registries. Provides an interface to match service request to Web service registries. Delivers and invokes the chosen preservation service Components communicate with each other using platform-neutral standards: OWL-S, WSDL and SOAP
20th APAN Meeting Taipei Aug 2005
Invocation component Invocation component
- Service Discovery –provides a
user interface so collections manager can specify the type
- f preservation service they
are looking for.
- Service Selection –presents
the services retrieved by the Discovery agent for selection.
- Service Invocation – invokes
the chosen service and updates the preservation metadata where necessary;
Invocation component Requester Agent Obsolescence Detector Service Discovery Service Selection Service Invocation
20th APAN Meeting Taipei Aug 2005
OWL OWL-
- S Ontology for
S Ontology for Web Services Web Services
ServiceGrounding ServiceProfile Service ServiceModel Resources Provides Presents What the service does DescribedBy How it works supports How to access it
(automatic discovery) (automatic discovery) (automatic composition) (automatic composition) (automatic invocation) (automatic invocation)
Superclass
20th APAN Meeting Taipei Aug 2005
OWL OWL-
- S Preservation Extensions
S Preservation Extensions
Service ExecutionStatus SystemRequirment Remote Execution Download Creator ReleaseDate ServiceQuality Speed Reliability Emulation
e.g. Windows XP e.g. John Doe e.g. 8-12-2003 e.g High e.g. Low
EmulatedObject EmulationType SystemSetting
e.g. 256 bit palette
Migration OriginalObjectFormat OriginalObjectVersion
e.g. TIFF e.g. 5.12
TargetObjectFormat
e.g. JPEG 2000
TargetObjectVersion
e.g. 2.02
Lossiness
e.g. lossless e.g. MAC OS e.g. OS
subClassOf PreservationService
20th APAN Meeting Taipei Aug 2005
Discovery component Discovery component
- Discovery Agent - matches service request against
OWL-S descriptions of Preservation Web services
- Returns a ranked list of Preservation Web services
that match the request
Discovery component Discovery Agent (e.g. Semantic Matchmaker) Preservation Service Registry OWL-S Profiles
Sesame RDF Store
20th APAN Meeting Taipei Aug 2005
Provider component Provider component
Provider Agent either:
- retrieves and invokes
preservation service locally or;
- Invokes preservation
service remotely
Provider component TIFF-to-JPEG2000 AIFF-to-MP3 Mac OS1 Emulator
Preservation Web Services
Preservation Service Provider Agent
20th APAN Meeting Taipei Aug 2005
Hypothetical Example Hypothetical Example
- Russel Coight is an astronomer at the Australian
Telescope National Facility (ATNF)
- Large collection of astronomy images in TIFF
format
- ImageViewer 1.0 used to view TIFF images
- New version of ImageViewer (2.0) no longer
supports TIFF
- RLG recommends that TIFF format be replaced by
JPEG2000 for archival
20th APAN Meeting Taipei Aug 2005
20th APAN Meeting Taipei Aug 2005
20th APAN Meeting Taipei Aug 2005
Future Future D Direction irections s
- Ongoing refinement
- Web Services Resource Framework (WSRF)
- Evaluation within real scientific archive
- Integrate - GDFR, PRONOM, TOM, XENA,
media longevity estimates, risk assessments
- Trusted services - quality ratings
- Composite services
- AONS project – NLA and UK DCC
20th APAN Meeting Taipei Aug 2005
Conclusions Conclusions
- (No need to) PANIC
- Collaborative effort ->
– dynamic, adaptable, intelligent – scalable, extensible, customizable – platform neutral – leverages existing and emerging work – interactive and/or automatic – cost-effective, sustainable
20th APAN Meeting Taipei Aug 2005