 
              FORTH-I CS A Case Study of Long-Running Business Processes: Digital Information Preservation Yannis Tzitzikas Assistant Professor, Department of Computer Science, University of Crete Associate Researcher, Institute of Computer Science (FORTH-ICS) First SSME Workshop and Summer School “The Business Process in the Science of Service” Heraklion, May 30-June 3, 2007
FORTH-I CS Outline • What is Digital Information Preservation? • Why it is important? • Aspects of Preservation • Preservation Approaches (/Strategies) • The OAIS Reference Model • The CASPAR Project • On preserving the Intelligibility of Digital Objects – Formalizing Intelligibility and Intelligibility Gaps – Intelligibility-aware processes • Concluding Remarks and Directions for Further Research Yannis Tzitzikas, SSME'07 2
FORTH-I CS What is Digital Information Preservation?
FORTH-I CS Phaistos disk (dated to 1700 BC) We still cannot understand it (the meaning has not been preserved) Yannis Tzitzikas, SSME'07 4
FORTH-I CS Egyptian Pyramids We still don’t know how the pyramids were constructed. (the process has not been preserved) Yannis Tzitzikas, SSME'07 5
FORTH-I CS Digital Objects How can we be sure that in the future one would be able to understand this byte stream? 100110110000110111011011101110010111100111 089097110110105115 Yannis It is “ Yannis ” in ASCII How we will preserve the meaning of digital objects? Yannis Tzitzikas, SSME'07 6
Digital Objects FORTH-I CS The need for preserving the process that created a digital object How we will preserve the digital process? process • How this image has been derived? • When and by whom it was taken? • How the satellite image was processed (by Storage what algorithms and with what parameters)? Yannis Tzitzikas, SSME'07 7
Digital Objects FORTH-I CS The need for preserving everyday knowledge I know UML but what this diagram specifies? If I knew Spanish then Empresa Empregado Company Person emprego nome sobrenome employment name name idade age empregue (p) hire(p:Person) fogo(p) fire(p:Person) aumenteIdade() increaseAge() promova(p,inc) promote(p,incr) Employment Emprego salary stipend startDate comeceData endDate termineData Plus everyday knowledge • A person cannot start a job before his/her birth • A promotion cannot lower the salary of an employee ⇒ Now I can develop the system or I can guess how the existing system operates Yannis Tzitzikas, SSME'07 8
FORTH-I CS everything flows nothing stands still [Heraklitus]
FORTH-I CS The need for tackling changes We need to tackle changes in software/hardware and community knowledge Suppose a tourist agency which keeps a web site where a large number of touristic brochures (for various destinations all over the world) are made available in electronic form. All the material is stored in a digital repository Tour of Maribor Notice that: Tour of Maribor w ith only: w ith only: • The Flag is no longer valid • The Country … “does not exist” any more Metadata Metadata • Format: gif • Format: gif • The currency is not valid • City: Maribor • City: Maribor • We may want to change the image format (e.g • Country: Yugoslavia • Country: Yugoslavia gif -> .png) • Currency.type: Yogoslav dinars • Currency.type: Yogoslav dinars (YUM) (YUM) • Currency.Value: 5 • Currency.Value: 5 Yannis Tzitzikas, SSME'07 10
We need to tackle changes because FORTH-I CS … everything flows nothing stands still [Heraklitus] Bosnia & Herzegovina Croatia Bosnia & Herzegovina Croatia Metadata Metadata • Format: giff • Format: giff • Type: Flag • Type: Flag • Country: Yugoslavia • Country: Yugoslavia Montenegro FYROM Montenegro FYROM Serbia Slovenia Serbia Slovenia 1977 2006 Yannis Tzitzikas, SSME'07 11
FORTH-I CS Tackling changes Tour of Maribor Tour of Maribor Tour of Maribor Tour of Maribor w ith only: w ith only: w ith only: w ith only: Metadata Metadata Metadata Metadata Format migration • Format: giff • Format: png • Format: giff • Format: png • City: Maribor • City: Maribor • City: Maribor • City: Maribor • Country: Yugoslavia • Country: Slovenia • Country: Yugoslavia • Country: Slovenia • Currency.type: Yogoslav dinars • Currency.type: Slovenian Tolar • Currency.type: Yogoslav dinars • Currency.type: Slovenian Tolar Knowledge (YUM) • Currency.Value: 3.4 (YUM) • Currency.Value: 3.4 • Currency.Value: 5 update • Currency.Value: 5 1977 July 2006 Yannis Tzitzikas, SSME'07 12
Preservation of Digital Information FORTH-I CS Why it is important? The world produces around 2 exabytes (2 60 ) of unique information per year , • – 90% of which is digital and with a 50% annual growth rate. • “ Everything flows, nothing stands still ” [Heraclitus] � Digital information has to be preserved not only against hardware and • software technology changes, but also against changes in the knowledge of the community. Yannis Tzitzikas, SSME'07 13
FORTH-I CS Aspects of Preservation But what should we preserve? • For sure we have to preserve the bits of the digital objects We should also try to preserve the information carried by the digital objects – Their accessibility – Their integrity – Their authenticity – Their provenance – Their intelligibility (by human or artificial actors) Preservation has been termed “ interoperability with the future ” Yannis Tzitzikas, SSME'07 14
FORTH-I CS What are the current preservation approaches and inititatives?
FORTH-I CS Current preservation approaches Approaches • Replication – Keep multiple copies • Refreshing – Copy data onto newer media or systems • Migration – Replace digital objects of old formats with "equivalent" objects of new formats. • Emulation – An emulator duplicates (provide an emulation of) the functions of one system with a different system, so that the second system behaves like (and appears to be) the first system. Standards – OAIS • (will be discussed next) Ongoing EU Projects – PLANETS • Objective: Support humans in deciding what preservation policy (emulation, migration) to adopt based on criteria like cost, loss of information. – CASPAR • (will be discussed next) Yannis Tzitzikas, SSME'07 16
FORTH-I CS OAIS: Open Archival Information System ( ISO 14721:2003)
FORTH-I CS OAIS: Open Archival Information System OAIS: An archive, consisting of an organization of people and systems, that has accepted the responsibility to preserve information and make it available for a Designated Community (OAIS 1.7.2) – Development led by the Consultative Committee for Space Data Systems (CCSDS) – Published in early 2003 as ISO 14721:2003 – Delivers two high-level models: • Information Model • Functional Model Yannis Tzitzikas, SSME'07 18
OAIS Information Model FORTH-I CS Kinds of Metadata • Representation Information – objective: for taking a collection of bits and convert it to something useful – key notions: Structure , Semantics , Algorithms ,... • Preservation Description Information – objective: for considering the origins and relevance of any digital information – key notions: Provenance, Fixity, Reference and Context • Descriptive Information – role: important for data management, discovery and access Yannis Tzitzikas, SSME'07 19
FORTH-I CS OAIS Information Model Information Object 1+ interpreted interpreted using Data Representation 1+ using Object Information Physical Digital Object Object 1+ Bit Sequence Yannis Tzitzikas, SSME'07 20
OAIS Information Model FORTH-I CS Kinds of Metadata class OAIS Information Model Information Object Data Object 0..* interpretedUsing Digital Object Representation Physical Object Bit Sequence Information 1..* 1..* Structure Semantic Software Algorithms Information Information Information Information Yannis Tzitzikas, SSME'07 21
FORTH-I CS OAIS Functional Model P C Preservation Planning Functional Model of OAIS Functional Model of OAIS R O (6 entities): (6 entities): DIP O N Descriptive • Ingest • Ingest info. D S • Archival Storage Descriptive queries • Archival Storage info. U U • Data Management Data • Data Management result sets SIP C Management M • Administration Access • Administration E E • Preservation Ingest orders • Preservation Planning R R Planning SIP Archival • Access • Access Storage AIP AIP SIP DIP Administration MANAGEMENT • SIP : Submission Information Package • AIP : Archival Information Package (e.g. format) which consist of – IO (Information Object): Data Object + Representation Information – PDI (Preservation Description Information): provenance, context, fixity • DIP : Dissemination Information Package – is the version of the information package delivered to the Consumer in response to an access request. May differ in form (e.g. TIFF to JPEG) or content (e.g. amount of metadata supplied) to that which resides in the archival store. Yannis Tzitzikas, SSME'07 22
Recommend
More recommend