The Open Archives Initiative: a low-barrier framework for - - PowerPoint PPT Presentation

the open archives initiative a low barrier framework for
SMART_READER_LITE
LIVE PREVIEW

The Open Archives Initiative: a low-barrier framework for - - PowerPoint PPT Presentation

www.openarchives.org The Open Archives Initiative: a low-barrier framework for interoperability Carl Lagoze Computing and Information Science Cornell University lagoze@cs.cornell.edu Interoperability Trade-offs MARC/ SGML AACR2 FGDC


slide-1
SLIDE 1

www.openarchives.org

The Open Archives Initiative: a low-barrier framework for interoperability

Carl Lagoze Computing and Information Science Cornell University lagoze@cs.cornell.edu

slide-2
SLIDE 2

Interoperability Trade-offs

Cost Functionality

less function, more acceptance less function, more acceptance more function, less acceptance more function, less acceptance

ASCII SGML HTML MARC/ AACR2 FGDC Dublin Core

OAI-PMH

slide-3
SLIDE 3

The Open Archives Initiative

The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly

  • communication. … The fundamental technological

framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials. OAI Mission Statement

slide-4
SLIDE 4

OAI Protocol for Metadata Harvesting (OAI-PMH)

The goal of the Open Archives Initiative Protocol for Metadata Harvesting … is to supply and promote an application- independent interoperability framework that can be used by a variety of communities who are engaged in publishing content on the Web. The OAI protocol … permits metadata harvesting.

slide-5
SLIDE 5

OAI-PMH: A simple two party model for sharing structured information

Data Providers

Metadata harvesting Discovery Current Awareness Preservation

Service Providers

slide-6
SLIDE 6

Yes, its about resource discovery over distributed collections

Author Title Abstract Identifer

metadata

slide-7
SLIDE 7

Facilitating/Monitoring Longevity of Distributed Content

Event Records

P1 A1 P2 A2 P3 A3 Policy Enforcer actions

Preservation Service

Web Site Web Site Managed Repository Selective Web Crawling Managed Repository

Preservation Metadata Preservation Metadata

Metadata Harvesting

slide-8
SLIDE 8

Personalization of Content

DigitalObject Realaudio video Powerpoint presentation SMIL synchronization metadata

structural metadata

Portal A Portal B

View A:

  • View Slides
  • View Video
  • View synchronized presentation using applet

View B:

  • Get Transcript of Audio
  • Search for keyword
  • Get Slides translated to French

Tool Repository

slide-9
SLIDE 9

Cross-Repository Reference Linking

Linkage Service

citation metadata citation metadata citation metadata citation metadata citation metadata

slide-10
SLIDE 10

Brief History of the OAI

  • Motivation: expand impact of ePrint

archives through federation

  • 1999: Santa Fe Meeting and convention
  • 2000: OAI-PMH formation

– Scope broadens – OAI steering committee

  • 2001 OAI-PMH v. 1.0 “experimental”

protocol

  • 2002 OAI-PMH v. 2.0 “stable” protocol
slide-11
SLIDE 11

OAI-PMH Key technical features

  • Deploy now technology – 80/20 rule
  • Simple HTTP encoding
  • Foundation of established XML standards
  • Multiple metadata formats
  • Repository partitioning (sets)
  • Selective harvesting (sets and dates)
  • Clean partition between core and

implementation-specific extensions

– Multiple item-level metadata – Collection level metadata

slide-12
SLIDE 12

OAI Verbs

  • Identify – repository characteristics
  • ListMetadataFormats – DC required
  • ListSets – repository paritioning
  • ListRecords – (selectively) harvest

metadata

  • ListIdentifiers – (selectively) harvest

metadata identifiers

  • GetRecord – known item retrieval
slide-13
SLIDE 13

Measures of Success

  • Registered data providers
  • Adoption by major projects
  • Acceptance as ‘fundamental infrastructure’

for research and implementation

slide-14
SLIDE 14

OAI Registered Data Providers

20 40 60 80 100 120 1 / 1 5 / 2 1 2 / 1 5 / 2 1 3 / 1 5 / 2 1 4 / 1 5 / 2 1 5 / 1 5 / 2 1 6 / 1 5 / 2 1 7 / 1 5 / 2 1 8 / 1 5 / 2 1 9 / 1 5 / 2 1 1 / 1 5 / 2 1 1 1 / 1 5 / 2 1 1 2 / 1 5 / 2 1 1 / 1 5 / 2 2 2 / 1 5 / 2 2 3 / 1 5 / 2 2 4 / 1 5 / 2 2 5 / 1 5 / 2 2 6 / 1 5 / 2 2 7 / 1 5 / 2 2

Total # Registered Sites

slide-15
SLIDE 15

National Science Digital Library (NSDL)

  • Very large scale distributed digital library

– 1,000,000 users – 10,000,000 items – 100,000 collections

  • Large institutional and funding commitment

– $25M+ funding – Over 80 collaborating institutions

  • Technical infrastructure builds on OAI-PMH

foundation

– Aggregation and dissemination of metadata

  • http://www.nsdl.org
slide-16
SLIDE 16

Fundamental Infrastructure

  • Eprints.org servers

– e.g., Cal Tech ePrint framework

  • Open language archives community
  • JISC FAIR awards
  • Mellon OAI service providers
  • ECDL , DCADL, JCDL research papers
slide-17
SLIDE 17

Some questions remain

  • Is OAI-PMH really low-barrier

infrastructure?

– NSDL experience indicates that significant barriers remain

  • Utility of core metadata (unqualified DC)

– NSDL and other experience raises doubts

  • Utility outside of resource discovery

– Certification, Reference linking, etc.

slide-18
SLIDE 18

Future Questions and Directions

  • “Standardization”?

– De-facto? – Maintenance agency? – Formal standards agency?

  • Future OAI-PMH versions?

– Expanded functionality?

  • Targeted ‘application profiles’?

– ePrints community?