~~~~~~ ~~~~~~ DigCCur 2007, Chapel Hill North Carolina April 18, - - PowerPoint PPT Presentation

digccur 2007 chapel hill north carolina april 18 2007
SMART_READER_LITE
LIVE PREVIEW

~~~~~~ ~~~~~~ DigCCur 2007, Chapel Hill North Carolina April 18, - - PowerPoint PPT Presentation

Consensus Building and Prioritizing Consensus Building and Prioritizing <Metadata> Development <Metadata> Development for Project DRIADE: for Project DRIADE: A Case Study A Case Study ~~~~~~ ~~~~~~ DigCCur 2007, Chapel Hill


slide-1
SLIDE 1

Jane Greenberg, Associative Professor and Director, SILS Metadata Research Center, School of Information and Library Science, University of North Carolina at Chapel Hill

Consensus Building and Prioritizing Consensus Building and Prioritizing <Metadata> Development <Metadata> Development for Project DRIADE: for Project DRIADE: A Case Study A Case Study

~~~~~~ ~~~~~~

DigCCur 2007, Chapel Hill North Carolina April 18, 2007

slide-2
SLIDE 2

Overview Overview

Introduce DRIADE

Motivation

Consensus building Functional requirements Metadata framework Conclusions and next steps Implications for digital curation education

slide-3
SLIDE 3

DRIADE: Digital Repository of DRIADE: Digital Repository of Information and Data for Evolution Information and Data for Evolution

Internet impact / “small science”

Knowledge Network for Biocomplexity (KNB) Marine Metadata Initiative (MMI)

  • Evolutionary biology

Evolutionary biology

Ecology, genomics, paleontology, population

genetics, physiology, systematics, …genomics

Data deposition (Genbank, TreeBase) Supplementary data

Molecular Biology and Evolution

http://www.caffedriade.com/

slide-4
SLIDE 4

DRIADE DRIADE’ ’s goals s goals

One-stop shopping for scientific data

  • bjects supporting published research

Support data acquisition, preservation,

resource discovery, data sharing, and data reuse of heterogeneous digital datasets

Balance a need for low barriers, with

higher-level … data synthesis

slide-5
SLIDE 5

DRIADE Team DRIADE Team

NESCent

Todd Vision, Director

  • f Informatics and

Assistant Professor, Biology, UNC-CH

Hilmar Lapp, Assistant

Director of Informatics UNC-CH/SILS/MRC

Jane Greenberg,

Associate Professor

Jed Dube, MRC

Doctoral Fellow

Sarah Carrier, MRC

Research Assistant

Amy Bouck,

UNC/Duke Biology Postdoc

slide-6
SLIDE 6

Consensus building: Consensus building: Stakeholders Stakeholders’ ’ workshop workshop

  • 1. Unanimous support for DRIADE

⎯ Advance science, cultural change, policing

  • 2. Challenges

⎯ Scope, representation, quality control, security, cultural change, sustainability

  • 3. Priorities and next steps

⎯ Preservation – access – synthesis

  • Maslow’s hierarchy of life needs!

⎯ Cultural change: editorials, publicizing at conferences, requirements

slide-7
SLIDE 7

Functional requirements Functional requirements

GBIF KNB/ SEEK NSDL ICPSR MMI

Heterogeneous digital datasets

▪ ▪ ▪ ▪ ▪

Long-term data stewardship

▪ ▪

Tools and incentives to researchers

▪ ▪ ▪ ▪ ▪

Minimize technical expertise and time required

▪ ▪ ▪ ▪

Intellectual property rights

▪ ▪ ▪

Published Datasets

slide-8
SLIDE 8

Functional requirements Functional requirements

Support:

Computer-aided metadata generation /

augmentation

Specialized modules linking data submission

and manuscript review

Data and metadata quality control by

integrating human and automatic techniques

Data security Basic metadata repository functions, such as

resource discovery, sharing, and interoperability

slide-9
SLIDE 9

DRIADE DRIADE’ ’s functional model based on OAIS s functional model based on OAIS

PRODUCER CONSUMER MANAGEMENT Ingest Preservation Planning Archival storage Data Management DIP

queries results sets

  • rders

AIP Administration Access SIP

Authentication and authorization Data deposition Query expansion and data discovery Data licensing and security Metadata and data quality curation Metadata and data format augmentation Data repository and metadata registry

slide-10
SLIDE 10

DRIADE metadata framework DRIADE metadata framework

Level 1 – initial repository implementation

Preservation, access, and basic usage of data,

(limited use of CVs)

Level 2 – full repository implementation

Level 1 plus expanded usage, interoperability,

preservation, administration, etc., greater use

  • f CV and authority control

Level 3 – “next generation” implementation

Considering Web 2.0 functionalities

slide-11
SLIDE 11

Application profiles Application profiles

“…consist of data elements drawn from one

  • r more namespace schemas combined

together by implementors and optimised for a particular local application.” (Heery& Patel, 2000)

Data Elements: Title, Name, Coverage, Identifier, etc. Namespace schemas:

⎯ Dublin Core ⎯ Data Documentation Initiative (DDI) ⎯ Ecological Metadata Language (EML) ⎯ PREMIS ⎯ Darwin Core

slide-12
SLIDE 12

Why create an Application Profile? Why create an Application Profile?

Single existing schemes are often not

sufficient

Dublin Core scheme doesn’t meet all of

DRIADE needs

Do not need all elements in a single scheme

(e.g. in DDI or EML)

Don't want to re-invent the wheel Interoperability

slide-13
SLIDE 13

Why DRIADE needs an application profile? Why DRIADE needs an application profile?

Evolutionary biology data requires a range

  • f metadata to effectively support:

Unstructured datasets, non-standard formats Varied data relationships, methods, software Varied data object relationships (i.e. part of

larger studies, linkages to publications, etc.)

Immediate and future dataset preservation

slide-14
SLIDE 14

Level 1+ Application Profile Level 1+ Application Profile

Module 1: Bibliographic Citation

dc:title / Title* dc:creator / Author* dc:subject / Subject* dc:publisher / Publisher* dcterms:issued / Year* dcterms:bibliographicCitati

  • n / Citation information*

dc:identifier / Digital

Object Identifier*

slide-15
SLIDE 15

Level 1+ Application Profile Level 1+ Application Profile

Module 2: Data Object

dc:creator / Name dc:title / Data set title dc:identifier / Data set

identifier

fixity (PREMIS) / (hidden) dc:relation / Digital Object

Identifier of published article

DDI: <depositr> /Depositor or

submitter name*

DDI: <contact> / Contact

information for <depositr>*

dc:rights / Rights statement dc:description / Description

  • f the data set *

dc:subject / Keywords

describing the data set *

dc:date / Date modified dc:date / (hidden) dc:format / File format dc:format / File size dc:software / Software dc:coverage / Locality dc:coverage /Date range

slide-16
SLIDE 16

Level 3, Level 3, brainstorm brainstorm… …

Personalization, query results, workflow

“macros”, user interface

Virtual societies utilizing “social tagging” Integration and extension of existing ontologies

Implementation of emerging standards

⎯ Minimal Information About a Phylogenetic Analysis (MIAPA)

Harvesting metadata (pull) / Exposing metadata

(push)

Visualizations: topic clustering data relationship

maps

slide-17
SLIDE 17

Conclusions and next steps Conclusions and next steps

Conclusions

Team work required

⎯stakeholders (scientists and journal

representatives), metadata experts, and sustainability partner

Late to the game, benefit from what’s been

accomplished (e.g., application profile, models)

Need to understand DRIADE’s unique goals

Next steps:

Survey and use-case/life-cycle studies Metadata application profile experiment

slide-18
SLIDE 18

Implications for digital curation education Implications for digital curation education

Students participation, service learning Curriculum needs to address the whole

picture –

Digital resource life-cycle Metadata life cycle IA components Human factors

Language barriers and communication

skills

Metadata facets… woo woo???

Conferences like DigCCur

slide-19
SLIDE 19

References References

Application profiles: mixing and matching metadata schemas

http://www.ariadne.ac.uk/issue25/app-profiles/

Application Profiles, or how to Mix and Match Metadata Schemas

http://www.cultivate-int.org/issue3/schemas/

Dublin Core Element Set: http://dublincore.org/documents/dces/ Data Documentation Initiative (DDI) http://www.icpsr.umich.edu/DDI/ Ecological Metadata Language (EML)

http://knb.ecoinformatics.org/software/eml/

PREMIS http://www.oclc.org/research/projects/pmwg/ Darwin Core Wiki:

http://wiki.tdwg.org/twiki/bin/view/DarwinCore/WebHome