Tools and Resources for Data Curation Stephen Abrams Perry Willett - - PowerPoint PPT Presentation

tools and resources for data curation
SMART_READER_LITE
LIVE PREVIEW

Tools and Resources for Data Curation Stephen Abrams Perry Willett - - PowerPoint PPT Presentation

Tools and Resources for Data Curation Stephen Abrams Perry Willett UC Curation Center / California Digital Library Summer Institute June 2014 Agenda Who we are Data curation, publication, and sharing Tools to help you DMPTool


slide-1
SLIDE 1

Tools and Resources for Data Curation

Stephen Abrams Perry Willett UC Curation Center / California Digital Library

Summer Institute June 2014

slide-2
SLIDE 2

Agenda

  • Who we are
  • Data curation, publication, and sharing
  • Tools to help you

– DMPTool – DataUp – Dash – WAS

  • Summary
  • Discussion

BITSS Summer Institute 2 June 2014

slide-3
SLIDE 3

Who we are

“To support the University of California community’s pursuit of scholarship and … public service mission”

BITSS Summer Institute 3 June 2014

{ { {

calibermag.org

UC Libraries UC Libraries

slide-4
SLIDE 4

Data curation, publication, and sharing

  • Increasingly, a requirement for funding and publication
  • Transparency ↔ trust
  • Reduce needless duplication of effort
  • Leverage prior investments
  • Expand the reach of your research, and get credit for it
  • Good for science, good for scientists

BITSS Summer Institute 4 June 2014

www.flickr.com/photos/_after8_/4052028795 berkeley.edu/teach www.flickr.com/photos/infocux/8450190120

slide-5
SLIDE 5

Data curation, publication, and sharing

  • Create/acquire a dataset in a form that is inherently

preservable and (re)usable

  • Describe the dataset in scientifically-meaningful ways
  • Give the dataset a unique identifier for persistent citation
  • License the dataset under CC0 or CC-BY
  • Deposit the dataset in a (non-commercial) repository

where it will receive pro-active curation management

  • Expose the dataset for harvesting by abstracting/

indexing services and search engines

BITSS Summer Institute 5 June 2014

slide-6
SLIDE 6

DMPTool

  • “Fulfill institutional and funder mandates”

BITSS Summer Institute 6 June 2014

dmptool.org blog.dmptool.org github.com/CDLUC3/dmptool/wiki

slide-7
SLIDE 7

DMPTool

  • Free and open f0r all
  • Hosted by CDL, with code released as open source
  • Supports data management requirements for NSF, NIH,

NEH, NOAA, IMLS, and other federal agencies and private funders

  • New version released on May 29
  • Developed by a partnership of universities, museums,

and researchers, with support from Sloan Foundation and IMLS

BITSS Summer Institute 7 June 2014

dmptool.org blog.dmptool.org github.com/CDLUC3/dmptool/wiki

slide-8
SLIDE 8

DMPTool

  • In addition to fulfilling external requirements, the

DMPTool provides:

– Framework to plan for management of research data – Comprehensive list of issues involved with data management best practices – Information about local resources and services: repositories, workshops, consultation services, etc. – Community of stakeholders: researchers, lab managers, IT specialists, archivists, grant administrators, funding agencies

BITSS Summer Institute 8 June 2014

dmptool.org blog.dmptool.org github.com/CDLUC3/dmptool/wiki

slide-9
SLIDE 9

DataUp

  • “Curation for tabular datasets”

BITSS Summer Institute 9 June 2014

dataup.org dataup.cdlib.org

slide-10
SLIDE 10

DataUp

  • Excel is often the database of choice for research

BITSS Summer Institute 10 June 2014

dataup.org dataup.cdlib.org

slide-11
SLIDE 11

DataUp

  • Drag-and-drop data upload
  • Opportunity to add descriptive metadata
  • Assignment of persistent identifier /

generation of persistent citation

  • Best practices check
  • Packaging and submission to ONEShare

repository

BITSS Summer Institute 11 June 2014

dataup.org dataup.cdlib.org Performed automatically

slide-12
SLIDE 12

EZID

  • “Long-term identifiers made easy”

BITSS Summer Institute 12 June 2014

ezid.cdlib.org

slide-13
SLIDE 13

EZID

  • “Long-term identifiers made easy”

BITSS Summer Institute 13 June 2014

ezid.cdlib.org

slide-14
SLIDE 14

EZID

  • “Long-term identifiers made easy”

BITSS Summer Institute 14 June 2014

ezid.cdlib.org

No more 404 errors!

slide-15
SLIDE 15

EZID

  • “Long-term identifiers made easy”

BITSS Summer Institute 15 June 2014

ezid.cdlib.org DOI for persistent citation and bi-directional linking between publications and underlying data

slide-16
SLIDE 16

EZID

  • “Long-term identifiers made easy”

BITSS Summer Institute 16 June 2014

ezid.cdlib.org

slide-17
SLIDE 17

Merritt

  • “Preservation and access”

BITSS Summer Institute 17 June 2014

merritt.cdlib.org

  • No prescriptive requirements on

content genre, type, format, structure, or metadata

  • Strong versioning maintains

complete change history

  • Restricted or public access –

under your control

  • Enforceable data use

agreements (DUAs)

  • Storage replication to UCLA and

UCSD, with ongoing auditing

  • Integration with EZID

and DataONE

  • Proactive preservation analysis,

planning, and intervention

slide-18
SLIDE 18

DataONE

  • “Data observation network for Earth”

BITSS Summer Institute 18 June 2014

dataone.org

  • Cyberinfrastructure

– Distributed grid of member and coordinating nodes – Aggregated discovery – Investigator’s toolkit

  • Community
slide-19
SLIDE 19

Dash

  • “Data sharing made easy”

BITSS Summer Institute 19 June 2014

datashare.ucsf.edu

slide-20
SLIDE 20

Dash

  • Preservation repositories are complex systems
  • Far too often, their interfaces are complicated and meant
  • nly for IT professionals and archivists
  • Dash provides a set of user-friendly screens to step

through the process:

– Select/upload files associated with a dataset – Augment with descriptive metadata – Review that the dataset meets requirements and is ready – Submit to the Merritt preservation repository with optionally replication to DataONE

BITSS Summer Institute 20 June 2014

datashare.ucsf.edu

slide-21
SLIDE 21

Dash

  • Upload dataset files

BITSS Summer Institute 21 June 2014

datashare.ucsf.edu

slide-22
SLIDE 22

Dash

  • Add descriptive information

BITSS Summer Institute 22 June 2014

datashare.ucsf.edu

slide-23
SLIDE 23

Dash

  • Review the dataset

BITSS Summer Institute 23 June 2014

datashare.ucsf.edu

slide-24
SLIDE 24

Dash

  • Submit to a repository

BITSS Summer Institute 24 June 2014

datashare.ucsf.edu

slide-25
SLIDE 25

Dash

  • Search/browse and discovery

BITSS Summer Institute 25 June 2014

datashare.ucsf.edu

slide-26
SLIDE 26

WAS

  • “Capture and preserve the web”

BITSS Summer Institute 26 June 2014

was.cdlib.org webarchives.cdlib.org

slide-27
SLIDE 27

WAS

  • The web is a volatile environment

BITSS Summer Institute 27 June 2014

was.cdlib.org webarchives.cdlib.org

slide-28
SLIDE 28

WAS

  • WAS captures and preserves important web content

BITSS Summer Institute 28 June 2014

was.cdlib.org webarchives.cdlib.org

slide-29
SLIDE 29

WAS

  • WAS captures the web over time

BITSS Summer Institute 29 June 2014

was.cdlib.org webarchives.cdlib.org

slide-30
SLIDE 30

WAS

  • WAS provides curators with tools to capture the free web:

– Schedule web crawls on regular or customized basis – Focus on website itself or include linked sites – Brief 1-hour or full 36-hour crawls – Analyze results with a range of reports – Search across captured websites – Keep archive restricted, or provide public access – Fee-based service

BITSS Summer Institute 30 June 2014

was.cdlib.org webarchives.cdlib.org

slide-31
SLIDE 31

WAS

  • WAS includes archives based on events:

– 2003 California recall election – 2007 Southern California wildfires

  • Thematic archives:

– Grateful Dead archives – US Labor unions and organizations – California political blogs

  • Comprehensive archives of web-domains:

– Emory University – University of Michigan

BITSS Summer Institute 31 June 2014

was.cdlib.org webarchives.cdlib.org

slide-32
SLIDE 32

Service takeaways

BITSS Summer Institute 32 June 2014

  • DMPTool
  • DataUp
  • Dash
  • EZID /

Merritt

  • WAS

Create data management plans required by funders or journals using campus resources Curation services tailored for tabular datasets Simplified interfaces for repository submission and discovery Core infrastructural services generally hidden beneath simple intuitive interfaces Curation services tailored for web-published content and data

slide-33
SLIDE 33

Summary

BITSS Summer Institute 33 June 2014

  • Good data management practice is critical to the success
  • f the academic enterprise and scholarly advancement
  • Management solutions should be integrated into existing

research systems and workflows

  • The UC Libraries are a natural partner for data

management advice and solutions

  • UC3 offers a comprehensive roster of innovative and

intuitive curation services applicable across the data and scholarly lifecycle

slide-34
SLIDE 34

For more information

  • UC Curation Center www.cdlib.org/uc3

datapub.cdlib.org uc3@ucop.edu

  • DMPTool

dmptool.org

  • DataUp/ONEShare dataup.org
  • Dash

datashare.ucsf.edu

– EZID

ezid.cdlib.org

– Merritt

merritt.cdlib.org – DataONE dataone.org

  • WAS

was.cdlib.org

BITSS Summer Institute 34 June 2014