Preserving Geospatial Data: The National Geospatial Digital - - PowerPoint PPT Presentation

preserving geospatial data the national geospatial
SMART_READER_LITE
LIVE PREVIEW

Preserving Geospatial Data: The National Geospatial Digital - - PowerPoint PPT Presentation

Preserving Geospatial Data: The National Geospatial Digital Archives Approach Greg Jane UC Santa Barbara NGDA genesis One of eight initial NDIIPP partners Members UCSB, Stanford, UT Knoxville, Vanderbilt Goal How


slide-1
SLIDE 1

Preserving Geospatial Data: The National Geospatial Digital Archive’s Approach

Greg Janée

UC Santa Barbara

slide-2
SLIDE 2

Archiving 2009 • 2009-05-05 2

NGDA genesis

  • One of eight initial NDIIPP partners
  • Members

– UCSB, Stanford, UT Knoxville, Vanderbilt

  • Goal

– How to preserve geospatial data, on a national scale, for future generations?

slide-3
SLIDE 3

Archiving 2009 • 2009-05-05 3

Three questions

  • What’s special about geospatial?
  • Are there any design principles that can

last a century?

  • Can we define a useful, implementable,

minimal level of preservation?

slide-4
SLIDE 4

Archiving 2009 • 2009-05-05 4

Geospatial data

  • Representations of

Earth’s surface

– remote-sensing imagery – aerial photography – maps – sensor data – GIS data

georeferenced

  • geotagged photos,

documents

geospatial

slide-5
SLIDE 5

Archiving 2009 • 2009-05-05 5

Challenges

  • No uniform data

model

– vector, raster, topological, discrete, continuous, …

  • Proprietary formats

⇒Many barriers to data mobility formats tools

slide-6
SLIDE 6

Archiving 2009 • 2009-05-05 6

Challenges (cont.)

  • Multiple granule

sizes

– features – layers – databases – projects – cartographic end products

  • Relational data

– geodatabases

a0000004d.gdbindexes a0000004d.gdbtable a0000004d.gdbtablx a0000004e.blk_key_index.atx a0000004e.col_index.atx a0000004e.gdbindexes a0000004e.gdbtable a0000004e.gdbtablx a0000004e.row_index.atx a0000004f.gdbindexes a0000004f.gdbtable a0000004f.gdbtablx a00000050.gdbtable a00000050.gdbtable.sdc a00000050.gdbtable.sdc.prj a00000050.gdbtable.sdi …

slide-7
SLIDE 7

Archiving 2009 • 2009-05-05 7

Challenges (cont.)

  • Large extent

– storage – time

  • Extensive context
  • Implicit context
  • Dynamic data

Visit the USGS Landsat website for important information regarding:

  • ground station facts,
  • Landsat calibration parameter

file details,

  • satellite ephemeris information,
  • satellite anomaly investigations,
  • data acquisition information,
  • image processing particulars,
  • data product guidance,
  • SLC-off data product details,
  • and sample data products.

http://landsat.gsfc.nasa.gov/data/tech_details.html

slide-8
SLIDE 8

Archiving 2009 • 2009-05-05 8

Ocean color example

semianalytic model*

*S. Maritorena, D. Siegel (2005), Consistent merging of satellite

  • cean color data sets using a bio-optical model, Remote Sens. Env.

94(4):429–440, doi:10.1016/j.rse.2004.08.014

surface radiance SeaWiFS MODIS ... chlorophyll ...

slide-9
SLIDE 9

Archiving 2009 • 2009-05-05 9

User’s view

semianalytic model* surface radiance SeaWiFS MODIS ... chlorophyll ... metadata data format (HDF)

slide-10
SLIDE 10

Archiving 2009 • 2009-05-05 10

Preservation of use (only)

semianalytic model* surface radiance SeaWiFS MODIS ... chlorophyll ... metadata data format (HDF) preserve & migrate

slide-11
SLIDE 11

Archiving 2009 • 2009-05-05 11

The curse of reprocessing

  • SeaWiFS*

– Reprocessing 5.2 - Completed July 12, 2007 – Reprocessing 5.1 - Completed July 5, 2005 – Reprocessing 5 - Completed March 18, 2005 – Reprocessing 4.1 - Completed May 24, 2004 – Reprocessing 4 - Completed July 25, 2002 – Reprocessing 3 - Completed May 24, 2000

  • Calibration Update - December 1, 2000
  • Calibration Update - April 10, 2001

– Reprocessing 2 - August, 1998 – Reprocessing 1 - January, 1998

*http://oceancolor.gsfc.nasa.gov/REPROCESSING/

new atmospheric, solar irradiance models

slide-12
SLIDE 12

Archiving 2009 • 2009-05-05 12

Preservation of functionality

semianalytic model* surface radiance SeaWiFS MODIS ... chlorophyll ... metadata data format (HDF) algorithms software calibration ... preserve, migrate, reprocess, revalidate lineage dependency

slide-13
SLIDE 13

Archiving 2009 • 2009-05-05 13

Mike Linda, “OMPS Aggregation and Packaging,” 2006 CLASS Users’ Workshop

Ozone reprocessing requirements

  • xDRs
  • Delivered IPs
  • Engineering data

(incl. C3S data if not in RDRs)

  • Upload files
  • Databases
  • Software (source

code)

  • Calibration artifacts

– data – analysis tools – tables – logs – notebooks – instrument design

  • All project

documentation

  • All scientific papers
  • All reports
slide-14
SLIDE 14

Archiving 2009 • 2009-05-05 14

Challenges— conclusion

  • NGDA archive design requirements:

– compound objects – aggregations and inter-object relationships – extensive context – equal treatment of data, context

  • Unmet challenges:

– storage size – proprietary formats – relational data

slide-15
SLIDE 15

Archiving 2009 • 2009-05-05 15

  • A preservation system should support

its own migration

system 100 years now

system

system

...

Relay principle

slide-16
SLIDE 16

Archiving 2009 • 2009-05-05 16

Fallback principle

archive

export ingest

archive storage system storage system

slide-17
SLIDE 17

Archiving 2009 • 2009-05-05 17

  • A preservation system should support

some form of handoff of its content even if the system itself is no longer functional.

Fallback principle

archive archive storage system storage system

slide-18
SLIDE 18

Archiving 2009 • 2009-05-05 18

iPhoto example

iPhoto Library/

2008/

11/

DSC_0035.jpg DSC_0036.jpg

12/

DSC_0042.jpg ...

AlbumData.xml Dir.data Library.data …

  • all metadata
  • self-describing

schema

slide-19
SLIDE 19

Archiving 2009 • 2009-05-05 19

  • A preservation system should allow archived

information to lapse out of usability, but at all times should support future resurrection of full use of the information.

Resurrection principle

fully curated somewhat usable resurrectable

100 years now

slide-20
SLIDE 20

Archiving 2009 • 2009-05-05 20

NGDA archive system

archive

management, policies, services, access custom software

logical data model

standard packaging of data, semantics instantiation of OAIS

physical data model

survivable, vendor-neutral representation of above filesystems, files, XML

storage virtualization layer

seamless movement, reliability, redundancy Logistical Networking

slide-21
SLIDE 21

Archiving 2009 • 2009-05-05 21

Physical data model

identifier ...pathname/ manifest.xml cnty24k97.xml data/ source/ cnty24k97.shp cnty24k97.dbf ... cnty24k97.png

  • bject structure
  • fixity metadata
  • inter- and intra-object

relationships

slide-22
SLIDE 22

Archiving 2009 • 2009-05-05 22

Defining context

  • Community-related problems

– distributed, implicit, inscrutable to outsiders – “known well to those that know it well”

  • Semantic problems

– formal semantics are too hard – multiple, conflicting, informal specifications – multiple software implementations

  • Conclusion

– context defined by community of practice

slide-23
SLIDE 23

Archiving 2009 • 2009-05-05 23

Capturing context

project wikis ? software scientific literature documentation metadata AIP AIP AIP AIP archive

slide-24
SLIDE 24

Archiving 2009 • 2009-05-05 24

NGDA format registry

wiki page + uploads repository archival

  • bject

community curators

templated automatic synchronization; curator mediation

slide-25
SLIDE 25

Archiving 2009 • 2009-05-05 25

Acknowledgements

  • UC Santa Barbara

– James Frew – Catherine Masi – Justin Mathena – Adam Ross

  • Stanford

– Nancy Hoebelheinrich – Keith Johnson – Julie Sweetkind- Singer

  • UT Knoxville

– Micah Beck – Terry Moore

  • NCSU

– Steve Morris

  • EDINA

– Guy McGarva