OPeNDAP Hyrax An extensible data access framework within the Earth - - PowerPoint PPT Presentation

opendap hyrax
SMART_READER_LITE
LIVE PREVIEW

OPeNDAP Hyrax An extensible data access framework within the Earth - - PowerPoint PPT Presentation

OPeNDAP OPeNDAP Hyrax An extensible data access framework within the Earth System Grid Federation Patrick West 1 , Peter Fox 1 , James Gallagher 2 , Nathan Potter 2 , Dan Halloway 2 , Stephan Zednik 1 1. Tetherless World Constellation


slide-1
SLIDE 1

OPeNDAP Hyrax

An extensible data access framework within the Earth System Grid Federation

Patrick West1, Peter Fox1, James Gallagher2, Nathan Potter2, Dan Halloway2, Stephan Zednik1

  • 1. Tetherless World Constellation (http://tw.rpi.edu)

– Rensselaer Polytechnic Institute (http://www.rpi.edu)

  • 2. OPeNDAP (http://www.opendap.org)

AGUFM2011-IN43C-04

1

OPeNDAP

slide-2
SLIDE 2

Motivation

  • There are more and more, and larger, data sets being

collected all the time

  • Researchers don’t have the capability to download all
  • f this data in order to get their work done
  • More and more server-side functionality needs to be

provided to work with the data

  • More advanced services also need to be provided for

additional client support and data manipulation

  • The OPeNDAP Hyrax framework provides for these

additional services and capabilities due to its modular design

2

OPeNDAP

slide-3
SLIDE 3

Key Points

  • Earth System Grid Federation (ESG) is such a project

where larger and more datasets are being collected for use by researchers

  • But … the most common method for researchers to use

this data is to download it

  • OPeNDAP Hyrax is a data access service that can be

used to access remote data

  • But … there is still a lot of work that can be done within

the Hyrax Framework

  • At the same time, there are other services that can be

provided, but not within the context of a data-access framework.

3

OPeNDAP

slide-4
SLIDE 4

ESGF

  • The Earth System Grid Federation (ESGF) is a

non-profit organization formed by the participants

  • f GO-ESSP providing software for the access

and dissemination of climate data

4

OPeNDAP

slide-5
SLIDE 5

ESGF

5

DIAGRAM OPeNDAP

slide-6
SLIDE 6

OPeNDAP

  • OPeNDAP (Open-source Project for a Network

Data Access Protocol) is a non-profit organization that provides software and services for the access and manipulation of data in support of the DAP2 (Data Access Protocol) standard.

  • OPeNDAP Hyrax is a software framework that

provides for the networking of scientific data, implementing the DAP2 standard.

  • TDS (THREDDS Data Server) is not an

OPeNDAP software product, but a software server provided by Unidata that can act as a DAP server

  • Different language bindings for DAP - C++, Pydap,

JDap

6

OPeNDAP

slide-7
SLIDE 7

SQL Database THREDDS Catalogs

OLFS BES

netCDF Data FIles

netCDF DAP SQL WxS WxS

BES Commands XML Encapsulated Response DAP2 THREDDS HTML SOAP

BES Modules

  • dap
  • dap-service
  • cdf
  • cedar
  • csv
  • ferret
  • fileout_netcdf
  • fits
  • freeform
  • gateway
  • hdf4
  • hdf5
  • jgofs
  • ncml
  • netCDF
  • wcs
  • xml-rdf

Response Types

  • DAS
  • DDS
  • DDX
  • DataDDS
  • Ascii
  • HTML Form
  • Info
  • NetCDF
  • RDF/XML

OPeNDAP Hyrax

7

OPeNDAP

slide-8
SLIDE 8

Hyrax BES - Extensible

  • New responses
  • New data types
  • Reporting mechanism - Metrics
  • Register Server-Side functions with libdap
  • New BES commands
  • Exception handling callbacks
  • Initialization and termination callbacks

8

OPeNDAP

slide-9
SLIDE 9

ESG Requirements

  • Only requirement we received: OPeNDAP Hyrax to

be a drop-in replacement for TDS (THREDDS Data Server)

  • Basic DAP requests and responses
  • Read THREDDS catalogs, including NCML

documents embedded within (ncml_module)

  • Ferret integration (ferret_module)

– Ferret data manipulation functionality only – LAS (Live Access Server) does more with Ferret, but not the DAP server

9

OPeNDAP

slide-10
SLIDE 10

Use Cases

  • Clearly defined use case with:

– a descriptive name and a clearly stated goal – a summary of what the final product of the use case will provide for the system – Actors – Preconditions – Triggers – Basic Flow – User interaction with the system – Resources required for the use case … and more

  • Using use cases provides for a more iterative approach

10

OPeNDAP

Drop-in Replacement 400 page Requirements Happy Medium === Use Cases

slide-11
SLIDE 11

From our Use Cases

  • DAP responses, nothing new to do here, functionality already

provided (DONE)

  • Ferret module

– First pass, data manipulation using ferret (DONE) – Future passes, provide new response types (data product types) such as images, movies using ferret

  • THREDDS catalogs

– First pass, be able to read THREDDS catalogs (DONE) – Future passes, integrate with portals that provide richer data inventory browsing, semantic knowledge incorporation, semantic knowledge provenance, etc… (ongoing)

  • NCML documents

– Be able to read NCML documents and perform aggregation (DONE) – Future passes, be able to dynamically pass in NCML documents and perform aggregations (DONE)

11

OPeNDAP

slide-12
SLIDE 12

Future Work

  • Improvements for Data Access

– Asynchronous support – Distributed data access, manipulation, & transformation – Storing intermediate data products for future access and manipulation – Sharing data products with other researchers – Building Citation information with Data Products – Data Access Provenance – Semantic responses and features – Semantic Data Services descriptions – Data Metrics, who, what, when, where, and how of data access – Administrative features – Better and more advanced middle-tier capabilities

12

OPeNDAP

slide-13
SLIDE 13

Future Work - but not for OPeNDAP (?)

  • OPeNDAP is a great tool for data access.

– Data Access is only part of the Cyberinfrastructure framework – Keep with its strengths, reliable, scalable, good performance, extensible

  • Data discovery mechanisms
  • Data Inventory browsing and querying

– Faceted browsing/Hierarchical browsing

  • Embedded provenance support, from data collection to

data product creation and visualization

  • Data Citation and Attribution

13

OPeNDAP

slide-14
SLIDE 14

References

  • OPeNDAP: Open-source Project for a Network Data

Access Protocol, www.opendap.org, 2005.

  • ESG. The Earth System Grid - Turning Climate Model

Datasets Into Community Resources. http:// www.earthsystemgrid.org, 2006

  • Fox, P., Garcia, J. and West, P. OPeNDAP for the

Earth System Grid. Data Science Journal. 2006.

  • D N Williams, et.all., Data management and analysis

for the Earth System Grid, Journal of Physics. 2008.

  • Jose Garcia, Peter Fox, Patrick West, Stephan

Zednik, Developing service-oriented applications in a grid environment: Experiences using the OPeNDAP back-end-server. Earth Science Informatics. 2009

14

OPeNDAP