INFRASTRUCTURE At the GHRC DAAC Will Ellett IT Manager - - PowerPoint PPT Presentation

infrastructure
SMART_READER_LITE
LIVE PREVIEW

INFRASTRUCTURE At the GHRC DAAC Will Ellett IT Manager - - PowerPoint PPT Presentation

INFRASTRUCTURE At the GHRC DAAC Will Ellett IT Manager sysadmin@itsc.uah.edu Support: Michele Garrett, Michael McEniry, Jason Toone Presented at the GHRC User Working Group Meeting September 25-26, 2014 GHRC Overview Data Systems Ingest


slide-1
SLIDE 1

Presented at the GHRC User Working Group Meeting September 25-26, 2014

INFRASTRUCTURE

At the GHRC DAAC Will Ellett

IT Manager sysadmin@itsc.uah.edu

Support: Michele Garrett, Michael McEniry, Jason Toone

slide-2
SLIDE 2
  • Data Systems
  • Ingest & Processing
  • Public (Web, FTP)
  • Database
  • Storage Systems
  • Tape-based Archive
  • Disk-based Archive
  • Backup
  • NAS

Data Storage

9/25/14 – 9/26/14 User Working Group Meeting 2

GHRC Overview

slide-3
SLIDE 3
  • NASA Public
  • Web
  • FTP
  • NASA Private
  • Ingest
  • Processing
  • Archive
  • NAS
  • UAH Private
  • User Workstations

NASA Public NASA Private UAH Private Firewall Internet

VPN

9/25/14 – 9/26/14 User Working Group Meeting 3

GHRC Network

slide-4
SLIDE 4

Production Sites Field Campaigns LANCE Project HS3 Project RTMM Project

Dell PowerEdge R510 GHRC.nsstc.nasa.gov LIGHTNING.nsstc.nasa.gov SCS3.nsstc.nasa.gov Dell PowerEdge R510 AIRBORNESCIENCE.nsstc.nasa.gov FCPORTAL.nsstc.nasa.gov GPM.nsstc.nasa.gov Dell PowerEdge R720 LANCE.nsstc.nasa.gov Dell PowerEdge R510 HS3.nsstc.nasa.gov Dell PowerEdge 2950 RTMM2.nsstc.nasa.gov (retiring soon)

9/25/14 – 9/26/14 User Working Group Meeting 4

Public Network Systems web/ftp

slide-5
SLIDE 5

Ingest/Processing

Dell PowerEdge R510 gale

LMA Processing

Dell Precision T7500 LMA Processing

AMSR Processing

Sun Fire X4270 AMSR1-3

AMSR Storage

Sun Storage amsrnas1: 16TB NAS amsrnas2: 20TB NAS (scaleable to 60TB)

LANCE Processing

Dell PowerEdge R720 gwen1

Database

Sun Fire X4250 neptune

Storage

NetGear PowerNAS 4200 20TB NAS

Backup/Logs

Dell PowerEdge R710 underdog

9/25/14 – 9/26/14 User Working Group Meeting 5

Private Network Systems

slide-6
SLIDE 6 LTO LTO LTO LTO

LTO3 Drives

KELVIN GHRCARC1-2

Sun V880/L700 90TB Tape Archive 75% full Sun ZFS Storage 7420 120TB Disk Archive 10% full

9/25/14 – 9/26/14 User Working Group Meeting 6

Private Network Systems

slide-7
SLIDE 7

Replacing aging Tape Archive – to be competed by Summer 2015

Installed Sept 2002 Installed June 2013

LTO LTO LTO LTO

LTO3 Drives

Sun V880/L700 90TB usable Scalable to 500TB Sun ZFS Storage 7420 120TB usable Scalable to 2PB

Archive Migration

9/25/14 – 9/26/14 7 User Working Group Meeting

slide-8
SLIDE 8
  • Tape Backup
  • System files
  • Source code
  • Critical data
  • Tape/Disk Archive
  • Datasets (multiple

copies)

  • Researching Off-Site

Archive

  • Datasets

GHRC Public GHRC Private

Firewall

Internet

Backup Archive

Amazon Glacier

Future

Data Backup

9/25/14 – 9/26/14 8 User Working Group Meeting

slide-9
SLIDE 9
  • User Registration System

(URS)

  • Require registration for data

access

  • FTP to HTTPS
  • Evaluate Impact on Users
  • LIS Space Station
  • Setup new Operations

Center

  • Development Server
  • Help reduce load on gale
  • Additional Storage
  • Off-Site Archive
  • Amazon Glacier

URS Amazon Glacier Development Server FTP to HTTPS LIS ISS

Future Projects

9/25/14 – 9/26/14 9 User Working Group Meeting

slide-10
SLIDE 10

Presented at the GHRC User Working Group Meeting September 25-26, 2014

GHRC DATA PROCESSING

Lamar Hawkins

Operations Manager dhawkins@itsc.uah.edu

Bruce Beaumont

Lead Software Engineer beaumont@itsc.uah.edu

slide-11
SLIDE 11

The Situation by the Numbers

  • ~300 cataloged datasets
  • ~30 ongoing datasets
  • Frequent field campaigns
  • ~25 real time data ingests

(each)

  • 1-1/2 Operations staff

9/25/14 – 9/26/14 11 User Working Group Meeting

slide-12
SLIDE 12

Goals

Automate everything!

  • Standardize data processing
  • Simplify data flow
  • Reduce duplicated code
  • Increase maintainability
  • Document everything
  • Automated watchdogs

9/25/14 – 9/26/14 12 User Working Group Meeting

slide-13
SLIDE 13

Environments

  • DEV (development)
  • Writable by all developers
  • Basic (unit) testing done here
  • TEST (integration & test)
  • Writable by Operations staff only
  • Acceptance testing done here
  • OPS (production)
  • Writable by SysAdmin only
  • Certain directories are writable by Ops staff
  • Operational processing done here

9/25/14 – 9/26/14 13 User Working Group Meeting

DEV TEST OPS

slide-14
SLIDE 14

Overall Data flow

9/25/14 – 9/26/14 14 User Working Group Meeting

Ingest Process Distribute

slide-15
SLIDE 15

Data Ingest

  • PUSH method
  • Remote site delivers data to us periodically
  • Standard SW discovers new data
  • PULL method
  • We poll a remote site for new data
  • Standard SW handles new data
  • Other method
  • Data delivered on media
  • Other PUSH method (socket, LDAP)
  • Ingest metrics are generated for most streams

9/25/14 – 9/26/14 15 User Working Group Meeting

Ingest

slide-16
SLIDE 16

Processing

  • Science processing for some data
  • May include reformatting, renaming, etc.
  • Processing is not required
  • Modules are stream-specific

9/25/14 – 9/26/14 16 User Working Group Meeting

Process

slide-17
SLIDE 17

Data Distribution

  • Data distribution is handled by a common module
  • Distribution may include
  • Copying files to public or private FTP areas
  • Putting files on the archive (in OPS only!)
  • Staging files for delivery to external users via PUSH
  • File-level metadata are generated for most

streams

9/25/14 – 9/26/14 17 User Working Group Meeting

Distribute

slide-18
SLIDE 18

Presented at the GHRC User Working Group Meeting September 25-26, 2014

GHRC DATA SEARCH, ACCESS AND ORDER

Mary Nair

User Services and Data Management Team Member sharrison@itsc.uah.edu

Sherry Harrison

DBA and Data Management Team Member mnair@itsc.uah.edu

slide-19
SLIDE 19

19 9/25/14 – 9/26/14 User Working Group Meeting

Overview

  • Search
  • HyDRO
  • Reverb
  • GCMD
  • Data Set List
  • OpenSearch
  • Tropical Storm Tracks
  • Access
  • Field Campaign Portals
  • DOIs
  • Data Set Landing Pages
  • Guides
  • OPeNDAP
  • Ftp
  • Future: https
  • Order
  • Automated Order Processing
  • Data Subscriptions: PUSH & GDX
slide-20
SLIDE 20
  • Application developed

at the GHRC by Bruce Beaumont

  • Highlights
  • Quick Search
  • Advanced Search
  • Data Sets by

Collection

  • Data Set Information
  • Download Data
  • Order Data

20 9/25/14 – 9/26/14 User Working Group Meeting

Hydrologic Data Search, Retrieval, and Order System (HyDRO)

http://ghrc.nsstc.nasa.gov/hydro/

slide-21
SLIDE 21
  • Reverb

http://reverb.echo.nasa.gov

  • Global Change Master

Directory (GCMD)

http://gcmd.gsfc.nasa.gov/

  • Data Set List

http://ghrc.nsstc.nasa.gov/ hydro/search.pl

  • OpenSearch
  • Provides a web service

API for searching the GHRC catalog

http://ghrc.nsstc.nasa.gov/ hydro/ghost.xml

21 9/25/14 – 9/26/14 User Working Group Meeting

Data Search Tools

slide-22
SLIDE 22

22 9/25/14 – 9/26/14 User Working Group Meeting

  • Application developed at the GHRC
  • Storm data from the National Hurricane

Center

  • ~ 6 hour interval updates during active

storms

Tropical Storm Tracks

http://ghrc.nsstc.nasa.gov/storms/

slide-23
SLIDE 23

23 9/25/14 – 9/26/14 User Working Group Meeting

Field Campaign Portals

http://fcportal.nsstc.nasa.gov/

  • Access restricted to

field campaign participants and collaborators

slide-24
SLIDE 24

24 9/25/14 – 9/26/14 User Working Group Meeting

Digital Object Identifiers (DOIs)

  • What is a DOI?
  • Unique alphanumeric string used to identify a digital object
  • Provides persistent identification with a permanent online

link

  • Enables easier access to research data
  • Assigned and regulated by The International DOI Foundation

(IDF)

  • Often used in online publications in citations
  • DOIs at the GHRC
  • DOIs have been defined for most of the approximately 300

datasets in the GHRC catalog, with about 65% of these registered through ESDIS.

  • Dataset Landing Pages are already provided for all GHRC

datasets, whether or not a DOI is in place.

  • DOI example: http://dx.doi.org/10.5067/MEASURES/DMSP-

F17/SSMIS/DATA302

slide-25
SLIDE 25
  • One-paragraph

description

  • Citation Information
  • Basic metadata
  • Coverage information
  • Links to

documentation and software

  • DOI

We get this information from the PI.

25 9/25/14 – 9/26/14 User Working Group Meeting

http://ghrc.nsstc.nasa.gov/hydro/ details.pl?ds=gpmparprbgcpex

Data Set Landing Pages

slide-26
SLIDE 26

26 9/25/14 – 9/26/14 User Working Group Meeting

Guides

http://ghrc.nsstc.nasa.gov/uso/ds_docs/tpw/rssm1tpwn_dataset.html

  • Data set overview

document composed by the GHRC from PI provided information

  • Features
  • Instrument Overview
  • Data Format and File

Naming Convention

  • Investigator Information
  • Algorithm Details
  • PI Documentation and

Software Information and Links

  • Citations and References
slide-27
SLIDE 27

27 9/25/14 – 9/26/14 User Working Group Meeting

Additional Access Methods

ftp://ghrc.nsstc.nasa.gov/ ftp://gpm.nsstc.nasa.gov/ http://ghrc.nsstc.nasa.gov/opendap/

Future: HTTPS

slide-28
SLIDE 28

28 9/25/14 – 9/26/14 User Working Group Meeting

Automated Order Processing

Order Submitter (HyDRO, Reverb) GHRC Order Database Order Broker Value- added process FTP area

  • Extracts files from

tarred/gzipped bundles

  • Performs HEW (HDF-EOS)

subsetting

  • Packs results into convenient tar

bundles for delivery

slide-29
SLIDE 29

9/25/14 – 9/26/14 29 User Working Group Meeting

Data Subscriptions

  • Data subscription
  • Scheduled delivery of data on a near-real-time basis to individual

subscribers

  • Delivery via applications developed at the GHRC (PUSH, GDX)
  • Access to subscription applications is limited to GHRC operations staff
  • Product / User Subscription Handler (PUSH)
  • Primary Data Subscription Service
  • Configurable for the dataset and the transfer interval
  • GPM Data Interchange (GDX)
  • Command line mechanism for data transfer which includes

handshaking

  • Near-real-time LIS provided to PPS (Erich Stocker)
  • Configurable to transfer various data sets
slide-30
SLIDE 30

Discussion

9/25/14 – 9/26/14 30 User Working Group Meeting

THANK YOU

for your attention!

  • Please cite your data.
  • When the DOI is available, please use it in your data citation.
  • When your publication cites our data, please notify us.
  • What data formats do you prefer?
  • What metadata is most useful to you?
  • Do you find the user guide documents useful?
  • Are there additional data access methods to consider?

If you have not already done so, please respond to the ESDIS survey for the GHRC DAAC.

Please contact GHRC User Services for any help or questions ghrcdaac@itsc.uah.edu