How the Virtual Observatory Has Influenced Data Discovery, Access, - - PowerPoint PPT Presentation

how the virtual observatory has
SMART_READER_LITE
LIVE PREVIEW

How the Virtual Observatory Has Influenced Data Discovery, Access, - - PowerPoint PPT Presentation

How the Virtual Observatory Has Influenced Data Discovery, Access, and Re-Use in Other Disciplines Robert Hanisch Director, Office of Data and Informatics Material Measurement Laboratory National Institute of Standards and Technology Tuesday


slide-1
SLIDE 1

Robert Hanisch Director, Office of Data and Informatics Material Measurement Laboratory National Institute of Standards and Technology Tuesday November 21, 2017

How the Virtual Observatory Has Influenced Data Discovery, Access, and Re-Use in Other Disciplines

slide-2
SLIDE 2

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

1

  • Materials Resource

Registry (data, code)

  • International Metrology

Resource Registry

  • NIST Enterprise Data

Inventory

  • data.gov
  • NIST Public Data

Repository and Search Portal

  • Standard Reference Data
  • Materials Data Repository
  • Materials Data Facility
  • Persistent identifiers

(DOIs, handles)

  • Materials Data Curator
  • Data type registry
  • Schema repository
  • Lab info mgmt systems

Discover Access Interoperate

slide-3
SLIDE 3

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

  • Why?

– Support FAIR* principles: Findable, Accessible, Interoperable, Re-usable – Assure maximum return on national investment in basic research – Demonstrate best practices – Address reproducibility “crisis”

  • US OMB, OSTP directives; FASTR legislation
  • But, Astronomy was here ~20 years ago!

2

*Wilkinson et al. 2016, Nature Scientific Data, DOI: 10.1038/sdata.2016.18

slide-4
SLIDE 4

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

3

slide-5
SLIDE 5

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

4

https://materials.registry.nist.gov/

slide-6
SLIDE 6

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

5

https://materials.registry.nist.gov/

slide-7
SLIDE 7

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

6

http://imrr.bipm.org/

slide-8
SLIDE 8

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

7

slide-9
SLIDE 9

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

8

major data providers

Local Publishing Registry Full Searchable Registry Full Searchable Registry Local Publishing Registry

(pull)

harvest replicate search queries Users, applications OAI/PMH Resource Registry

slide-10
SLIDE 10

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

9

Search NIST public data records

  • View metadata
  • Filter results
  • Access data files, metadata
  • APIs allow interoperability

with client tools

  • Records link to Public Data

Repository

slide-11
SLIDE 11

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

10

slide-12
SLIDE 12

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

11

Server & Network Storage Infrastructure External Users Data Application Layers Local Server/ Storage AWS EC2/S3

Custom Services/ Portals

(SRD, DB, …)

MIDAS (Management

  • f Institutional

Data Assets) Data.Gov

Collaboration Tools (Box, Google…) GitHub Socrata

Science Researcher NIST Data Portal Industry/Collaborators/Partners

Public Data Listing Landing Pages

Data & APIs Data Repository

(DSpace, Islandora, Custom ...)

Common Services (DOI, Preservation) Systems Deployment Data Package

Data Review

slide-13
SLIDE 13

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

12

Server & Network Storage Infrastructure External Users Data Application Layers Local Server/ Storage AWS EC2/S3

Custom Services/ Portals

(SRD, DB, …)

MIDAS (Management

  • f Institutional

Data Assets) Data.Gov

Collaboration Tools (Box, Google…) GitHub Socrata

Science Researcher NIST Data Portal Industry/Collaborators/Partners

Public Data Listing Landing Pages

Data & APIs Data Repository

(DSpace, Islandora, Custom ...)

Common Services (DOI, Preservation) Systems Deployment Data Package

Data Review

slide-14
SLIDE 14

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

13

slide-15
SLIDE 15

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

14

  • Integrated Collaborative

Environment (ICE)

– Running now at http://ice.nist.gov – Developed by Air Force Research Laboratory

  • Timely and Trustworthy Curating

and Coordinating Data Framework (T2C2) 4CeeD system

– Running now at http://t2c2.nist.gov:32500/ – Developed by University of Illinois at Urbana-Champaign

  • Also considering Discovery

Environment for Relational Information and Versioned Assets (DERIVA) from USC

slide-16
SLIDE 16

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

  • Capture instrument metadata at the source

– Metadata extractors – Often must reverse engineer proprietary binary formats

  • Move experiment metadata into database

– Enable search across many experiments – Do not use filenames/file system for metadata storage

  • Enable scripted data processing, calibration,

feature extraction

  • Support data management from acquisition to

publication; improve reproducibility

15

slide-17
SLIDE 17

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

LIMS LIMS

Plan Acquire Process Analyze Store Share Reuse Dispose

Metadata

Read + Extract Front-End File Management Tools Convert + Export Curation Archive

16

Data

slide-18
SLIDE 18

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

17

slide-19
SLIDE 19

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

18

Web Framework

Data Management & Search Engine Harvester REST API GUI Data Provider Exporter

User Scripts Simulation Measurement

Harvester Data Provider

Database Large Dataset Repository

Images Large Files BLOBs Data Metadata

Digital Data & Metadata (any format)

Data Analysis Infrastructure

slide-20
SLIDE 20

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

19

SampleIdent CPD RR S BANK 1 7251 726 CONST 500.00 2.00 0.000 0.000 1 153 1 161 1 141 1 141 1 148 1 163 1 139 1 139 1 129 1 132 1 129 1 129 1 151 1 121 1 129 1 127 1 127 1 151 1 139 1 146 1 129 1 134 1 125 1 114 1 129 1 127 1 125 1 129 1 121 1 121 SampleIdent CPD RR Sample 1B DataFileName CPD-1B DiffrType PW3710 GeneratorVoltage 40 TubeCurrent 40 Anode Cu Alpha1 1.54056 Alpha2 1.54439 Ratio 0.50000 MonochromatorUsed YES DivergenceSlit 1 ReceivingSlit 0.3 5.000 0.020 150.000 MeasureDateTime 20/12/1997 17:18 StepTime 3.00 184 171 182 184 176 169 156 161 182 166 171 163 146 158 158 169 182 151 171 136 156 158 148 153 151 156 139 158 125 163 This file was converted to xda by WinFit! 5.0000 150.0000 0.0200 1.0000

  • 177. 182. 174. 154. 177. 156. 172. 161. 146. 169.
  • 144. 154. 161. 156. 144. 166. 164. 119. 182. 135.
  • 164. 128. 154. 114. 142. 121. 144. 154. 137. 137.

Undefined Structure

Only an expert human can understand this number. To a computer, this is a meaningless collection of numbers

Different Formats

slide-21
SLIDE 21

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

20

{"diffractogram": { "xray-source": { "tube": {"anode-material": "Cu", "spectra": {"emission-line": [ {"Siegbahn": "Kalpha", "wavelength": {"value": 1.54184,"unit": "angstrom"}}, {"Siegbahn": "Kalpha1", "wavelength": {"value": 1.54056,"unit": "angstrom"}}, {"Siegbahn": "Kalpha2", "wavelength": {"value": 1.54439,"unit": "angstrom"}} ]}}}, "pattern-data": { "angle-2-theta": { "value": [9.3,9.32,9.34, ... 75.16,75.18,75.2], "unit": "degree"}, "intensity": { "value": [681.02,687.34,703.49, ... 127.52,124.29,118.32], "unit": "arbitrary"}}}}

slide-22
SLIDE 22

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

21

slide-23
SLIDE 23

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

22

Substance Module Physical Quantity Types

slide-24
SLIDE 24

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

23

slide-25
SLIDE 25

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

24

slide-26
SLIDE 26

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

  • Data management practices in astronomy, and the VO in

particular, have inspired similar efforts in other fields

– Space science/space physics VxOs (VSO, VHO, VMO, VITMO) – Materials science – Metrology – Ecology/environmental science – Life/bioscience – Neuroscience

  • Other communities are envious of astronomy’s global data

format, FITS

– Other fields must contend with myriad of formats, many proprietary

  • Also similar challenges, such as interoperability and

semantic standards

  • Independent development of similar architecture

25

slide-27
SLIDE 27

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

26

slide-28
SLIDE 28

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

27

slide-29
SLIDE 29

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

28

slide-30
SLIDE 30

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

  • Sean Hill, European Brain Initiative…

29

slide-31
SLIDE 31

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

30

slide-32
SLIDE 32

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

  • Core VO design/architecture is strong and has

been adopted by major data providing

  • rganizations in astronomy; now informing data

system design in other fields

  • Federated, distributed systems have many

benefits…

– Scalability – Flexibility – Distributed curation by experts – Communities can build on core infrastructure

  • …and challenges

– Consensus must be reached on metadata standards and realistic goals for interoperability

31

slide-33
SLIDE 33

Hanisch, Open Universe Workshop UNOOSA, Vienna, November 21, 2017

32

https://www.nist.gov/mml/odi