SLIDE 1 DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)
Frontier of Data Discovery: MD in Astronomy “A users perspective”
a c k g r
n d s t
y
a c k g r
n d s t
y
a t a f
m a t s & M e t a d a t a
a t a f
m a t s & M e t a d a t a
O s t a n d a r d s & P r
e n a n c e
O s t a n d a r d s & P r
e n a n c e
c l u s i
s & L O F A R E O S C
c l u s i
s & L O F A R E O S C
S t e l l a r n u r s e r y i n C y g n u s
SLIDE 2 An introduction to (radio) astronomy
Astronomy: scientific study of objects that exist naturally in space → moon, sun, planets, stars, galaxies, black holes, ...
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)
To understand celestial sources we want sensitive measurements with the highest possible resolution across the EM spectrum.
F L U X Wavelength
(Orion)
SLIDE 3 LOFAR a pan-european astronomy project
↓ (Medicina, Italy)
*
Aeneas @ Bologna 08/10/2018 J.B.R. Oonk (SURFsara) LOFAR data → Archive * 3 LTA sites
- SURFsara (NL)
- FZ Juelich (GER)
- PSNC (POL)
Data is findable via DB meta data (webportal)
Poznan Amsterdam
SLIDE 4 LOFAR a pan-european astronomy project
↓ (Medicina, Italy)
*
Aeneas @ Bologna 08/10/2018 J.B.R. Oonk (SURFsara) LOFAR Science (KSP) * Birth of the first stars * Evolution of black holes * Neutron stars / pulsars * Exo-planets * Space weather
Poznan Amsterdam
SLIDE 5
World map of Radio Astronomy
Radio telescopes: spread all over world with different organizations
SLIDE 6
World map of Radio+Optical Astronomy
Optical telescopes: spread all over world with different organizations
SLIDE 7
World/Space map of Astronomy
Progress in Astronomy is driven by combining data. This requires: * Data sharing , Common data formats & Meta data standards !
SLIDE 8 Common Data formats – FITS & MS
FITS = Flexible Image Transport System (1981: v.4.0 2016 – IAU)
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)
Radio: Measurement Set (MS) = set of tables in a directory structure
(courtesy: Allegrazza 2012)
SLIDE 9 Meta data – headers & keywords
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara) * Header = machine readable meta data → 80 characters per line (#lines not limited) * Header can also be used for Provenance, but not automatically enforced (users) !
(courtesy: Allegrazza 2012)
Meta data make data findable, queriable and quantitative.
SLIDE 10 Virtual Observatory & Standards
“The Virtual Observatory (VO) is the vision that astronomical datasets and other resources should work as a seamless whole.” * Governed by the International Virtual Observatory Alliance (IVOA) * VO’s → access & interoperability between data collections * >20 VO projects world-wide (e.g. large data centres – ESA, NASA) Note: However, finding sustained funding for maintenance of VO tools is difficult * IVOA has agreed on an improved (W3C) provenance model (2018) * Some IVOA standards have been adopted by other sciences.
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)
(Arvisset+2018)
SLIDE 11
Data Discovery
VO standards enable (incomplete) data discovery across collections
→ e.g. missing; (i) collections from some telescopes , (ii) high level user processed data DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara) ESA SKY (sky.es.int) NASA MAST (archive.stsci.edu)
SLIDE 12 HL user products I : Journals
HL user products are only described in journals → provenance ?? Journals & Papers provide “meta data” which is far from the data
A c c c c e s e s s t t
n d n d D D i s c s c
e r y r y
P P a p a p e r s r s i i n A A s t s t r
y a a c r c r
s s s J J
r u r n a n a l s : s :
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)
Search on keywords (e.g. sky objects) [open access]
SLIDE 13 HL user products II : CDS/Vizier
Access to some HL products from Journals → does not solve provenance!
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara) * alternatives e.g., NASA/IPAC Extragalactic database , Canadian Astronomy Data Centre.
(digital repositories for tables, images)
SLIDE 14 Conclusions – MD & Astronomy
Astronomy → FITS & MS are the standard formats for (meta) data
- High level (user) products often lack provenance !
- Data discovery – VO’s there are many/incomplete ?
LOFAR part of modern multi-wavelength Astronomy LOFAR meta data & EOSC (Pilot/Hub):
* L L O F O F A R R a r a r c h i h i v e v e d / p / p r
u c u c t s t s d a d a t a t a a r a r e e M S / S / F I F I T S S w i w i t h h m e m e t a a d a d a t a a i n i n c l . l . * L L T A T A m
e d e l h h a s a s a a m e m e t a a d a d a t a a m
e l e l t t h a t a t i i n c l c l u d u d e s s p r p r
e v e n a n a n c e c e
M e t e t a d d a t a t a w w i l i l l a a l w l w a y s y s b b e k k e p e p t , , e v e v e n n f
d d e l e l e t e t e d d r a w a w d d a t a t a
H i g i g h l l e v e v e l l u s u s e r r p r p r
u d u c t c t a r a r c h c h i v e v e : : P I D I D , , B 2 2 s e r e r v i v i c e c e s – – E E O S C S C * L L O F O F A R R s k s k y s s u r u r v e y e y s s a i m i m f f
r p u p u b l i l i c a c a t i
n f
l
i n i n g g V O O s t s t a n d n d a r a r d s
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)
SLIDE 15
Extra
FAIR data in astronomy F – findable : Yes , there are many archives A – accessible : Yes , but not single sign-on I – interoperable : Yes , VO standards (+common data format) R – reuseable : Yes , driven by F,A,I and the scientific need to combine the data in time space and frequency * caveat: but, current day high level products generated by individual users lack provenance → reproducibility
DI4R @ Lisbon 11/10/2018 J.B.R. Oonk (SURFsara)