Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - - PowerPoint PPT Presentation

▶

Aug 18, 2022 660 likes •833 views

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP Leibniz-Institute for Astrophysics Potsdam Research areas: cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) extragalactic astrophysics

SLIDE 1

Data at the Leibniz-Institute for Astrophysics

Kristin Riebe

SLIDE 2

2

AIP – Leibniz-Institute for Astrophysics Potsdam

Research areas:

– cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) – extragalactic astrophysics (galactic archeology, galaxies and quasars, cosmology)

Development of Research

Technology and Infrastructure

– Robotic telescopes, (3D) spectroscopy – Supercomputing and E-Science

Participation in many projects

– e.g. RAVE, ROSAT, XMM-Newton, LOFAR, MUSE, ...

SLIDE 3

3

Example data types at AIP

Observations:

– RAVE

Radial velocity measurements + spectra

– SDSS

Mirror of DR7, catalog server

– „minor data sets“:

Plate archive (historical plates)
CALIFA (spectra of galaxies)
Cepheids (collection of data for time series), ...
Simulation data:

– Magnetohydrodynamics – Cosmological simulations: particle data, dark matter halo catalogues, halo merger history, ...

SLIDE 4

4

Behind the scenes

Supercomputers: Leibniz, Babel, for in-house simulations,

data processing

Almagest: Graywulf cluster for archiving, exchanging data,

hosting databases, publishing data, 700 TB disk space

Virtual research environment:

– Erebos: ~ 250 TB disk space – Used by CLUES collaboration to exchange and process data

Web servers for publishing smaller data sets

SLIDE 5

5

Data center task: Extract – Transform – Load

Extract Load Webserver Server Transform

Checking, Corrections, Additions; bring into (standard) format From different sources Publish the data

SLIDE 6

6

Example: MultiDark Database

Collaboration with Spanish MultiDark project
Publish data of cosmological simulations in a simulation

database

Have similar success like MillenniumDB! :-)
http://www.multidark.org
2 simulations uploaded (12+6 TB)
> 1 million queries in 2 years,

~ 1500 per day, 4 TB downloaded

~ 140 registered users

SLIDE 7

7

Example workflow: MultiDark Database

Extract:

– Cosmologists produce data, copy them to a server at AIP (VRE)

Transform:

– We check data and reading routines, data curation (C/Fortran/Perl/Python)

Load:

– Ingest data into database (SQL, bulk copy)

Check and test:

– Check the data for completeness, consistency (SQL) – Create Peano-Hilbert keys, indexes (C#, Spatial 3D library (T. Budavari, G. Lemson))

Publish:

– Using simpledb (Gerard Lemson, Millennium DB, jsp) – Write/update documentation; update admin tables of the database – Inform users

SLIDE 8

8

Transform: Data curation

Check completeness of data sets
Create homogeneous data sets, bring into useful

(standard) formats

Add identifiers, grid indexes etc. for faster queries & for

representing relations in the database

Cross-link data with other catalogues

=> usually we applied tailor-made solutions, tuned to each individual data set, custom reading routines required => now things are improving ...

SLIDE 9

9

DBIngestor and libhilbert

DBIngestor library + AsciiIngest

– Adrian Partl, https://github.com/adrpar/DBIngestor, …/AsciiIngest – Apply converters (unit conversions, adding identifiers for db indexing, spatial grid indexes) – Apply asserters (nan, inf etc.) – => transform and load in one go – Easy to write own converters & add own reading routines for binary data

C-library libhilbert

– For creating indexes of space-filling Peano-Hilbert curve in 20 dimensions

SLIDE 10

10

Data publication

Many possibilities, very often individual solutions for each project
Now: new webapp Daiquiri, http://escience.aip.de/daiquiri/
Developed by Jochen Klar und Adrian Partl
Web application for publishing data
Modular, highly customizable
Using PHP, Zend-framework
Modern interface using bootstrap, jQuery
Authentication, Query Interface
Wordpress integration
One code base to serve most needs,
pen source, (easily) extendable

SLIDE 11

11

Daiquiri examples

MultiDark2
Califa
4MOST workshop
Plate Archive
Jubilee, Curie simulation

database in Madrid http://escience.aip.de/daiquiri/

SLIDE 12

Screenshot

SLIDE 13

Screenshot

SLIDE 14

Screenshot

SLIDE 15

15

VO compliance

Currently working on including VO protocols with Daiquiri

– Download data as VOTables (MySQL-VOTable-Dump, see github) – TAP protocol for accessing data – UWS for job queues (MySQL query queue)

Problems:

– No public PHP libraries for IVOA protocols available (only in java) – But community rather needs PHP or Python implementations

SLIDE 16

16

Concluding Remarks

Comon tasks for each data publication: extracting,

transforming, uploading the data

Different tool for each data set?

– Should rather use only a few, generalized tools, reusable, easier to maintain – Takes a lot of time to develop – => Collect tools from data centers? Combine efforts?

Would like to have more implementations/libraries of VO

Data at the Leibniz-Institute for Astrophysics

Kristin Riebe

2

AIP – Leibniz-Institute for Astrophysics Potsdam

– cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) – extragalactic astrophysics (galactic archeology, galaxies and quasars, cosmology)

Technology and Infrastructure

– Robotic telescopes, (3D) spectroscopy – Supercomputing and E-Science

– e.g. RAVE, ROSAT, XMM-Newton, LOFAR, MUSE, ...

3

Example data types at AIP

– RAVE

– SDSS

– „minor data sets“:

– Magnetohydrodynamics – Cosmological simulations: particle data, dark matter halo catalogues, halo merger history, ...

4

Behind the scenes

data processing

hosting databases, publishing data, 700 TB disk space

– Erebos: ~ 250 TB disk space – Used by CLUES collaboration to exchange and process data

5

Data center task: Extract – Transform – Load

Extract Load Webserver Server Transform

6

Example: MultiDark Database

database

~ 1500 per day, 4 TB downloaded

7

Example workflow: MultiDark Database

8

Transform: Data curation

(standard) formats

representing relations in the database

=> usually we applied tailor-made solutions, tuned to each individual data set, custom reading routines required => now things are improving ...

9

DBIngestor and libhilbert

– For creating indexes of space-filling Peano-Hilbert curve in 20 dimensions

10

Data publication

11

Daiquiri examples

database in Madrid http://escience.aip.de/daiquiri/

Screenshot

Screenshot

Screenshot

15

VO compliance

– Download data as VOTables (MySQL-VOTable-Dump, see github) – TAP protocol for accessing data – UWS for job queues (MySQL query queue)

– No public PHP libraries for IVOA protocols available (only in java) – But community rather needs PHP or Python implementations

16

Concluding Remarks

transforming, uploading the data

– Should rather use only a few, generalized tools, reusable, easier to maintain – Takes a lot of time to develop – => Collect tools from data centers? Combine efforts?

protocols, in different languages