Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - - PowerPoint PPT Presentation

data at the leibniz institute for astrophysics
SMART_READER_LITE
LIVE PREVIEW

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - - PowerPoint PPT Presentation

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP Leibniz-Institute for Astrophysics Potsdam Research areas: cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) extragalactic astrophysics


slide-1
SLIDE 1

Data at the Leibniz-Institute for Astrophysics

Kristin Riebe

slide-2
SLIDE 2

2

AIP – Leibniz-Institute for Astrophysics Potsdam

  • Research areas:

– cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) – extragalactic astrophysics (galactic archeology, galaxies and quasars, cosmology)

  • Development of Research

Technology and Infrastructure

– Robotic telescopes, (3D) spectroscopy – Supercomputing and E-Science

  • Participation in many projects

– e.g. RAVE, ROSAT, XMM-Newton, LOFAR, MUSE, ...

slide-3
SLIDE 3

3

Example data types at AIP

  • Observations:

– RAVE

  • Radial velocity measurements + spectra

– SDSS

  • Mirror of DR7, catalog server

– „minor data sets“:

  • Plate archive (historical plates)
  • CALIFA (spectra of galaxies)
  • Cepheids (collection of data for time series), ...
  • Simulation data:

– Magnetohydrodynamics – Cosmological simulations: particle data, dark matter halo catalogues, halo merger history, ...

slide-4
SLIDE 4

4

Behind the scenes

  • Supercomputers: Leibniz, Babel, for in-house simulations,

data processing

  • Almagest: Graywulf cluster for archiving, exchanging data,

hosting databases, publishing data, 700 TB disk space

  • Virtual research environment:

– Erebos: ~ 250 TB disk space – Used by CLUES collaboration to exchange and process data

  • Web servers for publishing smaller data sets
slide-5
SLIDE 5

5

Data center task: Extract – Transform – Load

Extract Load Webserver Server Transform

Checking, Corrections, Additions; bring into (standard) format From different sources Publish the data

slide-6
SLIDE 6

6

Example: MultiDark Database

  • Collaboration with Spanish MultiDark project
  • Publish data of cosmological simulations in a simulation

database

  • Have similar success like MillenniumDB! :-)
  • http://www.multidark.org
  • 2 simulations uploaded (12+6 TB)
  • > 1 million queries in 2 years,

~ 1500 per day, 4 TB downloaded

  • ~ 140 registered users
slide-7
SLIDE 7

7

Example workflow: MultiDark Database

  • Extract:

– Cosmologists produce data, copy them to a server at AIP (VRE)

  • Transform:

– We check data and reading routines, data curation (C/Fortran/Perl/Python)

  • Load:

– Ingest data into database (SQL, bulk copy)

  • Check and test:

– Check the data for completeness, consistency (SQL) – Create Peano-Hilbert keys, indexes (C#, Spatial 3D library (T. Budavari, G. Lemson))

  • Publish:

– Using simpledb (Gerard Lemson, Millennium DB, jsp) – Write/update documentation; update admin tables of the database – Inform users

slide-8
SLIDE 8

8

Transform: Data curation

  • Check completeness of data sets
  • Create homogeneous data sets, bring into useful

(standard) formats

  • Add identifiers, grid indexes etc. for faster queries & for

representing relations in the database

  • Cross-link data with other catalogues

=> usually we applied tailor-made solutions, tuned to each individual data set, custom reading routines required => now things are improving ...

slide-9
SLIDE 9

9

DBIngestor and libhilbert

  • DBIngestor library + AsciiIngest

– Adrian Partl, https://github.com/adrpar/DBIngestor, …/AsciiIngest – Apply converters (unit conversions, adding identifiers for db indexing, spatial grid indexes) – Apply asserters (nan, inf etc.) – => transform and load in one go – Easy to write own converters & add own reading routines for binary data

  • C-library libhilbert

– For creating indexes of space-filling Peano-Hilbert curve in 20 dimensions

slide-10
SLIDE 10

10

Data publication

  • Many possibilities, very often individual solutions for each project
  • Now: new webapp Daiquiri, http://escience.aip.de/daiquiri/
  • Developed by Jochen Klar und Adrian Partl
  • Web application for publishing data
  • Modular, highly customizable
  • Using PHP, Zend-framework
  • Modern interface using bootstrap, jQuery
  • Authentication, Query Interface
  • Wordpress integration
  • One code base to serve most needs,
  • pen source, (easily) extendable
slide-11
SLIDE 11

11

Daiquiri examples

  • MultiDark2
  • Califa
  • 4MOST workshop
  • Plate Archive
  • Jubilee, Curie simulation

database in Madrid http://escience.aip.de/daiquiri/

slide-12
SLIDE 12

Screenshot

slide-13
SLIDE 13

Screenshot

slide-14
SLIDE 14

Screenshot

slide-15
SLIDE 15

15

VO compliance

  • Currently working on including VO protocols with Daiquiri

– Download data as VOTables (MySQL-VOTable-Dump, see github) – TAP protocol for accessing data – UWS for job queues (MySQL query queue)

  • Problems:

– No public PHP libraries for IVOA protocols available (only in java) – But community rather needs PHP or Python implementations

slide-16
SLIDE 16

16

Concluding Remarks

  • Comon tasks for each data publication: extracting,

transforming, uploading the data

  • Different tool for each data set?

– Should rather use only a few, generalized tools, reusable, easier to maintain – Takes a lot of time to develop – => Collect tools from data centers? Combine efforts?

  • Would like to have more implementations/libraries of VO

protocols, in different languages