data at the leibniz institute for astrophysics
play

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - PowerPoint PPT Presentation

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP Leibniz-Institute for Astrophysics Potsdam Research areas: cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) extragalactic astrophysics


  1. Data at the Leibniz-Institute for Astrophysics Kristin Riebe

  2. AIP – Leibniz-Institute for Astrophysics Potsdam • Research areas: – cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) – extragalactic astrophysics (galactic archeology, galaxies and quasars, cosmology) • Development of Research Technology and Infrastructure – Robotic telescopes, (3D) spectroscopy – Supercomputing and E-Science • Participation in many projects – e.g. RAVE, ROSAT, XMM-Newton, LOFAR, MUSE, ... 2

  3. Example data types at AIP • Observations: – RAVE • Radial velocity measurements + spectra – SDSS • Mirror of DR7, catalog server – „minor data sets“: • Plate archive (historical plates) • CALIFA (spectra of galaxies) • Cepheids (collection of data for time series), ... • Simulation data: – Magnetohydrodynamics – Cosmological simulations: particle data, dark matter halo catalogues, halo merger history, ... 3

  4. Behind the scenes • Supercomputers: Leibniz, Babel, for in-house simulations, data processing • Almagest: Graywulf cluster for archiving, exchanging data, hosting databases, publishing data, 700 TB disk space • Virtual research environment: – Erebos: ~ 250 TB disk space – Used by CLUES collaboration to exchange and process data • Web servers for publishing smaller data sets 4

  5. Data center task: Extract – Transform – Load Extract Load Webserver Server From different Publish the data sources Transform Checking, Corrections, Additions; bring into (standard) format 5

  6. Example: MultiDark Database • Collaboration with Spanish MultiDark project • Publish data of cosmological simulations in a simulation database • Have similar success like MillenniumDB! :-) • http://www.multidark.org • 2 simulations uploaded (12+6 TB) • > 1 million queries in 2 years, ~ 1500 per day, 4 TB downloaded • ~ 140 registered users 6

  7. Example workflow: MultiDark Database • Extract: – Cosmologists produce data, copy them to a server at AIP (VRE) • Transform: – We check data and reading routines, data curation (C/Fortran/Perl/Python) • Load: – Ingest data into database (SQL, bulk copy) • Check and test: – Check the data for completeness, consistency (SQL) – Create Peano-Hilbert keys, indexes (C#, Spatial 3D library (T. Budavari, G. Lemson)) • Publish: – Using simpledb (Gerard Lemson, Millennium DB, jsp ) – Write/update documentation; update admin tables of the database – Inform users 7

  8. Transform: Data curation • Check completeness of data sets • Create homogeneous data sets, bring into useful (standard) formats • Add identifiers, grid indexes etc. for faster queries & for representing relations in the database • Cross-link data with other catalogues => usually we applied tailor-made solutions, tuned to each individual data set, custom reading routines required => now things are improving ... 8

  9. DBIngestor and libhilbert • DBIngestor library + AsciiIngest – Adrian Partl, https://github.com/adrpar/DBIngestor, …/AsciiIngest – Apply converters (unit conversions, adding identifiers for db indexing, spatial grid indexes) – Apply asserters (nan, inf etc.) – => transform and load in one go – Easy to write own converters & add own reading routines for binary data • C-library libhilbert – For creating indexes of space-filling Peano-Hilbert curve in 20 dimensions 9

  10. Data publication • Many possibilities, very often individual solutions for each project • Now: new webapp Daiquiri , http://escience.aip.de/daiquiri/ • Developed by Jochen Klar und Adrian Partl • Web application for publishing data • Modular, highly customizable • Using PHP, Zend-framework • Modern interface using bootstrap, jQuery • Authentication, Query Interface • Wordpress integration • One code base to serve most needs, open source, (easily) extendable 10

  11. Daiquiri examples • MultiDark2 • Califa • 4MOST workshop • Plate Archive • Jubilee, Curie simulation database in Madrid http://escience.aip.de/daiquiri/ 11

  12. Screenshot

  13. Screenshot

  14. Screenshot

  15. VO compliance • Currently working on including VO protocols with Daiquiri – Download data as VOTables (MySQL-VOTable-Dump, see github) – TAP protocol for accessing data – UWS for job queues (MySQL query queue) • Problems: – No public PHP libraries for IVOA protocols available (only in java) – But community rather needs PHP or Python implementations 15

  16. Concluding Remarks • Comon tasks for each data publication: extracting, transforming, uploading the data • Different tool for each data set? – Should rather use only a few, generalized tools, reusable, easier to maintain – Takes a lot of time to develop – => Collect tools from data centers? Combine efforts? • Would like to have more implementations/libraries of VO protocols, in different languages 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend