Data Ingestion in CTA Stefano Gallozzi 1 , Eva Sciacca 2 , L.Angelo - PowerPoint PPT Presentation

Data Ingestion in CTA Stefano Gallozzi 1 , Eva Sciacca 2 , L.Angelo Antonelli 1,3 , Alessandro Costa 2 1 INAF, Astrophysical Observatory of Rome RIA-653549 2 INAF, Astronomical Observatory of Catania 3 ASDC, ASI-Science Data Center stefano.gallozzi@oa-roma.inaf.it

Cherenkov Telescope Array https://cta-observatory.org WHAT : CTA is the worldwide project for the future of Very High Energy gamma-ray astronomy. ~20 telescopes for the North-site (Canarie) ~100 telescopes for the South-site (Chile) WHO: the CTA Consortium consists of scientists and engineers from 32 countries from 5 continents and has become a truly global (ESFRI) project. WHY: One of the major technological challenge is related to the data-handling and archiving of the huge amount of data (from 20 to 100 PB/year) coming from the observatory facilities. 2 Data Ingestion in CTA

Data Life Cycle 3 Data Ingestion in CTA

CTA Data Model Data Short Name Description Level DL0 DAQ-RAW Acquired raw data. DL1 CALIBRATED Calibrated camera data. DL2 RECONSTRUCTE Reconstructed shower parameters (such as energy, D direction, particle ID). DL3 REDUCED Sets of selected events with associated instrumental response characterizations needed for science analysis. DL4 SCIENCE High Level binned data products (such as spectra, sky maps, or light curves). DL5 OBSERVATORY Legacy observatory data (such as survey sky maps or source catalog). 4 Data Ingestion in CTA

Data Requirements Without data compression and assuming 165 operational nights/yr: ASTRI/Prot. → ~0.8 TB/night → ~0.3 PB/year Mini-Array → ~3 TB/night → ~6.1 TB/night A.R. → ~1.0 PB/year A.R. CTA → ~8.5 GB/s → ~40 TB/night → ~4 PB/year → ~20 PB/year A.R. (A.R. = After Reduction → input+processed data including calibs, intermediate reduction and MC simulation data) OPTIMISTIC SCENARIO The CTA Archive system must store, manage, preserve and The pessimistic one can provide easy access to such huge amount of data for a long time. take ~>100PB/year ! 5 Data Ingestion in CTA

Archive Framework Open Archive Information System (OAIS) standard • INGEST unit involves a collection of software and/or middleware able to receive bulk data of difgerent types coming from the array and to prepare them for storage, performing basic operation like data indexing, dependencies and compression. • STORAGE guarantees the effjcient retrieval of ingested data, and providing simple archive hierarchy management and maintenance. Storage also supervises the status of the media used in the archive, providing a guarantee of error control and data security . • ADMINISTRATION unit deals with all the operations related to the CTA archive system and its management. It will assure archive performance and standards/requirements fulfjllment by means of dedicated monitoring functionality and recover of failures. • ACCESS unit consists of a collection of software and on-line services that provide effjcient access to the data to the other CTA components (e.g. the data processing pipelines). Furthermore, it will make CTA users able to access CTA data accordingly to their specifjc data access privileges. • DATA TRANSFER unit will guarantee the transfer of data and data products between the on-site and the ofg-site zone of the archive system. 6 Data Ingestion in CTA

Architecture 7 Data Ingestion in CTA

Running thePrototype • The test infrastructure has been setup using VirtualBox Virtual Machines and Docker containers . • Demo datasets coming from the ASTRI project are uploaded to the CTA OneZone within a space supported by the two providers. • The ingested data are enriched with Metadata thanks to the Cloud Data Management Interface (CDMI) or, alternatively, the REST API can be used. • Metadata queries are performed using REST-API and indexing functions (associated to the Space) on pre-defjned extended attributes (Metadata). • The CouchBase database (embedded in OneData) can be used alternatively to query and retrieve the metadata using Elastic Search engines (e.g. N1QL) or common MapReduce functions using the standard CouchBase console and the SDK from the client side. This will enable versatile access to the whole CTA dataset to higher level application frameworks and end-users analysis tools. 8 Data Ingestion in CTA

Astronomical DATA (FITS format?) data descriptors == metadati ……. ……. ……. 9 Data Ingestion in CTA

Metadata 10 Data Ingestion in CTA

Metadata curl -k -H $TOKEN_HEADER -H $CDMI_VSN_HEADER -H 'Content-Type: Sample Ingestion application/cdmi-object' -d '{"metadata" : {“PROGRAM_ID" : “001"}}' -X PUT "$ENDPOINTDATA" function(meta) { if(meta['PROGRAM_ID']) { return meta['PROGRAM_ID']; Sample indexing function } return null; } curl -v -k --tlsv1.2 -Ss -H "X-Auth-Token: $TOKEN" \ Query using a REST-API call -X GET "https://$HOST:8443/api/v3/oneprovider/query- index/$INDEX_ID?key=\"0001\"&stale=false" 11 Data Ingestion in CTA

Distributed Archive Advantages • lower costs respect to a single huge data center, • easy manageability & maintenance by single-site human resources • distributed database of meta-data within the architecture • easily scalability by adding new nodes on the system • disaster recovery free by difgerent sites redundancy 12 Data Ingestion in CTA

Issues and Future Works • Improve Metadata query: possibility to performe more complex queries. • T est OneData roles and data permissions (through connection with an Authentication and Authorization Infrastructure)  Data is proprietary for a fjxed period, then it becomes public. • T est of the replication policies between providers. • Automatic Metadata ingestion (from fjle FITS headers) • Prototype deployment of the CTA Archive in 3 sites (INAF-Catania, INAF-Rome, ASDC) to enable CTA users to test it. • Prototype deployment with Data-Grid functionalities for CTA specifjc users (simulation & pipelines) • A look forward to Cloud-Services to be ready for CTA Workload Management System (DIRAC) migration from the DataGrid Environment to the Cloud Paradigm. 13 Data Ingestion in CTA

References  CTA web page: http://www.cta-observatory.org/  ASTRI web page: http://www.brera.inaf.it/astri/  YouT ube demo: https://youtu.be/TbmJn1bIizE  OneData documentation: https://onedata.org/docs/index.html  OneData @ docker hub: https://hub.docker.com/u/onedata/ 14 Data Ingestion in CTA

Ready to share experience! stefano.gallozzi@oa-roma.inaf.it eva.sciacca@oact.inaf.it alessandro.costa@oact.inaf.it angelo.antonelli@oa-roma.inaf.it https://www.indigo-datacloud.eu Better Software for Better Science. 15 Data Ingestion in CTA

Questions? CTA-North → ← CTA-South 16 Data Ingestion in CTA

Data Ingestion in CTA Stefano Gallozzi 1 , Eva Sciacca 2 , L.Angelo - PowerPoint PPT Presentation

Data Ingestion in CTA Stefano Gallozzi 1 , Eva Sciacca 2 , L.Angelo Antonelli 1,3 , Alessandro Costa 2 1 INAF, Astrophysical Observatory of Rome RIA-653549 2 INAF, Astronomical Observatory of Catania 3 ASDC, ASI-Science Data Center

TeV Galactic Source Physics with CTA TeVPA 2010 TeV -rays and CTA TeV -ray astronomy CTA

mzawa@ieav.cta.br monica@ieav.cta.br 31. AUG. 2010 1 Cognitive Maps to Aid Structure an Effect

GALACTIC PHYSICS WITH CTA Ryan C. G. Chaves 1 for the the CTA Consortium 1 CNRS/IN2P3 / Univ.

Practical R: Data Ingestion and Munging Practical R: Data Ingestion and Munging Abhijit Dasgupta

How CTA uses InDiCo How CTA uses InDiCo InDiCo Workshop Dirk Hoffmann, May 27 th 2013 Dirk

Massimo Persic INAF+INFN Trieste for CTA Consortium Merate, Oct 6, 2011 CTA Ground-Based

Scalable Data Ingestion Architecture Using Airflow and Spark April 17, 2019 Johannes Lepp

CTA : an Observatory with open data access C. Boisson on behalf of CTA VO WG LUTh, Observatoire

Gamma astronomy at CPPM with HESS & CTA H. Costantini Aix-Marseille Universit, CPPM

Current Status Joanna Kocot, Tomasz Szepieniec, Hubert Siejkowski ACC Cyfronet AGH

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Keeping our communities moving CTA Members Meeting Bill Freeman Chief Executive Why are we

Atmospheric calibration of the Cherenkov Telescope Array Jan Ebr for the CTA Consortium

John Carr CPPM, Marseille 1/20 The New Gamma Ray Observatory: CTA, J. Carr FFP14, July 2014

SUPERNOVA REMNANTS IN THE VERY-HIGH-ENERGY SKY: PROSPECTS FOR CTA Pierre Cristofari* for the CTA

CTA Design Study CTA Design Study - Swiss Hardware Contributions - Swiss Hardware Contributions

Looking for reflected light from Boo b in high-cadence HARPS-N observations F. Borsa 1 and the

Low-frequency variability in X-ray binaries Tomaso Belloni (INAF-Osservatorio Astronomico di

Blazars and cosmic bkgs. Fabrizio Tavecchio INAF-Oss. Astron. di Brera, Italy Introduction:

OAR 8 20 -0 0 5-0 0 8 0 Supervision and Control (1) "Supervision and control," as used

Fast-Growing Black Holes in Fast-Growing Galaxies, at z~5 the role of mergers revealed with ALMA

GRB 090426 An oddball event in the outskirts of two interacting galaxies Christina Thne (INAF

Euclid OULE3 Jean-Luc Starck on behalf of the OULE3 team Euclid

Climate Change and the New Industrial Revolution - What we risk and how we should cast the

Data Ingestion in CTA Stefano Gallozzi 1 , Eva Sciacca 2 , L.Angelo - PowerPoint PPT Presentation

Data Ingestion in CTA Stefano Gallozzi 1 , Eva Sciacca 2 , L.Angelo Antonelli 1,3 , Alessandro Costa 2 1 INAF, Astrophysical Observatory of Rome RIA-653549 2 INAF, Astronomical Observatory of Catania 3 ASDC, ASI-Science Data Center

TeV Galactic Source Physics with CTA TeVPA 2010 TeV -rays and CTA TeV -ray astronomy CTA

mzawa@ieav.cta.br monica@ieav.cta.br 31. AUG. 2010 1 Cognitive Maps to Aid Structure an Effect

GALACTIC PHYSICS WITH CTA Ryan C. G. Chaves 1 for the the CTA Consortium 1 CNRS/IN2P3 / Univ.

Practical R: Data Ingestion and Munging Practical R: Data Ingestion and Munging Abhijit Dasgupta

How CTA uses InDiCo How CTA uses InDiCo InDiCo Workshop Dirk Hoffmann, May 27 th 2013 Dirk

Massimo Persic INAF+INFN Trieste for CTA Consortium Merate, Oct 6, 2011 CTA Ground-Based

Scalable Data Ingestion Architecture Using Airflow and Spark April 17, 2019 Johannes Lepp

CTA : an Observatory with open data access C. Boisson on behalf of CTA VO WG LUTh, Observatoire

Gamma astronomy at CPPM with HESS &amp; CTA H. Costantini Aix-Marseille Universit, CPPM

Current Status Joanna Kocot, Tomasz Szepieniec, Hubert Siejkowski ACC Cyfronet AGH

CTA WEIGHTS AND CTA WEIGHTS AND DIMENSIONS DIMENSIONS INITIATIVES INITIATIVES Meeting of the

Keeping our communities moving CTA Members Meeting Bill Freeman Chief Executive Why are we

Atmospheric calibration of the Cherenkov Telescope Array Jan Ebr for the CTA Consortium

John Carr CPPM, Marseille 1/20 The New Gamma Ray Observatory: CTA, J. Carr FFP14, July 2014

SUPERNOVA REMNANTS IN THE VERY-HIGH-ENERGY SKY: PROSPECTS FOR CTA Pierre Cristofari* for the CTA

CTA Design Study CTA Design Study - Swiss Hardware Contributions - Swiss Hardware Contributions

Looking for reflected light from Boo b in high-cadence HARPS-N observations F. Borsa 1 and the

Low-frequency variability in X-ray binaries Tomaso Belloni (INAF-Osservatorio Astronomico di

Blazars and cosmic bkgs. Fabrizio Tavecchio INAF-Oss. Astron. di Brera, Italy Introduction:

OAR 8 20 -0 0 5-0 0 8 0 Supervision and Control (1) &quot;Supervision and control,&quot; as used

Fast-Growing Black Holes in Fast-Growing Galaxies, at z~5 the role of mergers revealed with ALMA

GRB 090426 An oddball event in the outskirts of two interacting galaxies Christina Thne (INAF

Euclid OULE3 Jean-Luc Starck on behalf of the OULE3 team Euclid

Climate Change and the New Industrial Revolution - What we risk and how we should cast the

Gamma astronomy at CPPM with HESS & CTA H. Costantini Aix-Marseille Universit, CPPM

OAR 8 20 -0 0 5-0 0 8 0 Supervision and Control (1) "Supervision and control," as used