APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. - - PowerPoint PPT Presentation

app computing
SMART_READER_LITE
LIVE PREVIEW

APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. - - PowerPoint PPT Presentation

APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. Toma / A. Saftoiu Virgo collaboration HESS collaboration, F. Acero and H. Gast Volker Beckmann Franois Arago Centre / APC CNRS / IN2P3 Outline Status Computing


slide-1
SLIDE 1

APP Computing

Volker Beckmann François Arago Centre / APC CNRS / IN2P3

Virgo collaboration KM3NeT, ASPERA / G. Toma / A. Saftoiu Auger; ASPERA/G.Toma/A.Saftoiu HESS collaboration, F. Acero and H. Gast

slide-2
SLIDE 2

Outline

Status Computing Data storage Grid vs. Cloud Challenges CNRS / IN2P3 initiative

Berghöfer et al. 2015 arXiv:1512.00988

slide-3
SLIDE 3

Data vs. computing

Very different experiments Different data types: Events, time-series, images Shared computing resources High-Throughput Computing (HTC): large computing centres, like CC-IN2P3, Grid (EGI) Local computing clusters Minor importance (French view): HPC, GPUs, (academic) cloud systems

  • V. Beckmann

HESS Fermi Cherenkov Telescope Array (CTA)

slide-4
SLIDE 4

Space based experiments

+ Rather low data rates (~ 10s Gbyte / day)

  • All-sky, all-mission analyses (e.g. Fermi/LAT, INTEGRAL IBIS /

SPI, Swift/BAT)

  • Complex analysis

+ Low storage needs (Tbyte range) + Accessibility and usability: centralised archives, common data formats, common tools

  • V. Beckmann

Fermi INTEGRAL Swift

slide-5
SLIDE 5

Ground based experiments

  • High data rates (~ Tbytes / day)

+ Event lists (HESS, Auger, Antares) + Time series (adv. Virgo / LIGO)

  • Computing intensive
  • Remote observation sites
  • Accessibility, common tools (e.g. ctools/gammalib)
  • V. Beckmann

Auger; ASPERA/G.Toma/A.Saftoiu Antares; Credits: F. Montanet

slide-6
SLIDE 6

Data rate evolution

Space based data limited by bandwidth 100 Mbps max (X-band), but INTEGRAL (2002): 1.2 Gbyte/day Hubble space telescope (1990): 15 Gbyte/day Gaia (2013), Euclid (2021): ~50 Gbyte/day ~10-20 Tbyte raw data per year Ground based: fast increase through fast read-out systems, multiple charge-coupled devices (CCDs) 1990s: 1 Mbyte / CCD frame LSST (2022): 3 GByte / exposure (15s) ~10 Pbyte raw data per year

slide-7
SLIDE 7

Daily data rates (Europe)

  • V. Beckmann
slide-8
SLIDE 8

European computing needs

  • V. Beckmann

Requirement in units of CERN-LHC Tier-0 centre

slide-9
SLIDE 9

CPU requirements (Europe)

  • V. Beckmann

kHS06 (~100 CPU) 2016 2020 Event-like data Auger, HESS, CTA, KM3NeT, Fermi, … 149 380 Signal-like (VIRGO, LIGO) 780 1290 Image-like (LSST, Euclid, …) 117 280 Total in kHS06 1047 1951 Total in LHC Tier-0 (2012)

1.6 3.0

slide-10
SLIDE 10

Disk storage requirements

  • V. Beckmann

Pbytes 2016 2020 Event-like data Auger, HESS, CTA, KM3NeT, Fermi, … 7 39 Signal-like (VIRGO, LIGO) 5.1 11 Image-like (LSST, Euclid, …) 2.4 21 Total in pByte 16 72 Total in LHC Tier-0 (2012)

0.52 2.4

Trend similar for tape storage

slide-11
SLIDE 11

Analysis, simulation, modelling

Air shower experiments require extensive simulations Cosmic Ray Simulations for Kascade (CORSIKA) Large CPU + storage requirements GPUs Gravitational waves: small data sets large computing need HPC, GPUs Increase in computing needs by a factor of 2 until end of this decade

  • V. Beckmann
slide-12
SLIDE 12

Data storage

High-energy cosmic ray experiments: Raw : reconstructed : simulation; 60:10:30 Ground based: dominated by raw data Space based: dominated by derived data Increase by factor 5 until end of this decade! Commercial cloud systems?

  • V. Beckmann
slide-13
SLIDE 13

Grid vs. Cloud

EGI runs 1 million jobs per day LHC: Largely batch, MC simulations, event reconstruction HESS, CTA (but also others; future e.g. KM3NeT)

  • V. Beckmann

ALICE ATLAS CMS LHCb CTA

  • thers

Usage of CPU time France Grille per project

slide-14
SLIDE 14

Grid vs. Cloud

EGI runs 1 million jobs per day LHC: Largely batch, MC simulations, event reconstruction HESS, CTA (but also others; future e.g. KM3NeT) Cloud solutions Software as a Service (SaaS) – run online tasks (Hera at HEASARC, VO) Platform as a Service (PaaS) – complete s/w platform (e.g. Euclid CODEEN) Data as a Service (DaaS) – use data remotely (e.g. CALDB, iRODS) Hardware as a Service (HaaS) – you provide OS + s/w + data (e.g. StratusLab) Infrastructure as a Service (IaaS) – Grid on demand

  • V. Beckmann
slide-15
SLIDE 15

Grid vs. Cloud

  • V. Beckmann

GRID Cloud Middleware gLite, UNICORE, ARC, … SlipStream, Hadoop Resources EGI Local academic clouds Availability ++

  • Input / output

+

  • Ease of use
  • +

Flexibility

  • ++
slide-16
SLIDE 16

Approach

Try to reduce amount of data! On-site (ground-based; LOFAR, CTA, SKA, …), on-board satellite processing (Gaia, INTEGRAL) where possible Then: centralise if possible (INTEGRAL, Gaia, LOFAR, SKA) use as few as possible (LSST, Euclid) GPUs: fast, but not good on i/o Fourier transformation (e.g. LOFAR), template fitting, challenging to use, training needed GRID: infrastructure, middleware, relatively heavy to use CTA, KM3NeT HPC / HTC with Hadoop, Hive (again, training needed) Cloud: virtualisation, flexibility, lower performance project development, production phase smaller projects

  • V. Beckmann

InSiDe Jülich

slide-17
SLIDE 17

Open Issues

Need for experts computing + projects Grid evolution Academic cloud systems in production (reliability) Federated cloud systems Commercial cloud systems (peak processing / replacement ?) Standards? Preservation – VM archive? Data storage evolution (x5 until 2020)

  • V. Beckmann
slide-18
SLIDE 18

Computing branch at CNRS / IN2P3

Objectives: Support research in applied computing at IN2P3 labs Identify main projects and colleagues interested Animate and coordinate initiatives Support education (training, master, PhD, HDR) Discuss and shape the future of computing in high-energy (astroparticle) physics We need your expertise!

  • V. Beckmann
slide-19
SLIDE 19

Open Issues

Need for experts and training on computing + projects Grid evolution (also EGI) Academic cloud systems in production (reliability) Federated cloud systems Commercial cloud systems (peak processing / replacement ?) Standards? Preservation – VM archive? Data storage evolution (x5 until 2020)

  • V. Beckmann
slide-20
SLIDE 20

Additional slides
 matériel supplémentaire
 noch mehr Folien
 materiale aggiuntivo
 aanvullend materiaal

slide-21
SLIDE 21

Computing requirements

  • V. Beckmann
slide-22
SLIDE 22

Data storage (disk)

  • V. Beckmann
slide-23
SLIDE 23

Challenges

Astroparticle physics goes BigData: CTA, KM3NET, SKA, … Pbyte scale data with need of Tflop processing Solutions depend on the science requirements Space or ground, remote or central, real-time processing or not… Advantage: community used to work together, file format standards, coding standards (C++, python) Development platforms

slide-24
SLIDE 24

Astro is 25% of CC-IN2P3

Credit: Pierre Macchi (CC-IN2P3) Computing of 7 PNHE projects amounts to 5% GENCI computing; see previous presentation by F. Casse

slide-25
SLIDE 25

One file format

  • V. Beckmann
  • Used for: images, spectra, light curves, tables, data cubes …
  • Used in: space-based and ground-based astrophysics, across all disciplines
  • Standards for key words, header, coordinate systems, …
  • Fits i/o libraries
  • tools to visualize and manipulate fits files (ftools, ds9, pyfits, …)
  • http://fits.gsfc.nasa.gov/
  • BigData: Volume, Velocity, Variety, Veracity and Value
  • Future? HDF5 ?
slide-26
SLIDE 26

CTA Computing: data volume

Raw-data rate

  • CTA South: 5.4 GB/s
  • CTA North: 3.2 GB/s

1314 hours of observation per year

Raw-data volume

  • ~40 PB/year
  • ~4 PB/year after

reduction

Total volume

  • ~27 PB/year including

calibrations, reduced data and all copies

  • L. Arrabito et al. (2015)
slide-27
SLIDE 27

Big-Data tomorrow: Euclid

How to achieve science goals:

Euclid RedBook (2012) Images (optical / infrared) spectra External (ground-based) images Merging of the data Photometric redshifts (distances) spectra Shape measurements High-level science products

slide-28
SLIDE 28

Big-Data tomorrow: Euclid

Euclid RedBook (2012)