Plans of the WLCG for Run3 and HL-LHC era Jose F. Salt Cairols - - PowerPoint PPT Presentation

plans of the wlcg for run3 and hl lhc era
SMART_READER_LITE
LIVE PREVIEW

Plans of the WLCG for Run3 and HL-LHC era Jose F. Salt Cairols - - PowerPoint PPT Presentation

XI CPAN DAYS 21-23 October 2019 Plans of the WLCG for Run3 and HL-LHC era Jose F. Salt Cairols Instituto de Fsica Corpuscular 23/10/2019 XI CPAN Days 1 Overview 1.-The WLCG Global Collaboration 2.-Run 3 and HL/LHC Plan 3.- The Spanish


slide-1
SLIDE 1

Plans of the WLCG for Run3 and HL-LHC era

Jose F. Salt Cairols Instituto de Física Corpuscular

XI CPAN DAYS 21-23 October 2019

23/10/2019 XI CPAN Days 1

slide-2
SLIDE 2

Overview

1.-The WLCG Global Collaboration 2.-Run 3 and HL/LHC Plan 3.- The Spanish LHC Computing GRID community (LCG-ES) 4.- Usage of additional compute resources 5.- Heterogeneity and Federation 6.- Software Optimization 7.- Spanish Strategy in Computing 8.- Summary and Outlook

23/10/2019 XI CPAN Days 2

slide-3
SLIDE 3

1.- The WLCG Global Collaboration The Worldwide LHC Computing GRID.

Distributed High- throughput computing infrastructure to store, process and analyze data produced by the LHC experiments. In numbers:

  • 167 sites, 42 countries, 63 MoU’s
  • ~ 800 Kcores
  • ~ 500 PB disk storage
  • ~ 750 PB tape storage
  • Optical private nertwork (LHCOPN)

and overlay over NREN s (LHCONE) with 10/100 Gbps links

CERN Computing Center

The equipment purchased by the centers (T0&T1 &T2) give service to the whole collaboration (as a detector) WLCG is a worldwide and non-stop infrastructure

Contributes to the scientific and technological progress

  • f the center which

participates in WLCG: scientific infrastructure, expert perssonel, etc

23/10/2019 XI CPAN Days 3

slide-4
SLIDE 4

2.-Run 3 and HL/LHC Plan

BEST GUESS Run 3:

  • 2021 is a vey low data test run , resources-> same as 2018 for pp
  • full Heavy Ions run is likely -> will need some level of additional resources
  • 2022 is a full year with a resources level of 1’5 times 2018
  • 2023-24 Moderate (20%) growth rates

23/10/2019 XI CPAN Days 4

slide-5
SLIDE 5

From I. Bird’s talk at 7th Scientific Computing Forum, 4/10/210 SCF, 4th Oct 2019, CERN

Resource Evolution

23/10/2019 XI CPAN Days 5

slide-6
SLIDE 6
  • 4-5 times gap between ‘flat budget– 20% annual increase’ and resource

requirements for HL-LHC

  • Intense R&D to reduce data and resource requirements

23/10/2019 XI CPAN Days 6

slide-7
SLIDE 7
  • Cost evolution is not well established
  • Assumed price reduction
  • 10% CPU, 15% disk, 20% tape

23/10/2019 XI CPAN Days 7

slide-8
SLIDE 8

3.- The Spanish LHC Computing GRID Community (LCG-ES)

Clouds:

  • CERN, CA, DE, ES, FR, IT, ND, NL, RU, TW, UK, US

The PIC Cloud (ES)

  • Tier1: PIC Barcelona
  • Provides 5% of Tier1 data processing of CERN's LHC detectors

ATLAS, CMS and LHCb

  • Tier2s

:

CMS Spanish Tier2

CIEMAT Madrid

IFCA Santander

ATLAS Spanish Tier2 IFIC Valencia IFAE Barcelona UAM Madrid)

LHCb Spanish Tier2

USC Santiago de Compostela

UB (Universitat de Barcelona=

LIP Lisbon, Portugal

UTFSM Santiago, Chile

UNLP La Paz, Argentina (inactive)

  • Integrated in the WCLG project (World Wide LHC Computing GRID) and

following the ATLAS/CMS/LHCb computing models

  • We represent the 4% of the total Tier-2s resources and the 5% of the

Tier-1s ones

23/10/2019 XI CPAN Days 8

Total accounting of Resources: CPU (HS06) =182K Disk (PB) = 14.5 Tape (PB) = 19.6

LCG-ES

slide-9
SLIDE 9

More than 22 million finished jobs On average, 5000 slots occupied by running jobs daily More than 196 million events proccessed More than 46 million files produced

Spanish Cloud performance in Run II

9 23/10/2019 XI CPAN Days

slide-10
SLIDE 10

4.- Usage of additional compute resources

  • Supercomputers for LHC

– Growing funding in supercomputing (HPC) infrastructures

  • Roadmap towards Exaflop machines
  • Countries/Funding agencies pushing HEP community to use these resources

– Euro HPC Beur funding 2 aprox 200 PFlps machines by 2021, 2 EXaFlops by 2024

– Data intensive computing with HPC facilities is not easy.

  • Limited/ no network connectivity in complete nodes
  • Limted storage for cahcing I/O event data files

– The ‘Call for resource allocAtion” in not suitable

  • We need a guaranteed share of resources
  • agreement with BSC

– LHC applications are NOT really suited for HPC

  • No large parallelization ( no use of fast node interconnects
  • No eseential use of acceleratos (GPU, FPGA)

– Substantial integration work to make HPC work for HTC

23/10/2019 XI CPAN Days 10

slide-11
SLIDE 11
  • Use of BSC (Barcelona Supercomputing Center) resources:

– Recommendation of using the computing resources of BSC coming from Funding Agency – ATLAS: : effort devoted to addapt the queues at BSC to run simulation production jobs . In 2018, start to call for computing time (IFIC, IFAE) and several requests have been granted

  • Computing hours have been requested in the Spanish

Supercomputing Network (RES) and Europe (PRACE), being granted for the IFAE 2.8 M hours and IFIC 1.2M hours in the Mare Nostrum (BSC) and 2M hours in Lusitania (Cenit)

  • installed the ATLAS software and the necessary tools for the

execution of simulation work of the ATLAS detector in these HPCs, so in this way we have used resources outside the Spanish Tiers centers. We have simulated more than 60 million event

  • IFIC/IFAE-PIC led

ATLAS simulation when profiting of

  • pportunistic HPC

resources

  • More than 60

millions of events simulated

  • More than 90% of

jobs ended successfully

– CMS:

  • CIEMAT/PIC: Regarding the use of BSC resources by

CMS, we still cannot use them due to the lack of network connectivity from the nodes, which is necessary in CMS to integrate them into the WMS. There is a project with the HTCondor team to address that limitation.

  • IFCA

Adaptation of ALTAMIRA (node of RES in Cantabria) within the GRID Infrastructure (input de Ibán) – The grid infrastructure of the T2 has been redesigned so that when the T2 is saturated, check the availability of free HPC resources and forward them there. At the moment pilot examples are operating using altamira in "parasitic" mode, but it can be easily changed.

  • LHCb:

at the spanish level the LHCb groups have not started with these activities yet

23/10/2019 XI CPAN Days 11

slide-12
SLIDE 12
  • In December 2018: meeting at BSC to

explore the possibility of having a dedicated share for LHC computing needs

Take the example of another special ‘project ‘agreement with BSC

– February-April: to prepare an LHC Computing-BSC agreement draft – Discussion of technical and policy questions – July 2019: Sergi Girona (BSC) will prepare the definitive document agreement to be approved at the November BSC ‘Junta de Gobierno’ (BSC Executive Board)

  • February-March 2020 could be opened for

users (hopefully)

23/10/2019 XI CPAN Days 12

Meeting at BSC in December 2018

View of Mare Nostrum

slide-13
SLIDE 13

Cloud Computing Resources:

23/10/2019 XI CPAN Days 13

Experiments have run large scale tests using Cloud compute nodes Google Cloud, Amazon AWS, Microsoft Azure

  • > (aprox) 50K cores concurrently for few days

=>Commercial cloud is

  • not profitable for either (a) storage or (b)

computing,

  • But it can be useful to test new architectures

without investing ⇒ Currentely essentially no commercial cloud use for LHC computing ⇒ Potential future opportunties:

European Open Science Cloud (EOSC) A EU model for use of cloud computing in the private and public sector

slide-14
SLIDE 14

23/10/2019 XI CPAN Days 14

European Science cluster of Astronomy & Particle Physics

ESFRI Research Infrastructure

slide-15
SLIDE 15

5.- Heterogeneity and resources federation

23/10/2019 XI CPAN Days 15

slide-16
SLIDE 16

23/10/2019 XI CPAN Days 16

slide-17
SLIDE 17

Federation is the key

23/10/2019 XI CPAN Days 17

  • Federation in data storage:

– The idea is localize bulk data in a cloud service (data lake): minimize replication, assure availability – Serve data to remote ( or local) compute grid, cloud, HPC, ??? – Simple caching is all that is needed at compute site (or none, if fast network) – Federated data at national, regional, global scales

slide-18
SLIDE 18

23/10/2019 XI CPAN Days 18

  • Federation of

computing resources

– Main issue: reducing the hardware cost – reducing the

  • perational cost

– Co-location of data and processors is not guaranteed- sites can be ‘diskless’ – Heterogenous computing

PIC is contributing actively in the first group with studies in Data

Access and Popularity for a CMS at PIC and CIEMA measuring the effect on the applications to real data in a remote way

slide-19
SLIDE 19

6.- Software Optimization

  • Solution could come from the software

– 50 millions of lines of code mainly C++ – “a project / experiment cannot afford to have bad software” (Graeme’s talk in Granada)

  • Initiatives:

– HEP Software Foundation – IRIS-HEP: Institute for Research & Innovation in Software for HEP, 25M$, 5 years – Proposal a EU Scientific Software Institute – In Spain: COMCHA forum

  • New hardware architectures

– High level parallelism , new instructions sets,… – Support in software frameworks for heterogenous hardware

  • New/faster algorithms

– Machine Learning/Deep Learning – Rewrite physics algorithms for new hardware

23/10/2019 XI CPAN Days 19

Improvement in CPU consumption by using faster phyisics algortithms in FASTSIM/FASTRECO

slide-20
SLIDE 20

7.- Spanish Strategy in Computing

23/10/2019 XI CPAN Days 20

  • Common theme in many contributions to the EPPS Granada is the desire to

collaborate with and benefit from LHC R&D work

  • Synergies and ‘not to reinvent the wheel’
  • Situation in different projects:

DUNE and CTA will leverage the WLCG for its Computing Infrastructure Nuclear Physics Coll: ESCAPE address FAIR data management The LHC Computing Model has been adapated to the needs and the size of AGATA collaboration Computing @ Future Accelerators

Meeting May 2019: Addressing the outstanding questions CLIC and Future Circular Cilliders

slide-21
SLIDE 21

23/10/2019 XI CPAN Days 21

and implies governance evolution Our strategy in Spain could be to establish a Computing Committee in order to coordinate the study of the computing/storage needs of the different projects/

  • initiatives. Our organization would be fully embedded in the governance model

described above.

slide-22
SLIDE 22

8 .- Summary and Conclusions

  • The Spanish LHC GRID Computing projects have been essential for the

scientific achievements in LHC projects

  • New needs and objectives for Run 3 and HL-LHC will imply deep changes

in our organization and technical challenges for the HEP Computing Community

– HPC resources/Cloud Computing/HLT – Reseource Federation: Data Lakes

  • Export –partially or globally- the WLCG organization and perspective to
  • ther Astroparticle, Nuclear and High Energy scientific projects ( sinergy)
  • Take profit of the experience of the LHC computing GRID groups at the

spanish centers since they (the centers) are also involved in other non- LHC experiments

23/10/2019 XI CPAN Days 22

slide-23
SLIDE 23

THANKS QUESTIONS?

23/10/2019 XI CPAN Days 23