Computing Construction Project DUNE UK/FNAL planning meeting Pete - - PowerPoint PPT Presentation

computing construction project
SMART_READER_LITE
LIVE PREVIEW

Computing Construction Project DUNE UK/FNAL planning meeting Pete - - PowerPoint PPT Presentation

iris DUNE-UK Computing Construction Project DUNE UK/FNAL planning meeting Pete Clarke Edinburgh Edinburgh 8/9 Oct 2018 1 1 IET, Oct 09 Dummies guide to how computing support works in UK 2 Physical Computing Resources (HS06, PBytes) q


slide-1
SLIDE 1

1

iris

IET, Oct 09

1

DUNE-UK

Computing Construction Project

DUNE UK/FNAL planning meeting Edinburgh 8/9 Oct 2018 Pete Clarke Edinburgh

slide-2
SLIDE 2

2

Dummies guide to how computing support works in UK

slide-3
SLIDE 3

3

q First port of call is GridPP

  • GridPP is the UK Project which provides computing for HEP
  • Current GridPP incarnation is GridPP5 until March 2020
  • GridPP5 is funded primarily for the LHC (90%)
  • 10% for “other HEP experiments known at the time” i.e. not really for DUNE
  • But GridPP always tries to help anyway, and generally succeeds
  • The GridPP6 incarnation will (hopefully) be April 2020 - 2024
  • The GridPP6 “ask” will include DUNE requirements
  • But STFC is still mainly operating under “flat-cash” resources
  • So “ask” != “get”

Ø See talk by Dave Briton (GridPP Project Leader) Physical Computing Resources (HS06, PBytes)

slide-4
SLIDE 4

4

q Next port of call is a thing called IRIS

  • IRIS is too complicated to explain fully here – pub or coffee bar later
  • IRIS is coordinating body for all computing across all of STFC
  • Particle Physics,
  • Astronomy,
  • Astroparticle,
  • Nuclear,
  • Photon source (Diamond),
  • Neutrons(IRIS),
  • Laser(CLF)
  • IRIS also has capital resources until 2022 (from ministry – BEIS)
  • DUNE is in scope of IRIS (fair share along with all others)

Ø DUNE will benefit from IRIS resources (until 2022) provided via GridPP

Physical Computing Resources (HS06-years, PBytes)

slide-5
SLIDE 5

5

q Next ports of call are more nebulous

  • Effectively, IRIS just made a further bid for additional hardware resource
  • Probability of success <=50% and may take scale of years to happen
  • Meanwhile, all Research Councils in the UK have just amalgamated into UKRI
  • UKRI is attempting to “harmonise” eInfrastructure
  • This brings an opening to ask Ministry (BEIS) for hardware resource at some point
  • Probability of success in short term <=30%, long term ~ 80%
  • It’s a hard path - but we have to walk it

Physical Computing Resources (HS06, PBytes)

slide-6
SLIDE 6

6

  • This means distributed computing middleware common to all experiments - WLCG

software components:

  • Compute Elements, Storage Elements
  • Tape service
  • CVMFS
  • Databases
  • Information service
  • Security, VOMS
  • Accounting, monitoring
  • Network
  • …..
  • GridPP is funded for the people to deploy and operate WLCG middleware

à GridPP is the only UK effort for WLCG middleware à GridPP can not really support things which are single experiment specific

  • It would be “difficult” to get resource in the UK to support a fundamentally

different fabric

Software Infrastructure: Common

slide-7
SLIDE 7

7 7

Activity – 1 Activity – 2 Activity - 3 Vertical Layer Scientists using analysis frameworks and adding analysis specific code Scientists using analysis frameworks and adding analysis specific code Scientists using analysis frameworks and adding analysis specific code Research software Engineering Engineering, Re-engineering, Development, Porting, Moving code down stack Activity specific reconstruction, simulation and analysis framework software Activity specific reconstruction, simulation and analysis framework software Activity specific reconstruction, simulation and analysis framework software Activity specific production computing, data management, trigger and online computing. Activity specific production computing, data management, trigger and online computing. Activity specific production computing, data management, trigger and online computing. Common distributed computing software infrastructure, operations, support and development. Global services such as security response, service registries, monitoring and accounting. Software verification and rollout. Physical infrastructure and operations staff

GridPP is here

Software Infrastructure

slide-8
SLIDE 8

8

  • Software Infrastructure (sInfrastructure) support – DUNE specific

Activity – 1 Activity – 2 Activity - 3 Vertical Layer Scientists using analysis frameworks and adding analysis specific code Scientists using analysis frameworks and adding analysis specific code Scientists using analysis frameworks and adding analysis specific code Research software Engineering Engineering, Re-engineering, Development, Porting, Moving code down stack Activity specific reconstruction, simulation and analysis framework software Activity specific reconstruction, simulation and analysis framework software Activity specific reconstruction, simulation and analysis framework software Activity specific effort for production computing, data management, trigger and online computing. Activity specific effort for production computing, data management, trigger and online computing. Activity specific effort for production computing, data management, trigger and online computing. Common distributed computing software infrastructure, operations, support and development. Global services such as security response, service registries, monitoring and accounting. Software verification and rollout. Physical infrastructure and operations staff

Software Infrastructure DUNE specific

DUNE

slide-9
SLIDE 9

9

  • This means the vertical elements of the diagram
  • This includes

– DUNE user analysis software – DUNE reconstruction and simulation software – DUNE distributed production computing software

  • The funding route (now) in UK is through the PPRP = Projects Peer Review Panel
  • We have just submitted the DUNE UK Construction Proposal to PPRP: 2019-2026

à WP1: Physics and Computing à WP2: DAQ à WP3: APAs

  • WP1

à WP1.1, 1.2 1.4 are Physics à WP 1.3 Production Computing Construction

Software Infrastructure: DUNE specific

slide-10
SLIDE 10

10

q WP 1.3 - Computing Construction (15 FTE-years) Ø WP 1.3.1 Data movement and management

  • Starts with RUCIO, SAM integration as necessary – but flexible as project evolved

Ø WP 1.3.2 Offline Production Management and Monitoring (2.5 FTE-y)

  • Production management
  • Workload management
  • Monitoring

Ø WP 1.3.3 Integration with Cloud Platforms (1.75 FTE-y)

  • Integration of HEPCloud, IRIs, ..other cloud resources into DUNE production

Ø WP 1.3.4 AI Application to Offline Production (3.25 FTE-y)

  • Applying AI to the monitoring of DUNE production work – is it working, data quality.
  • Applying AI to data selection (if we can)

Ø WP 1.3.5 Computing Production System for SURF (5.25 FTE-y)

  • SURF data centre

Software Infrastructure: DUNE specific

slide-11
SLIDE 11

11

q WP 1.3 : UK DUNE project people

  • Edinburgh: Perry, Nebot, Clarke, Muheim , (Washbrook), (Gambetta)
  • Manchester: McNab
  • RAL: Nandakumar, Brew, Wilson

Software Infrastructure: People

q DUNE UK compute group : DUNE members + helpful GridPP people

  • Jones, Bauer, Dewhurst, Nowak, Pec, Davda, Moore, Blake, Hartnell, Back,

Doige, Long, Fayer,

slide-12
SLIDE 12

12

q Data Management

  • Edinburgh working on RUCIO development
  • RAL, Imperial working on multi VO RUCIO

Work areas so far

q UK Resources

  • Work by GridPP people (Manchester, RAL, Imperial, Liverpool)
  • Have enabled GridPP resources
  • Disk: 0.9 PB of Proto-DUNE data
  • CPU :
slide-13
SLIDE 13

13

Summary

q Physical Compute resources for DUNE/ProtoDUNE in short term ✓ q Physical compute resources for DUNE during exploitation – unknowable but optimistic q Computing middleware support via GridPP – to be requested in GridPP6 but optimistic q Computing Project construction Staff - 15 SY requested - believe it when it happens due to track record of the process

slide-14
SLIDE 14

14

Leave you with a sobering example of how services can go wrong Protocol is à send English to an email translation service à Welsh translation is emailed back to you

slide-15
SLIDE 15

15

Leave you with a sobering example of how services can go wrong Protocol is à send English to an email translation service à Welsh translation is emailed back to you

slide-16
SLIDE 16

16

Leave you with a sobering example of how services can go wrong Protocol is à send English to an email translation service à Welsh translation is emailed back to you

This says “I am not in the office at the moment. Send any work to be translated”.