Software and Computing Requirements: WMS and DDM Maxim Potekhin - - PowerPoint PPT Presentation

software and computing requirements wms and ddm
SMART_READER_LITE
LIVE PREVIEW

Software and Computing Requirements: WMS and DDM Maxim Potekhin - - PowerPoint PPT Presentation

Software and Computing Requirements: WMS and DDM Maxim Potekhin potekhin@bnl.gov DUNE WMS/DDM Workshop@FNAL 07/28/2016 About this presentation The value of the requirements (the document): the purpose is to inform and guide the


slide-1
SLIDE 1

Software and Computing Requirements: WMS and DDM

Maxim Potekhin

potekhin@bnl.gov

DUNE WMS/DDM Workshop@FNAL 07/28/2016

slide-2
SLIDE 2

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

About this presentation

  • The value of the requirements (the document):

– the purpose is to inform and guide the evolution of the DUNE Computing Model – allow us to think through potential problems before they become issues – based on broad consensus in the Collaboration – serve as a common reference and systematized list of computing items that need to be addressed

  • The WMS/DDM requirements were influenced by the recommendation

from DOE to use the LHC experience in scoping and designing the systems for DUNE.

  • WMS/DDM are an important part of the requirements.
  • The requirements do not imply preference to any specific solutions.
  • They are a “living document” and will be updated as needed, including

feedback from discussions such as this workshop.

  • The requirements have been updated in 2015-2016 and incorporated as

“Appendix B” to the DUNE Computing Model, DUNE-doc-914-v2.

– http://docs.dunescience.org:8080/cgi-bin/ShowDocument?docid=914

2

slide-3
SLIDE 3

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

The structure of the Requirements

3

slide-4
SLIDE 4

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

The Content of the Requirements (DocDB 914)

4

slide-5
SLIDE 5

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

The following slides contain parts of the Requirements related to Workload Management and Data Management. Abridged/paraphrased as necessary to conserve time.

5

slide-6
SLIDE 6

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Grid and Cloud - Issues

  • We aim for most efficient utilization of all computing resources and

hardware available to the Collaboration.

  • Hedging against significant uncertainties inherent in estimating and

planning resource allocation over the next 10 years.

  • Grid Sites may have a wide range of capabilities, interfaces and other

configuration parameters - heterogeneity.

  • We need to insulate the users from the heterogeneous nature of the Grid

and instead presenting a homogeneous computing medium to them.

6

slide-7
SLIDE 7

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Grid and Cloud - Requirements

  • A widely distributed computing infrastructure, featuring a network of

federated resources (including Grid- and Cloud-based resources) shall be implemented in close cooperation with participating computing sites, institutions and agencies (cf. the Open Science Grid etc).

  • Necessary tools and procedures shall be provided, for streamlined

incorporation of new facilities and efficient use of opportunistic resources.

  • A Grid Information System shall keep the information about the Grid sites.

7

slide-8
SLIDE 8

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Distributed Computing: WMS definition

  • What is a WMS?

– an early example of gLite: “The Workload Management System (WMS) is a collection of components that provide the service responsible for distributing and managing tasks across computing and storage resources available on a Grid.” – According to the DUNE Requirements: “a system that enables automated placement

  • f computational payload jobs submitted by its users on distributed resources, using

the underlying Grid layer, and makes subsequent record keeping, accounting, elements

  • f data management and general monitoring available to the user”.

8

slide-9
SLIDE 9

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Distributed Computing: WMS description

  • Workload Management Systems insulate individual users from specific

configuration details and certain failure modes of Grid sites and networks, and provide substantial automation in managing the user's computational payload on the Grid.

  • Monitoring capabilities of a WMS (down to the job level) serve as a

valuable debugging tool, and represent an essential toolkit for the

  • perational support teams.
  • A WMS must be capable of keeping proper information about releases and

document the software configuration used for a specific production run.

9

slide-10
SLIDE 10

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Distributed Computing: WMS Requirements

  • DUNE shall implement a Workload Management System (WMS) for

resource management and brokerage functionality which will govern distribution of most types of computational workload in DUNE (e.g. production jobs, group analysis etc) across variety of resources available to the Collaboration.

  • The DUNE WMS shall be capable of keeping precise record of the software

configuration used for each and every production job deployed on the Grid, including the DUNE Offline Software Release information.

  • The DUNE WMS shall be capable of quickly suspending participating sites

due to outages, network congestion or potential security issues.

  • The DUNE WMS shall have a Workflow Management layer, which will help

create and manage large groups of Grid tasks supporting the scientific workflows.

10

slide-11
SLIDE 11

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

WMS Requirements (cont'd)

  • An DUNE WMS Monitoring System shall be implemented to ensure

efficient operation of the WMS, by helping ascertain status and progress of Grid jobs, accounting of resource utilization, identification and debugging

  • f failure modes etc.
  • The DUNE WMS Monitoring System shall have interfaces conducive to

integration with both Web UI for users and operators, and with automated systems which consume the WMS data.

11

slide-12
SLIDE 12

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Distributed Data - Requirements

  • Raw Data replication - summary of basic requirements

– redundant replicas (number of copies TBD) – site requirements are to be established (e.g. capacity, network throughput etc) – replicas can be striped if necessary across a few sites

  • Processed Data replication - summary of basic requirements

– Data placement based on research interests of the corresponding working group operating at a particular location, resource availability and scheduling policies of the WMS. The number of replicas of the processed data shall not be subject to a fixed minimum.

  • General:

– Assertion of validity of the data being replicated and/or transmitted (e.g. checksum controls). Control of data placement, volume, status and other characteristics shall be available. – A highly symmetrical placement strategy for the processed data shall exist, i.e. in principle both input and output data for any job or application can reside at any site or host which is a part of the DUNE distributed data network.

12

slide-13
SLIDE 13

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

File Catalog Requirements

  • A file catalog system shall be put in place by the DUNE Software working

group.

  • The file catalog system shall be protected from data loss to the greatest

extent possible, by utilizing redundancy, replication and backup and restore systems.

  • The file catalog system shall have interfaces which are flexible and

extensible enough to cover the range of data storage and distribution technologies employed in DUNE.

  • The file catalog system shall cover the totality of distributed storage used

by DUNE, i.e. it will allow its clients to locate potentially multiple replicas

  • f the data at multiple sites.

13

slide-14
SLIDE 14

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Metadata Requirements

  • A file metadata system shall be created to support the distributed data

processing capabilities of DUNE. It will cover the data managed by all participating sites, utilizing a variety of middleware and storage media.

  • The DUNE metadata system must scale to expected file, site and

job/access multiplicities and rates.

  • Information contained in the metadata system shall be protected from

data loss to the greatest extent possible, by utilizing redundancy, replication and backup and restore systems.

14

slide-15
SLIDE 15

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Stuff that is yet to be included

  • WMS and DDM

– The current version of the requirements does not contain much specifics on the interaction of the WMS and DDM.

  • WMS deployment

– it must be possible to run the WMS at any site with adequate network connectivity and hardware, without relying on a specific site configuration.

  • Job submission

– job submission via a network client running on the user's computer (e.g. laptop) which can be located anywhere.

  • Log files

– important from both infrastructure diagnostics/debugging as well as for the payload characterization and debugging (cf. noSQL tech used to handle logs)

  • Auth/auth in WMS/DDM

– solutions are pretty well known but the requirements aren't stated (i.e. compatibility with security frameworks such as X.509/VOMS).

15

slide-16
SLIDE 16

M Potekhin|DUNE WMS/DDM workshop@FNAL, July 2016

Comments

  • Certain features of the WMS often define how efficiently it can be

deployed and used:

– portability, i.e. whether the system can be deployed on a variety of platforms and without many prerequisites – requirements for participating Grid sites (the less, the better) – monitoring for the users and operational support

  • Ideally, there is a comprehensive monitoring system, from cloud level to

site level down to the task and job level (complete with log files but also making available the information on “live” status of jobs and data transmission).

16