DAQ Consortium Strategy - January 2018 Dave Newbold, Georgia - - PDF document

▶

Jun 23, 2023 320 likes •421 views

DAQ Consortium Strategy - January 2018 Dave Newbold, Georgia Karagiorgi, 23rd January 2018 DRAFT 1 Consortium Scope The consortium scope has been clarified in discussions over the last few months, and documented via a series of draft system

SLIDE 1

DAQ Consortium Strategy - January 2018

Dave Newbold, Georgia Karagiorgi, 23rd January 2018 DRAFT

1 Consortium Scope

The consortium scope has been clarified in discussions over the last few months, and documented via a series of draft system requirements and in- terface agreements.

On the detector side, DAQ responsibility begins with the optical fibres

emerging from the detectors. DAQ is not responsible for any on-cryostat

components. Note that, since some components of the signal processing

chain for the SP TPC will be housed within the CUC, those components fall within DAQ scope.

DAQ is responsible for all hardware operating at SURF, up to the WAN

link to FNAL. Operation of the WAN link and all elements thereafter are the responsibility of the offline computing group. Elements of offline soft- ware running on the DAQ system are the responsibility of the computing group.

Interfaces to other subsystems of DUNE FD exist (e.g. slow controls and

calibrations) where DAQ will provide network and computing services, as well as synchronisation and databases as needed.

2 Consortium Status

Several key technical and architectural decisions have been made in the last months, that will form an agreed basis for future discussions on design and implementation.

The DAQ will operate as a single system for all detectors, allowing

use of common components, full synchronisation of front-end electronics, and triggering based on information from all DUNE FD elements.

The DAQ will operate as a deadtimeless system, capable of storing continu-
us sequences of data for periods of up to tens of seconds, or of independent
verlapping samples from multiple space/time regions of the detectors.

1

SLIDE 2

The DAQ will be capable of storing the full information from the

detectors (i.e. without any form of lossy data manipulation), for short periods of time. This however may not always be the mode of operation all the time.

The DAQ will consist of a set of front-end buffers and a processing pipeline

for each sub-detector, an event builder system for each subdetector, a single WAN transfer buffer, and a data selection system. The data selection sys- tem will comprise components that are specific to individual sub-detectors, and central components (spanning all sub-detectors).

The DAQ will provide human interfaces for operation (configuration, run

control, data taking), monitoring, data quality management, and data book-keeping.

The DAQ architecture and implementation will be designed with the fol-

lowing characteristics, in priority order: robustness, scalability, ease

f deployment and commissioning, ease of design and construc-

tion, cost-effectiveness, and running costs and space and power requirements.

The DAQ will be sized to allow for significant conservatism in the
peration of data selection for the first sub-detector(s), i.e. able to operate

with a far greater data rate than would be expected, than asymptotically when the detector performance is fully understood. This implies sizing key elements of infrastructure for four detectors, from the start.

The final DAQ design will be demonstrated in a series of pre-production

and integration tests of increasing scale, including tests with detector pro- totypes, before a decision to proceed with construction is made. The consortium has agreed on a draft schedule and WBS, and is defining a list of areas of institutional interest. At the time of writing, there are few firm institutional commitments, or firm sources of funding. Based on the scope defined above, it will be necessary to increase both the person power and financial resources available to the consortium, on a time scale allowing a full resourced plan to be presented in the TDR. 2

SLIDE 3

3 Time line and decision-making process

3.1 Milestones

The DAQ has defined a set of internal milestones, spanning the time between late 2017 and the TDR delivery in July 2019.

M1 (Dec 2017) Interface documents complete (DONE)
M2 (Jan 2018) Functional specifications complete
M3 (Mar 2018) Cost and infrastructure requirements complete
M4 (Apr 2018) Technical Proposal
M5 (Aug 2018) Internal (preliminary) review of baseline TDR design
M6 (Dec 2018) First prototype components available
M7 (Dec 2018) TDR structure and institute responsibilities defined
M8 (Mar 2019) Slice test with CE completed
M9 (Mar 2019) Review of baseline TDR design
M10 (Jul 2019) Technical Design Report

At the time of this writing, the DAQ design is not yet finalised, with several key technical decisions yet to be made, and demonstration of the feasibility of the chosen solutions is yet to be achieved. It is understood that the reviews (M5 and M8) will also consider alternatives to the baseline design as outlined in the TP. More specifically, the consortium will continue to explore and allow the baseline design to evolve, as the DAQ design evolves and new findings are available through R&D.

3.2 Technical Proposal (M4)

The Technical Proposal will present:

A well-motivated set of design parameters for the DAQ, based on physics

requirements, and anticipated detector performance

A design philosophy for the DAQ, encompassing the agreed points specified

above 3

SLIDE 4

An overall architectural view of the DAQ for the first two detectors
A baseline design consisting of a technically-plausible and conservative

solution for each DAQ component, based on protoDUNE experience wher- ever possible

A documented set of alternative solutions for components, where appli-

cable, along with a statement of the R&D strategy, and the criteria for deciding between solutions

An outline cost and schedule for the baseline solution.

3.3 Pre-TDR Review (M9)

At a formal external review in March 2019, we will receive:

An iteration on the system requirements in light of further physics and

data selection studies

Concrete costed proposals for implementation of each system component
Concrete expressions of interest from groups of institutes capable of deliv-

ering the proposed solutions This strategy allows for up to a year after the TP for groups of institutes to carry out necessary R&D into particular solutions; to demonstrate those solutions based on practical tests or simulation; and to seek support from funding agencies in the implementation of components. It will also permit time for institutes from whom expressions of interest are just now being received to join the consortium and make a substantive contribution before decisions are made for the TDR. The review will result in a set of decisions for component implementations, to form the basis of the TDR design. In some cases, it may be optimal to let multiple solutions continue to a post-TDR phase of R&D; however, this should only be the case where emerging technologies are of interest, rather than a means of postponing decisions. Decisions will be made by an appointed review panel, that will comprise consortium and experiment leadership, experts from within the consortium, collaboration members with applicable expertise, and invited external review-

ers. The criteria for decision making will be: the maturity and demonstrated

effectiveness of the proposals; the matching to the criteria for the overall DAQ design expressed above; and the presence of a team able to implement the concept on the required time scale. 4

SLIDE 5

3.4 Technical Design Report (M10)

For the Technical Design Report, we will present:

A final set of design parameters for the DAQ, along with a statement of

their uncertainties and the margins allowed

A proposed detailed design for the DAQ for the first two detectors
A resourced work plan for delivery of the DAQ by the required data (cur-

rently understood to be around 2023), and including prototyping, con- struction, installation and commissioning phases

A full budget and risk mitigation strategy

4 Key decision points

For the currently envisioned baseline design, although several components of the DAQ are well-established, and in some cases are being demonstrated in the ProtoDUNEs, in other areas there are multiple options for components

implementation. There are also remaining uncertainties on the requirements

from physics in some areas. Listed below are key areas where binary decisions must eventually be made, or where there is significant interaction with sys- tems outside the DAQ and where time-critical information is needed. This is in contrast to ‘tactical’ and potentially reversible decisions (e.g. data selec- tion strategy, decisions on component implementation) which can be made

ver a more extended time scale, and in a more discursive way.

4.1 Noise assumptions for SP TPC

The complexity of data processing of SP TPC data, and hence the scale and cost of the DAQ system, depend strongly on the level and nature of noise in the SP TPC. Until firm results are obtained from tests of the final electronics in a realistic environment, it will be necessary to make an agreed- upon assumption here, including a worst-case livable scenario. The DAQ should be capable of scaling, at additional cost, to a worst-case scenario for noise. Time scale: we require an initial agreed assumption for the TP, which will be reviewed before the TDR. 5

SLIDE 6

4.2 Data rate to offline computing

The data rate to offline computing is a key parameter, since it affects the capital and operational cost of DAQ and computing systems (in the opposite direction!). It also allows the sizing of the WAN link, which may or may not be a cost driver in the medium term. The data selection strategy depends critically on this parameter. Time scale: a loosely-justified baseline assumption of 30PB/year has been made, but needs to be confirmed on both sides before the TP.

4.3 DAQ parameters for supernova physics

The key parameters of supernova fake rate and data selections strategy (what to keep on a trigger, and for how long) have a strong effect upon DAQ architecture and scale. These parameters are loosely bounded by physics arguments, and the need for efficiency has to be balanced against system cost and data rate to computing. An acceptable set of working assumptions needs to be identified. Time scale: needed for the TP, with agreement and justification from physics groups.

4.4 DAQ parameters for calibrations

The DUNE program requires sufficient calibration data in both type and

statistics. The calibration triggers and any related data selections strategy

also have a strong effect upon DAQ architecture and scale. These parame- ters are also loosely bounded by physics arguments, but must be balanced against system cost and data rate to computing. An acceptable set of working assumptions needs to be identified. Time scale: needed for the TP, with agreement and justification from calibrations and physics groups.

4.5 Physical location and arrangement of DAQ sys- tems

The CDR design for the DAQ proposed that the system be entirely contained in the underground CUC, with around 12 racks and 100kW allocated per de-

tector. There are strong concerns about DAQ operations in such conditions,

and uncertainty as to whether a sufficiently scalable system design can be installed in this way. The cost implications of placing part of the DAQ on the surface are yet to be fully established. 6

SLIDE 7

Time scale: in the TP, we will present a conservative option in this re- spect, i.e. part of the DAQ on the surface, and attempt to identify an outline

cost. The final plan should be reviewed as a matter of urgency before mid-

2018 (during the internal review or a wider-experiment review).

4.6 Data processing implementation

The optimal split of data-processing functionality between hardware (i.e. FPGAs) and software (either on CPUs or GPUs) is yet to be determined, and should be the subject of intensive study during 2018. The cost-power- performance tradeoffs could be significantly different depending on the deci- sion, but there will also be implications for the level and nature of expertise required to build the system. This decision cannot be isolated from the skills

f the teams prepared to build and fund the system. The decision is partic-

ularly acute in the trigger primitives / data selection area, where multiple plausible options clearly exist. Time scale: conceptual design proposals should be made to the internal review in 2018, and concrete, costed, and demonstrated proposals should be made to the external review in 2019, requiring significant effort and time during 2018 in preparation. Decisions on the implementation need to be made in good time for the TDR.

4.7 RAM buffer and NVM implementation

A key component of the system is a large RAM buffer, used for both rate matching between detector and permanent storage after a substantial data burst (e.g. SN trigger), and for event building. This buffer must be backed by a high-performance non-volatile storage system for data security. It is possi- ble to implement this part of the system in custom hardware, in commodity computing, or a combination of both. Exploration of the design tradeoffs (which also have implication for networking) must be made. Time scale: concrete, costed, and demonstrated proposals should be made to the external review, requiring significant effort and time during 2018 in

preparation. Decisions on the implementation need to be made in good time

for the TDR. However, it is also necessary to identify quickly if any potential solution requires a substantial uplift in network bandwidth in critical parts

f the system (e.g. fibres up the elevator shaft), and so an early review of

possibilities should be made after the TP, during the internal review. 7

SLIDE 8

4.8 Slice test strategy and location(s)

A strategy should be defined for the first slice tests of DAQ (some elements of which may need to be completed before the TDR), including where these tests will be carried out. Experience indicates that fully distributed development

f a trigger / DAQ system is highly inefficient (and may not have been

attempted at this scale before), and regular instances of tests and practical engagement by groups are required. A central location for this should be identified, with all necessary space, equipment and facilities, possibly moving closer to SURF as the project progresses. Time scale: a decision is made during mid-2018 (internal review), and well before the planning for the post-protoDUNE era is made at CERN. 8