ProtoDUNE Data Flow Protocol For discussion at the DAQ meeting Kurt - - PowerPoint PPT Presentation

▶

Feb 13, 2023 347 likes •436 views

ProtoDUNE Data Flow Protocol For discussion at the DAQ meeting Kurt Biery, Giovanna Lehmann Miotto 21-Nov-2016 Route fragments from each trigger Multi-core nodes to the same EventBuilder BoardReader

SLIDE 1

For discussion at the DAQ meeting Kurt Biery, Giovanna Lehmann Miotto 21-Nov-2016

ProtoDUNE Data Flow Protocol

SLIDE 2

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 2 BoardReader Process

FragmentGenerator

BoardReader Process

FragmentGenerator

BoardReader Process

FragmentGenerator

… …

Multi-‑core ¡nodes Multi-‑core ¡node

EventBuilder Process art EventStore EventBuilder Process art EventStore

Multi-‑core ¡node

Timing ¡System Dataflow ¡Manager ¡(DFO)

1 2 3 4

Route ¡fragments ¡from ¡each ¡trigger to ¡the ¡same ¡EventBuilder Support ¡dynamic ¡control ¡of ¡dataflow into ¡EventBuilders

Trigger ¡Messages Event ¡Readout ¡Requests Fragment ¡Readout ¡Requests Data ¡Fragment ¡Flow

Trigger/Event ¡counter ¡forwarded ¡through ¡DFO ¡to ¡EB to ¡BR ¡to ¡be ¡put ¡into ¡fragment ¡and ¡event ¡headers Data ¡requested ¡from ¡BRs ¡by ¡TIMESTAMP ¡from ¡EB

SLIDE 3

Building on the dataflow slides shown by Karol last week… Some proposals:

1. One BoardReader per RCE, one per SSP, etc.
2. Which entity should handle the possibility that triggers

“arrive” before the data?

1. Propose that the BoardReader/FragmentGenerator handle this (with a timeout) 2. Support for this already exists in the FragmentGenerator base class 3. Other options are possible (DFO, EB), but adding artificial delays or retries there adds complication

Additional Dataflow Considerations

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 3

SLIDE 4

3. When can data be cleared from various buffers in the

system?

1. Since TCP/IP will be used, propose to delete data as soon as they have been sent on (BoardReader -> EB, EB -> Aggregator, Aggregator -> disk)

4. When will EB notify DFO that event is complete or finished?

1. When full event is queued for output? 2. When event has been sent to Aggregator?

Additional Dataflow Considerations, continued

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 4

SLIDE 5

1. A BoardReader never finds a match between a request and the data 1. Detection: do we base the detection on a timeout or on the availability

f fragments associated to (much) higher timestamps?

2. Reaction: should the BoardReader create an empty fragment to send to the EB? (propose Yes) If this is done the EB can assume that it will ALWAYS build complete events (except if a BoardReader crashes) 2. A BoardReader crashes 1. Question: should this be considered a Fatal Error? 2. Detection: the process management application(s) detect that the BR process is gone; EBs detect that the connection to the BR has been lost 3. Reaction: end the current run or build incomplete events or create empty fragments for the missing pieces?

Data Flow Error Conditions

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 5

SLIDE 6

3. A BoardReader restarts 1. Do we want to make a BoardReader crash a recoverable error or not? 2. Reconfiguring and re-syncing the BoardReader and its associated hardware with the rest of the system seems like it will be non-trivial 3. There would also be reconnection with EBs to be done 4. This may be a great longer-term goal, but maybe we consider this a low priority for protoDUNE. We expect that this would involve interaction with RunControl. 4. DFO crash: FATAL, start new run

Data Flow Error Conditions continued

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 6

SLIDE 7

5. An EB node crashes: 1. Question: should this be considered a Fatal Error? Does the answer to this question depend on the configuration of the system (whether the EBs are writing data to disk, if all of the EBs are needed to handle the full rate, etc.)? 2. Detection: the process management application(s) detect the failure, the TCP connection to the DFO will be closed, and events assigned to the bad node will not be eliminated in the DFO 3. Reaction: the DFO continues assigning events to other EB nodes. Events assigned to the crashed EB will be lost. 6. An EB node restarts: 1. Do we want to foresee a recovery scenario? The way in which this can be done very much depends on how EBs announce themselves to the DFO nodes and how the connections to the BRs are handled. And, EB processes will need to be properly configured.

Data Flow Error Conditions continued

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 7

SLIDE 8

7. An event is never requested: 1. How will we clear it from the BoardReaders eventually? Timeout, circular buffer? 2. This depends somewhat on the implementation of the FragGen, e.g. the basic operation of the FELIX FragGen will include dropping of unwanted data. 8. Aggregator cannot write more data 1. Detection: the EventBuilders assigned to this Aggregator will not be able to send more data and will no longer be assigned new events by the DFO. 2. Recovery: continue writing to other Aggregators, or Fatal Error?

Data Flow Error Conditions continued

21/11/16 KAB, GLM | protoDUNE Dataflow Protocol 8