DAQ development and
- perations at PDSP
DAQ development and operations at PDSP DUNE Collaboration Week - - PowerPoint PPT Presentation
DAQ development and operations at PDSP DUNE Collaboration Week 2019-05-20 Roland Sipos CERN EP-DT Overview This talk is about the development and operations of ProtoDUNE-SP DAQ Three main elements: Detector operations Ensure
This talk is about the development and operations of ProtoDUNE-SP DAQ Three main elements:
○ Ensure system stability for data taking ○ Support for DAQ users
○ Understanding limitations and issues, eliminate problems ○ Tests of new features during the dedicated development days
○ DUNE DAQ components development ○ Integration of the stable new features
2
○ Development Fridays moved to Monday ○ In order to avoid starting the week with hidden issues in the system
○ System wide stability issues in January ○ Efforts for better issue tracking
○ Extremely useful for understanding limitations of the system
3
Currently the DAQ support approach is informal, operates on best effort: This is not sustainable We are aware of it, and continuously working on improvements Planning of the re-introduction of on-call DAQ shift for ProtoDUNE:
4
Slack is a great tool for communication, not so much for keeping track of progress We introduced an issue tracker for ongoing developments and pending problems
○ Still not too many users, but slowly growing
We need to encourage developers to follow up on issues, and also to track their progress
The lost manpower of critical components needs to be compensated
5
○ Finalized: the DAQ is fully prepared for APA7 ○ Partition 6 with RCE readout ○ This might change, as we gradually move to full FELIX readout
○ HV current limit threshold ○ Ground plane signals ○ Purity monitor signals
Under development:
○ For automated purity monitor runs
6
DAQ servers
○ To avoid early Meltdown/Spectre mitigation (retpoline) ○ Aligned configuration
○ Some servers have reboot/poweroff issues
○ SSD firmware upgrade of FELIX servers ○ Intel QAT driver automation (with user support)
Services
○ WIB and SSP operational monitoring scripts
○ Still have some issues
7
Needed for several noise study runs, which takes substantial time (max.: 3 hours)
○ Design goal: 25Hz
Still there are some hidden issues under the hood!
○ And the introduction of the DFO will eliminate the use of UDP messaging for the routing table
8
○ ArtDAQ ■ Several parts of the framework (details on next slide) ○ FELIX ■ Align software versions to newest ATLAS FELIX suite ■ Better operational monitoring and automated error recovery ○ Run Control ■ Alarm system improvements ○ System administration ■ Automation of missing elements (e.g.: FELIX)
○ Feature requests ○ Continuously discussed and followed up
9
Substantial improvements in the DAQ software framework (Many thanks to Kurt and the ArtDAQ developers)
○ Crashed EBs can be restarted in the same run!
○ Event integrity check
○ Ongoing work to group FELIX data from each APA in its own art/ROOT data product
○ And on work area packaging/handling
10
Offline experts reported incomplete events in data, therefore a new plugin (FragmentWatcher) for the EventBuilder was introduced
○ Missing fragments ○ Empty fragments
○ Mostly from SSP BoardReaders
11
In order to improve user experience and warn them if the system is in error state
○ Introduction of a RunControl bot on Slack
12
RCE APAs are gradually moving to FELIX
○ APA4 moves to FELIX in June (Half side of the detector read-out by FELIX)
13
○
HitFinding
○
Self-triggering chain
○
Single host FELIX setup
○
Co-processor
○
Control and Configuration Management (CCM studies)
○
Fault-recovery
○
… and many more!
14
15
From Philip Rodrigues
1 WIB frame (464 B) => 256 ADCs + headers @ 2 MHz
○ This implies the need of extending the FELIX Overlay with another version
Preliminary tests show gain in CPU utilization
16
From Giovanna’s plenary DAQ talk
○ Expand 12 bit ADCs into 16 bits ○ Reorder wires in order to select only collection plane
Full chain tests are already ongoing: FELIX BR -> HitFinder BR -> SoftwareTrigger BR This work is ongoing. Close to reality: full chain will be tested during next DAQ testing periods at NP04
17
From Giovanna’s plenary DAQ talk
Goal: Elimination of the 100Gb/s peer-to-peer connection between the FELIX host server, and the BoardReader application. Merging the FELIX data processing software with the BoardReader’s data selection. Gain: Less space requirement (1 less server) with less cost (1 x server and 2 x NICs) Also R&D towards DUNE approach First working version. (Not production ready, needs manual adjustments)
18
Main goals:
○ Introduction of the DataFlow Orchestrator in the EB ○ Eliminate the Routing Master ○ Still support only full event building ○ Introduce software hit finding, trigger candidate and module trigger applications ○ Introduce FELIX firmware with data reordered by planes ○ Provide FELIX readout for APA4
19
Thank you for your attention!
20