what to learn from microboone daq
play

What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of - PowerPoint PPT Presentation

What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of lots of MicroBooNE people 30 October 2017 2 First things first MicroBooNE Detector Paper: JINST 12, P02017 (2017) https://arxiv.org/abs/1612.05824 (basically) everything


  1. What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of lots of MicroBooNE people 30 October 2017

  2. 2 First things first ¡ MicroBooNE Detector Paper: JINST 12, P02017 (2017) ¡ https://arxiv.org/abs/1612.05824 ¡ (basically) everything in this talk that is not my opinion comes from there ¡ MicroBooNE continues running well ¡ Starting third year of data-taking ¡ >95% of POT delivered is recorded to tape ¡ That‘s integrated, so 5% loss not (all) due to DAQ (typical uptime >97%) Oct ‘15 Oct ‘16 Oct ‘17 30 October 2017

  3. 3 Design: Electronics 36 PMT channels 30 October 2017

  4. 4 TPC Readout Electronics 30 October 2017

  5. 5 PMT/Trigger Readout Electronics ¡ PMT readout ¡ Beam disc: unbiased readout for 23.4 us around trigger ¡ Cosmic disc: threshold requirement, readout for 625 ns ¡ Clock ¡ Common “frame number” (1.6 ms counter) from start of run ¡ Pules per second from GPS pulse latches time, allows for lookup map to real time ¡ Used for matching auxiliary data (like beam and cosmic ray tagger) 30 October 2017

  6. 6 Two data streams ¡ “Triggered” (NU): ¡ TPC lossless Huffman compressed ¡ PMT has no compression applied, readout 4 frames (no trimming) Operating point ¡ Data ~150 MB before compression, ~35 MB after compression ¡ “Continuous” (SN): ¡ TPC lossy zero-suppression, read-out frame-by-frame (15 MB/s per crate) ¡ PMT just reads out (~7 MB/s) ¡ Preference to triggered stream data ¡ Additional data ¡ Cosmic tagger panels added around MicroBooNE detector, being (or will be) combined in offline process ¡ Readout continuously, matched to TPC data on timestamp 30 October 2017

  7. 7 Thoughts on readout design ¡ First/foremost: it works for the needs of the experiment ¡ And works pretty darn well ¡ Largest struggles in dealing with real data ¡ PMT rate higher than expected à modifications of thresholds/buffers ¡ Likely leading cause for DAQ crash rates are FIFO overflow on PMT readout ¡ TPC noise higher/generally not as expected ¡ Huffman compression factor x5 instead of hoped-for x10 ¡ More complications on continuous readout mode rates ¡ Continuous stream competition for resources ¡ Despite dedicated readout stream, still some shared resources (data transmission on crate, go to same server) ¡ Lacked parasitic data-taking modes for testing DAQ components ¡ Hardware-based PMT trigger and continuous stream not online at start of beam à difficulty in commissioning without losing data ¡ Also, you really really need to use real data for commissioning 30 October 2017

  8. 8 Software data flow ¡ MicroBooNE doesn’t use artdaq, but shares the overall design ¡ I’ll translate to artdaq names ¡ BoardReaders ¡ Receive data from hardware ¡ Move to large circular buffer ¡ Process, identify data belonging to single event, move to outbound queue ¡ Send to EventBuilder ¡ EventBuilder+Aggregator (one multi-threaded process for us) ¡ Collect fragments ¡ When event complete, transfer fragments to raw event queue ¡ Process raw events, apply software trigger, write to disk ¡ 50 events per file, no filtering into separate files 30 October 2017

  9. 9 Software trigger ¡ High-level trigger software trigger to PMT Readout reduce rate TPC Readout ¡ Low-level trigger from neutrino beam gates ¡ High-level trigger looks for coincident PMT signals above threshold ¡ Accepts prescaled unbiased data ¡ <~10 ms per event total alg time Event Builder ¡ ~factor 20 reduction in data rate Fail Pass ¡ Trigger applied after event-building Software ¡ Limits low-level trigger rate to Trigger network bandwidth (20 Hz)/readout crate stability ¡ Better to have PMT info at low-level Data Logger trigger… 30 October 2017

  10. 10 Thoughts on software design ¡ General strategy: everything needs to work, or we get nothing ¡ We rely on … ¡ Well-formatted data (well, with hard-coded exceptions) ¡ In-sync fragments, all fragments report ¡ Pros: simpler (no partial events to handle/monitor, everything in shared state); when it works you trust it ¡ Cons: one piece goes down, you have nothing; special modes really a bit special ¡ This has generally worked well for MicroBooNE ¡ Things much more often than not work! But it’s a simple system ¡ Data format: binary data ¡ Needs conversion to offline format, which didn’t really happen until later in commissioning à hectic moments in early commissioning to understand data 30 October 2017

  11. 11 Additional software ¡ Run control ¡ Simple console-based python/shell scripts in VNC ¡ Highly automated ¡ Automatic re-lanching of runs, no selection of components, etc.: pick configuration, run length, and go ¡ Music to wake shifter in case of major errors ¡ Monitoring ¡ Custom metrics reported to real-time database with ganglia ¡ Some reported to SlowMonitoring / central alarm area ¡ The ones that aren’t are “expert” level ¡ Online data processing to monitor basic PMT and TPC waveforms/activity rates ¡ Runs off of spying data in shared memory, processes binary data ¡ Logging ¡ We just write log files out for history ¡ Configuration database ¡ PSQL ßà FCL tool: upload new configs by making new fcl files 30 October 2017

  12. Thoughts on those additional 12 elements ¡ MicroBooNE gets away with a highly-automated console-based DAQ because not too many components, and overall simple system ¡ Configurations must be carefully maintained…can create high load on experts ¡ Online monitoring off of raw binary data separate from offline data format à rather only slow changes in the quantities to monitor ¡ In periods of duress, we demand the swift conversion of files and dedicated people to continuously analyze the data ¡ Don’t collect enough run information into databases ¡ E.g. local log text files written with run uptime, to be used for POT integration information ¡ à If it’s worth having, plan to store in a database 30 October 2017

  13. 13 Data Management ¡ DAQ responsibility ends once file hits local disk ¡ Online DM takes over for getting file from disk to tape-backed storage ¡ Automated local processes for … ¡ Search for new files ¡ Generate metadata/auxiliary files ¡ Copy to outbound dropbox ¡ Monitor when data is whisked away ¡ Cleanup local files ¡ Nearline/Offline DM takes it from there ¡ Automated processes on grid for ... ¡ Keep-up “swizzling” (reformatting) and reconstruction ¡ Occupies ~100 nodes for ”normal” (1 Hz) data rates 30 October 2017

  14. 14 General thoughts ¡ Requires very close coordination of DAQ and DM groups ¡ DM group needs local DAQ cluster resources (CPU, disk read/write, network bandwidth) that can compete with DAQ functions ¡ MicroBooNE woefully underestimated its data rate, volume, and resource needs ¡ From TDR: ¡ Expected final compression was x10 (we achieved only x5) ¡ Expected recorded data rate from BNB was 0.05 Hz (actual: ~0.15 Hz) ¡ No careful accounting of any additional trigger sources (reality à ~0.7 Hz total rate) ¡ And physics groups demand more data still... ¡ Need carefully validated and realistic data volume and resource estimates ¡ Additional considerations to ease offline DM? ¡ MicroBooNE DAQ writes everything to one file ¡ e.g. Filtering on trigger streams likely would help offline re-swizzling/reconstruction ¡ “Swizzling” takes significant resources ¡ Reduce reformatting? Improve/make less necessary decompression routines? ¡ To borrow from Josh Klein: try to be less paranoid and greedy 30 October 2017

  15. 15 Conclusions/discussion ¡ MicroBooNE DAQ is running, running well, and fits our physics needs ¡ Very useful experience running a real physics experiment ¡ With real results! And MORE COMING! ¡ Elements to learn, as discussed, from design point of view ¡ For multiple data streams, need careful evaluation of shared resources ¡ Need flexibility in data handling, compression, and triggering ¡ Early and close integration with data management ¡ Design for realistic data rates/volume at a global/integrated level (DAQ+DM) ¡ And then resist pressure for changes without complete reevaluation of entire chain ¡ Also, MicroBooNE has loads of operational experience/advice, but won’t dwell on that here… ¡ DISCUSSION TIME 30 October 2017

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend