What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of - - PowerPoint PPT Presentation

what to learn from microboone daq
SMART_READER_LITE
LIVE PREVIEW

What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of - - PowerPoint PPT Presentation

What to Learn From MicroBooNE DAQ? Wesley Ketchum with input of lots of MicroBooNE people 30 October 2017 2 First things first MicroBooNE Detector Paper: JINST 12, P02017 (2017) https://arxiv.org/abs/1612.05824 (basically) everything


slide-1
SLIDE 1

What to Learn From MicroBooNE DAQ?

Wesley Ketchum with input of lots of MicroBooNE people

30 October 2017

slide-2
SLIDE 2

First things first

¡ MicroBooNE Detector Paper: JINST 12, P02017 (2017)

¡ https://arxiv.org/abs/1612.05824 ¡ (basically) everything in this talk that is not my opinion comes from there

¡ MicroBooNE continues running well

¡ Starting third year of data-taking ¡ >95% of POT delivered is recorded to tape ¡ That‘s integrated, so 5% loss not (all) due to DAQ (typical uptime >97%)

30 October 2017

2

Oct ‘15 Oct ‘17 Oct ‘16

slide-3
SLIDE 3

Design: Electronics

30 October 2017

3

36 PMT channels

slide-4
SLIDE 4

TPC Readout Electronics

30 October 2017

4

slide-5
SLIDE 5

PMT/Trigger Readout Electronics

¡ PMT readout ¡ Beam disc: unbiased readout for 23.4 us around trigger ¡ Cosmic disc: threshold requirement, readout for 625 ns ¡ Clock ¡ Common “frame number” (1.6 ms counter) from start of run ¡ Pules per second from GPS pulse latches time, allows for lookup map to real time ¡ Used for matching auxiliary data (like beam and cosmic ray tagger)

30 October 2017

5

slide-6
SLIDE 6

Two data streams

¡ “Triggered” (NU):

¡ TPC lossless Huffman compressed ¡ PMT has no compression applied, readout 4 frames (no trimming) ¡ Data ~150 MB before compression, ~35 MB after compression

¡ “Continuous” (SN):

¡ TPC lossy zero-suppression, read-out frame-by-frame (15 MB/s per crate) ¡ PMT just reads out (~7 MB/s)

¡ Preference to triggered stream data ¡ Additional data

¡ Cosmic tagger panels added around MicroBooNE detector, being (or will be) combined in

  • ffline process

¡ Readout continuously, matched to TPC data on timestamp

30 October 2017

6

Operating point

slide-7
SLIDE 7

Thoughts on readout design

¡ First/foremost: it works for the needs of the experiment

¡ And works pretty darn well

¡ Largest struggles in dealing with real data

¡ PMT rate higher than expected à modifications of thresholds/buffers ¡ Likely leading cause for DAQ crash rates are FIFO overflow on PMT readout ¡ TPC noise higher/generally not as expected ¡ Huffman compression factor x5 instead of hoped-for x10 ¡ More complications on continuous readout mode rates ¡ Continuous stream competition for resources ¡ Despite dedicated readout stream, still some shared resources (data transmission on crate, go to same server)

¡ Lacked parasitic data-taking modes for testing DAQ components

¡ Hardware-based PMT trigger and continuous stream not online at start of beam à difficulty in commissioning without losing data ¡ Also, you really really need to use real data for commissioning

30 October 2017

7

slide-8
SLIDE 8

Software data flow

¡ MicroBooNE doesn’t use artdaq, but shares the

  • verall design

¡ I’ll translate to artdaq names

¡ BoardReaders

¡ Receive data from hardware ¡ Move to large circular buffer ¡ Process, identify data belonging to single event, move to outbound queue ¡ Send to EventBuilder

¡ EventBuilder+Aggregator (one multi-threaded process for us)

¡ Collect fragments ¡ When event complete, transfer fragments to raw event queue ¡ Process raw events, apply software trigger, write to disk ¡ 50 events per file, no filtering into separate files

30 October 2017

8

slide-9
SLIDE 9

Software trigger

¡ High-level trigger software trigger to reduce rate ¡ Low-level trigger from neutrino beam gates ¡ High-level trigger looks for coincident PMT signals above threshold ¡ Accepts prescaled unbiased data ¡ <~10 ms per event total alg time ¡ ~factor 20 reduction in data rate ¡ Trigger applied after event-building ¡ Limits low-level trigger rate to network bandwidth (20 Hz)/readout crate stability ¡ Better to have PMT info at low-level trigger…

30 October 2017

9

PMT Readout Event Builder TPC Readout Software Trigger Data Logger Pass Fail

slide-10
SLIDE 10

Thoughts on software design

¡ General strategy: everything needs to work, or we get nothing

¡ We rely on … ¡ Well-formatted data (well, with hard-coded exceptions) ¡ In-sync fragments, all fragments report ¡ Pros: simpler (no partial events to handle/monitor, everything in shared state); when it works you trust it ¡ Cons: one piece goes down, you have nothing; special modes really a bit special ¡ This has generally worked well for MicroBooNE ¡ Things much more often than not work! But it’s a simple system

¡ Data format: binary data

¡ Needs conversion to offline format, which didn’t really happen until later in commissioning à hectic moments in early commissioning to understand data

30 October 2017

10

slide-11
SLIDE 11

Additional software

¡ Run control

¡ Simple console-based python/shell scripts in VNC ¡ Highly automated ¡ Automatic re-lanching of runs, no selection of components, etc.: pick configuration, run length, and go ¡ Music to wake shifter in case of major errors

¡ Monitoring

¡ Custom metrics reported to real-time database with ganglia ¡ Some reported to SlowMonitoring / central alarm area ¡ The ones that aren’t are “expert” level ¡ Online data processing to monitor basic PMT and TPC waveforms/activity rates ¡ Runs off of spying data in shared memory, processes binary data ¡ Logging ¡ We just write log files out for history

¡ Configuration database

¡ PSQLßàFCL tool: upload new configs by making new fcl files

30 October 2017

11

slide-12
SLIDE 12

Thoughts on those additional elements

¡ MicroBooNE gets away with a highly-automated console-based DAQ because not too many components, and overall simple system ¡ Configurations must be carefully maintained…can create high load on experts ¡ Online monitoring off of raw binary data separate from offline data format à rather only slow changes in the quantities to monitor ¡ In periods of duress, we demand the swift conversion of files and dedicated people to continuously analyze the data ¡ Don’t collect enough run information into databases ¡ E.g. local log text files written with run uptime, to be used for POT integration information ¡ àIf it’s worth having, plan to store in a database

30 October 2017

12

slide-13
SLIDE 13

Data Management

¡ DAQ responsibility ends once file hits local disk ¡ Online DM takes over for getting file from disk to tape-backed storage

¡ Automated local processes for … ¡ Search for new files ¡ Generate metadata/auxiliary files ¡ Copy to outbound dropbox ¡ Monitor when data is whisked away ¡ Cleanup local files

¡ Nearline/Offline DM takes it from there

¡ Automated processes on grid for ... ¡ Keep-up “swizzling” (reformatting) and reconstruction ¡ Occupies ~100 nodes for ”normal” (1 Hz) data rates

30 October 2017

13

slide-14
SLIDE 14

General thoughts

¡ Requires very close coordination of DAQ and DM groups

¡ DM group needs local DAQ cluster resources (CPU, disk read/write, network bandwidth) that can compete with DAQ functions

¡ MicroBooNE woefully underestimated its data rate, volume, and resource needs

¡ From TDR: ¡ Expected final compression was x10 (we achieved only x5) ¡ Expected recorded data rate from BNB was 0.05 Hz (actual: ~0.15 Hz) ¡ No careful accounting of any additional trigger sources (reality à ~0.7 Hz total rate) ¡ And physics groups demand more data still... ¡ Need carefully validated and realistic data volume and resource estimates

¡ Additional considerations to ease offline DM?

¡ MicroBooNE DAQ writes everything to one file ¡ e.g. Filtering on trigger streams likely would help offline re-swizzling/reconstruction ¡ “Swizzling” takes significant resources ¡ Reduce reformatting? Improve/make less necessary decompression routines? ¡ To borrow from Josh Klein: try to be less paranoid and greedy

30 October 2017

14

slide-15
SLIDE 15

Conclusions/discussion

¡ MicroBooNE DAQ is running, running well, and fits our physics needs ¡ Very useful experience running a real physics experiment

¡ With real results! And MORE COMING!

¡ Elements to learn, as discussed, from design point of view

¡ For multiple data streams, need careful evaluation of shared resources ¡ Need flexibility in data handling, compression, and triggering ¡ Early and close integration with data management ¡ Design for realistic data rates/volume at a global/integrated level (DAQ+DM) ¡ And then resist pressure for changes without complete reevaluation of entire chain ¡ Also, MicroBooNE has loads of operational experience/advice, but won’t dwell

  • n that here…

¡ DISCUSSION TIME

30 October 2017

15