[PPT] - Thoughts on alternate DUNE DAQ design Georgia Karagiorgi DUNE DAQ PowerPoint Presentation

SLIDE 1

Thoughts on alternate DUNE DAQ design

Georgia Karagiorgi DUNE DAQ Meeting

Oct. 16, 2017

SLIDE 2

Introduction

What is presented in this talk is a conceptual DAQ design & architecture for the DUNE FD;

I will advocate it should be explored further, and evaluated against DUNE DAQ requirements, as something which would be advantageous to move toward to.

Concept: Online (real-time) image processing and data selection.

– Event ID “on the fly”, minimizing offline processing and reconstruction needs – Could potentially require much more minimal processing needs in front-end DAQ

Design being explored leverages advancements in Deep Neural Networks and their

applications, and is informed and motivated by – Columbia Nevis Labs experience with MicroBooNE readout/DAQ system – Deep Learning development and results by MicroBooNE & DUNE collaborations – Recent work done in collaboration with L. Carloni’s Group (Columbia Comp. Sci.)

For this talk: explore design from the perspective of the single-phase detector;

expectation is that the same design is equivalently applicable to dual-phase as well

2

SLIDE 3

DUNE DAQ System Parameters: Data Rates

Consider a single 10kton module:

No data reduction – Continuous readout rate: 150 APA x 2,560 ch x 2 MHz x 1.5 B = 1.1 TB/s – Single, localized event size (~size of an APA x 1 drift): 2,560 ch x 2 MHz x 1.5 B x 2.25 ms = 17.3 MB – Single, extended event size (all APAs x 2.4 drifts): 150 APA x 2,560 ch x 2 MHz x 1.5 B x 5.4 ms = 6.22 GB

APA 1 APA N APA 1 APA N

localized event extended event

3

SLIDE 4

DUNE DAQ System Parameters: Data Rates

Consider a single 10kton module:

With data reduction (e.g. factor of 500-1800, depending on noise and radiological backgrounds, see docdb-4481) – Continuous readout rate: 150 APA x 2,560 ch x 2 MHz x 1.5 B / (500-1800) = 0.6-2.2 GB/s – Single, localized event size (~size of an APA x 1 drift): 2,560 ch x 2 MHz x 1.5 B x 2.25 ms / (500-1800) = 10-35 kB – Single, extended event size (all APAs x 2.4 drifts): 150 APA x 2,560 ch x 2 MHz x 1.5 B x 5.4 ms / (500-1800) = 3.5-12 MB

APA 1 APA N APA 1 APA N

localized event extended event

4

SLIDE 5

DUNE DAQ System Parameters: Data Rates

Consider a single 10kton module:

With data reduction (e.g. factor of 500-1800, depending on noise and radiological backgrounds, see docdb-4481) – Continuous readout rate: 150 APA x 2,560 ch x 2 MHz x 1.5 B / (500-1800) = 0.6-2.2 GB/s – Single, localized event size (~size of an APA x 1 drift): 2,560 ch x 2 MHz x 1.5 B x 2.25 ms / (500-1800) = 10-35 kB – Single, extended event size (all APAs x 2.4 drifts): 150 APA x 2,560 ch x 2 MHz x 1.5 B x 5.4 ms / (500-1800) = 3.5-12 MB

APA 1 APA N APA 1 APA N

localized event extended event

Note: A system which rests on noise assumptions and assumed data reduction factors is risk-prone

5

SLIDE 6

DUNE DAQ: Rethinking our challenge

DUNE is a 3D imaging device
Raw data format is ideally suited for deep learning based image

processing techniques

Promising performance for powerful image processing and classification
E.g. performance with offline (GPU)

training and inference:

DUNE SP FD simulations [J. Hewes] VGG16 CNN trained to isolate n-nbar

scillation events from atmospheric neutrino

backgrounds (more in Jeremy’s thesis) MicroBooNE: CNNs successful in identification and differentiation among different particle types. [JINST 12, P03011 (2017)]

Most Frequent MisID (%) Detection Accuracy (%)

6 Excellent separation between atm. nu and n-nbar signal images

SLIDE 7

New DAQ philosophy

Real-time image processing utilizing Deep Neural Networks (ideally on FPGA)
Minimize disk buffering needs (more on back-end)
Minimize reconstruction needs
Minimize reliance on noise rates (trainability)
Concerns

1. Speed of inference (per “image”); can we keep up with rates if we want to process every drift window (necessary for SN)? 2. Reliability of inference; already know MC-only training is deficient; how can we train reliably? Rare event searches often have no “control” data samples. 3. Changing detector conditions and need for retraining; what features are DNNs most sensitive to? what retraining frequency? what resources does this require? 4. How do we practically (re)train on data in real time? 5. Cost, technology lifecycle, power consumption, lifetime, …

Studies are needed to address above concerns and demonstrate the feasibility of a

DNN-based readout & DAQ scheme early on.

7

SLIDE 8

A “stab” at a conceptual DAQ design:

Cold elec. “slice” (e.g. 1 APA)

Noise filtering (e.g. coh. noise removal)

Reco. class. A

(high-E nu)

ROI image

classif. (A/

B/C/D/...); per plane,

r all 3

planes Signal processing channel data regrouping (e.g. by wire plane, by APA volume) Signal processing 1 Signal processing 2

Reco. Class. B

(e.g. CRM)

Reco. Class.

C (e.g. p->Knu)

Reco. Class.

D (e.g. n-nbar)

Full frame

classif. (SN);

per plane,

r all 3

planes

Reco. Class.

…

Reco. Class. X

(SN)

“traditional” fpga

Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external triggers)

external/beam/photodet. triggers DNN

ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space)

localized events extended events

data pre-processing data pre-selection/selection disk writing

8

SLIDE 9

A “stab” at a conceptual DAQ design:

Cold elec. “slice” (e.g. 1 APA)

Noise filtering (e.g. coh. noise removal)

Reco. class. A

(high-E nu)

ROI image

classif. (A/

B/C/D/...); per plane,

r all 3

planes Signal processing channel data regrouping (e.g. by wire plane, by APA volume) Signal processing 1 Signal processing 2

Reco. Class. B

(e.g. CRM)

Reco. Class.

C (e.g. p->Knu)

Reco. Class.

D (e.g. n-nbar)

Full frame

classif. (SN);

per plane,

r all 3

planes

Reco. Class.

…

Reco. Class. X

(SN)

“traditional” fpga

Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external triggers)

external/beam/photodet. triggers DNN

ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space)

localized events extended events Raw data (channel, ADC, TDC) flows from left to right, organized serially, in “frames”. a frame is O(1) drift, and, e.g. 1 APA wire plane (well defined boundary). Every single frame processed down to this stage (at least); then, can optionally drop frames.

9

SLIDE 10

What is this layer?

Layer developed and optimized for application of DNN for both image selection and classification
Can be a combination of FPGA and GPU devices:
GPU: acceleration of training
FPGA: acceleration of inference
During normal operations, DNN implemented in FPGA select and classify frames/ROIs of interest.
GPU allows semi-offline (re)training, and adjusting to changing detector conditions.
After this layer, images are already classified; specialized, topology-targeted reconstruction can be applied

separately on each event class

A “stab” at a conceptual DAQ design:

Cold elec. “slice” (e.g. 1 APA)

Noise filtering (e.g. coh. noise removal)

Reco. class. A

(high-E nu)

ROI image

classif. (A/

B/C/D/...); per plane,

r all 3

planes Signal processing channel data regrouping (e.g. by wire plane, by APA volume) Signal processing 1 Signal processing 2

Reco. Class. B

(e.g. CRM)

Reco. Class.

C (e.g. p->Knu)

Reco. Class.

D (e.g. n-nbar)

Full frame

classif. (SN);

per plane,

r all 3

planes

Reco. Class.

…

Reco. Class. X

(SN)

“traditional” fpga

Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external triggers)

external/beam/photodet. triggers DNN

ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space)

localized events extended events

10

SLIDE 11

A “stab” at a conceptual DAQ design:

Cold elec. “slice” (e.g. 1 APA)

Noise filtering (e.g. coh. noise removal)

Reco. class. A

(high-E nu)

ROI image

classif. (A/

B/C/D/...); per plane,

r all 3

planes Signal processing channel data regrouping (e.g. by wire plane, by APA volume) Signal processing 1 Signal processing 2

Reco. Class. B

(e.g. CRM)

Reco. Class.

C (e.g. p->Knu)

Reco. Class.

D (e.g. n-nbar)

Full frame

classif. (SN);

per plane,

r all 3

planes

Reco. Class.

…

Reco. Class. X

(SN)

ü Modularity ü Flexibility ü Accessibility ü Built-in redundancies ü Minimization of offline processing and reconstruction ü No explicit assumption on noise levels and data reduction Additional considerations (per docdb-8841) to be addressed:

Cost per channel
Technology lifecycle
Power consumption
Lifetime & reliability

“traditional” fpga

Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external triggers)

external/beam/photodet. triggers DNN

ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space)

localized events extended events

11

SLIDE 12

A “stab” at a conceptual DAQ design:

Cold elec. “slice” (e.g. 1 APA)

Noise filtering (e.g. coh. noise removal)

Reco. class. A

(high-E nu)

ROI image

classif. (A/

B/C/D/...); per plane,

r all 3

planes Signal processing channel data regrouping (e.g. by wire plane, by APA volume) Signal processing 1 Signal processing 2

Reco. Class. B

(e.g. CRM)

Reco. Class.

C (e.g. p->Knu)

Reco. Class.

D (e.g. n-nbar)

Full frame

classif. (SN);

per plane,

r all 3

planes

Reco. Class.

…

Reco. Class. X

(SN)

ü Built-in redundancies

Coherent noise filtering may be redundant (though unlikely)
Channel data regrouping may be unnecessary, as training may work on mixed-wire arrangements
Signal processing (deconvolution, further noise filtering) may not be necessary; e.g. DNN performance
n raw waveforms may be equivalently good (ongoing study by Y. Zhou)
Active frame selection will be necessary if DNN inference is “too slow”
ROI cropping will be necessary if DNN implementation (e.g. in FPGA) is limited in terms of image size
If frame-by-frame processing is fast enough, then localized and extended events paths can be merged

“traditional” fpga

Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external/beam trigger) Active “frame” selection (e.g. min.

integr. charge

AND/OR external triggers)

external/beam/photodet. triggers DNN

ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space) ROI finding (cropping in channel and time space)

localized events extended events

12

SLIDE 13

Concern 1: Speed of inference: How fast is fast?

Standalone study performed on GPU, by Simone Rossi (Columbia Comp. Sci.)
VGG16 network used in n-nbar analysis study
Collection plane only; after deconvolution; ROI images

preselected and down-sampled to fit 600x600 image

Timing calculated on GPU server w/
1 GPU NVIDIA GeForce GTX1080 with 8GB
Intel Core i7-6700 CPU (8 cores) @ 3.4GHz (up to 4.00GHz)
64 GB of RAM
16.04.2-Ubuntu LTS
Inference speed on GPU:
Average: 18.4 ms per ROI image
Min: 17.6 ms
Max: 71.38 ms
Still 1 order of magnitude too slow for frame-by-frame processing;

requires “active frame” pre-selection, or further optimization.

Frame-by-frame processing is a critical requirement for real-time SN

selection.

In parallel, CPU vs FPGA speed comparisons (ongoing):
Preliminary: FPGA offers speed-up by x4 in matrix multiplication

13

SLIDE 14

Concerns 2-4: Letting the data itself inform image selection

CNN studies so far seem extremely sensitive to (1) modeling/simulation

deficiencies, and possibly (2) noise characteristics. How do we mitigate this?

Furthermore, and this is regardless of architecture and data selection strategy, on
f our biggest challenges will be how to adapt to changing detector conditions.
We need a system which is capable of “learning” from the data itself, and *almost*

in real time, and capable of adapting to the new conditions; what latency is acceptable is TBD.

With that in mind, next steps: identifying specific architecture
f this block, and how this can be implemented in practice.
Also hardware, cost considerations will follow this step.

ROI image

classif. (A/

B/C/D/...); per plane,

r all 3

planes Full frame

classif. (SN);

per plane,

r all 3

planes

DNN

14

SLIDE 15

Summary

A specialized, and possibly mixed FPGA/GPU architecture for online (real-time) DNN application offers certain advantages 1. Efficient and fast ROI/frame processing and selection 2. Potential for event classification/ID on-the-fly 3. Potential for semi-online (re)training and “auto-tuning” (to deal with data/MC inherent differences; changing detector conditions) 4. Modular and scalable design, with little sensitivity to “event boundary” effects (based

n studies so far; see backup)

A first stab at a flexible design with redundancies and built-in knobs to mitigate noise risks has been presented. It could be extremely advantageous to adopt such design for the DUNE FD. Even if such design may not be realistic for implementation on Day 1, we should consider eventually moving to this solution.

Will be continuing R&D for such an implementation
Feedback, collaboration is welcome!

15

SLIDE 16

Backup slides

SLIDE 17

ZS legacy Noise, and Noise+Radiologicals

17

SLIDE 18

Insensitivity to APA boundary

It seems possible to train CNN’s irrespective of event APA containment.

18