Ideas for Real-Time Analysis for HL-LHC using the CMS DAQ System
Remigius K Mommsen Fermilab
Ideas for Real-Time Analysis for HL-LHC using the CMS DAQ System - - PowerPoint PPT Presentation
Ideas for Real-Time Analysis for HL-LHC using the CMS DAQ System Remigius K Mommsen Fermilab Disclaimer The idea of the L1 scou0ng originates from Emilio Meschi (CERN) This talk is based to a large extend on material presented by Hannes
Remigius K Mommsen Fermilab
The idea of the L1 scou0ng originates from Emilio Meschi (CERN) This talk is based to a large extend on material presented by Hannes Sakulin (CERN) at CHEP 2019, Adelaide, Australia Any mistakes or misinterpreta0ons are mine
2
3
Event size:
300 TB/s @ 40 MHz
Endcap Calorimeters
Barrel Calorimeters
HLT rate: ~7.5 kHz
Muon Systems
Tracker
low material budget
L1 rate: 750 kHz
MIP Timing Detector
LV1
μs sec
Digitizers Front end pipelines Event-builder nodes Storage (pt5/tier 0) HLT
4
Phase 0 & 1 — 2008-24 40 MHz 100 kHz 2 kHz 0.15 TB/s 1.5 MB event size Phase 2 — 2027- 40 MHz 750 kHz 7.5 kHz 7.5 MB event size 5.5 TB/s
5
12 µs latency
High resolu0on objects
firmware
reconstruc0on
Topological algorithms including invariant/transverse mass cuts Machine learning algorithms Inter-bx algorithms (limited to +/- 3 bx)
Analyze events while the data is being taken
Store summary results for certain topologies at higher rate
LHCb will does most analysis in “real-0me”
6
Data Scou0ng at HLT used successfully in CMS since 2011
performed for these events
7
LV1 HLT
μs sec
Detectors Digitizers Front end pipelines Readout buffers Switching networks Processor farms
Tiny event at higher rate
Acquire L1 trigger data at full bunch crossing rate
keep up with rate Analyze certain topologies at full rate
Planned for HL-LHC
8
LV1 HLT
μs sec
Detectors Digitizers Front end pipelines Readout buffers Switching networks Processor farms
40 MHz Level-1 Trigger Scou0ng System
Physics use case
(Available cuts give low efficiency at amributed rate budget)
Several Physics channels iden0fied where L1 scou0ng could poten0ally make a difference
9
Scou0ng provides invaluable diagnos0c and monitoring opportuni0es as well
Per-bunch luminosity measurement using physics channels with high sta0s0cs Anomaly detec0on with deep-learning algorithms
10
12
200 Gbps NIC
I/O node
Short term storage 2 min 1-3 TB
Same protocol as in trigger no back- pressure
SoAware ZS
DMA PCIe Gen4
Input HW Kintex Ultrascale+
Input board
using ML Kintex Ultrascale+
1 or 2 boards
RAM ? NVRAM ?
25 Gb/s from trigger
8x
CPU GPU
Features or full events
Other Accel.
distributed processing (MPI ?)
… …
HPC Interconnect(s) Infiniband HDR, 200 GbE
Distributed (global) stream processing Amached storage long term Query- based analysis Feature DB medium term
Key-value store ?
(mul7-bx possible)
Expect Xilinx Kintex Ultrascale+ based HW board to be commercially available
Trigger data captured directly from the Level-1 using spare outputs of the processing boards
Input hardware: PCIe boards with (modest) FPGA in 1U PC (I/O node) – (uGMT scou0ng uses KCU1500 [limited to 16 Gbps])
I/O nodes (CPU, GPU, other accelerators) use distributed algorithms to extract features while data are buffered in memory
Distributed global stream processing and storage into “feature DB”
Analysis by query, analysis results to permanent storage
13
14
14
HPC Interconnect(s) Infiniband HDR, 200 GbE
Distributed (global) stream processing Amached storage long term Query- based analysis Feature DB medium term
Trigger Primi0ves Tracker Muon Calo Global Decision
I/O nodes
14
HPC Interconnect(s) Infiniband HDR, 200 GbE
Distributed (global) stream processing Amached storage long term Query- based analysis Feature DB medium term
Trigger Primi0ves Tracker Muon Calo Global Decision
I/O nodes
Stage1: 9 nodes @ 200 Gbps
14
HPC Interconnect(s) Infiniband HDR, 200 GbE
Distributed (global) stream processing Amached storage long term Query- based analysis Feature DB medium term
Trigger Primi0ves Tracker Muon Calo Global Decision
I/O nodes
Stage1: 9 nodes @ 200 Gbps Stage 2: add 28 nodes @ 200 Gbps
14
HPC Interconnect(s) Infiniband HDR, 200 GbE
Distributed (global) stream processing Amached storage long term Query- based analysis Feature DB medium term
Trigger Primi0ves Tracker Muon Calo Global Decision
I/O nodes
Stage1: 9 nodes @ 200 Gbps Stage 2: add 28 nodes @ 200 Gbps Stage 3: add 98 nodes @ 200 Gbps
14
HPC Interconnect(s) Infiniband HDR, 200 GbE
Distributed (global) stream processing Amached storage long term Query- based analysis Feature DB medium term
Trigger Primi0ves Tracker Muon Calo Global Decision
I/O nodes
Stage1: 9 nodes @ 200 Gbps Stage 2: add 28 nodes @ 200 Gbps Stage 3: add 98 nodes @ 200 Gbps Stage 4: add 100's nodes @ 200 Gbps
When: Oct / Nov 2018 Types of runs:
Capture @ 40 MHz
candidates from barrel region
16
40 MHz Scouting Prototype System
17
10 Gbps NIC
SoAware ZS (1/8)
KCU1500 firmware ZS (1/20)
max 100 MB/s
10 Gb/s from GMT
8x
DMA PCIe Gen3 max 800 MB/s
8 GB/s
RAMdisk mount
Dell R720
40 Gbps NIC
BZIP (1/2)
max 50 MB/s
Dell R720
10/40 Gbps switch
Infini band NIC
RAM disk RAID 8 TB
Lustre
2x QSFP = 8x 10 Gbps PCIe Gen3x8 (2x) KCU 1500 Xilinx Kintex Ultrascale 115
1.1 TB/24 hour beam day aier compression in pp @2E34
Controller PC
Firmware update & monitoring
Data collected in the last week of pp running
(x8 compression)
…and for the en0re HI run
About one trillion non-empty BXs collected
18
<31…eta extrapolated…23><22…quality…19><18…transverse momentum…10> <9…phi extrapolated…0><31…reserved…30><29…eta…21><20…phi…11> <10…index bits…4><3charge valid><2charge><1…iso…0>
Emimance scan = method to determine beam overlap Beams moved in x (or y) w.r.t. each other Measure interac0on rate by coun0ng muons from GMT 40 MHz scou0ng
analysis Results from GMT scou0ng compa0ble with other luminometers
19
Tim Brueckler et al. (BRIL) Time Fill 7333 “late” emimance scan Number of muons
(=0.4 s)
21
Using Ka€a Java producers from a cloud node
Topic with 80 par00ons on 3 brokers Stream processing being op0mized
22
When: star0ng 2021 Capture @ 40 MHz
(displaced Muons)
24
40 MHz Scouting Prototype System
CMS will con0nue to use scou0ng technique in run 3
26
LV1 HLT
μs sec
Detectors Digitizers Front end pipelines Readout buffers Switching networks Processor farms
Tiny event at higher rate
Plenty of disk space on local HLT machine
Large buffer space in event-builder would allow to delay HLT selec0on
27
DAQDB
and affordable capacity
extend the capacity
(ARTree) for efficient range queries
range queries, and next event retrieval Complete dataflow simula0on
28
Grzegorz Jereczek on behalf of DAQDB team CHEP 2019, Adelaide, Australia
CMS is planning a 40 MHz L1 Trigger scou0ng for system for HL-LHC
Prototyped Global Muon Trigger Scou0ng in run 2
HLT scou0ng technique will be expanded for run 3 R&D ongoing on various fronts
29