the base line dataflow system of the atlas trigger daq
play

The Base line DataFlow system of the ATLAS Trigger & DAQ Jos - PowerPoint PPT Presentation

The Base line DataFlow system of the ATLAS Trigger & DAQ Jos Vermeulen NIKHEF On behalf of the ATLAS Trigger/DAQ DataFlow group 1 2 The ATLAS detector Calorimeters: all channels read out, very large energy depositions Muon detector:


  1. The Base line DataFlow system of the ATLAS Trigger & DAQ Jos Vermeulen NIKHEF On behalf of the ATLAS Trigger/DAQ DataFlow group 1

  2. 2

  3. The ATLAS detector Calorimeters: all channels read out, very large energy depositions Muon detector: zero-suppressed (> 32 GeV, rare) add some data read-out, occupancy determined by background, ~ 2% for precision chambers for current design luminosity estimate 7 TeV protons Si Pixels, Si strips (SCT), 3 Transition Radiation Tracker (TRT) Zero-suppressed read-out, occupancy estimate design luminosity Pixels: << 1%, SCT < ~ 1%, TRT up to about 40%

  4. TDAQ Documentation History December 1994: ATLAS Technical Proposal June 1998: • Level-1 Trigger TDR • Trigger Performance Status report • DAQ, EF, LVL2 and DCS Technical Progress Report March 2000: HLT, DAQ and DCS Technical Proposal June 2003: High-Level Trigger Data Acquisition and Controls Technical Design Report 4

  5. ATLAS TDAQ Read-Out Drivers *) : multiplex data from First-level trigger detectors onto Read-Out Links (ROLs) Region of Read-Out Links*) Interest Builder*) Read-Out Sub-system Second-level trigger Read-Out Buffers*) supervisors Data flow Manager Sub-Farm Input Farms with second- Farms with Event level trigger Filter processors processors (L2PUs) Sub-Farm Output Boundary of The DataFlow subsystem TDAQ system takes care of data movement in the TDAQ system 5 *) special-purpose hardware

  6. Event dataflow into the ROSs and passing of RoI information NB1: LVL1 is not A LVL1 accept part of the TDAQ causes the front-end DataFlow subsystem buffers to send event data to the NB2: LVL1 trigger RODs, which uses data from the assemble event calorimeters and fragments and pass dedicated muon these via 1600 trigger detectors ROLs (S-LINK, optical fibers, each 160 MByte/s bandwidth) LVL1 accept rate: to, in the baseline max. 75 kHz, design, 144 ROSs. upgradable to 100 kHz, RoI information is nominally 25 - 40 kHz passed to the RoI Builder, its output is sent to one Average fragment size of the L2SVs. per ROL < 1.6 kByte 6

  7. RODs and ROSs ROD 90 9U crates VME bus Crate (~40 racks) R R R R R Processor C O O O O Config & Control (6U) P D D D D Event sampling & Calibration data Linux See M.Müller, "A RobIn Prototype for ROLs Data a PCI - Bus based Atlas Readout - System" 1600 links (HOLA S-LINK, (next talk) 160 MByte/s per link) NIC Linux 144 4U PCs … ROBIN ROBIN ROBIN (~15 racks) PCI bus In USA15 (underground) In SDX15 Gigabit (at surface) Ethernet links Alternative data LVL2 & Event Builder Networks LVL2 & Event Builder Networks paths 7

  8. Networks, EB and HLT (LVL2 and EF) Network protocol within 1600 ROLs this domain: raw or UDP ROSs Linux small switch Linux Linux Linux TCP/IP from RoI Linux builder ~ 500 dual-CPU 8 GHz processors When individual ROBINs are connected to the network additional "concentrating" switches between the central switches and TCP/IP Linux 8 Linux the ROBINs may be used.

  9. Software on Linux nodes: C++, use of POSIX threads and of the Standard Template Library For detailed information see S. Gadomski, CHEP2003 "Experience with multi-threaded C++ applications in the ATLAS DataFlow" http://cdsweb.cern.ch/search.py?recid=621381 ATL-DAQ-2003-007 9

  10. RoI requests After a LVL1 accept the L2SV sends the RoI information to a L2PU. The RoI information indicates which data L2SV -> L2PU, has to be requested from RoI information the ROSs as input (1 message) for the LVL2 selection. L2PU -> ROSs, On average: 1.6 RoI per event RoI requests (several messages) An L2PU will request data corresponding Event fragments to a RoI in steps, e.g. for an electron/gamma RoI (several messages) first data from the em calorimeter, next from the hadron calorimeter and then from the inner detector. Only a fraction of the events is accepted in each step, in the example 19% after the first step, 11 % of the original number after the second step. 10

  11. RoI request rates are estimated with the "paper model" "Paper" -> "back-of-the-envelope" calculations In practice: C++ program (formerly spreadsheet). Basic assumption: RoI rate does not depend on the h and f of the centre of the RoI, only on the area in h - f space associated with the RoI. The RoI rates are obtained with a straightforward calculation using: • the LVL1 accept rate, • exclusive rates for the various LVL1 trigger menu items, • the number of RoIs associated with each trigger item, • the h - f area associated with each possible RoI location. The request rates are then obtained using: • information of the mapping of the ROLs onto the detector, • the acceptance factors of the various LVL2 trigger steps, • the h - f areas from which data is requested (RoI and detector dependent). 11

  12. Paper model result (luminosity: 2.10 33 ) 30 25 Number of ROS units 20 15 10 20 kHz 5 0 0 5000 10000 15000 20000 RoI request frequency per ROS unit (Hz) LVL1 accept rate: 100 kHz 12

  13. Paper model result (luminosity: 2.10 33 ) 45 40 35 Number of ROS units 30 25 20 15 40 MByte/s 10 5 0 0 10 20 30 40 Output data rate per ROS unit for LVL2 (MByte/s) LVL1 accept rate: 100 kHz 13

  14. LVL2 output After production of a decision by a LVL2 processor, the decision is communicated to the L2SV which sent the RoI request. For events accepted data produced pROS "pseudo-ROS" by the trigger algorithms collects LVL2 are also passed to the pROS. results L2PU -> L2SV, LVL2 decision (1 message, LVL1 accept rate) L2PU -> pROS, LVL2 results LVL2 accept rate = 3.0 - 3.5 kHz (1 message) for 100 kHz LVL1 accept rate L2SV -> DFM (1 message) 14

  15. Event building For each event accepted by LVL2 the DFM sends a build request to an SFI. This in turn sends requests for data to the ROSs (including the pROS). The ROSs pROS return the fragments (identified by the LVL1 id) requested. DFM -> SFI, Build request (1 message) SFI -> ROSs, Event building rate = 3.0 - 3.5 kHz Data request for 100 kHz LVL1 accept rate (1 message per ROS) Event size ~ 1.5 MByte ROSs -> SFI, Event fragment 15 (1 message)

  16. Event clearing After completion of event building an EoE (End of Event) message is sent by the SFI to the DFM. The DFM stores these and LVL2 reject messages until pROS ~ 300 of these have been received. Event clear commands for the LVL1 ids SFI -> DFM, associated with the EoE: "End of Event" EoE and LVL2 reject (1 message) messages are then sent to the ROSs, with ~ 300 of DFM -> ROSs, Event clear (1 message these commands in a single multi-cast to ROS message. These messages for ~300 events) are multi-cast, and the rate is the LVL1 accept rate divided by the blocking factor 16 (330 Hz for a LVL1 accept rate of 100 kHz) .

  17. Event Filter and mass storage Rate of accepted events ≈ 200 Hz ~ 1600 EF nodes (dual 8 GHz CPUs) ~ 90 SFIs and 30 SFOs After building the event it is delivered to one of the Event Filter processors (on request by these processors). A further decision is taken on acceptance or rejection. The data of accepted events are passed to the SFOs, where the events are buffered and passed to central mass storage in the CERN computer centre. 17

  18. Modelling of queue formation in switches Bus-based ROS unit with ROBINs connecting to 12 ROLs L2SV DFM Central LVL2 Central EB 4 processors I (in model, II in real system LVL2 LVL2 1 DFM) subfarm subfarm 64 SFIs switch switch 100 subfarm SFIs in model: PCs switches, L2PUs in model: PCs with one 8 GHz CPU 5 L2PUs per with two 8 GHz CPUs switch Queues tend to form at I and II. Can be controlled: • by limiting number of events assigned simultaneously to each L2PU/SFI • with assignment pattern of events to L2PUs/SFIs 18 • by limiting number of outstanding requests per L2PU/SFI

  19. Results for point I Obtained with discrete event simulation 1) , assuming use of raw Ethernet, with paper model assumptions for trigger menus, ROL mapping, acceptance factors of the different stages of LVL2 processing, 100 kHz LVL1 accept rate, design luminosity. Switches are assumed to be crossbar switches with buffers on the ouput ports (no flow control). 1) using simdaq, a dedicated C++ program round-robin assignment (rr) least-queued assignment: to L2PU handling smallest number of events (lq) least-queued assignment with preference for assignment to L2PU connected to different subfarm switch, at max. 4 events handled simultaneously by the same L2PU (lq4j) same as this distribution, with at max. 4 outstanding 19 (number of frames) requests (lq44j)

  20. LVL2 decision time for same model round-robin assignment (rr) round-robin assignment, at max. 4 events handled simultaneously by the same L2PU (rr4) least-queued assignment: to L2PU handling smallest number of events: better load balancing (lq) Peaks in distribution due to steps in LVL2 processing 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend