PENNACCHIO Elisabetaa PUGNÈRE Denis
CNRS / IN2P3 / IPNL
EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment - - PowerPoint PPT Presentation
PENNACCHIO Elisabetaa PUGNRE Denis CNRS / IN2P3 / IPNL EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment EOS workshop, CERN, 4-5/02/2019 Outline The DUNE experiment The ProtoDUNE experiments at CERN The ProtoDUNE
CNRS / IN2P3 / IPNL
– CMS & Alice T3 (Gocdb : IN2P3-IPNL)
– 1PB (DPM, XrootD), +1800 cores (CREAM-CE)
decay
NP02/WA105 (dual phase) et NP04 (single phase)
Single phase detector Dual phase detector Single phase control rooms Dual phase control rooms CPU farm (12 racks) Single Phase cryogenic system DAQ room (6 racks pour SP & DP)
µTCA crate
Charge: 12 µTCA crates, 10 AMC cards / crate, 64 channels / card => 7680 channels (12 charge readout + 1 for light readout) * 10 Gb/s links = 13 * 10 Gb/s uplinks to DAQ
10Gb/s
7
7
E=0.5 kV/cm Cathode
Segmented anode In gas phase with double-phase amplification
LAe volume
X and Y charge collection strips 3.125 mm pitch, 3 m long 7680 readout channels
Double phase liquid argon TPC 6x6x6 m3 active volume Deif
Drift coordinate 6 m = 4 ms sampling 2.5 MHz (400 ns), 12 bits 10000 samples per drift window
6 m 6 m 3 m
Peompt UV light
dE/dx ionizaton
Photomultipliers
Event size: drift window of 7680 channels x 10000 samples = 146.8 MB
6 m
– ProtoDUNE single-phase : event size 230 MB, trigger rate 25Hz
=> data rate 46Gb/s => 11.25 Gb/s assuming a compression factor = 4
– ProtoDUNE dual-phase : 146.8MB / event, trigger rate 100Hz
7680 channels, 10 000 samples, 12 bits (2.5Mhz : drift window 4ms) : => data rate 120Gb/s => ~12Gb/s assuming a 10 compression factor
– ~1 PB (needed to buffer 1 to 3 days of raw data taking) – Built to store files at a 130Gb/s data rate – Huffman coding lossless data compression : 10 compression factor
estimated
– Goal : Online DQM, reconstruction – Data copied to the CERN T0 via a dedicated 40Gb/s link
6x6x6m
L1 EVB L1 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
6 * 10Gb 7 * 10Gb
2 * 40Gb
NP02
storage
20 * 10Gb
24 * 10Gb
Online Farm (1k cores) :
Shifters & monitoring
40Gb
CERN CENTRAL EOS
ONLINE OFFLINE
FTS (Xrootd 3rd party copy)
FTS
FNAL
Event builders
Tape (CASTOR)
40Gb 2 * 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb
– Dell R740, 2 * CPU Gold 5122, 384 Go RAM – 2 * Intel X710 4*10Gb/s – 2 Mellanox Connect-X3 Pro 2 * 40Gb/s – Data collection : each EVB = /2 of the detector,
corresponding to 1/2 of the event
– Dell R740, 2 * CPU Gold 5122, 192 Go RAM) – 2 Mellanox Connect-X3 Pro 2 * 40Gb/s – Event merging, writing files to the online
storage
Builders
L1 EVB L1 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
6 * 10Gb 7 * 10Gb
2 * 40Gb
Event builders
40Gb 2 * 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb
– (very) old Dell R510, 2 * CPU E5620, 32
Go RAM) : 12 * 3TB SAS HDD
– Dell MD1200 : 12 * 3TB SAS HDD – 1 * 10Gb/s – 4 * RAID 6 on 6 HDD
– Dell R610, 2 * CPU E5540, 48 Go RAM
NP02
storage
20 * 10Gb
24 * 10Gb
40Gb
FTS (Xrootd 3rd party copy)
40Gb 40Gb 40Gb 40Gb
Lustre BeeGFS GlusterFS GPFS MooseFS XtreemFS XRootD EOS Versions v2.7.0-3 v2015.03.r10 3.7.8-4 v4.2.0-1 2.0.88-1 1.5.1 4.3.0-1 Citrine 4.0.12 POSIX Yes Yes Yes Yes Yes Yes via FUSE via FUSE Open Source Yes Client=Yes, Serveur=EULA Yes No Yes Yes Yes Yes Need for MetaData Server ? Yes Metadata + Manager No No Metadata + Manager Yes Yes Support RDMA / Infiniband Yes Yes Yes Yes No No No No Striping Yes Yes Yes Yes No Yes No No Failover M + D (1) DR (1) M + D (1) M + D (1) M + DR (1) M + DR (1) No M + D (1) Quota Yes Yes Yes Yes Yes No No Yes Snapshots No No Yes Yes Yes Yes No No Integrated tool to move data over data servers ? Yes Yes Yes Yes No Yes No Yes
(1) : M=Metadata, D=Data, M+D=Metadata+Data, DR=Data Replication
Each file is divided into « chunks » ditributed over all the storage servers This is always at the charge of the client CPU (DAQ back-end)
WA105 Technical Board meeting, June 15, 2016 : Results on distributed storage tests https://indico.fnal.gov/event/12347/contribution/3/material/slides/0.pdf
This is now Yes with raid6/raiddp
Cisco Nexus 9372TX : 6 ports 40Gbps QSFP+ and 48 ports 10gb/s
10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10.3.3.17 10.3.3.18 10.3.3.19 10.3.3.20 10.3.3.21 10.3.3.22 10.3.3.23 10.3.3.24 10.3.3.25 2 * 10Gb/s 2 * 40Gb/s 10.3.3.4 1 * 10Gb/s 10.3.3.3 1 * 10Gb/s 10.3.3.5 9 Storage Servers : (9 * Dell R510 : bought Q4 2010)
Client : Dell R630
MDS / Managment : 2 * Dell R630
Client 9 storage servers
On this 2016 platform benchmark (@IPNL) :
Lus tre 10 MB Lus tre 1 GB Lus tre 10 GB Lus tre 20 GB Be eG FS 10 MB Be eG FS 1 GB Be eG FS 10 GB Be eG FS 20 GB Glu ster FS 10 MB Glu ster FS 1 GB Glu ster FS 10 GB Glu ster FS 20 GB GP FS 10 MB GP FS 1 GB GP FS 10 GB GP FS 20 GB Mo
FS 10 MB Mo
FS 1 GB Mo
FS 10 GB Mo
FS 20 GB Xtr ee mF S 10 Xtr ee mF S 1 GB Xtr ee mF S 10 GB Xtr ee mF S 20 GB XR
D 10 MB XR
D 1 GB XR
D 10 GB XR
D 20 GB EO S 10 MB EO S 1 GB EO S 10 GB EO S 20 GB 0,00 1000,00 2000,00 3000,00 4000,00 5000,00 6000,00 7000,00 0,00 50,00 100,00 150,00 200,00 250,00 300,00
Distributed storage systems performance (8 threads)
Débit (MB/s) 1 target Débit (MB/s) 2 targets Débit (MB/s) 4 targets Débit (MB/s) 8 targets Débit (MB/s) Ec 8+1 (glusterfs) Débit (MB/s) 9 targets (GPFS / MooseFS) CPU % 1 target CPU % 2 targets CPU % 4 targets CPU % 8 targets CPU % Ec 8+1 (glusterfs) CPU % 9 targets (GPFS / MooseFS)
Débit (MB/s) CPU %
67,02% of the average client network bandwidth 59,48 % 43,93 %
Sum of synchronous storage elements writing bandwidth
Results from 1 of the 48 tests (2016)
– Low-latency storage, – Very efficient on the client side, – POSIX, Kerberos, GSI access control, – XrootD, POSIX file access protocol, – 3rd party-copy support (needed for FTS), – Checksums support, – Redondancy :
– Data server lifecycle management (draining, start/stop operation)
system with 19 FSTs :
– Assuming we decided to record 3GB raw data files size – Test = copy (xrdcp) a file from a RAMDISK of each
Event builder to the storage
– Using different flow patterns : 6,8,16,20 or 32 parallel
xrdcp / Event builder
– From 1, 2, 3 or 4 Event builders – Using different EOS configurations
group)
group – combining 2 RAID file-systems into 1 FS-)
– On different EOS releases : eos-server-4.4.10 & 4.4.18 L1 EVB L1 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
L2 EVB
6 * 10Gb 7 * 10Gb
2 * 40Gb
NP02
storage
20 * 10Gb
Event builders
40Gb 2 * 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb
← 1*EVB 2*EVB -> ← 3*EVB 4*EVB -> Confgueaton eos-seevee 4.4.10a 36 geoupsa 1 FS / geoup Something steange with 2a 3 and 4 * EVB
– no effect of skipSaturatedPlct parameter : used for
client file shedulling ?
> eos geosched set skipSaturatedPlct 1 – A group with less FS than the other groups is quickly
saturated, reducing the whole system efficiency
– Many errors with raid6/raiddp with 4.4.10 (not showed
there), not yet re-tested with other releases
– Huge performance improvements from 4.4.10 to
4.4.18
On the geoup with less FSa the numbee of « wopen » is inceeasinga FSTs aee oveeloaded
← eos 4.4.10 eos 4.4.18 ->
4*EVBa eos-seevee 4.4.10 4*EVBa eos-seevee 4.4.18a 6 theeads / EVB 32 theeads / EVB
20
Storage servers
EVBL1A EVBL1B EVBL2A EVBL2B EVBL2C EVBL2D 9 DAQ service machines
DAQ room @ EHN1 Dual Phase racks
PeotoDUNE Dual-Phase online DAQ
Router and switches
Storage servers
5 successives tests 6a 8a 16a 20 and 32 // xedcp feom 4*EVB
Inside ProtoDUNE : http://cds.cern.ch/images/CERN-PHOTO-201710-248-2
Many thanks to :