 
              PENNACCHIO Elisabetaa PUGNÈRE Denis CNRS / IN2P3 / IPNL EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment EOS workshop, CERN, 4-5/02/2019
Outline ● The DUNE experiment ● The ProtoDUNE experiments at CERN ● The ProtoDUNE Dual-Phase DAQ ● Storage systems tested ● Storage system choice : EOS ● Data challenges and some results ● Conclusion
Who am I ● Network and system engineer ● Working in « Institut de Physique Nucléaire de Lyon » ● Managing an WLCG site : – CMS & Alice T3 (Gocdb : IN2P3-IPNL) – 1PB (DPM, XrootD), +1800 cores (CREAM-CE) ● Responsible of the DAQ backend, the processing and online data storage of the ProtoDUNE Dual Phase experiment
DUNE http://www.dunescience.org/ DUNE : Deep Underground Neutrino Experiment (~2025) ● Exploring the phenomenon of neutrino oscillations, search for signs of proton ● decay Neutrino beam + near detector @ Fermilab ● Far detector (4 * 10kT liquid argon TPC) @ Sanford (underground) ● 2 technologies : single-phase / dual-phase (dual-phase => signal amplification) ● 2 ProtoDUNE prototypes built in surface @ CERN (2018/2019) : ● NP02/WA105 (dual phase) et NP04 (single phase)
EHN1 (CERN North area) 01 oct 2018 Dual phase Dual phase detector control rooms Single phase control rooms CPU farm (12 racks) DAQ room (6 racks pour SP & DP) Single phase detector Single Phase cryogenic system
Dual-phase protoDUNE DAQ : 6x6x6m 3 active volume (300T liquid argon = 1/20 10kTon LBNO) µTCA crate 10Gb/s Charge: 12 µTCA crates, 10 AMC cards / crate, 64 channels / card => 7680 channels (12 charge readout + 1 for light readout) * 10 Gb/s links = 13 * 10 Gb/s uplinks to DAQ
 Event size: drift window of Double phase liquid argon TPC 7680 channels x 10000 samples = 146.8 MB 6x6x6 m 3 active volume X and Y charge collection strips 3.125 mm pitch, 3 m long  7680 readout channels Segmented anode In gas phase with double-phase amplification 3 m Drift coordinate 6 m = 4 ms E=0.5 kV/cm sampling 2.5 MHz (400 ns), 12 bits Deif LAe volume  10000 samples per drift window 6 m dE/dx  ionizaton Peompt UV light 6 m 6 m Cathode 7 Photomultipliers 7
ProtoDUNE dual-phase experiment needs ● Bandwidth needed by the 2 ProtoDUNE experiments : – ProtoDUNE single-phase : event size 230 MB, trigger rate 25Hz => data rate 46Gb/s => 11.25 Gb/s assuming a compression factor = 4 – ProtoDUNE dual-phase : 146.8MB / event, trigger rate 100Hz 7680 channels, 10 000 samples, 12 bits (2.5Mhz : drift window 4ms) : => data rate 120Gb/s => ~12Gb/s assuming a 10 compression factor ● ProtoDUNE dual-phase online DAQ storage buffer specifications : – ~1 PB (needed to buffer 1 to 3 days of raw data taking) – Built to store files at a 130Gb/s data rate – Huffman coding lossless data compression : 10 compression factor estimated – Goal : Online DQM, reconstruction – Data copied to the CERN T0 via a dedicated 40Gb/s link
ProtoDUNE Dual-Phase DAQ back-end design Online Farm (1k cores) : - Detector monitoring - Data quality monitoring Shifters & monitoring - Online Reconstruction Tape (CASTOR) 24 * 10Gb L2 L2 6 * 10Gb EVB EVB 40Gb 40Gb L1 FTS FTS (Xrootd 3 rd party copy) L2 EVB L2 CERN NP02 FNAL EVB EVB 2 * 40Gb 40Gb 40Gb CENTRAL 20 * 10Gb online 40Gb L2 L2 storage 6x6x6m EOS EVB EVB L1 40Gb 40Gb EVB L2 L2 2 * 40Gb - RAW data files EVB EVB 40Gb 40Gb 7 * 10Gb - Root files Event builders ONLINE OFFLINE
ProtoDUNE Dual-Phase Event-builders ● 2 * L1 Event Builders – Dell R740, 2 * CPU Gold 5122, 384 Go RAM – 2 * Intel X710 4*10Gb/s – 2 Mellanox Connect-X3 Pro 2 * 40Gb/s L2 L2 6 * 10Gb – Data collection : each EVB = /2 of the detector, EVB EVB 40Gb 40Gb L1 L2 EVB L2 corresponding to 1/2 of the event EVB EVB 2 * 40Gb 40Gb 40Gb ● 4 * L2 Event Builders L2 L2 EVB EVB 40Gb 40Gb L1 EVB L2 – Dell R740, 2 * CPU Gold 5122, 192 Go RAM) L2 2 * 40Gb EVB EVB 40Gb 40Gb 7 * 10Gb Event builders – 2 Mellanox Connect-X3 Pro 2 * 40Gb/s – Event merging, writing files to the online storage ● RDMA/RoCE communication between the Event- Builders
ProtoDUNE Dual-Phase online storage system ● 20 * Data storage servers (= 20 EOS FST) – (very) old Dell R510, 2 * CPU E5620, 32 Go RAM) : 12 * 3TB SAS HDD 24 * 10Gb – Dell MD1200 : 12 * 3TB SAS HDD 40Gb – 1 * 10Gb/s FTS (Xrootd 3 rd party copy) NP02 40Gb – 4 * RAID 6 on 6 HDD 20 * 10Gb online 40Gb storage 40Gb ● 2 * EOS Metadata servers (MGM) - RAW data files 40Gb - Root files – Dell R610, 2 * CPU E5540, 48 Go RAM
Storage systems tested (2016) Lustre BeeGFS GlusterFS GPFS MooseFS XtreemFS XRootD EOS Versions v2.7.0-3 v2015.03.r10 3.7.8-4 v4.2.0-1 2.0.88-1 1.5.1 4.3.0-1 Citrine 4.0.12 POSIX Yes Yes Yes Yes Yes Yes via FUSE via FUSE Open Source Yes Client=Yes, Yes No Yes Yes Yes Yes Serveur=EULA Need for MetaData Yes Metadata + No No Metadata + Yes Yes Server ? Manager Manager Support RDMA / Yes Yes Yes Yes No No No No Infiniband Striping Yes Yes Yes Yes No Yes No No Failover M + D DR (1) M + D (1) M + D (1) M + DR (1) M + DR (1) No M + D (1) (1) Quota Yes Yes Yes Yes Yes No No Yes Snapshots No No Yes Yes Yes Yes No No Integrated tool to Yes Yes Yes Yes No Yes No Yes move data over data servers ? Each file is divided into « chunks » (1) : M=Metadata, D=Data, M+D=Metadata+Data, DR=Data Replication ditributed over all the storage servers This is now Yes This is always at the charge of the client with raid6/raiddp CPU (DAQ back-end) WA105 Technical Board meeting, June 15, 2016 : Results on distributed storage tests https://indico.fnal.gov/event/12347/contribution/3/material/slides/0.pdf
Benchmarking platform (2016) On this 2016 platform benchmark (@IPNL) : - copy a file from a RAMDISK to the storage - tests varying file size : 100MB, 1GB, 10GB or 20GB - tests varying number of // flows from the client : 1, 6 or 8 Client : Dell R630 MDS / Managment : 2 * Dell R630 1 CPU E5-2637 @ 3.5Ghz (4c, 8c HT), ● 1 CPU E5-2637 @ 3.5Ghz (4c, 8c HT), ● 32Go RAM 2133 Mhz DDR4 ● 32Go RAM 2133 Mhz DDR4 ● 2 * Mellanox CX313A 40gb/s ● 2 * 10Gb/s (X540-AT2) ● 2 * 10Gb/s (X540-AT2) ● Scientific Linux 6.5 and Centos 7.0 ● CentOS 7.0 ● 10.3.3.3 10.3.3.4 10.3.3.5 Client 1 * 10Gb/s 2 * 40Gb/s 2 * 10Gb/s 1 * 10Gb/s Cisco Nexus 9372TX : 6 ports 40Gbps QSFP+ and 48 ports 10gb/s 9 storage servers 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10.3.3.17 10.3.3.18 10.3.3.19 10.3.3.20 10.3.3.21 10.3.3.22 10.3.3.23 10.3.3.24 10.3.3.25 9 Storage Servers : (9 * Dell R510 : bought Q4 2010) 2 * CPU E5620 @ 2.40GHz (4c, 8c HT), 16Go RAM ● 1 PERC H700 (512MB) : 1 Raid 6 12HDD 2TB (10D+2P) = 20TB ● 1 Ethernet intel 10Gb/s (X520/X540) ● Scientific Linux 6.5 ●
Results from 1 of the 48 tests (2016) Distributed storage systems performance (8 threads) 67,02% of the average 7000,00 300,00 client network bandwidth Débit (MB/s) 1 target Débit (MB/s) 2 targets Débit (MB/s) 4 targets 59,48 % Débit (MB/s) 8 targets 6000,00 Débit (MB/s) Ec 8+1 (glusterfs) 250,00 Débit (MB/s) 9 targets (GPFS / MooseFS) CPU % 1 target CPU % 2 targets 5000,00 CPU % 4 targets 43,93 % CPU % 8 targets 200,00 CPU % Ec 8+1 (glusterfs) Sum of synchronous storage elements writing bandwidth CPU % 9 targets (GPFS / MooseFS) 4000,00 150,00 Débit (MB/s) CPU % 3000,00 100,00 2000,00 50,00 1000,00 0,00 0,00 Lus Lus Lus Lus Be Be Be Be Glu Glu Glu Glu GP GP GP GP Mo Mo Mo Mo Xtr Xtr Xtr Xtr XR XR XR XR EO EO EO EO tre tre tre tre eG eG eG eG ster ster ster ster FS FS FS FS ose ose ose ose ee ee ee ee oot oot oot oot S S 1 S S 10 1 10 20 FS FS FS FS FS FS FS FS 10 1 10 20 FS FS FS FS mF mF mF mF D D 1 D D 10 GB 10 20 0 GB GB GB 10 1 10 20 10 1 10 20 0 GB GB GB 10 1 10 20 S S 1 S S 10 GB 10 20 0 GB GB MB 0 GB GB GB 0 GB GB GB MB 0 GB GB GB 10 GB 10 20 0 GB GB MB MB MB MB 0 GB GB MB
Storage system choice : EOS ● EOS choosen (after the 2016 tests) : – Low-latency storage, – Very efficient on the client side, – POSIX, Kerberos, GSI access control, – XrootD, POSIX file access protocol, – 3 rd party-copy support (needed for FTS), – Checksums support, – Redondancy : ● Metadata servers ● Data server (x replicas, RAIN raid6/raiddp) – Data server lifecycle management (draining, start/stop operation)
Recommend
More recommend