EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment - - PowerPoint PPT Presentation

eos as an online daq bufee foe the peotodune dual phase
SMART_READER_LITE
LIVE PREVIEW

EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment - - PowerPoint PPT Presentation

PENNACCHIO Elisabetaa PUGNRE Denis CNRS / IN2P3 / IPNL EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment EOS workshop, CERN, 4-5/02/2019 Outline The DUNE experiment The ProtoDUNE experiments at CERN The ProtoDUNE


slide-1
SLIDE 1

PENNACCHIO Elisabetaa PUGNÈRE Denis

CNRS / IN2P3 / IPNL

EOS workshop, CERN, 4-5/02/2019

EOS as an online DAQ bufee foe the PeotoDUNE Dual Phase expeeiment

slide-2
SLIDE 2

Outline

  • The DUNE experiment
  • The ProtoDUNE experiments at CERN
  • The ProtoDUNE Dual-Phase DAQ
  • Storage systems tested
  • Storage system choice : EOS
  • Data challenges and some results
  • Conclusion
slide-3
SLIDE 3

Who am I

  • Network and system engineer
  • Working in « Institut de Physique Nucléaire de Lyon »
  • Managing an WLCG site :

– CMS & Alice T3 (Gocdb : IN2P3-IPNL)

– 1PB (DPM, XrootD), +1800 cores (CREAM-CE)

  • Responsible of the DAQ backend, the processing and
  • nline data storage of the ProtoDUNE Dual Phase

experiment

slide-4
SLIDE 4

DUNE http://www.dunescience.org/

  • DUNE : Deep Underground Neutrino Experiment (~2025)
  • Exploring the phenomenon of neutrino oscillations, search for signs of proton

decay

  • Neutrino beam + near detector @ Fermilab
  • Far detector (4 * 10kT liquid argon TPC) @ Sanford (underground)
  • 2 technologies : single-phase / dual-phase (dual-phase => signal amplification)
  • 2 ProtoDUNE prototypes built in surface @ CERN (2018/2019) :

NP02/WA105 (dual phase) et NP04 (single phase)

slide-5
SLIDE 5

EHN1 (CERN North area) 01 oct 2018

Single phase detector Dual phase detector Single phase control rooms Dual phase control rooms CPU farm (12 racks) Single Phase cryogenic system DAQ room (6 racks pour SP & DP)

slide-6
SLIDE 6

Dual-phase protoDUNE DAQ :

6x6x6m3 active volume (300T liquid argon = 1/20 10kTon LBNO)

µTCA crate

Charge: 12 µTCA crates, 10 AMC cards / crate, 64 channels / card => 7680 channels (12 charge readout + 1 for light readout) * 10 Gb/s links = 13 * 10 Gb/s uplinks to DAQ

10Gb/s

slide-7
SLIDE 7

7

7

E=0.5 kV/cm Cathode

Segmented anode In gas phase with double-phase amplification

LAe volume

X and Y charge collection strips 3.125 mm pitch, 3 m long  7680 readout channels

Double phase liquid argon TPC 6x6x6 m3 active volume Deif

Drift coordinate 6 m = 4 ms sampling 2.5 MHz (400 ns), 12 bits  10000 samples per drift window

6 m 6 m 3 m

Peompt UV light

dE/dx  ionizaton

Photomultipliers

 Event size: drift window of 7680 channels x 10000 samples = 146.8 MB

6 m

slide-8
SLIDE 8

ProtoDUNE dual-phase experiment needs

  • Bandwidth needed by the 2 ProtoDUNE experiments :

– ProtoDUNE single-phase : event size 230 MB, trigger rate 25Hz

=> data rate 46Gb/s => 11.25 Gb/s assuming a compression factor = 4

– ProtoDUNE dual-phase : 146.8MB / event, trigger rate 100Hz

7680 channels, 10 000 samples, 12 bits (2.5Mhz : drift window 4ms) : => data rate 120Gb/s => ~12Gb/s assuming a 10 compression factor

  • ProtoDUNE dual-phase online DAQ storage buffer specifications :

– ~1 PB (needed to buffer 1 to 3 days of raw data taking) – Built to store files at a 130Gb/s data rate – Huffman coding lossless data compression : 10 compression factor

estimated

– Goal : Online DQM, reconstruction – Data copied to the CERN T0 via a dedicated 40Gb/s link

slide-9
SLIDE 9

6x6x6m

L1 EVB L1 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

6 * 10Gb 7 * 10Gb

2 * 40Gb

NP02

  • nline

storage

20 * 10Gb

24 * 10Gb

Online Farm (1k cores) :

  • Detector monitoring
  • Data quality monitoring
  • Online Reconstruction

Shifters & monitoring

40Gb

CERN CENTRAL EOS

  • RAW data files
  • Root files

ONLINE OFFLINE

FTS (Xrootd 3rd party copy)

FTS

FNAL

Event builders

Tape (CASTOR)

40Gb 2 * 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb

ProtoDUNE Dual-Phase DAQ back-end design

slide-10
SLIDE 10

ProtoDUNE Dual-Phase Event-builders

  • 2 * L1 Event Builders

– Dell R740, 2 * CPU Gold 5122, 384 Go RAM – 2 * Intel X710 4*10Gb/s – 2 Mellanox Connect-X3 Pro 2 * 40Gb/s – Data collection : each EVB = /2 of the detector,

corresponding to 1/2 of the event

  • 4 * L2 Event Builders

– Dell R740, 2 * CPU Gold 5122, 192 Go RAM) – 2 Mellanox Connect-X3 Pro 2 * 40Gb/s – Event merging, writing files to the online

storage

  • RDMA/RoCE communication between the Event-

Builders

L1 EVB L1 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

6 * 10Gb 7 * 10Gb

2 * 40Gb

Event builders

40Gb 2 * 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb

slide-11
SLIDE 11

ProtoDUNE Dual-Phase online storage system

  • 20 * Data storage servers (= 20 EOS

FST)

– (very) old Dell R510, 2 * CPU E5620, 32

Go RAM) : 12 * 3TB SAS HDD

– Dell MD1200 : 12 * 3TB SAS HDD – 1 * 10Gb/s – 4 * RAID 6 on 6 HDD

  • 2 * EOS Metadata servers (MGM)

– Dell R610, 2 * CPU E5540, 48 Go RAM

NP02

  • nline

storage

20 * 10Gb

24 * 10Gb

40Gb

  • RAW data files
  • Root files

FTS (Xrootd 3rd party copy)

40Gb 40Gb 40Gb 40Gb

slide-12
SLIDE 12

Storage systems tested (2016)

Lustre BeeGFS GlusterFS GPFS MooseFS XtreemFS XRootD EOS Versions v2.7.0-3 v2015.03.r10 3.7.8-4 v4.2.0-1 2.0.88-1 1.5.1 4.3.0-1 Citrine 4.0.12 POSIX Yes Yes Yes Yes Yes Yes via FUSE via FUSE Open Source Yes Client=Yes, Serveur=EULA Yes No Yes Yes Yes Yes Need for MetaData Server ? Yes Metadata + Manager No No Metadata + Manager Yes Yes Support RDMA / Infiniband Yes Yes Yes Yes No No No No Striping Yes Yes Yes Yes No Yes No No Failover M + D (1) DR (1) M + D (1) M + D (1) M + DR (1) M + DR (1) No M + D (1) Quota Yes Yes Yes Yes Yes No No Yes Snapshots No No Yes Yes Yes Yes No No Integrated tool to move data over data servers ? Yes Yes Yes Yes No Yes No Yes

(1) : M=Metadata, D=Data, M+D=Metadata+Data, DR=Data Replication

Each file is divided into « chunks » ditributed over all the storage servers This is always at the charge of the client CPU (DAQ back-end)

WA105 Technical Board meeting, June 15, 2016 : Results on distributed storage tests https://indico.fnal.gov/event/12347/contribution/3/material/slides/0.pdf

This is now Yes with raid6/raiddp

slide-13
SLIDE 13

Benchmarking platform (2016)

Cisco Nexus 9372TX : 6 ports 40Gbps QSFP+ and 48 ports 10gb/s

10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10Gb/s 10.3.3.17 10.3.3.18 10.3.3.19 10.3.3.20 10.3.3.21 10.3.3.22 10.3.3.23 10.3.3.24 10.3.3.25 2 * 10Gb/s 2 * 40Gb/s 10.3.3.4 1 * 10Gb/s 10.3.3.3 1 * 10Gb/s 10.3.3.5 9 Storage Servers : (9 * Dell R510 : bought Q4 2010)

  • 2 * CPU E5620 @ 2.40GHz (4c, 8c HT), 16Go RAM
  • 1 PERC H700 (512MB) : 1 Raid 6 12HDD 2TB (10D+2P) = 20TB
  • 1 Ethernet intel 10Gb/s (X520/X540)
  • Scientific Linux 6.5

Client : Dell R630

  • 1 CPU E5-2637 @ 3.5Ghz (4c, 8c HT),
  • 32Go RAM 2133 Mhz DDR4
  • 2 * Mellanox CX313A 40gb/s
  • 2 * 10Gb/s (X540-AT2)
  • CentOS 7.0

MDS / Managment : 2 * Dell R630

  • 1 CPU E5-2637 @ 3.5Ghz (4c, 8c HT),
  • 32Go RAM 2133 Mhz DDR4
  • 2 * 10Gb/s (X540-AT2)
  • Scientific Linux 6.5 and Centos 7.0

Client 9 storage servers

On this 2016 platform benchmark (@IPNL) :

  • copy a file from a RAMDISK to the storage
  • tests varying file size : 100MB, 1GB, 10GB or 20GB
  • tests varying number of // flows from the client : 1, 6 or 8
slide-14
SLIDE 14

Lus tre 10 MB Lus tre 1 GB Lus tre 10 GB Lus tre 20 GB Be eG FS 10 MB Be eG FS 1 GB Be eG FS 10 GB Be eG FS 20 GB Glu ster FS 10 MB Glu ster FS 1 GB Glu ster FS 10 GB Glu ster FS 20 GB GP FS 10 MB GP FS 1 GB GP FS 10 GB GP FS 20 GB Mo

  • se

FS 10 MB Mo

  • se

FS 1 GB Mo

  • se

FS 10 GB Mo

  • se

FS 20 GB Xtr ee mF S 10 Xtr ee mF S 1 GB Xtr ee mF S 10 GB Xtr ee mF S 20 GB XR

  • ot

D 10 MB XR

  • ot

D 1 GB XR

  • ot

D 10 GB XR

  • ot

D 20 GB EO S 10 MB EO S 1 GB EO S 10 GB EO S 20 GB 0,00 1000,00 2000,00 3000,00 4000,00 5000,00 6000,00 7000,00 0,00 50,00 100,00 150,00 200,00 250,00 300,00

Distributed storage systems performance (8 threads)

Débit (MB/s) 1 target Débit (MB/s) 2 targets Débit (MB/s) 4 targets Débit (MB/s) 8 targets Débit (MB/s) Ec 8+1 (glusterfs) Débit (MB/s) 9 targets (GPFS / MooseFS) CPU % 1 target CPU % 2 targets CPU % 4 targets CPU % 8 targets CPU % Ec 8+1 (glusterfs) CPU % 9 targets (GPFS / MooseFS)

Débit (MB/s) CPU %

67,02% of the average client network bandwidth 59,48 % 43,93 %

Sum of synchronous storage elements writing bandwidth

Results from 1 of the 48 tests (2016)

slide-15
SLIDE 15

Storage system choice : EOS

  • EOS choosen (after the 2016 tests) :

– Low-latency storage, – Very efficient on the client side, – POSIX, Kerberos, GSI access control, – XrootD, POSIX file access protocol, – 3rd party-copy support (needed for FTS), – Checksums support, – Redondancy :

  • Metadata servers
  • Data server (x replicas, RAIN raid6/raiddp)

– Data server lifecycle management (draining, start/stop operation)

slide-16
SLIDE 16

Data challenges so far

  • Benchmarks on the ProtoDUNE DAQ storage

system with 19 FSTs :

– Assuming we decided to record 3GB raw data files size – Test = copy (xrdcp) a file from a RAMDISK of each

Event builder to the storage

– Using different flow patterns : 6,8,16,20 or 32 parallel

xrdcp / Event builder

– From 1, 2, 3 or 4 Event builders – Using different EOS configurations

  • 4 EOS groups with 19 FS in each group (1 FS / FST in each

group)

  • 2 EOS groups with 19 FS in each group (1 FS / FST in each

group – combining 2 RAID file-systems into 1 FS-)

  • 10 EOS groups with 2 FS from 2 each FST (=4 FS / group) :
  • 19 EOS groups : 1 FS from 2 FST (=2 FS / group)

– On different EOS releases : eos-server-4.4.10 & 4.4.18 L1 EVB L1 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

L2 EVB

6 * 10Gb 7 * 10Gb

2 * 40Gb

NP02

  • nline

storage

20 * 10Gb

Event builders

40Gb 2 * 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb 40Gb

slide-17
SLIDE 17

Some results : data from 1, 2, 3 or 4 *EVB

← 1*EVB 2*EVB -> ← 3*EVB 4*EVB -> Confgueaton eos-seevee 4.4.10a 36 geoupsa 1 FS / geoup Something steange with 2a 3 and 4 * EVB

slide-18
SLIDE 18

Some results : 4*EVB

  • Some findings

– no effect of skipSaturatedPlct parameter : used for

client file shedulling ?

> eos geosched set skipSaturatedPlct 1 – A group with less FS than the other groups is quickly

saturated, reducing the whole system efficiency

– Many errors with raid6/raiddp with 4.4.10 (not showed

there), not yet re-tested with other releases

– Huge performance improvements from 4.4.10 to

4.4.18

On the geoup with less FSa the numbee of « wopen » is inceeasinga FSTs aee oveeloaded

← eos 4.4.10 eos 4.4.18 ->

slide-19
SLIDE 19

Some results : 4*EVB, various group configurations

4*EVBa eos-seevee 4.4.10 4*EVBa eos-seevee 4.4.18a 6 theeads / EVB 32 theeads / EVB

slide-20
SLIDE 20

Conclusion

  • EOS is performing well, huge recent improvements
  • EOS is a SDS (software defined storage) which can take advantages of
  • ld hardware with decent performance (over 16GB/s continuous writing)
  • Requires some fine tuning

20

Storage servers

EVBL1A EVBL1B EVBL2A EVBL2B EVBL2C EVBL2D 9 DAQ service machines

DAQ room @ EHN1 Dual Phase racks

PeotoDUNE Dual-Phase online DAQ

  • 6 Event-Buildees Dell R740 (CERN/KEK)
  • 9 seevice seevees Dell R610 peovided by CC-IN2P3
  • stoeage seevees 20 Dell R510+MD1200 peovided by CC-IN2P3

Router and switches

Storage servers

5 successives tests 6a 8a 16a 20 and 32 // xedcp feom 4*EVB

slide-21
SLIDE 21

Inside ProtoDUNE : http://cds.cern.ch/images/CERN-PHOTO-201710-248-2

Many thanks to :

  • The EOS dev team & support
  • The EOS community for the help
  • The Neutrino platform
  • IT-ProtoDUNE coordination

Thank for your attention