CDF Data production model CDF Data production model S. Hou S. Hou - PowerPoint PPT Presentation

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production team for the CDF data production team 02 May 2006 02 May 2006

CDF data production model 2 Outline Outline Data streams - trigger, streaming, data logging Computing model - architecture, CAF linux farms, SAM data-handling Production tasks - submission, concatenation Monitoring and Bookkeeping - resource, file counting, recovery Scalability - capacity, limits, scaling options

CDF data production model 3 CDF collaboration CDF collaboration Collider Detector experiment at the Fermilab Tevatron collider • Study proton-antiproton collisions at CM energy ~ 2 TeV • Large data volume, computing load

CDF data production model 4 Trigger, detector data flow Trigger, detector data flow 3 Level trigger/data buffer 52 physics triggers CDF detector data taking capacity 2005 2006 Achieved upgrade Tevatron luminosity : 1.8x10 32 cm -2 s -1 3x10 32 cm -2 s -1 Level-1 acceptance : 27 kHz 40 kHz Level-2 acceptance : 850 Hz 1 kHz Event Builder (EVB) : 850X0.2 MB/s 500 MB/s Level-3 acceptance : 110 Hz 150 Hz to Tape storage rate : 20 MB/s 40 MB/s Event size : ~140 kByte ‘06 data taking rate ~ 5 M events/day Upgrade to improve DAQ efficiency

CDF data production model 5 Streams, data logging Streams, data logging Consumer server/Logger (CSL) • receive physics events • write to disks in 8 streams • distribute to online consumers 8 Streams: A B C,D Data in 52 E,J triggers G H Consumers An event may have multiple triggers, Stream overlap ~ 5% increase with Tevatron luminosity

CDF data production model 6 Data logging rate Data logging rate Data logging rate increase w. Tevatron luminosity Good-run physics data Feb 2002 - Dec 2004 1040 M events = 210 k files = 188 TByte Dec 2004 - Feb 2006 1270 M events = 172 k files = 159 TByte 1.6 fb -1 delivered Data logging rate by Tevatron up to Nov 2005 1.3 fb -1 in tape! 1.3 fb -1 of data written to tape

CDF data production model 7 CDF computing model, CDF computing model, ‘ ‘06 06 CDF DAQ Production farm remote remote remote CAFs CAFs CAFs production production datasets datasets raw raw datasets datasets CDF Analysis Farm dCache Enstore User desk top

CDF data production model 8 Computing network, ‘ ‘06 06 Computing network, Remote sites offline users Analysis Production Analysis Production farm farm farm farm 2Gbit 2Gbit 10Gbit CDF Online DAQ Enstore dCache File-servers Servers tape library file-servers Oracle DB

CDF data production model 9 Production data flow Production data flow DataBase Split data in production sub-detector 8 raw data streams � 52 physics datasets Level-1,2 Final storage � Enstore tape library Trigger, DAQ STK 9940B drives 200 GB/tape Level-3 30 MByte/s read/write farm Steady R/W rate ~1TByte/drive/day 8 raw-datasets Run splitter dCache CAF, fileservers Calibration catalog 52 physics File datasets

CDF data production model 10 st model Data production, 1 st model Data production, 1 In service 2000-2004 network MySQL,DB dfarm Register Direct I/O to Enstore tape input 1 1 run-splitter calibration stager • Custom I/O node to Enstore 2 2 FBS batch system • dfarm collection of all worker IDE Register output 3 3 buffer of input and output files Farm Processing system • MySQL for bookkeeping worker concatenated 4 4 • Concatenation in rigid event order 5 5 Register output truncated to 1 GB files Performance concatenator 6 6 • Peak rate at 1 TB input/day

CDF data production model 11 SAM- -farm upgrade, farm upgrade, ‘ ‘05 05 SAM to CAF & SAM Data Handling to CAF & SAM Data Handling Toward a distributed computing infrastructure CAF (CDF Analysis Farm) � Condor system with CAF interface for job submission and monitoring � Advantage: - uniform platform to other CDF computing facilities - compatible to distributed computing development SAM Data handling system � SAM (Sequential Access via Metadata) file delivery and DB service � dCache virtualizes disk usage

CDF data production model 12 SAM production farm SAM production farm CAF/SAM in parallel : network SAM,DB - SAM Project - Activating file delivery of an assigned SAM dataset 1 1 - Tracking file consumption status input-URL dCache 2 2 - Condor batch Job run-splitter calibration - Consume files in SAM project - update/declare SAM metadata for bookkeeping Declare/update worker 3 3 Concatenation of output metadata Merge output files output sorted in run sequence 4 4 merged Store to Enstore via SAM 5 5 fileserver Declare metadata, update file parentage for bookkeeping

CDF data production model 13 P roduction challenge P roduction challenge � Timely process every event collected � Interface to Data-handing, DataBase, multiple CAF’s � Precision bookkeeping on millions of files zero tolerance to error, every event is counted � Operation Resource monitoring Automatic submission and monitoring 1. binary jobs of SAM projects on CAF farms 2. concatenation on Fileservers 3. store to SAM/Enstore � Service interface Network, Enstore tape I/O dCache, SAM Data handling, DB service CDF online, calibration DB, software

CDF data production model 14 Use cases in production Use cases in production Fast beam-line calibration : immediately after data is available on Enstore Raw-data � Histograms � concatenation � Detector Monitoring : quick detector feedback and good-run definition immediate after beam-line is available Raw-data � production/Enstore � Histograms � Physics calibration : statistics required for chosen events Raw-data � Histograms � Physics production : - Raw-data � Multiple outputs � concatenation � Enstore - Production files � Single output � concatenation � Enstore � Multiple outputs � concatenation � Enstore

CDF data production model 15 SAM projects in production SAM projects in production Cron jobs accessing SAM DB 1. Check online DB, make same SAM input data sets 2. Submit SAM projects to condor CAF 3. Merge output files and samStore on fileserver Input datasets Control metadata physics-datasets 1. gphysr_runXX gXcrs0 reco.gphysr_.. gphysr_runXX gXjs00 reco-children gphysr_… gX… Online DB raw merged SAM good_runs nextfile declare declare query Input dataset operation ProExe merge samStore node 2. 3.

CDF data production model 16 Data handling in production Data handling in production Independent cron jobs on operation node/ fileservers 1. Submit a SAM project / CAF job, fetch files in input dataset 2. Concatenation on fileserver 3. samStore to Enstore 1. ProExe 2. merge 3. samStore /pnfs /dCache /samcache /pnfs dCache reco R/W condor merged CAF

CDF data production model 17 Binary jobs on Condor worker Binary jobs on Condor worker Each CPU take one job � CAF headnode dispatch � ProdExe tarball (self contained with all libs, 140 MB) � Production script 1. Fetch one input file in assigned SAM dataset 2. Binary execution � split table � calibration CAF headnode 3. Declear split outputs Unpack Unpack 4. Copy to concatenation area SAM DB tarball tarball 5. Update bookkeeping Worker Worker 6. Cleanup dCache input Scratch Scratch area area Calibration DB ~4 hours per file (1 GByte GByte) ) ~4 hours per file (1 Output to Concatenation on 1 GHz P3 on 1 GHz P3

CDF data production model 18 Concatenation / SAMstore Concatenation / SAMstore � Local on fileservers to reduce IDE, network bandwidth Independent from production submission 8 streams 1. SAM DB query, make files lists in order of a dataset, size varies 5MB to 1GB, output size ~1 GByte Production Production 2. Merge, rootd binary, ~3 min per GByte 52 datasets 52 datasets 3. SAM DB update, declared merged files � SAMstore merged files - Directly to Enstore - SAM DB update file parentage Challenge is on Bookkeeping Concatenation Concatenation - Plural SAM DB query SAMstore SAMstore - No data loss - No duplication � 100% exact in produciton � Easy recovery

CDF data production model 19 R esource monitoring R esource monitoring CDF DB, SAM DB, Data-Handling CAF condor batch system Fileserver storage Prohibited jobs missing required services

CDF data production model 20 CAF farm monitoring CAF farm monitoring Worker CPUs (Ganglia) & input (rcp) waiting Bandwidth limit : Input: Enstore loading to dCache Output: multiple workers to fileservers 1Gbit network port to IDE: 40 MB/s 1output dataset to Enstore: 30 MB/s Traffic to fileserver (xfs)

CDF data production model 21 CAF condor monitoring CAF condor monitoring Tarball (archived execution binary file) distributed to worker CPUs Input files copied via SAM from dCache End of job, output files are copied to assigned fileserver CPU engagement is monitored CPU engagement is monitore d � CPU � CPU’ ’s of a CAF job s of a CAF job � CPU of a section � CPU of a section Commnads Commnads � executed now � executed now

CDF Data production model CDF Data production model S. Hou S. Hou - PowerPoint PPT Presentation

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production team for the CDF data production team 02 May 2006 02 May 2006 CDF data production model 2 Outline Outline Data streams - trigger, streaming,

FOSTERING Photovoice Project Agenda 1. About CDF Canada and housekeeping (David Dern, CDF Canada)

First measurement of forward backward Asymmetry in bb Production at CDF on behalf of CDF

Search for New Particles Decaying to Z 0 +jets The CDF Collaboration URL http://www-cdf.fnal.gov

Ten Years of the CDF Silicon Vertex Detector: Ten Years of the CDF Silicon Vertex Detector:

Latest Measurements Latest Measurements of Top Mass at CDF of Top Mass at CDF Un-ki Yang The

Lecture 2: Inverse CDF method Todays lecture In this lecture we look at the inverse CDF method

Single Top s -channel Production in / Single Top s -channel Production in / E T +jets at CDF E T

CDF Physics Ben Kilminster Fermilab DOE Annual Science & Review July 12-14, 2010 The CDF

0.2 0.2 ? 0.0 0.0 -0.2 -0.2 -0.4 -0.4 New Physics -0.6 -0.6 -1 0

Status and Performance of the CDF Run II Silicon Detector Tuula Mki University of Helsinki and

in the di-tau decay channel at CDF PHENOMENOLOGY 2010 Symposium Madison, Wisconsin Pierluigi

Non-SUSY Exotic Searches at the Tevatron Qiuguang Liu (Purdue Univ.) on behalf of the CDF and D

ESA CDF Concurrent Engineering applied to Space Mission Design Massimo Bandecchi ESA/ ESTEC

arXiv:hep-ex/0605066 v2 16 May 2006 DIBOSON PHYSICS AT THE TEVATRON M.S. NEUBAUER (for the CDF

New CDF Results on Diffraction Talk given in Moriond QCD 2006 by K. Terashi The Rockefeller

PHOTON CONVERSION EFFICIENCY at CDF Focusing on: D 0 D 0 D D 0 Paola

Examining Self- Modifying Code Drew Ivarson, Union College CS Department Advisors: Prof.

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850

MBS FileMaker Plugin Christian Schmitz Monkeybread Software MBS FileMaker Plugin 4900 functions

Git: A Guide for Economists Frank Pinter 22 February 2019 1 / 32 Outline The importance of

Mayors Office of Economic Development FY 21 Budget Hearing John Barros, Chief May 12, 2020

Combating Malware in the age of APT SANS Digital Forensic and Incident Response Summit July 2010

Massive-parallel Input & Output with structured extensible data (HDF5 successful

AD39 - Making the Jump from DevOps to DevSecOps

Sambuz

Useful Links

Newsletter

Mail Us

CDF Data production model CDF Data production model S. Hou S. Hou - PowerPoint PPT Presentation

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production team for the CDF data production team 02 May 2006 02 May 2006 CDF data production model 2 Outline Outline Data streams - trigger, streaming,

FOSTERING Photovoice Project Agenda 1. About CDF Canada and housekeeping (David Dern, CDF Canada)

First measurement of forward backward Asymmetry in bb Production at CDF on behalf of CDF

Search for New Particles Decaying to Z 0 +jets The CDF Collaboration URL http://www-cdf.fnal.gov

Ten Years of the CDF Silicon Vertex Detector: Ten Years of the CDF Silicon Vertex Detector:

Latest Measurements Latest Measurements of Top Mass at CDF of Top Mass at CDF Un-ki Yang The

Lecture 2: Inverse CDF method Todays lecture In this lecture we look at the inverse CDF method

Single Top s -channel Production in / Single Top s -channel Production in / E T +jets at CDF E T

CDF Physics Ben Kilminster Fermilab DOE Annual Science &amp; Review July 12-14, 2010 The CDF

0.2 0.2 ? 0.0 0.0 -0.2 -0.2 -0.4 -0.4 New Physics -0.6 -0.6 -1 0

Status and Performance of the CDF Run II Silicon Detector Tuula Mki University of Helsinki and

in the di-tau decay channel at CDF PHENOMENOLOGY 2010 Symposium Madison, Wisconsin Pierluigi

Non-SUSY Exotic Searches at the Tevatron Qiuguang Liu (Purdue Univ.) on behalf of the CDF and D

ESA CDF Concurrent Engineering applied to Space Mission Design Massimo Bandecchi ESA/ ESTEC

arXiv:hep-ex/0605066 v2 16 May 2006 DIBOSON PHYSICS AT THE TEVATRON M.S. NEUBAUER (for the CDF

New CDF Results on Diffraction Talk given in Moriond QCD 2006 by K. Terashi The Rockefeller

PHOTON CONVERSION EFFICIENCY at CDF Focusing on: D 0 D 0 D D 0 Paola

Examining Self- Modifying Code Drew Ivarson, Union College CS Department Advisors: Prof.

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850

MBS FileMaker Plugin Christian Schmitz Monkeybread Software MBS FileMaker Plugin 4900 functions

Git: A Guide for Economists Frank Pinter 22 February 2019 1 / 32 Outline The importance of

Mayors Office of Economic Development FY 21 Budget Hearing John Barros, Chief May 12, 2020

Combating Malware in the age of APT SANS Digital Forensic and Incident Response Summit July 2010

Massive-parallel Input &amp; Output with structured extensible data (HDF5 successful

AD39 - Making the Jump from DevOps to DevSecOps

Sambuz

Useful Links

Newsletter

Mail Us

CDF Physics Ben Kilminster Fermilab DOE Annual Science & Review July 12-14, 2010 The CDF

Massive-parallel Input & Output with structured extensible data (HDF5 successful