Michael Ernst Michael Ernst
DESY / CMS DESY / CMS
Testing the Grid- Service & Data Challenges
LHC Data Analysis Challenges for the Experiments and 100 Computing Centres in 20 Countries
GridKa School Karlsruhe 15 September 2006
Testing the Grid- Service & Data Challenges LHC Data Analysis - - PowerPoint PPT Presentation
Testing the Grid- Service & Data Challenges LHC Data Analysis Challenges for the Experiments and 100 Computing Centres in 20 Countries GridKa School Karlsruhe 15 September 2006 Michael Ernst Michael Ernst DESY / CMS DESY / CMS The
Michael Ernst Michael Ernst
DESY / CMS DESY / CMS
GridKa School Karlsruhe 15 September 2006
2 Michael Ernst
... as defined by LCG
Purpose Develop, build and maintain a distributed computing
Ensure the computing service … and common application libraries and tools
Phase I – 2002-2005 - Development & planning Phase II – 2006-2008 – Deployment & commissioning
3 Michael Ernst
The Collaboration ~100 computing centres 12 large centres
(Tier-0, Tier-1)
38 federations of smaller
“Tier-2” centres
20 countries
4 Michael Ernst
Tier-0 – the accelerator centre
Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands Tier-1 (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – Fermilab (Illinois) – Brookhaven (NY)
Tier-1 – “online” to the data acquisition process high availability
grid-enabled data service
Tier-2 – ~100 centres in ~40 countries
interactive
5 Michael Ernst CERN 18% A ll Tier-1s 39% A ll Tier-2s 43% CERN 12% A ll Tier-1s 55% A ll Tier-2s 33% CERN 34% A ll Tier-1s 66%
CPU Disk Tape
All experiments - 2008 From LCG TDR - June 2005 CERN All Tier-1s All Tier-2s Total CPU (MSPECint2000s) 25 56 61 142 Disk (PetaBytes) 7 31 19 57 Tape (PetaBytes) 18 35 53
6 Michael Ernst
LCG
IN2P3 GridKa TRIUMF ASCC Fermilab Brookhaven Nordic CNAF SARA PIC RAL T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2
T2s and T1s are inter-connected by the general purpose research networks 10 Gbit links Optical Private Network
T2
Any Tier-2 may access data at any Tier-1
T2 T2 T2 IN2P3 GridKa TRIUMF ASCC Fermilab Brookhaven Nordic Nordic CNAF SARA PIC RAL T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2
T2s and T1s are inter-connected by the general purpose research networks 10 Gbit links Optical Private Network
T2 T2
Any Tier-2 may access data at any Tier-1
T2 T2 T2 T2 T2 T2 T2
8 Michael Ernst
Experiment
Tier2 Center
Online System
CERN Center PBs of Disk; Tape Robot
FNAL Center GridKa Center INFN Center RAL Center
Institute Institute Institute Institute
Workstations
~100-1500 MBytes/sec
2.5-10 Gbps 0.1 to 10 Gbps
Tens of Petabytes by 2007-8. An Exabyte ~5-7 Years later.
Physics data cache
~PByte/sec
~2.5-10 Gbps
Tier2 Center Tier2 Center Tier2 Center
~2.5-10 Gbps
Tier 3
Tier 4
Tier2 Center
Tier 2
CERN/Outside Resource Ratio ~1:2 Tier0/(Σ Tier1)/(Σ Tier2) ~1:1:1
Emerging Vision: A Richly Structured, Global Dynamic System
9 Michael Ernst
Tier-1 Tier-1 Tier-1 Tier-1
Tier-2 Tier-2 Tier-2 Tier-2
10 Michael Ernst
Tier-0 CPU farm T1 T1 Other Tier-1s
disk buffer
RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day AODm2 500 MB/file 0.004 Hz 0.34K f/day 2 MB/s 0.16 TB/day RAW ESD2 AODm2 0.044 Hz 3.74K f/day 44 MB/s 3.66 TB/day
T1 T1 Other Tier-1s T1 T1 Tier-2s Tape
RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day
disk storage
AODm2 500 MB/file 0.004 Hz 0.34K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file 0.036 Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file 0.036 Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day
Plus simulation Plus simulation & & analysis data flow analysis data flow
11 Michael Ernst
12 Michael Ernst
Tier-0 to Tier-1 Flow Accessing the Data Tier-1 to Tier-2 Flow
13 Michael Ernst
real grid service – run for weeks/months at a time (not just limited to experiment Data Challenges)
availability, scalability, end-to-end performance
Jun-Sep 2006 – SC4 – pilot service
Autumn 2006 – LHC service in continuous operation – ready for data taking in 2007
14 Michael Ernst
A stable service on which experiments can make a full demonstration of experiment offline chain
data recording, calibration, reconstruction
simulation, batch and end-user analysis
And sites can test their operational readiness
Extension to most Tier-2 sites
15 Michael Ernst
that we can run reliable services
– middleware, grid operations, computer centres, ….
90% site availability 90% user job success
to monitor, measure, debug First data will arrive next year NOT an option to get things going later
T
e s t ? T
m b i t i
s ?
16 Michael Ernst
End September 2006 - end of Service
8 Tier-1s and 20 Tier-2s
April 2007 – Service fully commissioned
All Tier-1s and 30 Tier-2s
17 Michael Ernst
BDII, MyProxy, VOMS, R-GMA, ….
(egee)
18 Michael Ernst
A vailability
ier-1 S ites
0% 20% 40% 60% 80% 100% 120% Jul-05 A ug-05 Sep-05 O ct-05 Nov
eb-06 M ar-06 M
th Percen tag e availa
A vailability
ier-1 S ites
0% 20% 40% 60% 80% 100% 120% Jul-05 A ug-05 S ep-05 O ct-05 Nov
F eb-06 M ar-06 M
th P ercen tag e availa
Tier-1 sites
Basic tests only
Only partially corrected
for scheduled down time
Not corrected for sites
with less than 24 hour coverage
average value of sites shown
19 Michael Ernst
... as it was declared by LCG in April 2006
3D distributed database services
development test deployment
stable service For experiment tests
SRM 2 test and deployment plan being elaborated
October target
Additional functionality to be agreed, developed, evaluated then - tested deployed ?? Deployment schedule ??
20 Michael Ernst
full physics run 2007 2008 first physics cosmics 2006
Pilot Services – stable service from 1 June 06 LHC Service in operation – 1 Oct 06
full operational capacity & performance LHC service commissioned – 1 Apr 07
21 Michael Ernst
22 Michael Ernst
Definition A 50 million event exercise to test the workflow and
dataflow associated with the data handling model of CMS
Receive previously simulated events Perform prompt reconstruction at Tier-0, including
determination and application of calibration constants
Creation of Analysis Object Data (AOD) Distribution of AOD to all participating Tier-1 Centers Physics jobs running on AOD at some Tier-1 Centers Skim jobs at some Tier-1 Centers with data propagated to
Tier-2s
Physics jobs on skimmed data at some Tier-2 Centers
23 Michael Ernst
A 25% capacity test of what we need in 2008 Demonstrate Workflow
Primary goal: Test the data handling model
Demonstrate Dataflow
The main exercise of SC4
Demonstrate production-grade reconstruction
Includes Calibration and Detector performance
Provide Services to a wide User Community
24 Michael Ernst
Simulate 50M Events
Mass Production started end of July Runs on Grid Resources only 4 Teams
UWM running on OSG Resources CIEMAT running on LCG Resources Aachen/DESY running on LCG Resources INFN – Bari running on LCG Resources
25 Michael Ernst
26 Michael Ernst
1200 CPUs
In the challenge and the real data taking the item that drives much of the
Total data rate out of CERN at 40 Hz = ~75 MB/s
(incl. factor of 2 in provisioning -> 3Gbps for outgoing networking out of CERN in CSA06)
180 TB
CSA06 data sample is ~50M events (cycled twice) Including the raw and reconstructed events
Demonstrate prompt reconstruction at 25% of HLT bandwidth (~40Hz)
Weeks of running at sustained rate
Efficiency
27 Michael Ernst
2400 CPUs across all Tier-1 Centers
Tier-0 Center is ~20% of the computing resources Tier-1 Centers are 40% by the computing model ratio Ideally Tier-1 Centers (incl. CAF) would all provide at least 300 CPUs
70TB Disk (nominal size Tier-1)
Allow storage of large fraction of data on disk, while exercising faults to
tape
Expected Performance WN SE (Disk Cache) 800MB/s
Based on 1 MB/s per batch slot Exercised and documented in CSA06 Castor and dCache have shown good performance
Goal for CSA06 is 12k jobs/day at Tier-1 Centers
Anticipated job success rate: goal 90%, threshold 70% The job submission infrastructure is currently the bottleneck At current resources level will be running 2-3 hour jobs to meet the goal Primary goal is to keep the existing resources productive at the 25%
scale
Scaling tape rates by pledge aiming for 160 MB/s
Network provisioning should be at least twice the rate
28 Michael Ernst
Tier-1 Performance Metric
Number of participating Tier-1 Centers Goal: 7 Threshold: 5 Demonstrate calibration/analysis jobs at Tier-1 Centers Demonstrate writing of new calibration constants into an
Demonstrate re-reconstruction from some raw data as part
Demonstration of a skim job at each Tier-1 Center Automatic transfer of skim job output to Tier-2 Centers
29 Michael Ernst
40% of the computing resources are located at Tier-2 Centers
CMS assumes 25 full size centers => ~100 CPUs per Tier-2
~10 TB Disk Expected Performance WN Disk Storage SE 200 MB/s Network estimates for Tier-2 vary widely
CMS Computing Model defines expected minimum in 2008 at 1 Gbps
end transfers experienced in SC4 it makes sense to try much larger scale tests at some Tier-2 Centers
Perform ~40k jobs/day in CSA06
Analysis and centrally submitted production applications
Primary goal of CSA06 is to make efficient use of the resources for
production and analysis
Submission infrastructure is known to be the bottleneck Job completion efficiency can be low under high stress conditions Goal to demonstrate we have both of these under control in CSA06
30 Michael Ernst
Tier-2 Performance Metric
Number of participating sites
Goal: 20 Threshold: 15
Demonstration of a user analysis job at each Tier-2
31 Michael Ernst
Tier-0 to Tier-1 (Tape)
Individual goals for each Tier-1, aggregate is 160
Goal: 25% of 2008 rates
Tier-1 to Tier-1
No such dataflow in CSA06
Tier-1 to Tier-2
Goal: 20 MB/s into each Tier-2 Center Threshold: 5 MB/s Overall success is to have 50% of participating
32 Michael Ernst
To grow from 25k jobs/day to 50k jobs/day we need
25k jobs/day is already a strain on current RB infrastructure Job Robots are ideal users (flat load over a 24h period) Users generate unexpected usage patterns and load
Switch to the gLite RB w/ bulk submission is needed
Deployment came later than we had expected in May Close to be fully commissioned in CMS for CSA06
Michael Ernst
34 Michael Ernst
(100 jobs per bulk)
shared input sandbox, the submission time is proportional to the number of files
suppress the job submission time to a constant
in one shared tarball
35 Michael Ernst
Need to enable users to perform actual work CMS remote Analysis Builder (CRAB) for new CMS
Data Management interface and job configuration
implemented
36 Michael Ernst
37 Michael Ernst
38 Michael Ernst
08/03/2006
Oracle DB maintenance
39 Michael Ernst
40 Michael Ernst
pool crashes fixed accept FNAL only
41 Michael Ernst
42 Michael Ernst
P5 Cessy Meyrin
RAW and I ndices
Merge Prom pt Reconstruction Repack
Tier-0 Export Buffers
RAW RECO R A W Tier- 1 Export
Castor Tape Archiving Tier-0 I nput Buffers
RECO
Experiment
43 Michael Ernst
44 Michael Ernst
45 Michael Ernst
3 steps for using CRAB for data analysis create CRAB jobs:
split given total number of events in separate jobs using
number of jobs per event
create n jobs
Submit CRAB jobs from the UI
submit created jobs to the grid
Retrieve CRAB jobs
retrieve job output
One single command to start CRAB analysis Look&Feel very similar to LCG job handling
46 Michael Ernst
USER provides in cfg file: dataset name n events to analyze
DBS/DLS
List of sites where datasets are hosted
Query DBS It returns
CRAB queries each local DLS to obtain:
ContactString=http://t2-cms-dls.desy.de.. ContactProtocol=HTTP CatalogueType=XML SE=srm-dcache.desy.de CE=grid-ce.desy.de Nevents=7000 RunRange=1-5 SE=srm-dcache.desy.de CE=grid-ce.desy.de Nevents=7000 RunRange=20-400
If (at least) one catalog entry found and n_events > user_n_event
CRAB creates and submits jobs to the GRID
Requirement of jdl: CMSSW version and
Crab creates init files
47 Michael Ernst
CMS Analysis Jobs Bandwidth Requirements (Disk)
130 Jobs / 800MB/s ⇒~ 6MB/s per job ⇒ExDSTAnalysis (running on Dual (single core) Opteron 2.4 GHz
2 ExDSTAnalysis processing running
Total delivered by srm-dcache instance
15 March 2006 (DESY Tier-2 Resources)
48 Michael Ernst
06/13 – 06/23
49 Michael Ernst
Period: 07/01/2006 – 08/31/2006
50 Michael Ernst
51 Michael Ernst
problems and difficulties are