Computing at LHC experiments in the first year of data taking at 7 TeV
Daniele Bonacorsi
[ deputy CMS Computing coordinator - University of Bologna, Italy ]
- n behalf of ALICE, ATLAS, CMS, LHCb Computing
Computing at LHC experiments in the first year of data taking at 7 - - PowerPoint PPT Presentation
Computing at LHC experiments in the first year of data taking at 7 TeV Daniele Bonacorsi [ deputy CMS Computing coordinator - University of Bologna, Italy ] on behalf of ALICE, ATLAS, CMS, LHCb Computing Growing up with Grids
Daniele Bonacorsi
[ deputy CMS Computing coordinator - University of Bologna, Italy ]
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ First Grid Deployment Board (GDB) in 2002
✦ LCG has collaborated with a number of Grid Projects
✦ EGEE, NorduGrid, and Open Science Grid (OSG) ✦ CoordinaLon and service support for the operaLons of the 4 LHC
✦ Distributed compuLng achieved by previous experiments
‐
✦ A huge collaboraLve effort throughout the years, and massive cross‐
2
Grid Solution for Wide Area Computing and Data Handling Grid Solution for Wide Area Computing and Data Handling
NORDUGRID
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ ~150k CPU cores, hit 1M jobs/day ✦ >50 PB disk
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
4
✦ at Tier‐0/1/2 levels
✦ Result of a huge collaboraLve work ✦ Thanks to WLCG and site admins!
Jul’06 Feb’11 2006 2007 2008 2009 2010
2010 data taking start at 7 TeV
2011 2010 2009
2010 data taking start at 7 TeV
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
5
~ plateau
✦ CriLcal tests, per Tier, per experiment
✦ e.g. CMS defines a “site readiness” based
‐
Easy to be OK on some
‐
Hard to be OK on all, and in a stable manner...
Sep’08 Mar’11 2010 data taking start at 7 TeV Sep’08 Mar’11
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
6
✦ Tiered compuLng faciliLes to meet the needs of the LHC experiments
✦ It served the community remarkably well, evoluLons in progress
“cloud” full mesh
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
7
pp+HI data taking pp+HI data taking
“Service Challenges”: since 2004, to demonstrate service aspects:
Run the service(s): Focus on real and continuous production use
+ “Readiness/Scale Challenges”: Data/Service Challenges to exercise aspects
DC04 (ALICE, CMS, LHCb) DC2 (ATLAS)
“Data Challenges”: experiment-specific, independent tests
(first full chain of computing models on grids)
SC1 (network transfer tests) SC2 (network transfer tests) SC3 (sustained transfer rates,
DM, service reliability)
SC4 (nominal LHC rates,
disk→tape tests, all T1, some T2s)
CCRC08 (phase I - II)
(readiness challenge, all exps, ~full computing models)
STEP’09
(scale challenges, all exps + multi-VO overlap, FULL computing models) More experiment-specific challenges... More experiment-specific challenges...
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
8
✦ At the beginning, a “good” weekend could
✦ a significant failure or outage for a fill
NOTE: log scale
✦ Time in stable beams per week reached 40% only few Lmes
✦ Slower ramp has allowed predicted acLviLes to be performed more frequently
PRELIMINARY
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011 9
✦ Means no service interrupLons
‐
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
10
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
11
✦ Very well serving the needs of LHC experiments ✦ A joined and long commissioning and tesLng effort to achieve this
STEP’09 challenge ICHEP’10 Conference CCRC’08 challenge
(phase I and II) 1 PB
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011 12
Jan Dec 2010
GB/s per day
2 4 6
MC reproc
Feb Mar Apr May Jun Jul Aug Sep Oct Nov
2009 Data reproc
2010 data taking start at 7 TeV
Data + MC reproc Data taking + MC prod 2010 pp reproc PbPb reproc @T1s
✦
ATLAS massive reprocessing campaigns
MC transfers in clouds Data consolida)on (MC transfers extra‐clouds) T0 export (incl. calib streams) Data brokering (Analysis data) User subscrip)ons
✦ Average: ~2.3 GB/s (daily average) ✦ Peak: ~7 GB/s (daily average)
Aver: ~2.3
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
13
✦ Can sustain up to >200 TB/day of producLon transfers on the overall topology
CMS improved by ad‐hoc challenges of increasing complexity and by compuLng commissioning acLviLes
NOTE: log scale
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011 14
# done transfers 325k
✦ RAW data is replicated to one of the
GB 80k
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
15
✦ New calibraLons, improved sokware, new data formats
Pass-2 reco HI reco: opportunistic usage of resources
4 reproc campaigns in 2010
✦ Feb’10: 2009 pp data + cosmics ✦ Apr’10: 2009/2010 data ✦ May’10: 2009/2010 data+MC ✦ Nov’10: full 2010 data+MC (from tapes)
+ HI reprocessing foreseen in Mar’11
(reprocessing passes only)
# jobs
6k
# jobs # jobs
~ a dozen of reproc passes in 2010
16k 6k
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
16
0.25 0.50 0.75 1.00 1 2 3 4 5 6 7 8 9 10 11 12
ESD dESD, AOD CA DE ES FR IT ND NL UK US
Campaign Day
Fraction Complete (normalised for each T1)
✦ RAW→ESD ✦ ESD merge ✦ ESD →dESD, AOD ✦ Grid distribuLon of derived data 1.5G evts
Actually, from 7 days onwards mostly dealing with tails
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ Accounts for a large fracLon of the global Grid usage ✦ One of the earliest Grid applicaLons
‐
✦ FluctuaLons caused by a range of causes
‐
✦ E.g. LHCb: SimulaLon 50%, Analysis 29%, ReconstrucLon 21% ✦ E.g. CMS: iniLally on 50% of T2 resources, recently expanded to T1s as well
17
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ A large fracLon of the overall Grid usage
18
Average: ~8.8k Peak: ~23k
Average: ~12k Peak: ~27k
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ SimulaLon, reconstrucLon, stripping, WG analysis (μDST)
✦ > 100 T2 sites stably supporLng simulaLon
‐
19
✦ LHCb hit ~140 T2s ✦ 4.2M jobs
Jun’09 Feb’10
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
20
# jobs
Each color is a regional “cloud”
60k Feb’10 Feb’11
(including CERN) Digging into
CNAF T2s
2010 data taking start at 7 TeV
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
21
Mostly T2s and some opportunisLc T3s Since August 2010, T1s also
20k
Jan’10 Jan’11
Each color is a Tier
600M
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
22
Simulated data produced (Jul’09 → Feb’10)
✦ to match real data analysis
MC production (Jan’10 → Feb’11)
✦ being produced already in 2010
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ Each framework implement in different ways an instance of this concept
‐
✦ They manage creaLon, submission, tracking of jobs and return results to users
✦ Efficiency of compleLon, CPU efficiency, user experience, status tracking,
23
Local Environment Packaged Communicate job status and write results somewhere Arrival at Batch farm Success submission through site grid interface Choice of site(s) with the desired data files or resources Discovery and
data file or remote file Authenticate User and VO Find local environment
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ Largest fracLon of analysis compuLng at LHC is at the T2 level ✦ Flexibility of the transfer model help to reduce the latency seen by the analysis end‐users
24
NOTE: log scale T1‐T2 dominates T2‐T2 emerges
>95% of the enLre matrix commissioned
Up to 30 links commissioned per day, average is ~7 links/day over the first 6 months of data taking # T2‐T2 links used for data transfers monthly
(not always the same ones)
2010
Aver: 7
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
25
# running jobs Aver: ~1.7k
Increasing trend in the # end-user analysers
(continues after Xmas)
Xmas + AliEn release ‘alitrain’ only
✦ User code is picked up and executed with other analyses
✦
>9M user jobs completed over last 12 months
Each color is a user, blue is the total
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
26
2010 data taking start at 7 TeV
✦ Aker that, roughly stable load
‐
e.g. ICHEP’10
(only for pAthena‐Panda system; ganga‐WMS not counted)
20k # jobs
Feb’10 Feb’11
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
27
✦
~300‐350 disLnct daily users
✦
Up to >500 users per week during peaks
✦
>800 individuals per month
2008 2009 2010 2007
Apr’10 Feb’11
NOTE: weekends and Xmas “holes”: visible only in the distributed analysis pattern, not in scheduled processing (e.g. MC, re-reco) Ad-hoc CMS Computing scale exercise focussed
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
28
2010 data taking start at 7 TeV
6k
# running user jobs
✦ Share by availability of resources and data
✦ Toy MC, private small simulaLons, etc
Successful user jobs at T1s
Mar’10 Feb’11
CERN
Sep’10 Feb’11 CERN CNAF GRIDKA RAL ...
Aver: 1.4k
Aver: 310
# jobs/hr
Each color is a user
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
29
Apr’10 Feb’11
[ NOTE: the line is _not_ a fit ]
✦ is allowing wider access to Grid(s) ✦ is building more solid data analysis teams
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
✦ Able to cope with the load in all sectors
‐
(Rare) backlogs or (rare) service losses showed no impact whatsoever on physics
✦ Real data is collected, stored, reprocessed, skimmed, transferred ✦ Simulated data are produced ‐ according to the physics needs ✦ Data and SimulaLons are successfully delivered to analysis end‐users
✦ If you worked on even some bits of this, YOU are part of this success
✦ Not all resources equally uLlized in 2010 ✦ A resource‐constrained environment in 2011
✦ Enthusiasm and hope for discoveries is very high
30
Daniele Bonacorsi [CMS] ISGC’11 - Taipei - 22 Marzo 2011
31