CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking
D Bonacorsi
CMS Facilities Infrastructure Operations INFNCNAF Bologna ItalyOn behalf of the CMS experiment
*Thanks* to all site people and operators.
CMS Data Transfer tests towards LHC data taking CMS Data Transfer - - PowerPoint PPT Presentation
CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking D Bonacorsi CMS Facilities Infrastructure Operations INFN CNAF Bologna Italy On behalf of the CMS experiment
CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking
D Bonacorsi
CMS Facilities Infrastructure Operations INFNCNAF Bologna ItalyOn behalf of the CMS experiment
*Thanks* to all site people and operators.
D Bonacorsi Bonacorsi
Calibration Calibration Prompt Prompt reco reco Skims, Skims, re-reco re-reco MC prod upload MC prod upload
Main CMS workows simplied
Data Transfer Data Transfer Data Transfer Data Transfer
Skims download Skims download
Data Transfer Data Transfer
In this talk: focus on distributed transfer only
D Bonacorsi Bonacorsi
PhEDEx
Physics Experiment Data Export
reliable scalable data replication system for HEP experiments
CMS is exercising the data placement system since
PhEDEx fully interfaced also with gLite FTS since yrs
CMS data transfers in a nutshell:
T T T T s T s T s s T T s s in current PhEDEx transfer topology
Data transfers operated as if experiment was already running
we have had service outages exceeding hrs in the last yrs
Current transfers at GBs GBs TBday TBday PBmonth PBmonth global average rate
average CMS les size likely to be GB k lesday k lesday
D Bonacorsi Bonacorsi
Evolution of a LoadTest from early on
Jan 2007 Jan 2007 mid-March 2007 mid-March 2007
A exible infrastructure to generate data transfer trac among CMS Tiers
fake but real
No real physics les; fully PhEDExcompliant though
activity activity since midFebruary
full cycles weeks each before Jun then extended into DDT CSA preparation activities TTtape TT TT regional TT nonregional
Start moving
D Bonacorsi Bonacorsi
Cycle-1 Cycle-2
Jan 2007 Jan 2007 mid-April 2007 mid-April 2007
Jan 2007 Jan 2007 mid-March 2007 mid-March 2007 Start walking
D Bonacorsi Bonacorsi
Sep 2006 Sep 2006 Today Today
Jan 2007 Jan 2007 mid-April 2007 mid-April 2007 Walk better and faster
D Bonacorsi Bonacorsi
CMS CMS LoadTest LoadTest 2007 2007
CMS CSA06: ~1 PB in ~1 month to participating Tiers
>12 PB in ~6 months
among all PhEDEx Tiers joining the LoadTest
D Bonacorsi Bonacorsi [ courtesy of L.Tuura, CHEP07 ]
2007/Q12
D Bonacorsi Bonacorsi
Data transfers: grandview
Status as of mid: With yr to LHC startup CMS approaches the real transfers in scale but not yet the full complexity
From reliable transfers over the full transfer mesh to multiVO exercises…
The progress is evident though Main sources of this are:
A welldesigned robust scalable transfer system A remarkable manpower investment to commission the transfer system
Continuing eorts are needed on debugging data transfers debugging data transfers
more
D Bonacorsi Bonacorsi
[ courtesy of L.Tuura ]
Screenshot as of Sep 2007
D Bonacorsi Bonacorsi
Quality of transfers
Clear improvement in Tiers participation to test transfers since LT started Still not evident improvement in quality though
Its not one problem that lacks solution there is a wide span of them
Greenish quality plots successful transfers with fewer retrials only when storage at both ends network PhEDEx setup site cong operators work ne… simultaneously!
30
Jan06 Jun07
20
# Tiers making some successful transfers # Tiers making transfers at >50% quality
CSA06 CSA06 LoadTest LoadTestPositive note: now stably at a ~challenge traffic load Need to: Keep sites exercised + Debug and improve quality
more
D Bonacorsi Bonacorsi
Debugging Data Transfers
A CMS program to maintain a highquality transfer network
to be handed over to CMS Data Operations
Debugcommission transfer links transfer links among CMS Tiers
a Task Force DDTTF is in charge since July
Joined eort with CMS FacilitiesCommissioning T liasons PhEDEx Central Ops FTSSRM experts network experts site admins …
Troubleshooting by Tiers work by milestones
focus on watching logs ping site admins x problems
commission TT T downlinks to T T uplinks to T …
Infrastructural issues by FacilitiesNetwork projects Overall activity DDTTF work on deliverables
Eg a realtime status map with reasons of all TierXTierY links Eg a number of documented success stories in troubleshooting
DDT Gain confidence Started in July 07
D Bonacorsi Bonacorsi
The rst DDT metric
DDT
First step was to dene and implement a metric by which links can become commissioned and subsequently handed over to Data Operations There are several stages through which a link can get commissioned:
NOTTESTED: links never actually tested
ie showing no successful transfer attempts within PhEDEx
PENDINGCOMMISSIONING: links that have transferred successfully at least les in PhEDEx but have not yet passed the reqs below
COMMISSIONED: links that are demonstrated to work * and can be delivered to Data Ops
day period
volume of TB to match service business hrs support at Ts
PROBLEMRATE: links that were working but whose rate has dropped o
every days Otherwise the link must be re commissioned by following the procedure above
* ie this does not imply that the link or the site has met the reqs of the CMS Computing Model but simply that the link has passed some passed minimum reqs to be considered usable for Data Operations namely: productionquality activity not tests
D Bonacorsi Bonacorsi
Legenda:
ROWCOLUMN: upper half of box COLUMNROW: lower half of boxStates:
T1T1 T1T2
[ … plus many more … ]
Steady inux of new links…
The DDT status in late
DDT
number of COMMISSIONED links that were in danger of decommissioning within the next two daysD Bonacorsi Bonacorsi
Going beyond
DDT
The requirementsthresholds in this metric: were developed with the idea of having a higher threshold to commission than to decommission the link can be increased in time as networks and sites develop
the rates implied by GBday are of the order of MBs per link far below the commitments envisioned in the computing model
Ts being able to download a total of up to TBday from T sites or over MBs sustained downloads
match the idea that transfers would be at continuous rate over several days The Computing Model actually envisions that transfers will occur in bursts
the metric used during CSA deviated from this model to prove the stability of data transfer links
It worthed a metric revision later But before see what happened with this one metric!
D Bonacorsi Bonacorsi
Before DDT started
(1 month, May 2007)
After DDT started
(1 month, Oct 2007, during CSA07) The plots show the fraction successes/attempts in file transfers. A clear improvement in the number and quality of data transfer links is seen soon after DDT started (Jul 07).
D Bonacorsi Bonacorsi
In the meantime…. CCRC’08/phase-1 (WLCG Common-VO Computing Readiness Challenge)
D Bonacorsi Bonacorsi
Plans
TT: CERN FNAL FZK INP PIC TT: FNAL FZK INP PIC export to other Ts among the TT: CNAF RAL ASGC regional T TT: IT UK Taiwan Ts outbound trac to CNAF RAL ASGC respectively TT: CERN CNAF RAL ASGC TT: CNAF RAL ASGC export to other Ts among the TT: FNAL FZK INP PIC regional T TT: US Germany France Spain Ts outbound trac to FNAL FZK INP PIC TT: CERN all Ts golden week for running concurrently with ATLAS TT: in principle nothing repetition slot TT: in principle nothing repetition slot TT: in principle nothing repetition slot TT: in principle nothing free for rther superimposition with ATLAS TT: full or repetition slot TT: full or repetition slot TT: full or repetition slot
https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations
D Bonacorsi Bonacorsi
SRMv deployment status for CMS Tiers
At the start of CCRC:
(week-1 day-1)
Situation much improved in all region, and faster than expected, during CCRC weeks-1/2/3:
https:twikicernchtwikibinviewCMSTierSRM
End of CCRC week-3:
(week-3 day-7)
D Bonacorsi Bonacorsi
D Bonacorsi Bonacorsi
from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations
TT targets
D Bonacorsi Bonacorsi
CCRC Week-1
500 MB/s 500 MB/s 1.5 GB/s 1.5 GB/s
CCRC Week-2 CCRC Week-3 CCRC Week-4
CMS
D Bonacorsi Bonacorsi
CERN outbound trac PhEDEx
D Bonacorsi Bonacorsi
T T
Start
CCRC F2F meeting Delay in LT set-up Issues with SRMv2 @CERN
Grandview on TT in CCRCFeb
Week-1 Week-3
End
Week-4 Week-2
Not all 7 T1s
All 7 T1s
D Bonacorsi Bonacorsi
T T
Zoom on week all Ts together
PhEDEx issue PhEDEx issue + slow support responsesNominal CERN-outbound rate for CMS
D Bonacorsi Bonacorsi
Summary of TT tests
D Bonacorsi Bonacorsi
D Bonacorsi Bonacorsi
from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations
TT targets
Note: CCRC targets for T1-T1 are EXPORT targets
D Bonacorsi Bonacorsi
T T
Grandview on TT in CCRCFeb
Week-1 Week-2 Week-3 Week-4
D Bonacorsi Bonacorsi
CNAF all Ts
CNAF T: ASGC FZK FNAL PIC INP RAL
MBsday for days in a row
Legenda: Colours = match the plot Underlined = in another continent Overall T1-outbound to T1s: 6 x 6 MB/s
Week-1 Week-2 Week-3 Week-4
[ an example of a T1 exporting successfully to all other 6 T1s ]
D Bonacorsi Bonacorsi
Summary of TT tests
D Bonacorsi Bonacorsi
D Bonacorsi Bonacorsi
from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations
TT targets
D Bonacorsi Bonacorsi
FNAL regional Ts
Target: 117 MB/s/day for 3 days in a row
FNAL T: Caltech Nebraska Wisconsin
Florida Purdue MIT UCSD UERJ
Aggregate FNALoutbound rate to regT is OK Very good quality!
Week-1 Week-2 Week-3 Week-4
[ an example of a region with stable T1 outbound traffic to regional T2s ]
D Bonacorsi Bonacorsi
Summary of TT tests
D Bonacorsi Bonacorsi
D Bonacorsi Bonacorsi
from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations
TT targets
D Bonacorsi Bonacorsi
Summary of TT tests
D Bonacorsi Bonacorsi
PhEDExdriven trac in CCRCFeb tests only
PB
moved by CMS in CCRCFeb
GBs GBs
D Bonacorsi Bonacorsi
When a challenge ends… … a debugging data transfer program continues.
D Bonacorsi Bonacorsi
DDT decommissioning metric for
Commissioning in PhEDEx Debug instance: TT and TT downlink: MBs for day
in terms of data volume: TB in less than h
TT uplink: MBs for day
in terms of data volume: TB in less than h
Decommissioning:
On complaint from DataOps or… Periodic exercising of transfers by DDTTF in PhEDEx Debug instance : From each T check one T cross link and two T downlinks each day
Exercise each link in rotation at commissioning rate
If link cannot pass in days in a row then decommission
decommission deactivate the path in PhEDEx Prod topology scheduled downtimes central PhEDEx failures holidays exempt …
New Metric eective Feb th
Link Exercising began Feb th also
if a link passed in the two previous weeks it automatically passes the exercising
D Bonacorsi Bonacorsi
Actual DDTTF procedure
GOAL: to exercise and pass or if it fails then recommission every previously commissioned link Procedure: https:twikicernchtwikibinviewCMSDDTLinkExercising
rotation of active links tofrom each T trying to not overload any one site
Three somewhat equal groups for Testing Group : Links beginning or ending at FNAL or ASGC: links Group : Links beginningending at INP FZK RAL and not in G: links Group : All other links mainly CERN CNAF PIC: links
Injection rate increased tmp or onetime injection for links exercising Site administrators informed of progress or problems Production transfers count towards metric goals
DDT Exercising Matrix under development by DDTTF
Green: passed exercising metric in past weeks Yellow: Commissioned still no passed andor exercised in past weeks though
D Bonacorsi Bonacorsi
How to daily monitor DDT today
e.g. screenshot for CERN/T1 links only click
CNAF scheduled downtime
D Bonacorsi Bonacorsi
Status of DDT link commissioning
as of last week
Requirements: from the Site Commissioning document
Specicies the CMS reqs to CMS Tiers
not only for transfers
T Link Commissioning Requirements: CERNT and TCERN are commissioned plus TT downlinks TT bidirectional links T Link Commissioning Requirements: uplink to associated T commissioned plus downlinks from Ts
Status: COMMISSIONED links as of last week
all CERNTT crosslinks COMMISSIONED TT downlinks
Ts pass link section of Site Commissioning reqs:
ASGC CERN CNAF FNAL FZK INP PIC RAL
TT uplinks
Ts have a commissioned uplink to the associated T Of these have at least commissioned TT downlinks
TT crosslinks not in Computing Model
D Bonacorsi Bonacorsi
Troubleshooting and support
Problems found are:
Rough overview of problems: problems over last few weeks
Savannah CMS Computing Infrastructure support project
Specic DDT squad and category
Sites notify problems Ops people steer the process and assign tickets to relevant and available experts Experts feel ownership problems get solved and tracked TODO: Ops people review problems tracked and extract certied solutions FAQ …
D Bonacorsi Bonacorsi
[ courtesy of L.Tuura ]
2007 2008
… and this is the overall effect.
D Bonacorsi Bonacorsi
Whats next in DDT?
About of links had real issues when it came time to exercise so this activity seems useful and important Is it still a long path?
Yes Despite stlid progress and the experience gained in running daily
TT matrix is complete gains are stable But…
Only of T downlinks commissioned Only of T uplinks commissioned Only active T pass Site Commissioning Link Requirements Also other T sites not transferring any data
DDT can exercise linksweek only
Is once per weeks oen enough? Is more frequent exercising possible with current manpower and T throughput overhead of MBs
Probably not May think of smaller continuous exercises ala heartbea though
Focus on uncommissioned T sites with just few commissioned links and investigate bottlenecks there and complete the T picture
Better this than to encourage sites with many commissioned links to complete their connectivity… that will come later
D Bonacorsi Bonacorsi
Summary
Thanks to all CMS people and site admins for the excellent and constant work to support CMS activities Constantly increasing involvement of Tiers involvement of Tiers in transfer operations since yrs Scale improvements in data transfers First visible boost in performance performance of data transfers also
Focus on achieving stable improvements through clear procedures
Both CMSspecic and multiVO challenges felt as useful exercises
CSA CCRCphase… soon CSA CCRCphase
The DDT program DDT program is and extremely useful computing eort within CMS
Constant renement of DDT metric
mostly xed within the exercising time period of working days
DDT can now exercise only linksweek not bad but we need more
Limited by eort people and extra IO load on T sites
Next:
Already now: next generation DDT CSA CCRCphase soon