CMS Data Transfer tests towards LHC data taking CMS Data Transfer - - PowerPoint PPT Presentation

cms data transfer tests towards lhc data taking cms data
SMART_READER_LITE
LIVE PREVIEW

CMS Data Transfer tests towards LHC data taking CMS Data Transfer - - PowerPoint PPT Presentation

CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking D Bonacorsi CMS Facilities Infrastructure Operations INFN CNAF Bologna Italy On behalf of the CMS experiment


slide-1
SLIDE 1

CMS Data Transfer tests towards LHC data taking CMS Data Transfer tests towards LHC data taking

D Bonacorsi

CMS Facilities Infrastructure Operations INFNCNAF Bologna Italy

On behalf of the CMS experiment

*Thanks* to all site people and operators.

slide-2
SLIDE 2
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Calibration Calibration Prompt Prompt reco reco Skims, Skims, re-reco re-reco MC prod upload MC prod upload

Main CMS workows simplied

Data Transfer Data Transfer Data Transfer Data Transfer

Skims download Skims download

Data Transfer Data Transfer

In this talk: focus on distributed transfer only

slide-3
SLIDE 3
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

PhEDEx

Physics Experiment Data Export

reliable scalable data replication system for HEP experiments

CMS is exercising the data placement system since

PhEDEx fully interfaced also with gLite FTS since yrs

CMS data transfers in a nutshell:

T T T T s T s T s s T T s s in current PhEDEx transfer topology

  • f CMS Institutes

Data transfers operated as if experiment was already running

we have had service outages exceeding hrs in the last yrs

Current transfers at GBs GBs TBday TBday PBmonth PBmonth global average rate

average CMS les size likely to be GB k lesday k lesday

slide-4
SLIDE 4
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Evolution of a LoadTest from early on

Jan 2007 Jan 2007 mid-March 2007 mid-March 2007

A exible infrastructure to generate data transfer trac among CMS Tiers

fake but real

No real physics les; fully PhEDExcompliant though

activity activity since midFebruary

full cycles weeks each before Jun then extended into DDT CSA preparation activities TTtape TT TT regional TT nonregional

Start moving

slide-5
SLIDE 5
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Cycle-1 Cycle-2

Jan 2007 Jan 2007 mid-April 2007 mid-April 2007

Jan 2007 Jan 2007 mid-March 2007 mid-March 2007 Start walking

slide-6
SLIDE 6
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Sep 2006 Sep 2006 Today Today

Jan 2007 Jan 2007 mid-April 2007 mid-April 2007 Walk better and faster

slide-7
SLIDE 7
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

CMS CMS LoadTest LoadTest 2007 2007

CMS CSA06: ~1 PB in ~1 month to participating Tiers

>12 PB in ~6 months

among all PhEDEx Tiers joining the LoadTest

slide-8
SLIDE 8
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi [ courtesy of L.Tuura, CHEP07 ]

2007/Q12

slide-9
SLIDE 9
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Data transfers: grandview

Status as of mid: With yr to LHC startup CMS approaches the real transfers in scale but not yet the full complexity

From reliable transfers over the full transfer mesh to multiVO exercises…

The progress is evident though Main sources of this are:

A welldesigned robust scalable transfer system A remarkable manpower investment to commission the transfer system

Continuing eorts are needed on debugging data transfers debugging data transfers

more

slide-10
SLIDE 10
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

[ courtesy of L.Tuura ]

Screenshot as of Sep 2007

slide-11
SLIDE 11
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Quality of transfers

Clear improvement in Tiers participation to test transfers since LT started Still not evident improvement in quality though

Its not one problem that lacks solution there is a wide span of them

Greenish quality plots successful transfers with fewer retrials only when storage at both ends network PhEDEx setup site cong operators work ne… simultaneously!

30

Jan06 Jun07

20

# Tiers making some successful transfers # Tiers making transfers at >50% quality

CSA06 CSA06 LoadTest LoadTest

Positive note: now stably at a ~challenge traffic load Need to: Keep sites exercised + Debug and improve quality

more

slide-12
SLIDE 12
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Debugging Data Transfers

A CMS program to maintain a highquality transfer network

to be handed over to CMS Data Operations

Debugcommission transfer links transfer links among CMS Tiers

a Task Force DDTTF is in charge since July

Joined eort with CMS FacilitiesCommissioning T liasons PhEDEx Central Ops FTSSRM experts network experts site admins …

Troubleshooting by Tiers work by milestones

focus on watching logs ping site admins x problems

commission TT T downlinks to T T uplinks to T …

Infrastructural issues by FacilitiesNetwork projects Overall activity DDTTF work on deliverables

Eg a realtime status map with reasons of all TierXTierY links Eg a number of documented success stories in troubleshooting

DDT Gain confidence Started in July 07

slide-13
SLIDE 13
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

The rst DDT metric

DDT

First step was to dene and implement a metric by which links can become commissioned and subsequently handed over to Data Operations There are several stages through which a link can get commissioned:

  • NOTTESTED

NOTTESTED: links never actually tested

ie showing no successful transfer attempts within PhEDEx

  • PENDINGCOMMISSIONING

PENDINGCOMMISSIONING: links that have transferred successfully at least les in PhEDEx but have not yet passed the reqs below

  • COMMISSIONED

COMMISSIONED: links that are demonstrated to work * and can be delivered to Data Ops

  • Transfer GBday for out of consecutive days and transfer a total of TB during that same

day period

  • For links involving an endpoint at a T this req is relaxed to out of days and a total transfer

volume of TB to match service business hrs support at Ts

  • PROBLEMRATE

PROBLEMRATE: links that were working but whose rate has dropped o

  • To remain COMMISSIONED a link must transfer at least GBday for a single day at least once

every days Otherwise the link must be re commissioned by following the procedure above

* ie this does not imply that the link or the site has met the reqs of the CMS Computing Model but simply that the link has passed some passed minimum reqs to be considered usable for Data Operations namely: productionquality activity not tests

slide-14
SLIDE 14
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Legenda:

ROWCOLUMN: upper half of box COLUMNROW: lower half of box

States:

T1T1 T1T2

[ … plus many more … ]

Steady inux of new links…

The DDT status in late

DDT

number of COMMISSIONED links that were in danger of decommissioning within the next two days
slide-15
SLIDE 15
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Going beyond

DDT

The requirementsthresholds in this metric: were developed with the idea of having a higher threshold to commission than to decommission the link can be increased in time as networks and sites develop

the rates implied by GBday are of the order of MBs per link far below the commitments envisioned in the computing model

Ts being able to download a total of up to TBday from T sites or over MBs sustained downloads

match the idea that transfers would be at continuous rate over several days The Computing Model actually envisions that transfers will occur in bursts

the metric used during CSA deviated from this model to prove the stability of data transfer links

It worthed a metric revision later But before see what happened with this one metric!

slide-16
SLIDE 16
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Before DDT started

(1 month, May 2007)

After DDT started

(1 month, Oct 2007, during CSA07) The plots show the fraction successes/attempts in file transfers. A clear improvement in the number and quality of data transfer links is seen soon after DDT started (Jul 07).

slide-17
SLIDE 17
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

In the meantime…. CCRC’08/phase-1 (WLCG Common-VO Computing Readiness Challenge)

slide-18
SLIDE 18
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Plans

TT: CERN FNAL FZK INP PIC TT: FNAL FZK INP PIC export to other Ts among the TT: CNAF RAL ASGC regional T TT: IT UK Taiwan Ts outbound trac to CNAF RAL ASGC respectively TT: CERN CNAF RAL ASGC TT: CNAF RAL ASGC export to other Ts among the TT: FNAL FZK INP PIC regional T TT: US Germany France Spain Ts outbound trac to FNAL FZK INP PIC TT: CERN all Ts golden week for running concurrently with ATLAS TT: in principle nothing repetition slot TT: in principle nothing repetition slot TT: in principle nothing repetition slot TT: in principle nothing free for rther superimposition with ATLAS TT: full or repetition slot TT: full or repetition slot TT: full or repetition slot

https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations

slide-19
SLIDE 19
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

SRMv deployment status for CMS Tiers

At the start of CCRC:

(week-1 day-1)

Situation much improved in all region, and faster than expected, during CCRC weeks-1/2/3:

  • Check details out at:

https:twikicernchtwikibinviewCMSTierSRM

End of CCRC week-3:

(week-3 day-7)

slide-20
SLIDE 20
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

TT

slide-21
SLIDE 21
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations

TT targets

slide-22
SLIDE 22
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

CCRC Week-1

500 MB/s 500 MB/s 1.5 GB/s 1.5 GB/s

CCRC Week-2 CCRC Week-3 CCRC Week-4

CMS

slide-23
SLIDE 23
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

CERN outbound trac PhEDEx

slide-24
SLIDE 24
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

T T

Start

CCRC F2F meeting Delay in LT set-up Issues with SRMv2 @CERN

Grandview on TT in CCRCFeb

Week-1 Week-3

End

Week-4 Week-2

Not all 7 T1s

All 7 T1s

slide-25
SLIDE 25
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

T T

Zoom on week all Ts together

PhEDEx issue PhEDEx issue + slow support responses

Nominal CERN-outbound rate for CMS

slide-26
SLIDE 26
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Summary of TT tests

slide-27
SLIDE 27
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

TT

slide-28
SLIDE 28
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations

TT targets

Note: CCRC targets for T1-T1 are EXPORT targets

slide-29
SLIDE 29
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

T T

Grandview on TT in CCRCFeb

Week-1 Week-2 Week-3 Week-4

slide-30
SLIDE 30
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

CNAF all Ts

CNAF T: ASGC FZK FNAL PIC INP RAL

MBsday for days in a row

  • Target achieved

Legenda: Colours = match the plot Underlined = in another continent Overall T1-outbound to T1s: 6 x 6 MB/s

Week-1 Week-2 Week-3 Week-4

[ an example of a T1 exporting successfully to all other 6 T1s ]

slide-31
SLIDE 31
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Summary of TT tests

slide-32
SLIDE 32
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

TT

slide-33
SLIDE 33
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations

TT targets

slide-34
SLIDE 34
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

FNAL regional Ts

Target: 117 MB/s/day for 3 days in a row

FNAL T: Caltech Nebraska Wisconsin

Florida Purdue MIT UCSD UERJ

Aggregate FNALoutbound rate to regT is OK Very good quality!

  • Target achieved

Week-1 Week-2 Week-3 Week-4

[ an example of a region with stable T1 outbound traffic to regional T2s ]

slide-35
SLIDE 35
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Summary of TT tests

slide-36
SLIDE 36
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

TT

slide-37
SLIDE 37
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

from: https:twikicernchtwikibinviewCMSCCRCPhaseITestTransfersOperations

TT targets

slide-38
SLIDE 38
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Summary of TT tests

slide-39
SLIDE 39
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

PhEDExdriven trac in CCRCFeb tests only

PB

  • f test data

moved by CMS in CCRCFeb

GBs GBs

  • n links
slide-40
SLIDE 40
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

When a challenge ends… … a debugging data transfer program continues.

slide-41
SLIDE 41
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

DDT decommissioning metric for

Commissioning in PhEDEx Debug instance: TT and TT downlink: MBs for day

in terms of data volume: TB in less than h

TT uplink: MBs for day

in terms of data volume: TB in less than h

Decommissioning:

On complaint from DataOps or… Periodic exercising of transfers by DDTTF in PhEDEx Debug instance : From each T check one T cross link and two T downlinks each day

Exercise each link in rotation at commissioning rate

If link cannot pass in days in a row then decommission

decommission deactivate the path in PhEDEx Prod topology scheduled downtimes central PhEDEx failures holidays exempt …

New Metric eective Feb th

Link Exercising began Feb th also

if a link passed in the two previous weeks it automatically passes the exercising

slide-42
SLIDE 42
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Actual DDTTF procedure

GOAL: to exercise and pass or if it fails then recommission every previously commissioned link Procedure: https:twikicernchtwikibinviewCMSDDTLinkExercising

rotation of active links tofrom each T trying to not overload any one site

Three somewhat equal groups for Testing Group : Links beginning or ending at FNAL or ASGC: links Group : Links beginningending at INP FZK RAL and not in G: links Group : All other links mainly CERN CNAF PIC: links

Injection rate increased tmp or onetime injection for links exercising Site administrators informed of progress or problems Production transfers count towards metric goals

DDT Exercising Matrix under development by DDTTF

Green: passed exercising metric in past weeks Yellow: Commissioned still no passed andor exercised in past weeks though

slide-43
SLIDE 43
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

How to daily monitor DDT today

e.g. screenshot for CERN/T1 links only click

CNAF scheduled downtime

slide-44
SLIDE 44
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Status of DDT link commissioning

as of last week

Requirements: from the Site Commissioning document

Specicies the CMS reqs to CMS Tiers

not only for transfers

T Link Commissioning Requirements: CERNT and TCERN are commissioned plus TT downlinks TT bidirectional links T Link Commissioning Requirements: uplink to associated T commissioned plus downlinks from Ts

Status: COMMISSIONED links as of last week

all CERNTT crosslinks COMMISSIONED TT downlinks

Ts pass link section of Site Commissioning reqs:

ASGC CERN CNAF FNAL FZK INP PIC RAL

TT uplinks

Ts have a commissioned uplink to the associated T Of these have at least commissioned TT downlinks

TT crosslinks not in Computing Model

slide-45
SLIDE 45
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Troubleshooting and support

Problems found are:

  • Categorized
  • if possible investigated by central ops team volunteers…
  • Always reported to site people who join debugging or do it themselves

Rough overview of problems: problems over last few weeks

  • FTS channel cong issues FTS timeouts
  • SRM errors of some kind gridp timeouts storage system issues
  • myproxy problems errors in bind or simply proxy expired
  • Network failures
  • File exists or error in path at destination
  • Filesystem authentication problems
  • PhEDEx agents down unnoticed or miscongured; LT samples not available

Savannah CMS Computing Infrastructure support project

Specic DDT squad and category

Sites notify problems Ops people steer the process and assign tickets to relevant and available experts Experts feel ownership problems get solved and tracked TODO: Ops people review problems tracked and extract certied solutions FAQ …

slide-46
SLIDE 46
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

[ courtesy of L.Tuura ]

2007 2008

… and this is the overall effect.

slide-47
SLIDE 47
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Whats next in DDT?

About of links had real issues when it came time to exercise so this activity seems useful and important Is it still a long path?

Yes Despite stlid progress and the experience gained in running daily

  • ps more still needs to be done

TT matrix is complete gains are stable But…

Only of T downlinks commissioned Only of T uplinks commissioned Only active T pass Site Commissioning Link Requirements Also other T sites not transferring any data

DDT can exercise linksweek only

Is once per weeks oen enough? Is more frequent exercising possible with current manpower and T throughput overhead of MBs

Probably not May think of smaller continuous exercises ala heartbea though

Focus on uncommissioned T sites with just few commissioned links and investigate bottlenecks there and complete the T picture

Better this than to encourage sites with many commissioned links to complete their connectivity… that will come later

slide-48
SLIDE 48
  • ISGC Symposium Taipei April

D Bonacorsi Bonacorsi

Summary

Thanks to all CMS people and site admins for the excellent and constant work to support CMS activities Constantly increasing involvement of Tiers involvement of Tiers in transfer operations since yrs Scale improvements in data transfers First visible boost in performance performance of data transfers also

Focus on achieving stable improvements through clear procedures

Both CMSspecic and multiVO challenges felt as useful exercises

CSA CCRCphase… soon CSA CCRCphase

The DDT program DDT program is and extremely useful computing eort within CMS

Constant renement of DDT metric

  • f previously commissioned links have PASSED the latest metric
  • f tested links had problems uncovered in exercising

mostly xed within the exercising time period of working days

DDT can now exercise only linksweek not bad but we need more

Limited by eort people and extra IO load on T sites

Next:

Already now: next generation DDT CSA CCRCphase soon