Testing the Grid- Service & Data Challenges LHC Data Analysis - - PowerPoint PPT Presentation

testing the grid service data challenges
SMART_READER_LITE
LIVE PREVIEW

Testing the Grid- Service & Data Challenges LHC Data Analysis - - PowerPoint PPT Presentation

Testing the Grid- Service & Data Challenges LHC Data Analysis Challenges for the Experiments and 100 Computing Centres in 20 Countries GridKa School Karlsruhe 15 September 2006 Michael Ernst Michael Ernst DESY / CMS DESY / CMS The


slide-1
SLIDE 1

Michael Ernst Michael Ernst

DESY / CMS DESY / CMS

Testing the Grid- Service & Data Challenges

LHC Data Analysis Challenges for the Experiments and 100 Computing Centres in 20 Countries

GridKa School Karlsruhe 15 September 2006

slide-2
SLIDE 2

2 Michael Ernst

The Worldwide LHC Computing Grid

... as defined by LCG

Purpose Develop, build and maintain a distributed computing

environment for the storage and analysis of data from the four LHC experiments

Ensure the computing service … and common application libraries and tools

Phase I – 2002-2005 - Development & planning Phase II – 2006-2008 – Deployment & commissioning

  • f the initial services
slide-3
SLIDE 3

3 Michael Ernst

WLCG Collaboration

The Collaboration ~100 computing centres 12 large centres

(Tier-0, Tier-1)

38 federations of smaller

“Tier-2” centres

20 countries

slide-4
SLIDE 4

4 Michael Ernst

Service Hierarchy

Tier-0 – the accelerator centre

  • Data acquisition & initial processing
  • Long-term data curation
  • Distribution of data Tier-1 centres

Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands Tier-1 (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – Fermilab (Illinois) – Brookhaven (NY)

Tier-1 – “online” to the data acquisition process high availability

  • Managed Mass Storage –

grid-enabled data service

  • Data-heavy analysis
  • National, regional support

Tier-2 – ~100 centres in ~40 countries

  • Simulation
  • End-user analysis – batch and

interactive

slide-5
SLIDE 5

5 Michael Ernst CERN 18% A ll Tier-1s 39% A ll Tier-2s 43% CERN 12% A ll Tier-1s 55% A ll Tier-2s 33% CERN 34% A ll Tier-1s 66%

CPU Disk Tape

Summary of Computing Resource Requirements

All experiments - 2008 From LCG TDR - June 2005 CERN All Tier-1s All Tier-2s Total CPU (MSPECint2000s) 25 56 61 142 Disk (PetaBytes) 7 31 19 57 Tape (PetaBytes) 18 35 53

slide-6
SLIDE 6

6 Michael Ernst

Two major science grid infrastructures ….

EGEE - Enabling Grids for E- Science OSG

  • US Open Science Grid
slide-7
SLIDE 7

LCG

IN2P3 GridKa TRIUMF ASCC Fermilab Brookhaven Nordic CNAF SARA PIC RAL T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2

T2s and T1s are inter-connected by the general purpose research networks 10 Gbit links Optical Private Network

T2

Any Tier-2 may access data at any Tier-1

T2 T2 T2 IN2P3 GridKa TRIUMF ASCC Fermilab Brookhaven Nordic Nordic CNAF SARA PIC RAL T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2 T2

T2s and T1s are inter-connected by the general purpose research networks 10 Gbit links Optical Private Network

T2 T2

Any Tier-2 may access data at any Tier-1

T2 T2 T2 T2 T2 T2 T2

.. and an excellent

Wide Area Network

slide-8
SLIDE 8

8 Michael Ernst

LHC Data Grid Hierarchy

Experiment

Tier 1

Tier2 Center

Online System

CERN Center PBs of Disk; Tape Robot

FNAL Center GridKa Center INFN Center RAL Center

Institute Institute Institute Institute

Workstations

~100-1500 MBytes/sec

2.5-10 Gbps 0.1 to 10 Gbps

Tens of Petabytes by 2007-8. An Exabyte ~5-7 Years later.

Physics data cache

~PByte/sec

~2.5-10 Gbps

Tier2 Center Tier2 Center Tier2 Center

~2.5-10 Gbps

Tier 0 +1

Tier 3

Tier 4

Tier2 Center

Tier 2

CERN/Outside Resource Ratio ~1:2 Tier0/(Σ Tier1)/(Σ Tier2) ~1:1:1

Emerging Vision: A Richly Structured, Global Dynamic System

slide-9
SLIDE 9

9 Michael Ernst

Tier-1 Tier-1 Tier-1 Tier-1

Tier-2 Tier-2 Tier-2 Tier-2

Experiment computing models define specific data flows between Tier-1s and Tier-2s

slide-10
SLIDE 10

10 Michael Ernst

ATLAS “average” Tier-1 Data Flow (2008)

Tier-0 CPU farm T1 T1 Other Tier-1s

disk buffer

RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day AODm2 500 MB/file 0.004 Hz 0.34K f/day 2 MB/s 0.16 TB/day RAW ESD2 AODm2 0.044 Hz 3.74K f/day 44 MB/s 3.66 TB/day

T1 T1 Other Tier-1s T1 T1 Tier-2s Tape

RAW 1.6 GB/file 0.02 Hz 1.7K f/day 32 MB/s 2.7 TB/day

disk storage

AODm2 500 MB/file 0.004 Hz 0.34K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AOD2 10 MB/file 0.2 Hz 17K f/day 2 MB/s 0.16 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file 0.036 Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD2 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm2 500 MB/file 0.036 Hz 3.1K f/day 18 MB/s 1.44 TB/day ESD1 0.5 GB/file 0.02 Hz 1.7K f/day 10 MB/s 0.8 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm1 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day AODm2 500 MB/file 0.04 Hz 3.4K f/day 20 MB/s 1.6 TB/day

Plus simulation Plus simulation & & analysis data flow analysis data flow

Real data storage, reprocessing and distribution

slide-11
SLIDE 11

CMS Data Flows

11 Michael Ernst

slide-12
SLIDE 12

12 Michael Ernst

CMS Basic Data Flows

Tier-0 to Tier-1 Flow Accessing the Data Tier-1 to Tier-2 Flow

slide-13
SLIDE 13

13 Michael Ernst

Service Challenges

  • Purpose
  • Understand what it takes to operate a real grid service

real grid service – run for weeks/months at a time (not just limited to experiment Data Challenges)

  • Trigger and verify Tier1 & large Tier-2 planning and deployment –
  • tested with realistic usage patterns
  • Get the essential grid services ramped up to target levels of reliability,

availability, scalability, end-to-end performance

  • Four progressive steps from October 2004 thru September 2006
  • End 2004 - SC1 – data transfer to subset of Tier-1s
  • Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier-2s
  • 2nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services

Jun-Sep 2006 – SC4 – pilot service

Autumn 2006 – LHC service in continuous operation – ready for data taking in 2007

slide-14
SLIDE 14

14 Michael Ernst

SC4 – the Pilot LHC Service from June 2006

A stable service on which experiments can make a full demonstration of experiment offline chain

  • DAQ Tier-0 Tier-1 and/or Tier-2

data recording, calibration, reconstruction

  • Offline analysis - Tier-1 Tier-2 data exchange

simulation, batch and end-user analysis

And sites can test their operational readiness

  • Service metrics MoU service levels
  • Grid services
  • Mass storage services, including magnetic tape

Extension to most Tier-2 sites

slide-15
SLIDE 15

15 Michael Ernst

  • The Service Challenge program this year must show

that we can run reliable services

  • Grid reliability is the product of many components

– middleware, grid operations, computer centres, ….

  • Target for this Fall (LCG goal)

90% site availability 90% user job success

  • Requires a major effort by everyone

to monitor, measure, debug First data will arrive next year NOT an option to get things going later

T

  • m
  • d

e s t ? T

  • a

m b i t i

  • u

s ?

slide-16
SLIDE 16

16 Michael Ernst

Availability Targets

... as anticipated by LCG

End September 2006 - end of Service

Challenge 4

8 Tier-1s and 20 Tier-2s

> 90% of MoU targets

April 2007 – Service fully commissioned

All Tier-1s and 30 Tier-2s

> 100% of MoU Targets

slide-17
SLIDE 17

17 Michael Ernst

Measuring Response times and Availability

Site Functional Test Framework:

  • monitoring services by running regular tests
  • basic services – SRM, LFC, FTS, CE, RB, Top-level BDII, Site

BDII, MyProxy, VOMS, R-GMA, ….

  • VO environment – tests supplied by experiments
  • results stored in database
  • displays & alarms for sites, grid operations, experiments
  • high level metrics for management
  • integrated with EGEE operations-portal - main tool for daily
  • perations

(egee)

slide-18
SLIDE 18

18 Michael Ernst

Site Functional Tests

A vailability

  • f 10 T

ier-1 S ites

0% 20% 40% 60% 80% 100% 120% Jul-05 A ug-05 Sep-05 O ct-05 Nov

  • 05 Dec-05 Jan-06 F

eb-06 M ar-06 M

  • n

th Percen tag e availa

A vailability

  • f 5 T

ier-1 S ites

0% 20% 40% 60% 80% 100% 120% Jul-05 A ug-05 S ep-05 O ct-05 Nov

  • 05 Dec-05 Jan-06

F eb-06 M ar-06 M

  • n

th P ercen tag e availa

Tier-1 sites

without BNL

Basic tests only

Only partially corrected

for scheduled down time

Not corrected for sites

with less than 24 hour coverage

average value of sites shown

slide-19
SLIDE 19

19 Michael Ernst

Medium Term Schedule

... as it was declared by LCG in April 2006

3D distributed database services

development test deployment

SC4

stable service For experiment tests

SRM 2 test and deployment plan being elaborated

October target

Additional functionality to be agreed, developed, evaluated then - tested deployed ?? Deployment schedule ??

slide-20
SLIDE 20

20 Michael Ernst

LCG Service Deadlines

full physics run 2007 2008 first physics cosmics 2006

Pilot Services – stable service from 1 June 06 LHC Service in operation – 1 Oct 06

  • ver following six months ramp up to

full operational capacity & performance LHC service commissioned – 1 Apr 07

slide-21
SLIDE 21

21 Michael Ernst

Overview of CMS Computing, Software & Analysis Challenge 2006 (CSA06) Goals and Metrics

slide-22
SLIDE 22

22 Michael Ernst

CSA06 – What it is

Definition A 50 million event exercise to test the workflow and

dataflow associated with the data handling model of CMS

Receive previously simulated events Perform prompt reconstruction at Tier-0, including

determination and application of calibration constants

Creation of Analysis Object Data (AOD) Distribution of AOD to all participating Tier-1 Centers Physics jobs running on AOD at some Tier-1 Centers Skim jobs at some Tier-1 Centers with data propagated to

Tier-2s

Physics jobs on skimmed data at some Tier-2 Centers

slide-23
SLIDE 23

23 Michael Ernst

CSA06 – Overall Goals

A 25% capacity test of what we need in 2008 Demonstrate Workflow

Primary goal: Test the data handling model

Demonstrate Dataflow

The main exercise of SC4

Demonstrate production-grade reconstruction

software

Includes Calibration and Detector performance

Provide Services to a wide User Community

slide-24
SLIDE 24

24 Michael Ernst

Pre-Challenge Steps

Simulate 50M Events

Mass Production started end of July Runs on Grid Resources only 4 Teams

UWM running on OSG Resources CIEMAT running on LCG Resources Aachen/DESY running on LCG Resources INFN – Bari running on LCG Resources

slide-25
SLIDE 25

25 Michael Ernst

Production Ramp-up (Minbias Sample)

slide-26
SLIDE 26

26 Michael Ernst

Resource Expectations & Metrics

  • Tier-0

1200 CPUs

In the challenge and the real data taking the item that drives much of the

  • ffline computing capacity is the Tier-0 reconstruction rate (25%=~40 Hz)

Total data rate out of CERN at 40 Hz = ~75 MB/s

  • If we assume a factor of 2 to catch up from loss time we reach the 150 MB/s for this year

(incl. factor of 2 in provisioning -> 3Gbps for outgoing networking out of CERN in CSA06)

180 TB

CSA06 data sample is ~50M events (cycled twice) Including the raw and reconstructed events

Demonstrate prompt reconstruction at 25% of HLT bandwidth (~40Hz)

Weeks of running at sustained rate

  • Goal: 4
  • Threshold: 2

Efficiency

  • Goal: 80%
  • Threshold: 30%
  • Measured as unattended uptime fraction over 2 best weeks of the running period
slide-27
SLIDE 27

27 Michael Ernst

Resource Expectations & Metrics

  • Tier-1

2400 CPUs across all Tier-1 Centers

Tier-0 Center is ~20% of the computing resources Tier-1 Centers are 40% by the computing model ratio Ideally Tier-1 Centers (incl. CAF) would all provide at least 300 CPUs

  • Approximately the same percentage of shortfall we see in 2008

70TB Disk (nominal size Tier-1)

Allow storage of large fraction of data on disk, while exercising faults to

tape

Expected Performance WN SE (Disk Cache) 800MB/s

Based on 1 MB/s per batch slot Exercised and documented in CSA06 Castor and dCache have shown good performance

Goal for CSA06 is 12k jobs/day at Tier-1 Centers

Anticipated job success rate: goal 90%, threshold 70% The job submission infrastructure is currently the bottleneck At current resources level will be running 2-3 hour jobs to meet the goal Primary goal is to keep the existing resources productive at the 25%

scale

Scaling tape rates by pledge aiming for 160 MB/s

Network provisioning should be at least twice the rate

slide-28
SLIDE 28

28 Michael Ernst

Resource Expectations & Metrics

Tier-1 Performance Metric

Number of participating Tier-1 Centers Goal: 7 Threshold: 5 Demonstrate calibration/analysis jobs at Tier-1 Centers Demonstrate writing of new calibration constants into an

  • ffline DB to be used for re-reconstruction

Demonstrate re-reconstruction from some raw data as part

  • f the calibration exercise

Demonstration of a skim job at each Tier-1 Center Automatic transfer of skim job output to Tier-2 Centers

slide-29
SLIDE 29

29 Michael Ernst

Resource Expectations & Metrics

  • Tier-2

40% of the computing resources are located at Tier-2 Centers

CMS assumes 25 full size centers => ~100 CPUs per Tier-2

~10 TB Disk Expected Performance WN Disk Storage SE 200 MB/s Network estimates for Tier-2 vary widely

CMS Computing Model defines expected minimum in 2008 at 1 Gbps

  • Would be 250 Mbps for CSA06
  • Given many have already 1 Gbps and some 10 Gbps and the difficulties for end-to-

end transfers experienced in SC4 it makes sense to try much larger scale tests at some Tier-2 Centers

Perform ~40k jobs/day in CSA06

Analysis and centrally submitted production applications

Primary goal of CSA06 is to make efficient use of the resources for

production and analysis

Submission infrastructure is known to be the bottleneck Job completion efficiency can be low under high stress conditions Goal to demonstrate we have both of these under control in CSA06

slide-30
SLIDE 30

30 Michael Ernst

Resource Expectations & Metrics

Tier-2 Performance Metric

Number of participating sites

Goal: 20 Threshold: 15

Demonstration of a user analysis job at each Tier-2

Center on skimmed output

slide-31
SLIDE 31

31 Michael Ernst

Data Transfer

Tier-0 to Tier-1 (Tape)

Individual goals for each Tier-1, aggregate is 160

MB/s

Goal: 25% of 2008 rates

Tier-1 to Tier-1

No such dataflow in CSA06

Tier-1 to Tier-2

Goal: 20 MB/s into each Tier-2 Center Threshold: 5 MB/s Overall success is to have 50% of participating

sites at or above goal and 90% above threshold

slide-32
SLIDE 32

32 Michael Ernst

Increasing the Processing

Two improvements from SC4

To grow from 25k jobs/day to 50k jobs/day we need

to switch submission infrastructure

25k jobs/day is already a strain on current RB infrastructure Job Robots are ideal users (flat load over a 24h period) Users generate unexpected usage patterns and load

Switch to the gLite RB w/ bulk submission is needed

for CSA06

Deployment came later than we had expected in May Close to be fully commissioned in CMS for CSA06

slide-33
SLIDE 33

Michael Ernst

Test results

Bulk submission of 300 and 800 jobs to WMProxy (300 jobs dispatched in about 600 sec) Different issues have been solved especially at the matchmaking level: Information Collection and Caching from the Information System

slide-34
SLIDE 34

34 Michael Ernst

Input sandbox sharing in gLite ... bulk submission

  • File size ~ 11KBytes
  • 300 jobs are submitted in bulk

(100 jobs per bulk)

  • In the case with multi-file non-

shared input sandbox, the submission time is proportional to the number of files

  • Sharing multiple files can

suppress the job submission time to a constant

  • One tarball means packing all files

in one shared tarball

slide-35
SLIDE 35

35 Michael Ernst

Including Users

The user load for the Physics TDR was extremely useful for tool and infrastructure validation

Need to enable users to perform actual work CMS remote Analysis Builder (CRAB) for new CMS

Software Framework (CMSSW) released

Data Management interface and job configuration

implemented

slide-36
SLIDE 36

36 Michael Ernst

Crabmon Statistics (16 – 22 March 2006) ... contributing to CMS P-TDR

slide-37
SLIDE 37

37 Michael Ernst

Data Transfer to CMS SC4 Sites

slide-38
SLIDE 38

38 Michael Ernst

High Rate Transfers across the Atlantic

08/03/2006

Oracle DB maintenance

slide-39
SLIDE 39

39 Michael Ernst

slide-40
SLIDE 40

40 Michael Ernst

pool crashes fixed accept FNAL only

PhEDEx Dashboard for All to T2_DESY

slide-41
SLIDE 41

41 Michael Ernst

Transfer Quality

slide-42
SLIDE 42

42 Michael Ernst

CMS Magnet Tests & Worldwide Data Distribution

P5 Cessy Meyrin

RAW and I ndices

Merge Prom pt Reconstruction Repack

Tier-0 Export Buffers

RAW RECO R A W Tier- 1 Export

Castor Tape Archiving Tier-0 I nput Buffers

RECO

Experiment

slide-43
SLIDE 43

43 Michael Ernst

MTCC Transfers – Magnet Test on 08/27 ... low latency transfers from the DAQ at Pit

slide-44
SLIDE 44

44 Michael Ernst

slide-45
SLIDE 45

45 Michael Ernst

CMS Remote Analysis Builder (CRAB)

3 steps for using CRAB for data analysis create CRAB jobs:

split given total number of events in separate jobs using

number of jobs per event

create n jobs

Submit CRAB jobs from the UI

submit created jobs to the grid

Retrieve CRAB jobs

retrieve job output

One single command to start CRAB analysis Look&Feel very similar to LCG job handling

slide-46
SLIDE 46

46 Michael Ernst

Data Discovery Flow

USER provides in cfg file: dataset name n events to analyze

DBS/DLS

List of sites where datasets are hosted

Query DBS It returns

CRAB queries each local DLS to obtain:

ContactString=http://t2-cms-dls.desy.de.. ContactProtocol=HTTP CatalogueType=XML SE=srm-dcache.desy.de CE=grid-ce.desy.de Nevents=7000 RunRange=1-5 SE=srm-dcache.desy.de CE=grid-ce.desy.de Nevents=7000 RunRange=20-400

If (at least) one catalog entry found and n_events > user_n_event

CRAB creates and submits jobs to the GRID

Requirement of jdl: CMSSW version and

  • ther.GlueCEInfoHostName == CEs“

Crab creates init files

slide-47
SLIDE 47

47 Michael Ernst

CMS Analysis Jobs Bandwidth Requirements (Disk)

130 Jobs / 800MB/s ⇒~ 6MB/s per job ⇒ExDSTAnalysis (running on Dual (single core) Opteron 2.4 GHz

2 ExDSTAnalysis processing running

Total delivered by srm-dcache instance

15 March 2006 (DESY Tier-2 Resources)

slide-48
SLIDE 48

48 Michael Ernst

06/13 – 06/23

slide-49
SLIDE 49

49 Michael Ernst

CMS Analysis Jobs in SC4

Period: 07/01/2006 – 08/31/2006

slide-50
SLIDE 50

50 Michael Ernst

slide-51
SLIDE 51

51 Michael Ernst

Conclusions

  • LHC Data Analysis will depend on
  • ~100 computer centres – run by you
  • two major science grid infrastructures – EGEE and OSG
  • excellent global research networking
  • Have
  • understanding of the experiment computing models
  • agreement on the baseline services
  • good experience from previous service challenges on what the

problems and difficulties are

  • Grids are now operational
  • ~200 sites between EGEE and OSG
  • Grid operations centres running over a year
  • Using GGUS to communicate operational issues
  • > 30K jobs per day accounted (=> 200k jobs/day for CMS in 2008)
  • ~20K simultaneous jobs with the right load and job mix

BUT – a long way to go on reliability