LHC Computing LHC Computing Nick Brook The LHC & experiments - - PowerPoint PPT Presentation

lhc computing lhc computing
SMART_READER_LITE
LIVE PREVIEW

LHC Computing LHC Computing Nick Brook The LHC & experiments - - PowerPoint PPT Presentation

LHC Computing LHC Computing Nick Brook The LHC & experiments Requirements Computing models Experiences so far Interoperability LCG Baseline service group Future requirements Summary 1 st EGEE User forum


slide-1
SLIDE 1

1st EGEE User forum – CERN , 1st March’06 1

LHC Computing LHC Computing

Nick Brook

  • The LHC & experiments
  • Requirements
  • Computing models
  • Experiences so far
  • Interoperability
  • LCG Baseline service group
  • Future requirements
  • Summary
slide-2
SLIDE 2

1st EGEE User forum – CERN , 1st March’06 2

The CERN LHC

4 Large Experiments

The world’s most powerful particle accelerator First (proton-proton) collisions due in 2007

slide-3
SLIDE 3

1st EGEE User forum – CERN , 1st March’06 3

7,000 tonnes 42m long 22m wide 22m high 2,000 Physicists 150 Institutes 34 Countries

ATLAS Detector

(About the height of a 5 storey building)

slide-4
SLIDE 4

1st EGEE User forum – CERN , 1st March’06 4

LHC Physics Goals

What is mass ?

particles acquire their masses by interacting with another particle, the Higgs Boson

Is there supersymmetry ?

links the matter particles (the quarks and leptons) with the force particles (the gauge bosons) - “grand unified theory”

What is Dark Matter?

The discovery of supersymmetric particles could have important implications for cosmology

Where has all the antimatter gone?

very early moments after the Big Bang the universe should have contained equal amounts of matter and antimatter but the universe we see around us is made up almost entirely of matter

Why are there three "generations" of quarks and leptons?

The answer to this question is probably linked to the answers to the other questions, and in particular to the ideas of supersymmetry and the resolution of the matter - antimatter problem.

slide-5
SLIDE 5

1st EGEE User forum – CERN , 1st March’06 5

  • CERN (Tier-0 centres)

– First pass reconstruction, storage of one copy of RAW data from detectors, calibration data, 1st pass reconstructed data

  • Large external computing centres+CERN (Tier-1 centres)

– Reconstructions and Production-type analysis, storage of the second copy of RAW data and copy of all data to be kept, disk replicas of reconstructed data and analysis data

  • Smaller external computing centres (Tier-2 centres)

– Simulation and end-user analysis, disk replicas of analysis data

Typical LHC experiment computing model Tier-1 & Tier-2 centres are defined by the Level of Service provision

slide-6
SLIDE 6

1st EGEE User forum – CERN , 1st March’06 6

CPU Requirements

50 100 150 200 250 300 350 2007 2008 2009 2010 Year MSI2000

LHCb-Tier-2 CMS-Tier-2 ATLAS-Tier-2 ALICE-Tier-2 LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN

CERN Tier-1 Tier-2

slide-7
SLIDE 7

1st EGEE User forum – CERN , 1st March’06 7

Disk Requirements

20 40 60 80 100 120 140 160 2007 2008 2009 2010 Year PB

LHCb-Tier-2 CMS-Tier-2 ATLAS-Tier-2 ALICE-Tier-2 LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN

CERN

Tier-1 Tier-2

slide-8
SLIDE 8

1st EGEE User forum – CERN , 1st March’06 8

Tape Requirements

CERN Tier-1

20 40 60 80 100 120 140 160 2007 2008 2009 2010

Year

PB

LHCb-Tier-1 CMS-Tier-1 ATLAS-Tier-1 ALICE-Tier-1 LHCb-CERN CMS-CERN ATLAS-CERN ALICE-CERN

slide-9
SLIDE 9

1st EGEE User forum – CERN , 1st March’06 9

LCG/EGEE Usage by LHC Experiments Major use of Grid so far has been for Monte Carlo simulation

slide-10
SLIDE 10

1st EGEE User forum – CERN , 1st March’06 10

LCG/EGEE Usage by LHC Experiments

slide-11
SLIDE 11

1st EGEE User forum – CERN , 1st March’06 11

Example Use of EGEE Resources

DIRAC Job Management Service DIRAC CE DIRAC CE DIRAC CE DIRAC Sites Agent Production Manager GANGA UI DIRAC API JobMonitorSvc JobAccountingSvc AccountingDB Job monitor ConfigurationSvc FileCatalogSvc BookkeepingSvc BK query webpage FileCatalog browser

DIRAC services

Storage DiskFile gridftp LCG Resource Broker CE 1 CE 2 CE 3

Agent

LCG resources

FileCatalog Agent Agent

slide-12
SLIDE 12

1st EGEE User forum – CERN , 1st March’06 12

Site ALICE central services

Job submission

lfn1, lfn2, lfn3 Job 3 lfn1, lfn2, lfn3, lfn4 Job 2 lfn1, lfn2, lfn3, lfn4 Job 1 lfn2 Job 3.2 lfn1, lfn3 Job 3.1 lfn2, lfn4 Job 2.1 lfn1, lfn3 Job 2.1 lfn3, lfn4 Job 1.3 lfn2 Job 1.2 lfn1 Job 1.1

Optimizer

CE agent

RB CE WN

Env OK? Die with grace

Execs agent

Sends job agent to site Yes No Knows close SE’s Matchmakes Receives work-load Asks work-load Retrieves workload Sends job result Updates TQ Submits job User ALICE Job Catalogue Submits job agent

VO-Box LCG User Job ALICE catalogues

Registers

  • utput

{se’s} guid lfn {se’s} guid lfn {se’s} guid lfn {se’s} guid lfn guid {se’s} lfn

ALICE File Catalogue

slide-13
SLIDE 13

1st EGEE User forum – CERN , 1st March’06 13

Status of production

Production job duration: 8 ½ hours on 1KSi2K CPU, output archive size: 1 GB (consists of 20 files)

2450 jobs

slide-14
SLIDE 14

1st EGEE User forum – CERN , 1st March’06 14

Production Grid

  • Basic middleware
  • A set of baseline services agreed

and initial versions in production

  • All major LCG sites active
  • Grid job failure rate 5-10% for most

experiments, down from ~30% in 2004

  • Sustained 10K jobs per day
  • > 10K simultaneous jobs

during prolonged periods

Average number of jobs/day EGEE Grid - 2005

2,000 4,000 6,000 8,000 10,000 12,000 14,000 jan feb mar apr may jun jul aug sep

  • ct

nov month

jobs/day . j

slide-15
SLIDE 15

1st EGEE User forum – CERN , 1st March’06 15

ProdDB CE CE CE Dulcinea Dulcinea Dulcinea Dulcinea Dulcinea Lexor Dulcinea Dulcinea CondorG CG PANDA RB RB RB

ATLAS Prodsys

slide-16
SLIDE 16

1st EGEE User forum – CERN , 1st March’06 16

country sites country sites country sites Austria 2 India 1 Russia 10 Belgium 1 Israel 2 Singapore 1 Bulgaria 4 Italy 25 Slovakia 3 Canada 6 Japan 1 Slovenia 1 China 1 Korea 1 Spain 13 Croatia 1 Netherlands 2 Sweden 2 Cyprus 1 Macedonia 1 Switzerlan d 2 Czech Republic 2 Pakistan 2 Taiwan 4 France 8 Poland 4 Turkey 1 Germany 8 Portugal 1 UK &Ireland 35 Greece 6 Puerto Rico 1 USA 3 Hungary 1 Romania 1 Yugoslavia 1

EGEE/LCG-2 grid:

174 sites, 40

countries >17,000 processors, ~5 PB storage

46 CEs, 15459 CPUs 6 SEs

50 sites 13 countries > 5000 CPU’s

Interoperability is a major issue

slide-17
SLIDE 17

1st EGEE User forum – CERN , 1st March’06 17

EGEE – OSG:

  • Job submission demonstrated in both directions
  • Done in a sustainable manner
  • EGEE WN tools installed as a grid job on OSG nodes

EGEE – ARC:

  • Longer term want to agree standard interfaces to grid services
  • Short term:
  • EGEE→ARC: Try to use Condor component that talks to ARC CE
  • ARC→EGEE: discussions with EGEE WMS developers to understand

where to interface

  • Default solution: NDGF acts as a gateway

In both cases:

  • Catalogues are experiment choices – generally local catalogues use

local grid implementations

Interoperability

slide-18
SLIDE 18

1st EGEE User forum – CERN , 1st March’06 18

Recent Service Challenges - throughput phase

slide-19
SLIDE 19

1st EGEE User forum – CERN , 1st March’06 19

SC3 re-run throughput

50 100 150 200 250 300 ASCC BNL CNAF DESY FNAL GRI DKA I N2P3 NDGF PI C RAL SARA Triumf achieved goal

Recent Service Challenges - throughput phase

MB/s

slide-20
SLIDE 20

1st EGEE User forum – CERN , 1st March’06 20

Recent Service Challenges - experiment experiences

slide-21
SLIDE 21

1st EGEE User forum – CERN , 1st March’06 21

Extremely useful for shaking down sites, experiment systems & WLCG

  • Many new components used for the 1st time in anger
  • Need for additional functionality in services
  • F(ile) T(ransfer) S(ervice), L(CG) Fi(le) C(atalog), S(torage)

R(esource) M(anager), … Reliability seems to be the major issue

  • MSS at CERN - still ironing out problems, but big

improvements

  • Coordination issues
  • Problems with sites and networks
  • MSS, security, network, services…

FTS:

  • For well-defined site/channels performs well after tuning
  • Timeout problems dealing with accessing data from MSS

SRM:

  • Limitations/ambiguity in functionality for v1.1

SC3 summary - expt perspective

slide-22
SLIDE 22

1st EGEE User forum – CERN , 1st March’06 22

Ganga

  • Designed for data analysis on the Grid

– LHCb will do all its analysis on T1’s – T2’s mostly for simulation

  • System should not be general – we know all main

use cases

– Use prior knowledge – Identified use pattern

  • Aid user in

– Bookkeeping aspects – Keeping track of many individual jobs

  • Developed in cooperation between LHCb and

ATLAS with EGEE support

slide-23
SLIDE 23

1st EGEE User forum – CERN , 1st March’06 23

slide-24
SLIDE 24

1st EGEE User forum – CERN , 1st March’06 24

slide-25
SLIDE 25

1st EGEE User forum – CERN , 1st March’06 25

slide-26
SLIDE 26

1st EGEE User forum – CERN , 1st March’06 26

CMS Analysis on the Grid

Most accessed sites since July 05 CRAB jobs so far

Many 10’s of thousands of jobs run to produce results for CMS technical design report

slide-27
SLIDE 27

1st EGEE User forum – CERN , 1st March’06 27

LCG baseline service working group -goals

  • Experiments and regional centres agreement on

baseline services

– Support the computing models for the initial period of LHC – Thus must be in operation by September 2006.

  • The services concerned are those that

– supplement the basic services

  • (e.g. provision of operating system services, local cluster

scheduling, compilers, ..)

– and which are not already covered by other LCG groups

  • such as the Tier-0/1 Networking Group or the 3D Project.
  • Not a middleware group – focus on what the

experiments need & how to provide it

  • Expose experiment plans and ideas
slide-28
SLIDE 28

1st EGEE User forum – CERN , 1st March’06 28

slide-29
SLIDE 29

1st EGEE User forum – CERN , 1st March’06 29

Service Challenge 4 – the Pilot LHC Service

  • Start service June’06
  • Must be able to support a demonstration of the

complete chain

  • Data Acquisition → Tier-0 → Tier-1

data recording, calibration, reconstruction

  • simulation, batch and end-user analysis

Tier-1 ↔ Tier-2 data exchange

  • Service metrics → agreed (MoU) service levels
  • Extension of the service to most Tier-2 sites
  • Production LCG service by October’06
slide-30
SLIDE 30

1st EGEE User forum – CERN , 1st March’06 30

  • Steady progress in understanding and fixing problems

– Not everything is there at the moment – Every Service Challenge has increasingly ambitious objectives and an increasing number of problems to be fixed!

  • Experiments, Sites, m/w developers all work in the same

direction and with the same objectives / priorities

  • (Some) experiment specific solutions are unavoidable and

should be supported in a form acceptable to all parties

– First collision due in 2007 & experiments need a stable system beforehand to ensure their systems work

  • Proper prioritisation of experiment requirements is now

mandatory in order to satisfy them

– Ongoing activity with EGEE TCG

  • Progress needs to continue to allow different flavoured

Grid to interoperate

Summary