BaBar Distributed Computing Stephen J. Gowdy SLAC Super B-Factory - - PowerPoint PPT Presentation

babar distributed computing
SMART_READER_LITE
LIVE PREVIEW

BaBar Distributed Computing Stephen J. Gowdy SLAC Super B-Factory - - PowerPoint PPT Presentation

BaBar Distributed Computing Stephen J. Gowdy SLAC Super B-Factory Workshop 22 nd April 2005 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 1 Overview Foundations Tier-A Sites Data Distribution 22 nd April 2005 BaBar


slide-1
SLIDE 1

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 1

BaBar Distributed Computing

Stephen J. Gowdy SLAC Super B-Factory Workshop 22nd April 2005

slide-2
SLIDE 2

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 2

Overview

  • Foundations
  • Tier-A Sites
  • Data Distribution
slide-3
SLIDE 3

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 3

Software Distribution

  • All source is in CVS repository in AFS

– Allows code to be seen anywhere in the world

  • SoftRelTools/SiteConfig used to configure

each site

– Location of external software, compilers – Server names (Objectivity lock servers, etc.)

  • UserLogin package to set up environment

– More site customisation here (these

modifications are not in CVS)

slide-4
SLIDE 4

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 4

Software Distribution (Cont.)

  • “bin” package contains bootstrapping scripts

– Installed at sites as $BFROOT/bin

  • importrel used to import a BaBar Software

Release

– By default imports all architectures, can use

importarch to only import selected platforms (would tell importrel to not import any)

– Once local run “gmake siteinstall” to reconfigure

the release for local site

  • Should now be able to run applications as

would at SLAC

slide-5
SLIDE 5

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 5

Eventstore

  • Collection names are trivially mapped to first

logical file name (LFN)

/store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00 /store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.01.root

  • Mapping from LFN to physical file name via

site specific configuration file

– $BFROOT/kanga/config/KanAccess.cfg

[yakut06] ~/reldirs/tstanalysis-24/workdir > KanAccess /store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.01.root root://kanolb-a:1094///store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.01.root

– This one uses xrootd for access

slide-6
SLIDE 6

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 6

Eventstore (Cont.)

  • xrootd used for production data access

– Resilient against many failure modes – Very little overhead to disk IO – Now part of ROOT distribution

  • Latest versions at http://xrootd.slac.stanford.edu

From 15th April 2005

slide-7
SLIDE 7

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 7

Eventstore (Cont.)

  • Collections made up of different components

– Generally two classes of files from production

  • Micro

– Header – User Data (ntuple-like information associated with particles) – (B)Tag Information (event level information) – Candidates (physics level reconstructed objects) – Analysis Object Data (AOD, detector level information) – Truth (if MC data)

  • Mini

– Event Summary Data (ESD)

  • (Third class contains RAW and Simulation data)
  • File names let you know what is in them

/store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.02E.root

slide-8
SLIDE 8

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 8

Eventstore (Cont.)

  • Production skimming done on data and

Monte Carlo

– Currently have 189 skims defined

  • Vary a great deal in selection rate (<% of % to ~10%)

– Each skim can decide to only be a pointer, deep

copy the micro or deep copy the micro and mini

  • All include the Tag, Candidates and User data
  • Pointer skims require underlying production

collections (not available at all sites)

  • Deep copy skims expected to be more performant

– Analysis runs on skims

slide-9
SLIDE 9

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 9

Bookkeeping

  • RDBMS based system

– Support Oracle and mySQL

  • Knows about collections (Data Set Entity)
  • Groups collections in to datasets

– Analysis performed on datasets – Example datasets are;

  • AllEvents-Run5-OnPeak-R18 (data)
  • SP-998-Run4 (MC)
  • Tool to mirror databases to different sites
  • Have a key distribution system to allow off-

site access to databases

slide-10
SLIDE 10

Simulation Production

slide-11
SLIDE 11

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 11

Tier-A Sites

  • Currently have 6 Tier-A sites

– SLAC (Prompt Calibration, analysis, simulation,

skimming)

– CC-IN2P3, France (analysis, simulation) – RAL (analysis, simulation) – Padova (Event Reconstruction, skimming,

simulation)

– GridKa, Germany (analysis, skimming,

simulation)

– CNAF, Italy (analysis)

slide-12
SLIDE 12

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 12

Tier-A Sites (Cont.)

  • Tasks at each Tier-A site based on local

expertise and needed level of resources

  • Countries received a Common Fund rebate

based on their resources contributed (50%

  • f the cost saving at SLAC, the other 50%

get distributed to all other countries)

– Actual usage reported each six months to the

International Finance Committee (funding agencies)

slide-13
SLIDE 13

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 13

Data Distribution

  • Primarily method using Bookkeeping tools to

do distribution

– Sites can choose to import certain datasets

  • Perhaps only the AOD, or the full AOD & ESD

– Site can have a local database to remember

which files have been imported

– Bookkeeping tools warn users if they do not

have all of the data locally

  • All data import and export is via SLAC

– Could set up other Tier-A sites for export – Cluster of servers

slide-14
SLIDE 14
slide-15
SLIDE 15

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 15

Data Distribution (Cont.)

  • Recently decided to allocate datasets to

Tier-A sites based on Analysis Working Groups

– Each AWG has a set of skims associated with it – All the skims for an AWG are put at one site

slide-16
SLIDE 16

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 16

Summary

  • BaBar has a very productive Distributed

Computing system

  • For analysis users have an inconvenience of

using a specific site (that they may not have used before)

– In the future the “Grid” is forecast to solve this

slide-17
SLIDE 17

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 17

Backup Slides

slide-18
SLIDE 18

22nd April 2005 BaBar Distributed Computing - S. J. Gowdy 18

BbkDatasetTcl

[yakut06] ~ > BbkDatasetTcl -l '*BSemiExcl-Run4-*R16a' BbkDatasetTcl: 7 datasets found:- BSemiExcl-Run4-OffPeak-R16a BSemiExcl-Run4-OnPeak-R16a SP-1005-BSemiExcl-Run4-R16a SP-1235-BSemiExcl-Run4-R16a SP-1237-BSemiExcl-Run4-R16a SP-3429-BSemiExcl-Run4-R16a SP-998-BSemiExcl-Run4-R16a [yakut06] ~ > BbkDatasetTcl 'BSemiExcl-Run4-*R16a' BbkDatasetTcl: wrote BSemiExcl-Run4-OffPeak-R16a.tcl (7 collections, 1300477/132941301 events, ~9990.6/pb) BbkDatasetTcl: wrote BSemiExcl-Run4-OnPeak-R16a.tcl (73 collections, 22851621/1448776065 events, ~99532.6/pb) Selected 80 collections, 24152098/1581717366 events, ~109523.1/pb, from 2 datasets

slide-19
SLIDE 19

SLAC Usage

– Extra disk space originally made available for CM2

conversion, ~80 TB to be freed of old Kanga+Objy

– SLAC CPU time is a mix of dedicated and batch use

Disk space Batch time + Dedicated CPU

slide-20
SLIDE 20

IN2P3 Usage

– Note: IN2P3 uses dynamic staging system (HPSS) – Batch utilization has come back strong after decline

last summer

Disk space Batch time

slide-21
SLIDE 21

RAL Usage

– Actual disk space slightly exceeding 2004 MOU – Batch use peaked in Oct, recent drop the effect of

transitioning away from old Kanga

Disk space Batch time

slide-22
SLIDE 22

INFN Usage

– Disk already above 2004 MOU, including CNAF – Dedicated CPU reached 2004 MOE in Dec 2004,

analysis started to add to that

Disk space Dedicated CPU + Batch time

slide-23
SLIDE 23

GridKa Usage

– Disk space reached MOU in mid2004 – CPU usage continued positive trend of 1st half of 2004 – With analysis use, has peaked above MOU level

Disk space Batch time