Data Management in ATLAS Angelos Molfetas on behalf of the ATLAS DQ2 - - PowerPoint PPT Presentation

data management in atlas
SMART_READER_LITE
LIVE PREVIEW

Data Management in ATLAS Angelos Molfetas on behalf of the ATLAS DQ2 - - PowerPoint PPT Presentation

Angelos Molfetas Data Management in ATLAS Angelos Molfetas on behalf of the ATLAS DQ2 team 1 ATLAS DDM COLLABORATION A.Molfetas (CERN), F.Barreiro (CERN), A.Tykhonov (Jo ef Stefan Institute), V.Garonne (CERN), S.Campana (CERN), M.Lassnig


slide-1
SLIDE 1

Data Management in ATLAS

Angelos Molfetas on behalf of the ATLAS DQ2 team

Angelos Molfetas

1

slide-2
SLIDE 2

ATLAS DDM COLLABORATION

A.Molfetas (CERN), F.Barreiro (CERN), A.Tykhonov (Jožef Stefan Institute), V.Garonne (CERN), S.Campana (CERN), M.Lassnig (CERN), M.Barisits (Vienna University of Technology), D.Zang (Institute of high energy physics, Chinese Academy of Sciences), C.Serfon (LMU Munich), P.Calfayan (LMU Munich), D.Oleynik (Joint Institute for Nuclear Research), D.Kekelidze (Joint Institute for Nuclear Research) , A.Petrosyan (Joint Institute for Nuclear Research), S.Jezequel (IN2P3), I.Ueda (University of Tokyo), Gancho Dimitrov (Deutsches Elektronen- Synchrotron), Florbela Tique Aires Viegas (CERN)

2

slide-3
SLIDE 3

Introduction

¤ Presentation intended for a general audience ¤ Current issues & trends ¤ Covers some of the issues we are facing in ATLAS Distributed Data Management (DDM) ¤ ATLAS grid:

¤ Over 800 end points ¤ Petabytes of data managed on the grid ¤ System responsible for this is DQ2 middleware

Angelos Molfetas

3

slide-4
SLIDE 4

The ATLAS Computing Model

¤ Grid Sites are organised in Tiers

¤ Tier-0 ¤ record RAW detector data ¤ distributed data to Tier-1s ¤ calibration and first-pass reconstruction ¤ Tier-1s ¤ permanent storage ¤ capacity for reprocessing and bulk analysis ¤ Tier-2s ¤ Monte-Carlo simulation ¤ user analysis

Tier-1 Tier-0

Online filter farm RAW ESD AOD Reconstruction farm RAW ESD AOD MC Analysis farm Re-reconstruction farm

Tier-2

Analysis farm Monte Carlo farm SelSD, AOD RAW RAW ESD AOD ESD, AOD RAW MC RAW ESD AOD ESD, AOD

slide-5
SLIDE 5

The ATLAS Computing Model

¤ Sites are also organised in clouds

¤ not the “computer science” definition of clouds, though!

¤ Every cloud has a major Tier-1 and associated Tier-2s ¤ Mostly geographical and/or political

¤ support ¤ deployment ¤ funding

LYON BNL

LPC

Tokyo NW GRIF

T3

NET2

FR Cloud BNL Cloud

Pékin NG LYON BNL FZK TRIUMF ASGC PIC SARA RAL CNAF

CERN

Clermont LAPP CCPM Roumanie SW GL SLAC TWT2 Melbourne

slide-6
SLIDE 6

DQ2 (Don Quijote 2)

¤ DQ2 enforces dataset

¤ placement ¤ replication ¤ deletion ¤ access ¤ consistency ¤ monitoring ¤ accounting

DQ2 Clients & API DQ2 Common Modular Framework Production Analysis Interactive Physics Metadata WLCG

OPEN SCIENCE GRID LHC COMPUTING GRID

NORDUGRID

Site Services Centrals Catalogs Database Deletion Transfer Consistency Repository, Content Location, Accounting, Subscription Tracer Data Export

slide-7
SLIDE 7

Managing Heterogeneous resources

¤ Users need to be able to:

¤ Download/Upload data from the grid ¤ Transfer data between sites ¤ User should not need to know about each storage system

¤ Many different mass storage systems are used - we need a simplified interface that hides the grid’s heterogeneity.

¤ Not trivial ¤ In ATLAS this is done by DQ2 middleware and abstraction layers like SRM

¤ For example:

¤ User downloads dataset by CLI: “dq2-get user.angelos.xxxxxxx” ¤ No specific knowledge is required about castor, dcache, xrootd, etc.

Angelos Molfetas

7

slide-8
SLIDE 8

Catalogs

¤ Maintain global state of data (central catalog of all datasets on the grid)

¤ This has to scale ¤ Central point of failure

¤ In ATLAS we have Local File Catalogs (LFC) which also have to be maintained. ¤ For example, uploading data to the grid:

¤ Dq2-put –s files_location user.angelos.xxxxxxx ¤ Has to handle different storage systems ¤ Has to register files in central catalogs ¤ Has to register files in LFC

¤ Not trivial. E.g. order of operations in dq2-put can create dark data

Angelos Molfetas

8

slide-9
SLIDE 9

Maintaining Consistency

¤ Consistency service for identifying data corruption on the grid ¤ Have to maintain awareness of changing datasets on the grid. For example, if we replicate dataset user.angelos.xxxxx to site A, B, and C, and then this dataset changes, the changes have to propagated ¤ At the ATLAS scale we need to enforce concept of dataset immutability

Angelos Molfetas

9

slide-10
SLIDE 10

Replication policy

¤ Replication largely driven by the ATLAS Computing Model ¤ Datasets are marked as:

¤ Primary – mandated by the Computing Model ¤ Secondary - in excess of the Computing Model

¤ Secondary replicas reduced by popularity ¤ Determining popularity of datasets

¤ Collecting traces ¤ Aggregating traces

¤ Problems with the current approach – dynamic approaches

Angelos Molfetas

10

slide-11
SLIDE 11

Scalability

¤ At the grid level, scalability is a primary concern ¤ New technologies ¤ Seven fold increase of file events over the year ¤ Disk I/O is the bottle neck ¤ Parkinson's Law

0.00E+00 2.00E+07 4.00E+07 6.00E+07 8.00E+07

File Events on the Grid

Angelos Molfetas

11

slide-12
SLIDE 12

Trends

¤ Moving towards meta data driven model, rather than hierarchical container -> dataset -> file ¤ Increased emphasis on searching by meta data ¤ Simplification of services, consolidation (e.g. consolidation of LFCs) ¤ Optimisation by simulation ¤ Move to open protocols

Angelos Molfetas

12

slide-13
SLIDE 13

Summary

¤ Major Issues:

¤ Scalability ¤ Consistency ¤ Replication policy ¤ Heterogeneity

¤ Trends

¤ Addressing scalability ¤ Metadata ¤ Simplification of services ¤ Simulation

Angelos Molfetas

13

slide-14
SLIDE 14

Backup slides

Angelos Molfetas

14

slide-15
SLIDE 15

SRM and Space Tokens

¤ Storage systems implement a common interface

¤ Storage Resource Manager (SRM) ¤ gridftp as common transfer protocol ¤ storage specific access protocols ¤ Space Tokens ¤ partitioning of storage resources according to activities

¤ Each ATLAS site is identified by a site name and according space token

¤ DESY-ZN_PRODDISK

¤ 'srm': 'token:ATLASPRODDISK:srm://lcg-se0.ifh.de:8443/srm/managerv2?SFN=/pnfs/ifh.de/data/atlas/ atlasproddisk/'

SRM REQUESTS

CASTOR / dCache DPM / StoRM BestMAN

gridFTP

local access