Operating the Operating the Distributed NDGF Tier-1 Distributed - - PowerPoint PPT Presentation

operating the operating the distributed ndgf tier 1
SMART_READER_LITE
LIVE PREVIEW

Operating the Operating the Distributed NDGF Tier-1 Distributed - - PowerPoint PPT Presentation

Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grnager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008 Talk Outline Talk Outline What is NDGF ?


slide-1
SLIDE 1

Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1

Michael Grønager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei, April 10th 2008

slide-2
SLIDE 2

ISGC08, Taipei, April 2008

2

Talk Outline Talk Outline

 What is NDGF ?  Why a distributed Tier-1 ?  Services

Computing Storage Databases VO Specific

 Operation  Results

slide-3
SLIDE 3

ISGC08, Taipei, April 2008

3

Nordic DataGrid Facility Nordic DataGrid Facility

 A Co-operative Nordic Data and Computing Grid facility  Nordic production grid, leveraging national grid resources  Common policy framework for Nordic production grid  Joint Nordic planning and coordination  Operate Nordic storage facility for major projects  Co-ordinate & host major eScience projects (i.e., Nordic

WLGC Tier-1)

 Develop grid middleware and services  NDGF 2006-2010  Funded (2 M€/year) by National Research Councils of the

Nordic Countries

NOS-N DK SF N S

Nordic Data Grid Facility

slide-4
SLIDE 4

ISGC08, Taipei, April 2008

4

Nordic DataGrid Facility Nordic DataGrid Facility

 Nordic Participation in Big

Science:

 WLCG – the Worldwide Large

Hadron Collider Grid

 Gene databases for bio-informatics

sciences

 Screening of CO2-Sequestration

suitable reservoirs

 ESS – European Spallation Source  Astronomy projects  Other...

slide-5
SLIDE 5

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

slide-6
SLIDE 6

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

 Computer centers are small and distributed

slide-7
SLIDE 7

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

 Computer centers are small and distributed  Even the biggest adds up to 7

slide-8
SLIDE 8

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

 Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community

slide-9
SLIDE 9

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

 Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy

slide-10
SLIDE 10

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

 Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy  Only one 24x7 center

slide-11
SLIDE 11

ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?

 Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy  Only one 24x7 center  Fast inter Nordic network

slide-12
SLIDE 12

ISGC08, Taipei, April 2008

Organization – Tier-1 related Organization – Tier-1 related

slide-13
SLIDE 13

ISGC08, Taipei, April 2008

Tier-1 Services Tier-1 Services

 Storage – Tape and Disk  Computing – well connected to storage  Network - part of the LHC OPN  Databases:  3D for e.g. ATLAS  LFC for indexing files  File Transfer Service  Information systems  Monitoring  Accounting  VO Services:  ATLAS specific  ALICE specific

slide-14
SLIDE 14

ISGC08, Taipei, April 2008

Resources at Sites Resources at Sites

 Storage is distributed  Computing is distributed  Many services are distributed  But the sites are heterogeneous...

slide-15
SLIDE 15

ISGC08, Taipei, April 2008

Resources at Sites Resources at Sites

slide-16
SLIDE 16

ISGC08, Taipei, April 2008

Computing Computing

 A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy

slide-17
SLIDE 17

ISGC08, Taipei, April 2008

Computing Computing

 A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally

slide-18
SLIDE 18

ISGC08, Taipei, April 2008

Computing Computing

 A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally  gLite keeps nodes idle in up/download

slide-19
SLIDE 19

ISGC08, Taipei, April 2008

Computing Computing

 A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally  ARC uses the CE for datahandling

slide-20
SLIDE 20

ISGC08, Taipei, April 2008

20

Storage Storage

slide-21
SLIDE 21

ISGC08, Taipei, April 2008

21

Storage Storage

slide-22
SLIDE 22

ISGC08, Taipei, April 2008

Storage Storage

slide-23
SLIDE 23

ISGC08, Taipei, April 2008

Storage Storage

slide-24
SLIDE 24

ISGC08, Taipei, April 2008

Storage Storage

 dCache  Java based – so runs even on Windows !  Separation between resources and services  Open source  Pools at sites  Doors and Admin nodes centrally  Part of the development  Added GridFTP2 to bypass door nodes in transfers  Various improvements a tweaks for distributed use  Central services at the GEANT endpoint

slide-25
SLIDE 25

ISGC08, Taipei, April 2008

Storage Storage

slide-26
SLIDE 26

ISGC08, Taipei, April 2008

Network Network

CERN LHC

NORDUnet NREN

SE

Örestaden HPC2N PDC NSC ... ... Central host(s)

National IP network

National Sites National Switch NDGF AS - AS39590

NO FI DK

 Dedicated 10GE to CERN via GEANT (LHCOPN)  Dedicated 10GE between participating Tier-1 sites

slide-27
SLIDE 27

ISGC08, Taipei, April 2008

Other Tier-1 Services Other Tier-1 Services

 Catalogue: RLS & LFC  FTS – File Transfer

Service

 3D – Distributed

Database Deployment

 SGAS -> APEL  Service Availability

Monitoring – via ARC- CE SAM sensors

slide-28
SLIDE 28

ISGC08, Taipei, April 2008

ATLAS Services ATLAS Services

 So far part of Dulcinea  Moving to PanDa

The aCT (ARC Control Tower aka “the fat

pilot”)

PanDa improves gLite performance through

better data handling (similar to ARC)

 Moving RLS to LFC

slide-29
SLIDE 29

ISGC08, Taipei, April 2008

ALICE Services ALICE Services

 Many VO Boxes – one pr site

Aalborg, Bergen, Copenhagen, Helsinki,

Jyväskylä, Linjköping, Lund, Oslo, Umeaa

 Central VO Box integrating distributed dCache

with xrootd

 Ongoing efforts to integrate ALICE and ARC

slide-30
SLIDE 30

ISGC08, Taipei, April 2008

NDGF Facility - 2008Q1 NDGF Facility - 2008Q1

slide-31
SLIDE 31

ISGC08, Taipei, April 2008

Operations Operations

slide-32
SLIDE 32

ISGC08, Taipei, April 2008

Operation Operation

slide-33
SLIDE 33

ISGC08, Taipei, April 2008

Operation Operation

 1st line support – (in operation)

 NORDUnet NOC – 24x7

 2nd line support – (in operation)

 Operator on Duty – 8x365

 3rd line support – (in operation)

 NDGF Operation Staff  Sys Admins at sites

 Shared tickets with NUNOC

slide-34
SLIDE 34

ISGC08, Taipei, April 2008

People People

slide-35
SLIDE 35

ISGC08, Taipei, April 2008

Results - Accounting Results - Accounting

 According to EGEE Accounting Portal for 2007:

NDGF contributed to 4% of all EGEE NDGF was the 5th biggest EGEE site NDGF was the 3rd biggest ATLAS Tier-1

worldwide

NDGF was the biggest European ATLAS Tier-1

slide-36
SLIDE 36

ISGC08, Taipei, April 2008

Results - Reliability Results - Reliability

 NDGF has been running SAM tests since 2007Q3  Overall 2007Q4 reliability was 96%  Which made us the most reliable Tier-1 in the

world

slide-37
SLIDE 37

ISGC08, Taipei, April 2008

Results - Efficiency Results - Efficiency

 The efficiency of the NorduGrid cloud (NDGF +

Tier-2/3s using ARC) was 93%

 Result was mainly due to:

High middleware efficiency High reliability

 This was due to:

Distributed setup Professional operation team

slide-38
SLIDE 38

ISGC08, Taipei, April 2008

Worries Worries

 Can re-constructions run on a distributed setup

High data throughput Low CPU consumption

 NDGF, Triumph and BNL reprocessed M5 data in

February in the CCRC08-1

Shown to work Bottleneck was 3D DB (which is running on

  • nly one machine)
slide-39
SLIDE 39

ISGC08, Taipei, April 2008

Looking ahead... Looking ahead...

 The Distributed Tier-1 a success

 High efficiency  High reliability  Passed the CCRC08-1 tests

 Partnering with EGEE on:

 Operation (taking part in CIC on Duty)  Interoperability

 Tier-2s under setup

 CMS will use gLite interoperability to run on ARC

slide-40
SLIDE 40

ISGC08, Taipei, April 2008

40

Thanks! Thanks!

Questions?