Operating the Operating the Distributed NDGF Tier-1 Distributed - - PowerPoint PPT Presentation
Operating the Operating the Distributed NDGF Tier-1 Distributed - - PowerPoint PPT Presentation
Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grnager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008 Talk Outline Talk Outline What is NDGF ?
ISGC08, Taipei, April 2008
2
Talk Outline Talk Outline
What is NDGF ? Why a distributed Tier-1 ? Services
Computing Storage Databases VO Specific
Operation Results
ISGC08, Taipei, April 2008
3
Nordic DataGrid Facility Nordic DataGrid Facility
A Co-operative Nordic Data and Computing Grid facility Nordic production grid, leveraging national grid resources Common policy framework for Nordic production grid Joint Nordic planning and coordination Operate Nordic storage facility for major projects Co-ordinate & host major eScience projects (i.e., Nordic
WLGC Tier-1)
Develop grid middleware and services NDGF 2006-2010 Funded (2 M€/year) by National Research Councils of the
Nordic Countries
NOS-N DK SF N S
Nordic Data Grid Facility
ISGC08, Taipei, April 2008
4
Nordic DataGrid Facility Nordic DataGrid Facility
Nordic Participation in Big
Science:
WLCG – the Worldwide Large
Hadron Collider Grid
Gene databases for bio-informatics
sciences
Screening of CO2-Sequestration
suitable reservoirs
ESS – European Spallation Source Astronomy projects Other...
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
Computer centers are small and distributed
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
Computer centers are small and distributed Even the biggest adds up to 7
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy Only one 24x7 center
ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?
Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy Only one 24x7 center Fast inter Nordic network
ISGC08, Taipei, April 2008
Organization – Tier-1 related Organization – Tier-1 related
ISGC08, Taipei, April 2008
Tier-1 Services Tier-1 Services
Storage – Tape and Disk Computing – well connected to storage Network - part of the LHC OPN Databases: 3D for e.g. ATLAS LFC for indexing files File Transfer Service Information systems Monitoring Accounting VO Services: ATLAS specific ALICE specific
ISGC08, Taipei, April 2008
Resources at Sites Resources at Sites
Storage is distributed Computing is distributed Many services are distributed But the sites are heterogeneous...
ISGC08, Taipei, April 2008
Resources at Sites Resources at Sites
ISGC08, Taipei, April 2008
Computing Computing
A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy
ISGC08, Taipei, April 2008
Computing Computing
A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC ! Already deployed Runs on all Linux flavors Uses resources optimally
ISGC08, Taipei, April 2008
Computing Computing
A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC ! Already deployed Runs on all Linux flavors Uses resources optimally gLite keeps nodes idle in up/download
ISGC08, Taipei, April 2008
Computing Computing
A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC ! Already deployed Runs on all Linux flavors Uses resources optimally ARC uses the CE for datahandling
ISGC08, Taipei, April 2008
20
Storage Storage
ISGC08, Taipei, April 2008
21
Storage Storage
ISGC08, Taipei, April 2008
Storage Storage
ISGC08, Taipei, April 2008
Storage Storage
ISGC08, Taipei, April 2008
Storage Storage
dCache Java based – so runs even on Windows ! Separation between resources and services Open source Pools at sites Doors and Admin nodes centrally Part of the development Added GridFTP2 to bypass door nodes in transfers Various improvements a tweaks for distributed use Central services at the GEANT endpoint
ISGC08, Taipei, April 2008
Storage Storage
ISGC08, Taipei, April 2008
Network Network
CERN LHC
NORDUnet NREN
SE
Örestaden HPC2N PDC NSC ... ... Central host(s)
National IP network
National Sites National Switch NDGF AS - AS39590
NO FI DK
Dedicated 10GE to CERN via GEANT (LHCOPN) Dedicated 10GE between participating Tier-1 sites
ISGC08, Taipei, April 2008
Other Tier-1 Services Other Tier-1 Services
Catalogue: RLS & LFC FTS – File Transfer
Service
3D – Distributed
Database Deployment
SGAS -> APEL Service Availability
Monitoring – via ARC- CE SAM sensors
ISGC08, Taipei, April 2008
ATLAS Services ATLAS Services
So far part of Dulcinea Moving to PanDa
The aCT (ARC Control Tower aka “the fat
pilot”)
PanDa improves gLite performance through
better data handling (similar to ARC)
Moving RLS to LFC
ISGC08, Taipei, April 2008
ALICE Services ALICE Services
Many VO Boxes – one pr site
Aalborg, Bergen, Copenhagen, Helsinki,
Jyväskylä, Linjköping, Lund, Oslo, Umeaa
Central VO Box integrating distributed dCache
with xrootd
Ongoing efforts to integrate ALICE and ARC
ISGC08, Taipei, April 2008
NDGF Facility - 2008Q1 NDGF Facility - 2008Q1
ISGC08, Taipei, April 2008
Operations Operations
ISGC08, Taipei, April 2008
Operation Operation
ISGC08, Taipei, April 2008
Operation Operation
1st line support – (in operation)
NORDUnet NOC – 24x7
2nd line support – (in operation)
Operator on Duty – 8x365
3rd line support – (in operation)
NDGF Operation Staff Sys Admins at sites
Shared tickets with NUNOC
ISGC08, Taipei, April 2008
People People
ISGC08, Taipei, April 2008
Results - Accounting Results - Accounting
According to EGEE Accounting Portal for 2007:
NDGF contributed to 4% of all EGEE NDGF was the 5th biggest EGEE site NDGF was the 3rd biggest ATLAS Tier-1
worldwide
NDGF was the biggest European ATLAS Tier-1
ISGC08, Taipei, April 2008
Results - Reliability Results - Reliability
NDGF has been running SAM tests since 2007Q3 Overall 2007Q4 reliability was 96% Which made us the most reliable Tier-1 in the
world
ISGC08, Taipei, April 2008
Results - Efficiency Results - Efficiency
The efficiency of the NorduGrid cloud (NDGF +
Tier-2/3s using ARC) was 93%
Result was mainly due to:
High middleware efficiency High reliability
This was due to:
Distributed setup Professional operation team
ISGC08, Taipei, April 2008
Worries Worries
Can re-constructions run on a distributed setup
High data throughput Low CPU consumption
NDGF, Triumph and BNL reprocessed M5 data in
February in the CCRC08-1
Shown to work Bottleneck was 3D DB (which is running on
- nly one machine)
ISGC08, Taipei, April 2008
Looking ahead... Looking ahead...
The Distributed Tier-1 a success
High efficiency High reliability Passed the CCRC08-1 tests
Partnering with EGEE on:
Operation (taking part in CIC on Duty) Interoperability
Tier-2s under setup
CMS will use gLite interoperability to run on ARC
ISGC08, Taipei, April 2008
40