Grid Deployment & Operations in the UK
Jeremy Coles GridPP Production Manager UK&I Operations for EGEE J.Coles@rl.ac.uk
Wednesday 3rd May ISGC 2006, Taipei
Grid Deployment & Operations in the UK Wednesday 3 rd May ISGC - - PowerPoint PPT Presentation
Grid Deployment & Operations in the UK Wednesday 3 rd May ISGC 2006, Taipei Jeremy Coles GridPP Production Manager UK&I Operations for EGEE J.Coles@rl.ac.uk Overview 1 Background to e-Science The UK Grid Projects NGS & GridPP
Jeremy Coles GridPP Production Manager UK&I Operations for EGEE J.Coles@rl.ac.uk
Wednesday 3rd May ISGC 2006, Taipei
2 The deployment and operations models and vision 3 GridPP performance measures
4 Progress in GridPP against LCG requirements
5 Future plans 6 Summary 1 Background to e-Science – The UK Grid Projects NGS & GridPP
– Application focused/ led developments – Varying degree of “ infrastructure” …
‘e-S cience is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’
John Taylor Director General of Research Councils Office of S cience and Technology
http:/ / www.rcuk.ac.uk/ escience/
LHC ISIS TS2 HPCx + HECtoR Users get common access, tools, information, Nationally supported services, through NGS
Integrated internationally
VRE, VLE, IE
Regional and Campus grids
Community Grids
cover traditional computational sciences
– Both user and pre-installed software
everal data focused activities
– Distributed data and/ or collaborators
collaborations
– Explicitly encourage new users – Common infrastructure/ interfaces
Consisted of 3 partners in EGEE-I:
ervice (NGS )
Number of Registered NGS Users
50 100 150 200 250 300 14 January 2004 23 April 2004 01 August 2004 09 November 2004 17 February 2005 28 May 2005 05 September 2005 14 December 2005
Date Number of Users
NGS User Registrations Linear (NGS User Registrations)
Consisted of 3 partners in EGEE-I:
ervice (NGS )
Grid-Ireland focus:
built over the Higher Education Authority network
EGEE componenets
Consisted of 3 partners in EGEE-I:
ervice (NGS )
and a Tier-1 as per the LCG Tier model
In EGEE-II:
and Grid-Ireland unchanged
GridPP Tier-2s becomes a partner.
UK Core e-Science Programme Institutes Tier-2 Centres CERN LCG EGEE
Tier-1/A Middleware, Security, Networking Experiments Grid Support Centre
Not to scale!
Apps Dev Apps Int
UK Core e-Science Programme Institutes Tier-2 Centres CERN LCG CERN LCG EGEE
Tier-1/A Tier-1/A Middleware, Security, Networking Middleware, Security, Networking Experiments Grid Support Centre Grid Support Centre
Not to scale!
Apps Dev Apps Dev Apps Int
Production Manager
NorthGrid Coordinator S
S cotGrid Coordinator London Tier-2 Coordinator
Tier-2 support Tier-2 support Tier-2 support Tier-2 support Site Administrat or Site Administrat or Site Administrat or Site Administrator Tier-1 Manager
Tier-1 Technical Coordinator Tier-1 support & administrators
S torage Group Networking Group VOMS support Catalogue support Helpdesk support Tier-2 Board Tier-1 Board Deployment Board User Board Proj ect Management Board Collaboration Board Oversight Committee
Production Manager
NorthGrid Coordinator S
S cotGrid Coordinator London Tier-2 Coordinator
Tier-2 support Tier-2 support Tier-2 support Tier-2 support Site Administrat or Site Administrat or Site Administrat or Site Administrator Tier-1 Manager
Tier-1 Technical Coordinator Tier-1 support & administrators
S torage Group Networking Group VOMS support Catalogue support Helpdesk support Tier-2 Board Tier-1 Board Deployment Board User Board Proj ect Management Board Collaboration Board Oversight Committee
upporting dCache
upporting DPM
upporting network testing
Recent output from SOME areas follows… .
Example activities from across these areas
How effectively are resources being used?
Tier-1 developed script uses one simple measure: sum(CPU time) / sum(wall time). Low efficiencies for 2005 were generally a few j obs making the situation look bad.
http:/ / www.gridpp.rl.ac.uk/ stats/ Problems with SEs 2006
http:/ / gridportal.hep.ph.ic.ac.uk/ rtm/ reports.html * Data shown for Q42005 What are the underlying reasons for big differences in overall efficiency
http:/ / gridportal.hep.ph.ic.ac.uk/ rtm/ reports.html * Data shown for Q42005 Does the usage distribution make sense?
http:/ / gridportal.hep.ph.ic.ac.uk/ rtm/ reports.html * Data shown for Q42005 Operations needs to check mappings and discover why some sites not used
torage provided
torage provided
cheduled downtime
100 200 300 400 500 600 L i v e r p
M a n c h e s t e r * U C L
E P B r u n e l S h e f f i e l d D u r h a m Q u e e n M a r y , U L B i r m i n g h a m R
a l H
l
a y , U L I C H E P U C L
C C I C L e S C * * G l a s g
E d i n b u r g h O x f
d C a m b r i d g e R A L P P R A L
i e r
L a n c a s t e r B r i s t
Hours of Scheduled Downtime October November December
100 200 300 400 500 600 L i v e r p
M a n c h e s t e r * U C L
E P B r u n e l S h e f f i e l d D u r h a m Q u e e n M a r y , U L B i r m i n g h a m R
a l H
l
a y , U L I C H E P U C L
C C I C L e S C * * G l a s g
E d i n b u r g h O x f
d C a m b r i d g e R A L P P R A L
i e r
L a n c a s t e r B r i s t
Hours of Scheduled Downtime October November December
torage provided
cheduled downtime
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% R A L P P E d i n b u r g h O x f
d R
a l H
l
a y , U L U C L
C C L a n c a s t e r Q u e e n M a r y , U L L i v e r p
G l a s g
D u r h a m B i r m i n g h a m S h e f f i e l d I C H E P B r i s t
B r u n e l M a n c h e s t e r U C L
E P C a m b r i d g e I C L e S C Average occupancy Contribution to UK Tier-2 processing
100 200 300 400 500 600 Liverpool Manchester* UCL-HEP Brunel Sheffield Durham Queen Mary, UL Birmingham Royal Holloway, UL IC HEP UCL-CCC IC LeSC** Glasgow Edinburgh Oxford Cambridge RALPP RAL-Tier-1 Lancaster Bristol Hours of Scheduled Downtime October November December 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% RALPP Edinburgh Oxford Royal Holloway, UL UCL-CCC Lancaster Queen Mary, UL Liverpool Glasgow Durham Birmingham Sheffield IC HEP Bristol Brunel Manchester UCL-HEP Cambridge IC LeSC Average occupancy Contribution to UK Tier-2 processing
torage provided
cheduled downtime
FT failures
20 40 60 80 100 120 140 B r u n e l R A L
i e r
E d i n b u r g h I C H E P B i r m i n g h a m G l a s g
S h e f f i e l d L a n c a s t e r U C L
C C O x f
d C a m b r i d g e R
a l H
l
a y , U L Q u e e n M a r y , U L D u r h a m U C L
E P R A L P P I C L e S C L i v e r p
M a n c h e s t e r * B r i s t
* # of critical SFT failures December November October
100 200 300 400 500 600 Liverpool Manchester* UCL-HEP Brunel Sheffield Durham Queen Mary, UL Birmingham Royal Holloway, UL IC HEP UCL-CCC IC LeSC** Glasgow Edinburgh Oxford Cambridge RALPP RAL-Tier-1 Lancaster Bristol Hours of Scheduled Downtime October November December 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% RALPP Edinburgh Oxford Royal Holloway, UL UCL-CCC Lancaster Queen Mary, UL Liverpool Glasgow Durham Birmingham Sheffield IC HEP Bristol Brunel Manchester UCL-HEP Cambridge IC LeSC Average occupancy Contribution to UK Tier-2 processing
torage provided
cheduled downtime
FT failures
20 40 60 80 100 120 140 Brunel RAL-Tier-1 Edinburgh IC HEP Birmingham Glasgow Sheffield Lancaster UCL-CCC Oxford Cambridge Royal Holloway, UL Queen Mary, UL Durham UCL-HEP RALPP IC LeSC Liverpool Manchester* Bristol* # of critical SFT failures December November October
20 40 60 80 100 120 R A L T i e r
Q u e e n M a r y , U L M a n c h e s t e r R
a l H
l
a y , U L S h e f f i e l d C a m b r i d g e R A L P P U C L
E P B r u n e l D u r h a m L a n c a s t e r E d i n b u r g h B i r m i n g h a m G l a s g
O x f
d I m p e r i a l C
l e g e H E P L i v e r p
U C L
C C # tickets Q3 & Q4 2005 Average time in hrs to resolve tickets for Q3 & Q4 2005
100 200 300 400 500 600 Liverpool Manchester* UCL-HEP Brunel Sheffield Durham Queen Mary, UL Birmingham Royal Holloway, UL IC HEP UCL-CCC IC LeSC** Glasgow Edinburgh Oxford Cambridge RALPP RAL-Tier-1 Lancaster Bristol Hours of Scheduled Downtime October November December 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% RALPP Edinburgh Oxford Royal Holloway, UL UCL-CCC Lancaster Queen Mary, UL Liverpool Glasgow Durham Birmingham Sheffield IC HEP Bristol Brunel Manchester UCL-HEP Cambridge IC LeSC Average occupancy Contribution to UK Tier-2 processing
torage provided
cheduled downtime
FT failures
20 40 60 80 100 120 140 Brunel RAL-Tier-1 Edinburgh IC HEP Birmingham Glasgow Sheffield Lancaster UCL-CCC Oxford Cambridge Royal Holloway, UL Queen Mary, UL Durham UCL-HEP RALPP IC LeSC Liverpool Manchester* Bristol* # of critical SFT failures December November October 20 40 60 80 100 120 R A L T i e r
Q u e e n M a r y , U L M a n c h e s t e r R
a l H
l
a y , U L S h e f f i e l d C a m b r i d g e R A L P P U C L
E P B r u n e l D u r h a m L a n c a s t e r E d i n b u r g h B i r m i n g h a m G l a s g
O x f
d I m p e r i a l C
l e g e H E P L i v e r p
U C L
C C # tickets Q3 & Q4 2005 Average time in hrs to resolve tickets for Q3 & Q4 2005
2 4 6 8 10 12 14 16 R A L P P I C
E P R A L T i e r
O x f
d B i r m i n g h a m E d i n b u r g h U C L
C C L a n c a s t e r M a n c h e s t e r G l a s g
D u r h a m Q u e e n M a r y , U L R
a l H
l
a y , U L C a m b r i d g e B r u n e l S h e f f i e l d B r i s t
L i v e r p
U C L
E P I C
E S C Number of supported VOs
100 200 300 400 500 600 Liverpool Manchester* UCL-HEP Brunel Sheffield Durham Queen Mary, UL Birmingham Royal Holloway, UL IC HEP UCL-CCC IC LeSC** Glasgow Edinburgh Oxford Cambridge RALPP RAL-Tier-1 Lancaster Bristol Hours of Scheduled Downtime October November December 0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00% 50.00% RALPP Edinburgh Oxford Royal Holloway, UL UCL-CCC Lancaster Queen Mary, UL Liverpool Glasgow Durham Birmingham Sheffield IC HEP Bristol Brunel Manchester UCL-HEP Cambridge IC LeSC Average occupancy Contribution to UK Tier-2 processing
torage provided
cheduled downtime
FT failures
..
WHAT MAKES A S ITE BETTER (beyond manpower)?
periods
data!
meeting MoU/ S LA targets
20 40 60 80 100 120 140 Brunel RAL-Tier-1 Edinburgh IC HEP Birmingham Glasgow Sheffield Lancaster UCL-CCC Oxford Cambridge Royal Holloway, UL Queen Mary, UL Durham UCL-HEP RALPP IC LeSC Liverpool Manchester* Bristol* # of critical SFT failures December November October 20 40 60 80 100 120 R A L T i e r
Q u e e n M a r y , U L M a n c h e s t e r R
a l H
l
a y , U L S h e f f i e l d C a m b r i d g e R A L P P U C L
E P B r u n e l D u r h a m L a n c a s t e r E d i n b u r g h B i r m i n g h a m G l a s g
O x f
d I m p e r i a l C
l e g e H E P L i v e r p
U C L
C C # tickets Q3 & Q4 2005 Average time in hrs to resolve tickets for Q3 & Q4 2005
2 4 6 8 10 12 14 16 R A L P P I C
E P R A L T i e r
O x f
d B i r m i n g h a m E d i n b u r g h U C L
C C L a n c a s t e r M a n c h e s t e r G l a s g
D u r h a m Q u e e n M a r y , U L R
a l H
l
a y , U L C a m b r i d g e B r u n e l S h e f f i e l d B r i s t
L i v e r p
U C L
E P I C
E S C Number of supported VOs
Example: Tier-2 individual transfer tests
172 Mb/ s
QMUL IC-HEP
461 Mb/ s
Birmingham
456 Mb/ s
Oxford
74 Mb/ s
Cambridge
193 Mb/ s
Durham
Cam 289 Mb/ s Birmingham 252 Mb/ s Oxford Durham 118 Mb/ s QMUL 388 Mb/ s
RAL-PPD IC-HEP
331Mb/ s
Glasgow
440Mb/ s
Edinburgh
150 Mb/ s
Manchester Lancaster
397 Mb/ s 84Mb/ s 166 Mb/ s 156Mb/ s 350Mb/ s ~800Mb/ s
RAL Tier-1
RAL-PPD IC-HEP Glasgow Edinburgh Manchester Lancaster RAL Tier-1
Receiving
Example rates from throughput tests
ite connectivity (and contention)
RM setup and level of optimisation
RM setup and level of optimisation
cheduling tests was not straightforward
tatus of hardware deployment
http:/ / wiki.gridpp.ac.uk/ wiki/ S ervice_Challenge_Transfer_Tests Initial focus was on getting S RMs understood and deployed… ..
Example: Tier-1 & Tier-2 combined transfer tests
http:/ / wiki.gridpp.ac.uk/ wiki/ S C4_Aggregate_Throughput
Lancaster
Tier-1 & Tier-2 combined transfer tests-rerun
http:/ / wiki.gridpp.ac.uk/ wiki/ S C4_Aggregate_Throughput
RM deployments now stable and focus has shifted to improving site configurations and optimisations
ites are now more comfortable with the release/ reporting process but concerns remain – gLite 3.0
tests to include such things as sustained simultaneous reading and writing
everal sites are receiving new equipment – we need to ensure a smooth
the Tier-2 capabilities. S
integration and automaton is far from ideal.
ecurity – several areas but extending ROC security challenge and implementing an approach for j oint logging are in progress.
2 There will be increasing interoperation between UK activities 3 The UK particle physics grid remains one of the largest projects 4 Operational focus will shift to performance measures
6 There are clear areas where further work is required
1 UK e-science has a broad vision with NGS a central part 5 Progress being made for LHC pilot service but not always smoothly