www.egi.eu EGI-InSPIRE RI-261323
EGI-InSPIRE
www.egi.eu EGI-InSPIRE RI-261323
Grid Oversight, Status and Issues
Ron Trompert COD
9/19/12 1
Grid Oversight, Status and Issues Ron Trompert COD 1 www.egi.eu - - PowerPoint PPT Presentation
9/19/12 EGI-InSPIRE Grid Oversight, Status and Issues Ron Trompert COD 1 www.egi.eu www.egi.eu EGI-InSPIRE RI-261323 EGI-InSPIRE RI-261323 AP www.egi.eu EGI-InSPIRE RI-261323 History Transition from 10 ROCs to now 37 NGIs
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
9/19/12 1
www.egi.eu EGI-InSPIRE RI-261323
AP
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
Start follow-up of A/R tickets Transition from SAM to Nagios
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
– Means that the grid is down for about 2 days every month – But the grid is not down for 2 days every month. 94% is the average availability of sites but it is not the availability
– If the availability of the Grid is defined as the probability that the ops VO can store a file and run a job on the grid, the availability of the grid is much much higher
www.egi.eu EGI-InSPIRE RI-261323
– Is the monthly follow-up of the A/R metrics beneficial? – If this activity is stopped, will the A/R drop?
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
Start follow-up RPI
www.egi.eu EGI-InSPIRE RI-261323
– Holidays and in the past weekends – Ignored alarms
– Regional SE down – Nagios problems – Top-BDII problems
– Close in nonOK status
– Bad coordination
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
– Holidays – Ignored alarms
– Problems with monitoring system
– Non-production service – These alarms should have been handled.
– Bad coordination
– People go on holidays and forget to pass on their shift to a colleague – People that forgot that they were on shift
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
– Some NGIs “certify” sites to get them to make the
– This is how it should go down:
www.egi.eu EGI-InSPIRE RI-261323
– Should not be closed in principle and a ticket
– There are cases when it is OK to close them
– Some times an alarm is closed with the
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323
– Please have a look at the “Performance
www.egi.eu EGI-InSPIRE RI-261323
www.egi.eu EGI-InSPIRE RI-261323