the network monitoring in grid context
play

The network monitoring in grid context Operations Perspective Emir - PowerPoint PPT Presentation

Enabling Grids for E-sciencE The network monitoring in grid context Operations Perspective Emir Imamagic /SRCE EGEE09, Barcelona, Spain www.eu-egee.org


  1. Enabling Grids for E-sciencE The network monitoring in grid context Operations Perspective Emir Imamagic /SRCE EGEE’09, Barcelona, Spain www.eu-egee.org ������������������������ ����������������������������������������

  2. Overview Enabling Grids for E-sciencE • Monitoring In Operations • Service Availability Monitoring – Architecture – Network Monitoring • Performance Monitoring • Possible Future Work • Possible Future Work • Conclusion EGEE-III INFSO-RI-222667 2

  3. Enabling Grids for E-sciencE Monitoring In Operations • Provide means to site and grid operators to monitor their resources • Focus on improving availability and reliability by spotting problems and issuing alarms • Define procedures for escalation and resolution of • Define procedures for escalation and resolution of more complex problems EGEE-III INFSO-RI-222667 3

  4. Service Availability Monitoring Enabling Grids for E-sciencE Schema provided by Karolis Eigelis EGEE-III INFSO-RI-222667 4

  5. The New Architecture Enabling Grids for E-sciencE Schema provided by Karolis Eigelis EGEE-III INFSO-RI-222667 5

  6. The New Architecture Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 6

  7. Which Other Systems Are Used? Enabling Grids for E-sciencE • Database components – Aggregated Topology Provider (ATP) – Metric Description Database (MDDB) • Operations services – GOCDB, ENOC, OIM • Grid information services – BDII EGEE-III INFSO-RI-222667 7

  8. What Do We Check? Enabling Grids for E-sciencE • SAM probes – various grid services (CE, WN and SRM) • WLCG probes (SRCE, CERN) – various grid services (e.g. GridFTP, LFC) • BDII & Gstat probes – validation of content in information system BDII • Nagios native probes – standard services (e.g. web, ftp, ssh servers) EGEE-III INFSO-RI-222667 8

  9. Network Monitoring Enabling Grids for E-sciencE • Collaboration with ENOC – integration of ENOC Downcollector features into SAM • Added lightweight service checks – based on nmap – executed with high frequency – used for masking other alarms EGEE-III INFSO-RI-222667 9

  10. Network Monitoring Enabling Grids for E-sciencE • Integrated network topology data – ENOC provided static list of border routers for all sites – Nagios supports network hierarchy – in case of router failure site resources flagged as unreachable EGEE-III INFSO-RI-222667 10

  11. Performance Monitoring - Grid Enabling Grids for E-sciencE • Several grid systems gather performance – BDII, GridFTP transfers – Dashboards and VO-specific systems • Some raise alarms based on performance data EGEE-III INFSO-RI-222667 11

  12. Performance Monitoring - Network Enabling Grids for E-sciencE • Majority of sites are without dedicated links – without SLAs what should we alarm on? • Severe degradation of network performance – e.g. failure of primary link – interpreted as service unavailability EGEE-III INFSO-RI-222667 12

  13. Possible Future Work – Availability Monitoring Enabling Grids for E-sciencE • Lightweight checks improvement? • Dynamic network topology info? • Better integration with networking monitoring systems? systems? • End-to-end monitoring between sites? EGEE-III INFSO-RI-222667 13

  14. Possible Future Work – Performance Monitoring Enabling Grids for E-sciencE • Dynamic performance testing – to distinguish between failure and severe degradation – interesting for grid services (job & file transfer management) • With dedicated links – monitoring network parameters – raising alarms in case of degradation • Monitoring dynamic link reservation EGEE-III INFSO-RI-222667 14

  15. Conclusion Enabling Grids for E-sciencE • Multilevel monitoring provide the means for administrators to better monitor their services • Integration with existing components to automate operations of monitoring instances • Network monitoring mainly focused on end-to-end links EGEE-III INFSO-RI-222667 15

  16. Links Enabling Grids for E-sciencE • OAT web page https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III • OAT Multi-level monitoring architecture https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMon itoringOverview EGEE-III INFSO-RI-222667 16

  17. Enabling Grids for E-sciencE Thank You! Questions? EGEE-III INFSO-RI-222667 17

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend