The network monitoring in grid context Operations Perspective Emir - - PowerPoint PPT Presentation

the network monitoring in grid context
SMART_READER_LITE
LIVE PREVIEW

The network monitoring in grid context Operations Perspective Emir - - PowerPoint PPT Presentation

Enabling Grids for E-sciencE The network monitoring in grid context Operations Perspective Emir Imamagic /SRCE EGEE09, Barcelona, Spain www.eu-egee.org


slide-1
SLIDE 1

Enabling Grids for E-sciencE

The network monitoring in grid context

Operations Perspective

  • www.eu-egee.org
  • Emir Imamagic /SRCE

EGEE’09, Barcelona, Spain

slide-2
SLIDE 2

Enabling Grids for E-sciencE

Overview

  • Monitoring In Operations
  • Service Availability Monitoring

– Architecture – Network Monitoring

  • Performance Monitoring
  • Possible Future Work

EGEE-III INFSO-RI-222667

  • Possible Future Work
  • Conclusion

2

slide-3
SLIDE 3

Enabling Grids for E-sciencE Monitoring In Operations

  • Provide means to site and grid operators to monitor

their resources

  • Focus on improving availability and reliability by

spotting problems and issuing alarms

  • Define procedures for escalation and resolution of

EGEE-III INFSO-RI-222667

  • Define procedures for escalation and resolution of

more complex problems

3

slide-4
SLIDE 4

Enabling Grids for E-sciencE

Service Availability Monitoring

EGEE-III INFSO-RI-222667 4

Schema provided by Karolis Eigelis

slide-5
SLIDE 5

Enabling Grids for E-sciencE

The New Architecture

EGEE-III INFSO-RI-222667 5

Schema provided by Karolis Eigelis

slide-6
SLIDE 6

Enabling Grids for E-sciencE

The New Architecture

EGEE-III INFSO-RI-222667 6

slide-7
SLIDE 7

Enabling Grids for E-sciencE

Which Other Systems Are Used?

  • Database components

– Aggregated Topology Provider (ATP) – Metric Description Database (MDDB)

  • Operations services

– GOCDB, ENOC, OIM

  • Grid information services

EGEE-III INFSO-RI-222667

– BDII

7

slide-8
SLIDE 8

Enabling Grids for E-sciencE

What Do We Check?

  • SAM probes

– various grid services (CE, WN and SRM)

  • WLCG probes (SRCE, CERN)

– various grid services (e.g. GridFTP, LFC)

  • BDII & Gstat probes

– validation of content in information system BDII

EGEE-III INFSO-RI-222667

  • Nagios native probes

– standard services (e.g. web, ftp, ssh servers)

8

slide-9
SLIDE 9

Enabling Grids for E-sciencE

Network Monitoring

  • Collaboration with ENOC

– integration of ENOC Downcollector features into SAM

  • Added lightweight service checks

– based on nmap – executed with high frequency

EGEE-III INFSO-RI-222667

– used for masking other alarms

9

slide-10
SLIDE 10

Enabling Grids for E-sciencE

Network Monitoring

  • Integrated network topology data

– ENOC provided static list of border routers for all sites – Nagios supports network hierarchy – in case of router failure site resources flagged as unreachable

EGEE-III INFSO-RI-222667 10

slide-11
SLIDE 11

Enabling Grids for E-sciencE

Performance Monitoring - Grid

  • Several grid systems gather performance

– BDII, GridFTP transfers – Dashboards and VO-specific systems

  • Some raise alarms based on performance data

EGEE-III INFSO-RI-222667 11

slide-12
SLIDE 12

Enabling Grids for E-sciencE

Performance Monitoring - Network

  • Majority of sites are without dedicated links

– without SLAs what should we alarm on?

  • Severe degradation of network performance

– e.g. failure of primary link – interpreted as service unavailability

EGEE-III INFSO-RI-222667 12

slide-13
SLIDE 13

Enabling Grids for E-sciencE

Possible Future Work – Availability Monitoring

  • Lightweight checks improvement?
  • Dynamic network topology info?
  • Better integration with networking monitoring

systems?

EGEE-III INFSO-RI-222667

systems?

  • End-to-end monitoring between sites?

13

slide-14
SLIDE 14

Enabling Grids for E-sciencE

Possible Future Work – Performance Monitoring

  • Dynamic performance testing

– to distinguish between failure and severe degradation – interesting for grid services (job & file transfer management)

  • With dedicated links

– monitoring network parameters

EGEE-III INFSO-RI-222667

– raising alarms in case of degradation

  • Monitoring dynamic link reservation

14

slide-15
SLIDE 15

Enabling Grids for E-sciencE

Conclusion

  • Multilevel monitoring provide the means for

administrators to better monitor their services

  • Integration with existing components to automate
  • perations of monitoring instances

EGEE-III INFSO-RI-222667

  • Network monitoring mainly focused on end-to-end

links

15

slide-16
SLIDE 16

Enabling Grids for E-sciencE

Links

  • OAT web page

https://twiki.cern.ch/twiki/bin/view/EGEE/OAT_EGEE_III

  • OAT Multi-level monitoring architecture

https://twiki.cern.ch/twiki/bin/view/EGEE/MultiLevelMon itoringOverview

EGEE-III INFSO-RI-222667 16

slide-17
SLIDE 17

Enabling Grids for E-sciencE

Thank You!

EGEE-III INFSO-RI-222667

Questions?

17