short introduction to monitoring systems for large short
play

Short introduction to monitoring systems for large Short - PowerPoint PPT Presentation

Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Immediate Reaction / Configuration Change Immediate Reaction / Configuration Change Security Auditing Security


  1. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Immediate Reaction / Configuration Change Immediate Reaction / Configuration Change Security Auditing Security Auditing Security Auditing Security Auditing Definition and setup of security polices for systems, users and network. Daily system checks and log analysis to find any incidents. Failure Monitoring Failure Monitoring Failure Monitoring Failure Monitoring Monitoring of hardware and software failures. Event based alerting gives the potential to react autonomous. Performance Monitoring Performance Monitoring Performance Monitoring Performance Monitoring Long term monitoring and statistical evaluation of system and network performance values. Baseline (Service Level Agreement) Baseline (Service Level Agreement) [ Pierre Zelnicek 2010 ]

  2. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Performance Monitoring Performance Monitoring What has to be monitored? CPU & Disk & Memory (CDM) usage, Network bandwidth usage, CPU & Disk & Network I/O, Network latency Where it has to be monitored? On the systems itself per single user/process/instance. Network monitoring is done by accessing the switches via SNMP. What opensource solutions are existing? Ganglia, Lemon, Cacti, Smokeping and self build solutions based on RRDTool [ Pierre Zelnicek 2010 ]

  3. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Failure Monitoring Failure Monitoring What has to be monitored? Errors and failures in hardware and software components. Critical thresholds for performance values. Where it has to be monitored? On the systems itself per single software instance or hardware component. Performance thresholds can be monitored via access to an performance monitoring system. What opensource solutions are existing? SysMES, Nagios What should the software additional provide? Autonomous execution of reactions to known or possible errors and failures. Is it possible to foresee hardware failures? Yes, for some hardware components which provide indicator values. [ Pierre Zelnicek 2010 ]

  4. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Failure Monitoring Failure Monitoring Enclosure Device ID: 252 Slot Number: 7 Device Id: 11 Sequence Number: 2 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 931.512 GB [0x74706db0 Sectors] Non Coerced Size: 931.012 GB [0x74606db0 Sectors] Coerced Size: 930.390 GB [0x744c8000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x96803721a299998b Connected Port Number: 7(path0) Inquiry Data: WD-WMATV1432482WDC WD1002FBYS-02A6B0 03.00C06 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 3.0Gb/s Link Speed: 3.0Gb/s Media Type: Hard Disk Device [ Pierre Zelnicek 2010 ]

  5. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Failure Monitoring Failure Monitoring [ Pierre Zelnicek 2010 ]

  6. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Security Auditing Security Auditing What does security auditing cover? Check for policy enforcement setup and check for incidents. Where it has to be done? On the systems itself for policy enforcement setup. On a central logging facility for detecting incidents. What opensource solutions are existing? SELinux or AppArmor for policy enforcement. RSyslog and LogCheck, SNORT for incident detection. [ Pierre Zelnicek 2010 ]

  7. Short introduction to monitoring systems for large Short introduction to monitoring systems for large computer farms computer farms Reporting Service Level Agrement [ week / month / year ] Configuration/Setup Change Management Policy Change Security Auditing Security Auditing Check for incident Failure Monitoring Failure Monitoring Check thresholds Performance Monitoring Performance Monitoring Immediate autonomous reaction / response [ Pierre Zelnicek 2010 ]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend