use puppet and network inventory to populate nagios
play

Use puppet and network inventory to populate nagios/icinga - PowerPoint PPT Presentation

http://www.grnet.g r GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF-NOC Dublin Alexandros Kosiaris (alex@noc.grnet.gr) Network & Equipment Optical Network: Storage Equipment:


  1. ฀ http://www.grnet.g r GRNET NOC Use puppet and network inventory to populate nagios/icinga configuration TF-NOC Dublin Alexandros Kosiaris (alex@noc.grnet.gr)

  2. ฀ Network & Equipment  Optical Network: •Storage Equipment:  ~70 cities (+30 within next year)  Netapp/IBM N5300  15years-leased dark fiber  EMC Celerra NS-480  DWDM/CWDM network •Computing Equipment: •  Virtualization (KVM)  Optical Equipment:  12 Blade servers, HP BL-460c  Alcatel 1626LM, 1696MS, 1678MCC  12 IBM 1U Servers  Adva FSP2000  128 1U Fujitsu Servers  Routing Equipment:  275 2U HP Proliant Servers  Juniper T1600, Juniper MX960  ~200 Vms  ~10x Cisco 12000s,  a few Cisco 7200s/7300s  Switching Equipment:  Cisco 6500  Several Cisco 3750, Cisco 2970, Juniper ex4200, Extreme X450a/X350

  3. Nagios + Network Equipment or (more accurately) Switching and Routing  In-house developed Network Inventory (a.k.a. GRNETDB) •A MySQL database of almost 150 tables •Populated multiple times a day by a PHP discovery script  SNMP, telnet + expect •Basic Concepts:  Node  Interface  Layer  Domain  Location •These concepts get extended to represent functionality  Routing, Switching nodes  Layer2, Layer3 interfaces  Switching, administrative domains

  4. Nagios + Network Equipment or (more accurately) Switching and Routing  In-house developed python Django project, with multiple sub-apps •Network (the interface to the database) •RG (router graphs, take a peek at http://mon.grnet.gr/rg) •Maps (take a look at http://mon.grnet.gr/network/maps) •Hostmaster •Optical network (built mostly on Location info) •Nadjicingo  Builts on network app and generates a nagios/icinga configuration •Nagvis  Same thing but generates/updates nagvis config

  5. Nadjicingo  A Django management command outputing nagios/icinga configuration •Run by crontab every hour (manage.py nadjicingo) •Will generate nagios configuration objects for  Routers  Switches  Interfaces •L3 Topology aware (nagios hates cyclic dependencies – aka redundant links), populates parents field for most devices. •Hardware checks in devices •Business logic embedded in interface descriptions:  Part of it is a unique identifier for a customers link –[.NTUA-4] => National Technical University's L3 link –[AUTH@ERMOU-1] => Aristotle University of Thessaloniki L2 link at Ermou PoP

  6. Nagvis  A Django management command (again...) •Run by crontab every hour (manage.py nagvis) •Will update a specific nagvis map configuration by:  Removing obsolete nodes  Adding new nodes to a special area for manual positioning on map •Also features an automated positioning mode based on devices Latitude Longitude.  Nice for showoff but not for overview in monitoring applications •Will only populate host objects in map. •Service objects cluttered it too much and information is rightly available anyway

  7. Nagvis Network Map

  8. Servers, Services ?  A little bit of history •For years, GRNET only had very basic services (DNS, email, Web) •And some router supporting services (Looking glass, mrtg, rancid) •And very few servers (<=10) •3 years ago, major paradigm shift from networking to services •20 Servers bought, and then 132 and recently 275 more •End user services were born:  Public cloud storage service (Pithos)  Virtual Private Servers (ViMa)  Students books statements (Eudoxus)  Student Id cards (Paso)  Public IaaS (Okeanos)  Academic Professor Elections (Apella) •Plus many other services and projects (TCS, Whois, NTP, VoD,…) •The result ? => 200 Vms were created for managing all this infrastructure

  9. Puppet to the rescue  What is Puppet? •It's a stack of applications •It's a language (a declarative one as well) •It's a policy and state enforcing tool •It's a attribute and state discovery tool (kind of...) •It's a new paradigm in managing systems!  What is Puppet not? •Not just an automation tool •Not a “For loop” •Not a command execution framework (it can be reduced to that though)  AGAIN: A new paradigm, you need to change the way you work

  10. Puppet Concepts  Facts •Attributes of a system:  OS Version and family  Available memory  CPUs  Block devices  IP addresses/netmasks  MAC addresses  And anything else you can write code for it to be discovered  LLDP neighbours  IPMI functionality  Hardware info  Apache vhosts •Discovered by facter and then made available to Puppet

  11. Puppet Concepts(2)  Resources •Files, Directories •Users, Groups •Packages •Vlans •Interfaces •Nagios objects!!!! •And a lot more (http://docs.puppetlabs.com/references/latest/type.html)  Classes •A way to group resources •Support inheritance and mixins (aka including) •The standard class has 3 resources defined •Package {'software': } •File { '/etc/software.conf': } •Service { 'softwared': }

  12. Puppet Concepts(3) •Nodes •A.k.a. machines (VM or hardware) •A node CAN (and probably will) have multiple puppet classes •Node population can be done in multiple ways: •Puppet language config •LDAP •External script  Puppetd agents running in each machine (daemon or crontab)  Central Puppetmaster (with an RDBMS) holds all the configuration and data

  13. Hello World example class helloworld { file { '/tmp/helloworld': ensure => present, owner => root, group => root, mode => 640, content => 'Hello world' } } node mynode { include helloworld }  Will create the /tmp/helloworld with all the attributes as defined above  More importantly, if run again it will make sure to wipe any possible changes and restore the state as is defined above

  14. Back to nagios  Let’s use a puppet native type nagios_host { “$hostname”: address => 10.10.10.10, alias => myhost, contact_groups => hostadmins, hostgroups => 'Puppeted Servers', }  /etc/nagios/nagios_host.cfg gets populated  Problem is ... •This is executed in the machine running puppetd not the nagios server.  No problem. Puppet supports exported resources.

  15. Exported resources  Let’s prepend the definition with two @ signs @@nagios_service { 'myservice' contact_groups => hostadmins, host_name => $hostname, tag => 'collect_me_nagios_server', } •Exports the resource but does not realize it on the machine running puppetd •No /etc/nagios/nagios_service.cfg file created <<| Nagios_service tag == 'collect_me_nagiosserver' |>> • In nagios server’s manifest. •/etc/nagios/nagios_service.cfg populated. •nagios,icinga.cfg can now just include the file/directory and monitoring begins

  16. Simple example  A manifest for all authoritative DNS servers  Install bind9, install configuration and ensure it is running  Open up firewall  Setup a simple DNS check class authoritativedns { include bind9 include service::dns @@nagios_service { "authdns": command => "check_dig!www.grnet.gr", servicegroups => "DNS,DNS:Authoritative" } }

  17. Interesting use cases  Class hierarchy means:  A base class nagios::host that is included in all other  So all servers nagios-monitored without any intervention But:  A Server is physical and has IPMI capabilities: So export another nagios host for it if $ipmi_capable { @@nagios_host { "$ipmi_dns": address => $ipmi_ipaddress, tag => "hardwarehost", } }

  18. Interesting use cases (2)  Server is an HP Proliant Server class hp-health { package { [ 'hp-health', 'hpacucli' ]: ensure => present, } nagios::host::service { 'hpacucli': ensure => present, servicegroups => 'HARDWARE', command => 'check_nrpe!dsa-check-hpacucli!0', } nagios::host::service { 'hpasm': ensure => present, servicegroups => 'HARDWARE', command => 'check_nrpe!dsa-check-hpasm!0', } }

  19. Interesting use cases (3)  Multicast beacons (double exported resources!!!) define ssmping_check($ipv4, $ipv6) { $local = $::fqdn $remote = $name if ($::ipaddress and $ipv4 and $local != $remote) { @@nagios_service { "ping-ssm-$remote-$local-v4": ensure => present, check_command => "check_nrpe!check_ssmping!$ipv4", host_name => $local, service_description => "Multicast from $remote SSM IPv4", } … } # export the checks... @@ssmping_check { $fqdn: ipv4 => $ipaddress, ipv6 => $ipv6address}

  20. Interesting use cases (4)  Standard checks for all servers nagios::host::service { "disk": command => "check_nrpe!check_disk!13% 7%", } nagios::host::service { "load": command => "check_nrpe!check_load!4,3,2 5,4,3", } nagios::host::service { "users": command => "check_nrpe!check_load!20 30", } nagios::host::service { "swap": command => "check_nrpe!check_swap!60 40", } nagios::host::service { "check_tainted": command => "check_nrpe!check_tainted!0", } nagios::host::service { "check_firewall": command => "check_nrpe!check_firewall!0", }

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend