Mod-Gearman Distributed Monitoring based on the Gearman Framework - - PowerPoint PPT Presentation

mod gearman
SMART_READER_LITE
LIVE PREVIEW

Mod-Gearman Distributed Monitoring based on the Gearman Framework - - PowerPoint PPT Presentation

Mod-Gearman Distributed Monitoring based on the Gearman Framework Sven Nierlein 18.10.2012 Consol http://www.consol.de/open-source-monitoring/ www.consol.com 18.10.2012 2 Introduction Common Scenarios Installation


slide-1
SLIDE 1

18.10.2012

Mod-Gearman

Distributed Monitoring based

  • n the Gearman Framework

Sven Nierlein

slide-2
SLIDE 2

18.10.2012 www.consol.com

2

Consol

  • http://www.consol.de/open-source-monitoring/
slide-3
SLIDE 3

18.10.2012 www.consol.com

3

  • Introduction
  • Common Scenarios
  • Installation
  • Configuration
  • Performance Data
  • Improved Plugin Output
  • Exports
  • Tools
  • Performance
slide-4
SLIDE 4

18.10.2012 www.consol.com

4

Introduction

slide-5
SLIDE 5

18.10.2012 www.consol.com

5

Introduction

  • Gearman
  • Distributes tasks across the network from multiple clients to multiple worker
  • Load balancing
  • Client/Worker supports C, Java, Perl, PHP, Python and Shell
  • Asynchronous
slide-6
SLIDE 6

18.10.2012 www.consol.com

6

Introduction

Nagios Mod-Gearman NEB PNP4Nagios Worker Gearman Daemon Mod-Gearman Worker

Checkresults Perfdata Checks / Events Perfdata / Exports Checks / Events Perfdata Checkresults

Tools: send_gearman send_multi

Checkresults

slide-7
SLIDE 7

18.10.2012 www.consol.com

7

Common Scenarios

slide-8
SLIDE 8

18.10.2012 www.consol.com

8

Load Reduction & Non Blocking

Nagios

hosts=yes services=yes eventhandler=yes

Worker

hosts=yes services=yes eventhandler=yes

Pros

  • Move blocking events away from Nagios core (Eventhandler, on-demand hostchecks)
  • Reduce forking overhead from huge nagios core
  • Even reduces load when both are on the same host
slide-9
SLIDE 9

18.10.2012 www.consol.com

9

Load Balancing

Worker

hosts=yes services=yes eventhandler=yes

Nagios

hosts=yes services=yes eventhandler=yes

Worker

hosts=yes services=yes eventhandler=yes

Pros

  • Spread load across multiple hosts
slide-10
SLIDE 10

18.10.2012 www.consol.com

10

Distributed Setup

Nagios

hosts=yes services=yes eventhandler=yes hostgroups=remote

Worker

hosts=no services=no eventhandler=no hostgroups=remote

Worker

hosts=yes services=yes eventhandler=yes

Pros

  • Easy replacement for

remote nagios installations

  • Central configuration
slide-11
SLIDE 11

18.10.2012 www.consol.com

11

Distributed & Load Balancing

Nagios

hosts=yes services=yes eventhandler=yes hostgroups=remote

Worker

hosts=no services=no eventhandler=no hostgroups=remote

Worker

hosts=yes services=yes eventhandler=yes

Worker

hosts=no services=no eventhandler=no hostgroups=remote

Worker

hosts=yes services=yes eventhandler=yes

Pros

  • Active/active remote sites
slide-12
SLIDE 12

18.10.2012 www.consol.com

12

Distributed & Load Balancing + Graphing

Nagios

hosts=yes services=yes eventhandler=yes hostgroups=remote perfdata=yes

Worker

hosts=no services=no eventhandler=no hostgroups=remote

Worker

hosts=yes services=yes eventhandler=yes

Worker

hosts=no services=no eventhandler=no hostgroups=remote

Worker

hosts=yes services=yes eventhandler=yes

PNPWorker

slide-13
SLIDE 13

18.10.2012 www.consol.com

13

Check Serialization

Nagios

hosts=no services=no eventhandler=no servicegroups=serial

Worker

hosts=no services=no eventhandler=no servicegroups=serial max-worker=1

Pros

  • Useful for non-serializable checks (ex. check_selenium, java checks. etc...)
  • “parallelize_check” has been removed in Nagios 3.x
  • Works better than “max_concurrent_checks”
slide-14
SLIDE 14

18.10.2012 www.consol.com

14

Installation

slide-15
SLIDE 15
  • Standalone
  • Packages are available for Centos/Redhat/SLES
  • http://mod-gearman.org/pkg/
  • including Gearmand
  • Mod-Gearman is part of the Debian 7, Wheezy
  • Consol Labs Repository
  • https://labs.consol.de/repo/
  • Packages for Mod-Gearman, Gearmand, Thruk, OMD
  • OMD
  • Mod-Gearman is included in OMD

18.10.2012 www.consol.com

15

Installation

slide-16
SLIDE 16

18.10.2012 www.consol.com

16

Configuration

slide-17
SLIDE 17
  • Load Broker Module
  • nagios.cfg:
  • broker_module=.../lib/mod_gearman/mod_gearman.o config=/etc/mod-gearman/server.cfg

18.10.2012 www.consol.com

17

Configuration - NEB Module

slide-18
SLIDE 18
  • NEB configuration should be the sum of all workers

18.10.2012 www.consol.com

18

Configuration

Nagios hosts=yes services=yes eventhandler=yes Worker hosts=yes services=yes eventhandler=yes Nagios hosts=yes services=yes eventhandler=yes hostgroups=remote Worker hosts=no services=no eventhandler=yes Worker hosts=yes services=yes eventhandler=no hostgroups=remote

+ = =

slide-19
SLIDE 19
  • config
  • can be used to specify/include config files
  • server
  • list of gearmand servers to connect to
  • encryption
  • enable/disable encryption
  • key
  • plaintext key used for encryption
  • keyfile
  • read key from this file

18.10.2012 www.consol.com

19

Configuration - Common

slide-20
SLIDE 20
  • services
  • all servicechecks
  • hosts
  • all hostchecks
  • hostgroups
  • list of hostgroups going into a separate queue
  • servicegroups
  • list of servicegroups going into a separate queue
  • eventhandler
  • execute eventhandler with Mod-Gearman
  • localhostgroups
  • list of hostgroups not managed by Mod-Gearman
  • localservicegroups
  • list of servicegroups not managed by Mod-Gearman
  • do_hostchecks
  • can be used to manage hostchecks by Nagios

18.10.2012 www.consol.com

20

Configuration - Queues

slide-21
SLIDE 21

18.10.2012 www.consol.com

21

Configuration - Queues

localservicegroups? localhostgroups? servicegroups? hostgroups? hosts=yes? services=yes? Let Nagios take care about this check Let Nagios take care about this check Put check in servicegroup queue: servicegroup_<groupname> Put check in hostgroup queue: hostgroup_<groupname> Put check in generic “hosts” queue Put check in generic “services” queue

slide-22
SLIDE 22
  • set queue by custom variable
  • NEB: queue_custom_variable=worker
  • Nagios:

define host { ... _WORKER hostgroup_test }

  • Worker: hostgroups=test
  • http://labs.consol.de/nagios/mod-gearman/#_how_to_set_queue_by_custom_variable

18.10.2012 www.consol.com

22

Configuration - Queues by Custom Variable

slide-23
SLIDE 23
  • Embedded Perl has serious memory leaks
  • bad for nagios
  • process grows and gets slower and slower
  • k with Mod-Gearman
  • worker processes will be renewed from time to time
  • worker:
  • enable_embedded_perl=on
  • enable embedded perl
  • use_embedded_perl_implicitly=off
  • nly when explicitly enabled by the script itself
  • #!/usr/bin/perl

# nagios: +epn

18.10.2012 www.consol.com

23

Configuration - Embedded Perl

slide-24
SLIDE 24
  • identifier
  • unique name of this worker, defaults to hostname
  • min-worker
  • minimum number of total worker
  • max-worker
  • maximum number of total worker
  • spawn-rate
  • rate at which new worker will be spawned
  • idle-timeout
  • timeout in seconds before a idling worker exists
  • max-jobs
  • maximum number of jobs before a worker exists
  • dupserver
  • useful to send copy of result to other Gearmand server

18.10.2012 www.consol.com

24

Configuration - Worker

slide-25
SLIDE 25

18.10.2012 www.consol.com

25

Performance Data

slide-26
SLIDE 26

18.10.2012 www.consol.com

26

Performance Data

Nagios Mod-Gearman NEB PNP4Nagios Worker Gearman Daemon

Perfdata Perfdata

Config

  • Set “perfdata=yes” in your Mod-Gearman neb configuration.
  • Set “process_performance_data=1” in your nagios.cfg.
  • Adjust gearman options in process_perfdata.cfg and start pnp_gearman_worker.
slide-27
SLIDE 27

18.10.2012 www.consol.com

27

Improved Plugin Output

slide-28
SLIDE 28

18.10.2012 www.consol.com

28

  • STDERR output included:
  • display worker identifier on errors
  • display stderr output for easy plugin debugging
  • translated signal names

Improved Plugin Output

slide-29
SLIDE 29

18.10.2012 www.consol.com

29

Exports

slide-30
SLIDE 30

18.10.2012 www.consol.com

30

Exports

  • Export core events and data into gearman queues
  • Format is JSON
  • Write worker in any language gearman supports (C, Java, Perl, PHP, Python and Shell)
  • No need to poll for data all the time
  • Example
  • Syntax:

export=<queue>:<returncode>:<callback>[,<callback>,...]

  • mod_gearman_neb.cfg:

export=log_queue:1:NEBCALLBACK_LOG_DATA

  • Limited to a few callbacks currently:
  • NEBCALLBACK_PROCESS_DATA
  • NEBCALLBACK_TIMED_EVENT_DATA
  • NEBCALLBACK_LOG_DATA
slide-31
SLIDE 31

18.10.2012 www.consol.com

31

Tools

slide-32
SLIDE 32

18.10.2012 www.consol.com

32

gearman_top

  • Shows current state of all queues
  • $ gearman_top -H localhost:4730
slide-33
SLIDE 33

18.10.2012 www.consol.com

33

check_gearman

  • Use as nagios plugin to check Gearmand and worker
  • $ ./check_gearman -H localhost

check_gearman CRITICAL - failed to connect to localhost:4730 - Connection refused

  • $ ./check_gearman -H localhost

check_gearman OK - 0 jobs running and 0 jobs waiting. Version: 0.25|...

slide-34
SLIDE 34

18.10.2012 www.consol.com

34

send_gearman

  • Similar but extended functionality like send_nsca
  • Can be used to send passive check result via Mod-Gearman
  • Can send active results with --active
  • Use --latency, --starttime, --finishtime to preserve those attributes too
  • $ ./bin/send_gearman --server=mo --keyfile=etc/mod-gearman/secret.key \
  • -host='localhost' --service='ping' --message='Ping OK' --returncode=0
slide-35
SLIDE 35

18.10.2012 www.consol.com

35

  • Return multiple results from check_multi
  • Basically:

$ check_multi -r 256 -f check.cfg | ./bin/send_multi --config=mod_gearman.cfg --host=<host>

  • Better:

#!/bin/bash host=$1; shift;

  • ther=$*

report="256" [ "$other" != "" ] && report="13"

  • ut=`.../libexec/check_by_ssh -H $host -q -C ".../check_multi -f .../multi.cfg -r $report $other" 2>&1`

rc=$? if [ `echo "$out" | grep -c "CHILD"` -eq 0 -o "$other" != "" ]; then echo "$out" exit $rc fi echo "$out" | .../send_multi config=.../mod_gearman.conf host=$host

  • “check_multi -i <subcheck>” allows you to reschedule single checks from a multi.cfg

$ ./better.sh # for all $ ./better.sh -i check17 # for a single check P P P P

send_multi

slide-36
SLIDE 36

18.10.2012 www.consol.com

36

gearman_proxy.pl

Gearman Daemon (DMZ) Nagios

Checkresults Checks / Events Checks / Events Checkresults

Gearman Daemon (Main) Mod-Gearman Worker gearman- proxy.pl

  • All connections are initiated from the

worker/client

  • Use gearman_proxy.pl in case where it’s

not possible to directly access the gearmand from remote locations

slide-37
SLIDE 37

18.10.2012 www.consol.com

37

Thruk

  • Thruks Dashboard has some Mod-Gearman related Panels
slide-38
SLIDE 38

18.10.2012 www.consol.com

38

Performance

slide-39
SLIDE 39

18.10.2012 www.consol.com

39

  • Main reason for Mod-Gearman was making distributed monitoring easy
  • but it’s quite fast too
  • all tests done with

Livestatus and Mod-Gearman Module loaded

  • tests were made on a

single virtual machine

Performance

slide-40
SLIDE 40

18.10.2012 www.consol.com

40

  • Debian6 VM 2x2.5GHz with 2GB Ram + 2 external Worker
  • nearly 2.000 active service checks per second!

Performance

slide-41
SLIDE 41

18.10.2012 www.consol.com

41

Questions?

slide-42
SLIDE 42

18.10.2012 www.consol.com

42

Resources

  • http://labs.consol.de/nagios/mod-gearman/
  • http://gearman.org/
  • http://docs.pnp4nagios.org/de/pnp-0.6/modes#gearman_mode
  • http://my-plugin.de/wiki/projects/check_multi/feed_passive
  • http://packages.debian.org/de/source/sid/mod-gearman
  • http://mod-gearman.org/pkg/