Bringing ZFS information into SNMP Thomas Stibor GSI Helmholtz - - PowerPoint PPT Presentation

bringing zfs information into snmp
SMART_READER_LITE
LIVE PREVIEW

Bringing ZFS information into SNMP Thomas Stibor GSI Helmholtz - - PowerPoint PPT Presentation

Bringing ZFS information into SNMP Thomas Stibor GSI Helmholtz Centre for Heavy Ion Research, HPC 27. Januar 2014 What is SNMP? Simple Network Management Protocol (SNMP) is protocol for network management. It allows collecting


slide-1
SLIDE 1

Bringing ZFS information into SNMP

Thomas Stibor

GSI Helmholtz Centre for Heavy Ion Research, HPC

  • 27. Januar 2014
slide-2
SLIDE 2

What is SNMP?

  • Simple Network Management Protocol (SNMP) is protocol

for network management.

  • It allows collecting information from switches, printers,

linux-boxes, . . . and also to configure (write access) those.

thomas@lxdv65:~>snmpget -v 1 -c public localhost 1.3.6.1.2.1.1.1.0 iso.3.6.1.2.1.1.1.0 = STRING: "Linux lxdv65 3.8.13-tstibor-lxdv65-rev1 #1 SMP Wed May 15 12:32:59 CEST 2013 x86_64" thomas@lxdv65:~>snmpget -v 1 -c public localhost 1.3.6.1.2.1.25.1.4.0 iso.3.6.1.2.1.25.1.4.0 = STRING: "BOOT_IMAGE=/boot/vmlinuz-3.8.13-tstibor-lxdv65-rev1 root=/dev/mapper/vg0-debian ro quiet"

What are these strange looking numbers, e.g.

1.3.6.1.2.1.25.1.4.0?

  • Each Object Identifier (short OID) identifies a variable that

can be read or set via SNMP.

  • OID(s) are organized hierarchically.
slide-3
SLIDE 3

OID(s) as a Tree (snmpwalk)

thomas@lxdv65:~> snmpwalk -c public -v 2c localhost 1.3.6.1.4.1.2021.9.1 iso.3.6.1.4.1.2021.9.1.1.1 = INTEGER: 1 iso.3.6.1.4.1.2021.9.1.1.2 = INTEGER: 2 ... iso.3.6.1.4.1.2021.9.1.1.12 = INTEGER: 12 iso.3.6.1.4.1.2021.9.1.2.1 = STRING: "/" iso.3.6.1.4.1.2021.9.1.2.2 = STRING: "/sys" ... iso.3.6.1.4.1.2021.9.1.2.12 = STRING: "/zfs/pools-deduplication" iso.3.6.1.4.1.2021.9.1.3.1 = STRING: "rootfs" iso.3.6.1.4.1.2021.9.1.3.2 = STRING: "sysfs" ... iso.3.6.1.4.1.2021.9.1.3.5 = STRING: "devpts" iso.3.6.1.4.1.2021.9.1.3.6 = STRING: "tmpfs"

1.3.6.1.4.1.2021.9.1 1 1 2 . . . 12 2 1 2 . . . 12 3 1 2 . . . 6

slide-4
SLIDE 4

Human Readable OID(s)

Given an OID

  • What is the semantic meaning of e.g.

thomas@lxdv65:~>snmpget -v 1 -c public localhost 1.3.6.1.2.1.25.1.6.0 iso.3.6.1.2.1.25.1.6.0 = Gauge32: 597

  • Is there a description giving us more information?

thomas@lxdv65:~>snmptranslate -m SNMPv2-MIB 1.3.6.1.2.1.1.1 SNMPv2-MIB::sysDescr thomas@lxdv65:~>snmptranslate -m SNMPv2-MIB -On -Td 1.3.6.1.2.1.1.1 .1.3.6.1.2.1.1.1 sysDescr OBJECT-TYPE

  • - FROM

SNMPv2-MIB

  • - TEXTUAL CONVENTION DisplayString

SYNTAX OCTET STRING (0..255) DISPLAY-HINT "255a" MAX-ACCESS read-only STATUS current DESCRIPTION "A textual description of the entity. This value should include the full name and version identification of the system’s hardware type, software operating-system, and networking software." ::= { iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) system(1) 1 }

These information are provided in Management Information Base (short MIB) file(s).

slide-5
SLIDE 5

Desired ZFS Information bringing into SNMP

thomas@lxdv65:~>sudo zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT domov-0 178M 244K 178M 0% 1.00x DEGRADED

  • domov-1

178M 235K 178M 0% 1.00x ONLINE

  • domov-2

178M 235K 178M 0% 1.00x ONLINE

  • domov-3

178M 28.5M 150M 15% 1.00x ONLINE

  • domov-4

178M 235K 178M 0% 1.00x ONLINE

  • domov-5

178M 235K 178M 0% 1.00x ONLINE

  • domov-6

178M 235K 178M 0% 1.00x ONLINE

  • domov-7

178M 599K 177M 0% 2048.00x ONLINE

  • thomas@lxdv65:~>sudo zfs get all | grep "avail\|used "

domov-0 used 163K

  • domov-0

available 86.4M

  • domov-1

used 157K

  • domov-1

available 86.4M

  • domov-2

used 157K

  • domov-2

available 86.4M

  • domov-3

used 18.9M

  • domov-3

available 67.6M

  • domov-4

used 157K

  • domov-4

available 86.4M

  • domov-5

used 157K

  • domov-5

available 86.4M

  • domov-6

used 157K

  • domov-6

available 86.4M

  • domov-7

used 256M

  • domov-7

available 86.2M

  • Specify MIB file to bring

the ZFS information in- to SNMP.

slide-6
SLIDE 6

ZFS-MIB File

ZFS-MIB.txt

ZFS-MIB DEFINITIONS ::= BEGIN IMPORTS OBJECT-TYPE, MODULE-IDENTITY, enterprises, Counter64, Integer32 FROM SNMPv2-SMI ...

  • - A brief description and update information about the ZFS-MIB.
  • zfs MODULE-IDENTITY

LAST-UPDATED "201312190000Z" ORGANIZATION "GSI" CONTACT-INFO "t.stibor@gsi.de" DESCRIPTION "This MIB module describes read-only ZFS information gathered through libzfs. This encompasses the health status, available used and total space, as well as compression and deduplication ratio of pools." REVISION "201312190000Z" DESCRIPTION "Initial revision." ::= { hpc 1 }

slide-7
SLIDE 7

ZFS-MIB File (cont.)

ZFS-MIB.txt

... ZFSUnsigned64 ::= TEXTUAL-CONVENTION DISPLAY-HINT "d" STATUS current DESCRIPTION "A 64 bits unsigned (which doesn’t exist in SMIv2) containing any unsigned 64 bits integer number. It is defined as a Counter64 but doesn’t carry the counter semantic" SYNTAX Counter64

  • - We are hosted under GSI OID (2021).

gsi OBJECT IDENTIFIER ::= { enterprises 2021 } hpc OBJECT IDENTIFIER ::= { gsi 255 } poolTable OBJECT-TYPE SYNTAX SEQUENCE OF PoolEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "ZFS Pool watching information." ::= { zfs 1 } poolEntry OBJECT-TYPE SYNTAX PoolEntry MAX-ACCESS not-accessible STATUS current DESCRIPTION "An entry containing information on a ZFS pool." INDEX { poolIndex } ::= { poolTable 1 } ...

slide-8
SLIDE 8

ZFS-MIB File (cont.)

ZFS-MIB.txt

... PoolEntry ::= SEQUENCE { poolIndex Integer32,

  • - 1

poolName DisplayString,

  • - 2

poolHealth DisplayString,

  • - 3

poolAvail ZFSUnsigned64,

  • - 4

poolUsed ZFSUnsigned64,

  • - 5

poolTotal ZFSUnsigned64,

  • - 6

poolCompressRatio DisplayString

  • - 7

poolDedupRatio DisplayString

  • - 8

} poolIndex OBJECT-TYPE SYNTAX Integer32 (0..255) MAX-ACCESS read-only STATUS current DESCRIPTION "Reference Index for each observed ZFS pool." ::= { poolEntry 1 } poolName OBJECT-TYPE SYNTAX DisplayString (SIZE (0..255)) MAX-ACCESS read-only STATUS current DESCRIPTION "Name of ZFS pool." ::= { poolEntry 2 } ...

slide-9
SLIDE 9

Inspect our ZFS-MIB File

thomas@lxdv65:~>snmptranslate -Tp -IR ZFS-MIB::poolTable +--poolTable(1) | +--poolEntry(1) | Index: poolIndex | +-- -R-- Integer32 poolIndex(1) | Range: 0..255 +-- -R-- String poolName(2) | Textual Convention: DisplayString | Size: 0..255 +-- -R-- String poolHealth(3) | Textual Convention: DisplayString | Size: 0..15 +-- -R-- Counter64 poolAvail(4) | Textual Convention: ZFSUnsigned64 +-- -R-- Counter64 poolUsed(5) | Textual Convention: ZFSUnsigned64 +-- -R-- Counter64 poolTotal(6) | Textual Convention: ZFSUnsigned64 +-- -R-- String poolCompressRatio(7) | Textual Convention: DisplayString | Size: 0..15 +-- -R-- String poolDedupRatio(8) Textual Convention: DisplayString Size: 0..15

slide-10
SLIDE 10

Inspect our ZFS-MIB File (cont.)

thomas@lxdv65:~>snmptranslate -On -Td ZFS-MIB::poolHealth .1.3.6.1.4.1.2021.255.1.1.1.3 poolHealth OBJECT-TYPE

  • - FROM

ZFS-MIB

  • - TEXTUAL CONVENTION DisplayString

SYNTAX OCTET STRING (0..15) DISPLAY-HINT "255a" MAX-ACCESS read-only STATUS current DESCRIPTION "Health status of ZFS pool." ::= { iso(1) org(3) dod(6) internet(1) private(4) enterprises(1) gsi(2021) hpc(255) zfs(1) poolTable(1) poolEntry(1) 3 }

  • Howto implement a SNMP sub-agent daemon, once we

specified our MIB file? Excellent starting point: http://www.net-snmp.org/wiki/index.php/Tutorials

slide-11
SLIDE 11

From MIB ⇒ C

thomas@lxdv65:~>env MIBS="+ZFS-MIB" mib2c -c mib2c.iterate.conf poolTable # poolTable.h poolTable.c /* * Note: this file originally auto-generated by mib2c using * : mib2c.iterate.conf 17821 2009-11-11 09:00:00Z dts12 */ #ifndef POOLTABLE_H #define POOLTABLE_H /* function declarations */ void init_poolTable(void); void initialize_table_poolTable(void); Netsnmp_Node_Handler poolTable_handler; Netsnmp_First_Data_Point poolTable_get_first_data_point; Netsnmp_Next_Data_Point poolTable_get_next_data_point; /* column number definitions for table poolTable */ #define COLUMN_POOLINDEX 1 #define COLUMN_POOLNAME 2 #define COLUMN_POOLHEALTH 3 #define COLUMN_POOLAVAIL 4 #define COLUMN_POOLUSED 5 #define COLUMN_POOLTOTAL 6 #define COLUMN_POOLCOMPRESSRATIO 7 #define COLUMN_POOLDEDUPRATIO 8 #endif /* POOLTABLE_H */

slide-12
SLIDE 12

From MIB ⇒ C (cont.)

/** Handles requests for the poolTable entries. */ int poolTable_handler(netsnmp_mib_handler *handler, netsnmp_handler_registration *reginfo, netsnmp_agent_request_info *reqinfo, netsnmp_request_info *requests) { netsnmp_request_info *request; netsnmp_table_request_info *table_info; struct poolTable_entry *table_entry; char result[ZFS_MAXPROPLEN]; switch (reqinfo->mode) { case MODE_GET: for (request=requests; request; request=request->next) { table_entry = (struct poolTable_entry *) netsnmp_extract_iterator_context(request); table_info = netsnmp_extract_table_info(request); switch (table_info->colnum) { ... case COLUMN_POOLHEALTH: if ( !table_entry ) { netsnmp_set_request_error(reqinfo, request, SNMP_NOSUCHINSTANCE); continue; } /* Pool health. */ if (get_zpool_prop(libzfs_handle, table_entry->poolName, ZPOOL_PROP_HEALTH, result) == ERROR) netsnmp_set_request_error(reqinfo, request, SNMP_NOSUCHINSTANCE); else { strcpy(table_entry->poolHealth, result); table_entry->poolHealth_len = strlen(result); snmp_set_var_typed_value(request->requestvb, ASN_OCTET_STR, (u_char*)table_entry->poolHealth, table_entry->poolHealth_len); }

slide-13
SLIDE 13

Ask libzfs (/dev/zfs) for ZFS Information

First approach, don’t do that!

#define COMMAND_ARCSTATS "/bin/cat /proc/spl/kstat/zfs/arcstats" #define COMMAND_ZPOOL_HEALTH "/usr/local/sbin/zpool list -H -o name,health" #define COMMAND_ZGET_AVAIL_USED "/usr/local/sbin/zfs get -Hpo value used,available" /* Open the command for reading. */ fp = popen(COMMAND_ZPOOL_HEALTH, "r"); if (fp == NULL) { perror("popen" ); return ERROR; } i = 0; while (fgets(line, sizeof(line)-1, fp) != NULL) { line_dup = strdup(line); while ((tok_str = strsep(&line_dup, "\t"))) { if (n_token == 0) { //printf("poolname: %s\n", tok_str); name_temp = strdup(tok_str); } else if (n_token == 1) { tok_str[strlen(tok_str)-1] = ’\0’; /* Remove CR */ health_temp = strdup(tok_str); } else { free_pool_info(pool_info); return ERROR; } n_token = (n_token + 1) % 2; } pool_info[i] = malloc(sizeof(pool_info_t)); strcpy(pool_info[i]->name, name_temp); strcpy(pool_info[i]->health, health_temp); ...

slide-14
SLIDE 14

Ask libzfs (/dev/zfs) for ZFS Information (cont.)

Much more efficient and cleaner!

#include <libzfs.h> int get_zpool_prop(libzfs_handle_t *libzfs_handle, const char const *pool_name, zpool_prop_t prop, char result[ZFS_MAXPROPLEN]) { zpool_handle_t *zpool_handle; int rc; if (libzfs_handle == NULL) { fprintf(stderr, "Error: libzfs_handle is NULL pointer\n"); return ERROR; } zpool_handle = zpool_open_canfail(libzfs_handle, pool_name); if (zpool_handle == NULL) { fprintf(stderr, "Error: zpool_open_canfail(%p, %s)\n", libzfs_handle, pool_name); return ERROR; } rc = zpool_get_prop(zpool_handle, prop, result, ZFS_MAXPROPLEN, NULL); if (rc != SUCCESS) { fprintf(stderr, "Error: zpool_get_prop(%p, %s), rc = %d\n", zpool_handle, result, rc); zpool_close(zpool_handle); return ERROR; } zpool_close(zpool_handle); return SUCCESS; } ...

slide-15
SLIDE 15

Demo (server)

Start by means of init script.

thomas@lxdv65:~>sudo /etc/init.d/zfsnmpd start [ ok ] Starting ZFS SNMP Sub-Agent: zfsnmpd. thomas@lxdv65:~>sudo /etc/init.d/zfsnmpd stop [ ok ] Stopping ZFS SNMP Sub-Agent: zfsnmpd.

Let’s look at the syntax first

thomas@lxdv65:~/dev/zfsnmpd>sudo ./zfsnmpd --help unknown parameter: --help syntax: ./zfsnmpd

  • f (optional parameter for running in foreground)

(if no parameter is given it runs in background as a daemon) version 0.1, written by t.stibor@gsi.de, HPC Group at GSI, 2014

Start in foreground

thomas@lxdv65:[1]~/dev/zfsnmpd>sudo ./zfsnmpd -f NET-SNMP version 5.4.3 AgentX subagent connected zfsnmpd is up and running.

slide-16
SLIDE 16

Client for querying (efficiently) ZFS information

thomas@lxdv65:~/dev/zfsnmpd>./zfsnmp syntax: ./zfsnmp <hostname1> <hostname2> ... (specify hostname(s) or IP address(es))

  • f <hostfile> (specify hostfile where each row contains a hostname or IP address)
  • p [optional] parameter named ’problem’ to quickly see whether unhealthy ZFS pools exist

example: ./zfsnmp -f zfsnmphosts.txt -p example: ./zfsnmp 10.10.2.17 lx-zfs01.gsi.de lx-zfs73.gsi.de version 0.1, written by t.stibor@gsi.de, HPC Group at GSI, 2014 thomas@lxdv65:~/dev/zfsnmpd>./zfsnmp localhost -p host: ’localhost’ has one or several unhealthy pools and functioning can be compromised! ... void synchronous_query(int pflag) { struct host *hp; unsigned int oid_i = 1; unsigned int host_index = 0; double summary_avail = 0; double summary_used = 0; double summary_total = 0; struct timespec start_time, end_time; clock_gettime(CLOCK_MONOTONIC, &start_time); /* Iterate over all hosts. */ for (hp = hosts; hp->name; hp++) { struct snmp_session ss, *sp; struct oid *op; struct oid *op_i; ...

slide-17
SLIDE 17

Client for querying (efficiently) ZFS information (cont.)

thomas@lxdv65:~/dev/zfsnmpd>./zfsnmp localhost localhost localhost | name health available used total compress dedup +-1 domov-0 DEGRADED 86.39M 0.16M 86.55M 1.00x 1.00x +-2 domov-1 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-3 domov-2 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-4 domov-3 ONLINE 67.64M 18.91M 86.55M 2.38x 1.00x +-5 domov-4 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-6 domov-5 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-7 domov-6 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-8 domov-7 ONLINE 86.16M 256.01M 342.18M 1.00x 2048.00x localhost | name health available used total compress dedup +-1 domov-0 DEGRADED 86.39M 0.16M 86.55M 1.00x 1.00x +-2 domov-1 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-3 domov-2 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-4 domov-3 ONLINE 67.64M 18.91M 86.55M 2.38x 1.00x +-5 domov-4 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-6 domov-5 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-7 domov-6 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-8 domov-7 ONLINE 86.16M 256.01M 342.18M 1.00x 2048.00x summary: 2 host(s) queried in 0.06 secs with overall capacities (summarized over all hosts) available space: 1.31G used space: 0.54G total space: 1.85G

slide-18
SLIDE 18

Client for querying (efficiently) ZFS information (cont.)

thomas@lxdv65:~/dev/zfsnmpd>cat zfsnmphosts.txt # This denotes a comment 127.0.0.1 10.10.1.1 thomas@lxdv65:[1]~/dev/zfsnmpd>./zfsnmp -f zfsnmphosts.txt 127.0.0.1 | name health available used total compress dedup +-1 domov-0 DEGRADED 86.39M 0.16M 86.55M 1.00x 1.00x +-2 domov-1 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-3 domov-2 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-4 domov-3 ONLINE 67.64M 18.91M 86.55M 2.38x 1.00x +-5 domov-4 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-6 domov-5 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-7 domov-6 ONLINE 86.40M 0.15M 86.55M 1.00x 1.00x +-8 domov-7 ONLINE 86.16M 256.01M 342.18M 1.00x 2048.00x cannot connect to host: 10.10.1.1 summary: 2 host(s) queried in 6.04 secs with overall capacities (summarized over all hosts) available space: 0.66G used space: 0.27G total space: 0.93G

slide-19
SLIDE 19

Summary & Outlook

Summary:

  • SNMP sub-agent daemon + query client are developed.
  • Implement trap mechanism.
  • Final code polishing, e.g. switch between

fprintf(stderr,...) and syslog(LOG_ERR,...).

  • Debian package (will require GSI-ZFS.deb).
  • Will be publicly available (GPL3), e.g. http://git.stibor.net
  • r http://github.com/stibor

Outlook:

  • SNMP query client for Lustre.

thomas@[SSH]apollo:~/linux/lustre/lustre-release/snmp>ll total 136 drwxr-xr-x 2 thomas thomas 4096 Nov 28 13:32 autoconf

  • rw-r--r-- 1 thomas thomas 28864 Nov 28 13:32 Lustre-MIB.txt
  • rw-r--r-- 1 thomas thomas 27854 Nov 28 13:32 lustre-snmp.c
  • rw-r--r-- 1 thomas thomas

1963 Nov 28 13:32 lustre-snmp.h

  • rw-r--r-- 1 thomas thomas 18223 Nov 28 13:32 lustre-snmp-trap.c
  • rw-r--r-- 1 thomas thomas

1466 Nov 28 13:32 lustre-snmp-trap.h

  • rw-r--r-- 1 thomas thomas 23882 Nov 28 13:32 lustre-snmp-util.c
  • rw-r--r-- 1 thomas thomas

8553 Nov 28 13:32 lustre-snmp-util.h

  • rw-r--r-- 1 thomas thomas

397 Nov 28 13:32 Makefile.am

  • rw-r--r-- 1 thomas thomas

214 Nov 28 13:32 README.install