Better Integration of Systems Management Hardware with Linux - - PowerPoint PPT Presentation

better integration of systems management hardware with
SMART_READER_LITE
LIVE PREVIEW

Better Integration of Systems Management Hardware with Linux - - PowerPoint PPT Presentation

Better Integration of Systems Management Hardware with Linux LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc. Agenda Introduction Systems Management Hardware/Software Information Available to the Service


slide-1
SLIDE 1

Better Integration of Systems Management Hardware with Linux

LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc.

slide-2
SLIDE 2

2

Agenda

  • Introduction

– Systems Management Hardware/Software – Information Available to the Service Processor

  • The Need for Better Integration

– Integration of the Service Processor with Linux – Managing Servers In-band and Out-of-band

  • Current State

– IPMI – Exchange of information between OS and Service Processor – System Recovery/Debug – SNMP Redirection – USB NIC Pass-through – Server Health

  • Future Features

– OS Event logging in Service Processor – Aid with Diagnostic/Debugging – Automatic Configuration of console redirection

slide-3
SLIDE 3

3

Introduction

slide-4
SLIDE 4

4

Systems Management Hardware/Software

  • Systems Management Hardware on Server systems:

– Helps manage, monitor, update and deploy Servers. – Provides remote management and configuration options. – Independent of the presence and status of the Operating System. – Referred to as Service Processor/Baseboard Management Controller (BMC)

  • Interfaces/API

– IPMI – CIM – WSMAN – SSH – SNMP – Telnet – VNC – Web UI

slide-5
SLIDE 5

5

Information Available in the Service Processor

  • Server Hardware

– CPU – RAM – Storage/RAID Controller – NIC – Convergent Network Adapter/Fibre Channel

  • Server Firmware

– BIOS – Service Processor – NIC, Storage Controller

  • Server Software

– NIC IP, drivers

slide-6
SLIDE 6

6

The need for better Integration

slide-7
SLIDE 7

7

Integration of the Service Processor with Linux

  • Servers can be managed:

– Over the systems management interface (IPMI, CIM, SNMP) – Out-of-band. – Over the OS’s network interface (SNMP, CIM, etc.) – In-band.

  • In-band or out-of-band should not result in loss of

information/functionality.

  • OS information should be available in the Service Processor.
  • Service processor information should be available in the OS.
  • Eliminate the need for any proprietary agents on the OS.
  • Utilize OS to Service Processor Pass-through network.

– LAN On Motherboard. – Virtual USB NIC.

  • Security Considerations.

Server Hardware Operating System Service Processor

In-band Out-of-band

slide-8
SLIDE 8

8

Managing Servers In-band and Out-of-band

Server Hardware Operating System Service Processor Server Hardware Operating System Service Processor Server Hardware Operating System Service Processor

In-band Out-of-band Management Console Managed Servers

slide-9
SLIDE 9

9

Current Status

slide-10
SLIDE 10

10

IPMI

IPMI kernel module Autoload

  • Older systems required OpenIPMI’s startup script

to load ipmi kernel modules

  • Kernel 3.10 and later will autoload ipmi modules

– ipmi_devintf – Ipmi_si – Ipmi_msghandler

  • Simplifies IPMI’s use in installation/livecd

environments

  • ipmi_watchdog does not yet load automatically

– TODO: autoload ipmi_watchdog

slide-11
SLIDE 11

11

Exchange Information between OS and Service Processor

  • What OS is running on a server?
  • What is the Service processor’s IP/URL?
  • OS information is set in the Service

Processor

– System Host Name – Operating System – Operating System Version

  • Service Processor’s IP/URL is exported to

the OS

  • /etc/init.d/exchange-bmc-os-info

– ipmitool/contrib

slide-12
SLIDE 12

12

System Recovery/Debug

  • On OS lock-up, capture information that can

aid with debugging.

  • Watchdog timer facility provided by the

Service Processor

  • Unlike the Chipset Watchdog (iTCO), does

more than just resetting the system.

– Record failure in Sensor Event Log – Send alerts over SNMP/SMS/Phone, etc. – Capture VGA as a JPEG, Capture Video.

slide-13
SLIDE 13

13

System Recovery/Debug

  • IPMI driver has had support to detect/log kernel

panic events for years.

  • Linux Watchdog API: ipmi_watchdog.ko

– /dev/watchdog interface to the Service Processor. – watchdog pings converted to KCS messages to BMC. – Traditionally required agents in OS to send KCS messages to BMC. – Watchdogd or Systemd can act as watchdog daemons in the OS.

  • Can co-exist/supplement kdump/kexec, requires

some guess work.

  • TODO: Update ipmi_watchdog.ko to support

multi-watchdog.

slide-14
SLIDE 14

14

SNMP Redirection

  • Service Processor has exhaustive Hardware information.
  • OS contains information for resources it manages.
  • Many Management Consoles communicate with OS’s SNMP

agent.

  • Hardware health/inventory information available to OS is

limited/non-exhaustive.

  • Service Processor’s OID is grafted as part of the OS’s SNMP

MIB.

  • Traps from Service Processor can be configured to reach the

network’s Trap Sink.

  • Hardware Health is now available to management console.
  • Support SNMP v2 and v3.

Server Hardware Operating System Service Processor Management Console: SNMP get/set TRAP SNMP proxy TRAP forward

slide-15
SLIDE 15

15

SNMP Redirection – Operation

Get/Set

  • Enable SNMP on the Service Processor
  • “proxy” get/set SNMP requests to the Service

Processor’s IP for a subset of OID

  • SNMPv2-SMI::enterprises.674.10892

Trap

  • Enable snmptrapd to accept traps from Service

Processor’s IP.

  • “forward” traps to sink configured on the host.
  • Enable SNMP Alerting on Service Processor
  • ipmitool-1.8.15

– contrib/bmc-snmp-proxy

slide-16
SLIDE 16

16

USB NIC Pass-Through

  • Dedicated channel for OS – Service Processor communication
  • Service Processor at 169.254.0.1 (default). Non-routable.
  • Automatic configuration with Avahi and nss-mdns or

NetworkManager.

  • Service processor can be reached with “idrac.local”

– http://idrac.local – # ipmitool –I lan –H idrac.local – # snmpget idrac.local

Server Hardware Operating System Service Processor USB NIC

slide-17
SLIDE 17

17

System Health

  • Health of CPU, Fan, Temp, Voltages, etc. available already
  • Aggregate the above into “System Health” machine readable

value.

  • Available in-band and/or out-of-band
  • Can be used by cluster software, virtualization managers, cloud

compute managers to perform workload migration decisions

  • Available over SNMP or IPMI
  • SNMP redirection can make health available in-band

Server Hardware Operating System Service Processor Health Health

slide-18
SLIDE 18

18

System Health over IPMI and SNMP

  • IPMI

– raw 0x30 0x51

  • Byte 5: Global and Storage status

– Bit 0- Set = Storage status Normal – Bit 1- Set = Storage status Error (non-critical) – Bit 2- Set = Storage status Failed (critical) – Bit 3- Set = Storage status Unknown – Bit 4- Set = Global status Normal – Bit 5- Set = Global status Error (non-critical) – Bit 6- Set = Global status Failed (critical) – Bit 7- Set = Global status Unknown

  • SNMP

– SNMPv2-SMI::enterprises.674.10892.5.2.2.0 – 1: other -- the is not one of the below. – 2: unknown -- not known or monitored. – 3: ok -- the status is ok. – 4: nonCritical -- the status is warning, non- critical. – 5: critical -- the status is critical (failure). – 6: nonRecoverable -- the status is non- recoverable (dead).

slide-19
SLIDE 19

19

Opportunities…

slide-20
SLIDE 20

20

OS event logging in Service Processor

  • Log OS Events to the Service Processor to have a better understanding of the host OS:

– OS Started – OS Stopped – OS Install Started – OS Install Stopped – OS Install Aborted – OS Install Failed

  • Standard IPMI Sensor Events
  • Combined with OS Name, OS Version and Power Status information, this will help

administrators/console software on server state.

  • SUSE’s YaST2 Hooks
slide-21
SLIDE 21

21

Aid with Debugging

  • OS configuration and logs crucial for

debugging

  • Logs might be unavailable if system has

locked-up or there was a Kernel Panic. On application/kernel error:

  • Collect relevant configuration and logs.
  • Store in Service Processor.
  • Accessible out-of-band even with host OS

down.

slide-22
SLIDE 22

22

Automatic Configuration of Console Redirection

  • Most headless servers use IPMI Serial Over LAN to access remote server’s console.
  • BIOS contains options to setup redirection to serial console.
  • Administrator has to duplicate BIOS setup information on kernel command line.

– console=ttyS0,115200

  • Can reduce overhead if kernel can read BIOS serial port information.
  • ACPI already has SPCR – Serial Port Console Redirection.
  • Linux support was introduced in 2.4 and removed in 2.5.
  • Would be nice to have something similar.
slide-23
SLIDE 23

23

References

  • IPMI on Linux

– http://openipmi.sourceforge.net/IPMI.pdf – http://ipmitool.sourceforge.net/ – http://www.gnu.org/software/freeipmi/

  • Related Projects

– http://www.openlmi.org/ – https://github.com/abrt/abrt/wiki/ABRT-Project

  • Scripts

– Exchange Information

– http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/exchange-bmc-os-info.init.redhat

– SNMP Redirection

– http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/bmc-snmp-proxy

– Installer Status Event logging

– http://sourceforge.net/p/ipmitool/patches/97/

– Fedora Feature Page

– http://fedoraproject.org/wiki/Features/AgentFreeManagement

  • Dell iDRAC

– http://en.community.dell.com/techcenter/systems-management/w/wiki/3204.dell-remote-access-controller-drac-idrac.aspx

slide-24
SLIDE 24

24

Thank You!

  • charles_rose@dell.com
  • linux-poweredge@dell.com
slide-25
SLIDE 25

25

Backup

slide-26
SLIDE 26

26

Server Block Diagram

slide-27
SLIDE 27

27

Automated System Recovery with Systemd Watchdog Daemon

  • Set RuntimeWatchdogSec
  • Set ipmi_watchdog timeout to the same
  • Blacklist chipset watchdog
  • Load ipmi_watchdog
  • Reload systemd – systemctl daemon reexec