Better Integration of Systems Management Hardware with Linux
LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc.
Better Integration of Systems Management Hardware with Linux - - PowerPoint PPT Presentation
Better Integration of Systems Management Hardware with Linux LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc. Agenda Introduction Systems Management Hardware/Software Information Available to the Service
LINUXCON NORTH AMERICA Aug 2014 Charles Rose Engineer Dell Inc.
2
– Systems Management Hardware/Software – Information Available to the Service Processor
– Integration of the Service Processor with Linux – Managing Servers In-band and Out-of-band
– IPMI – Exchange of information between OS and Service Processor – System Recovery/Debug – SNMP Redirection – USB NIC Pass-through – Server Health
– OS Event logging in Service Processor – Aid with Diagnostic/Debugging – Automatic Configuration of console redirection
3
4
– Helps manage, monitor, update and deploy Servers. – Provides remote management and configuration options. – Independent of the presence and status of the Operating System. – Referred to as Service Processor/Baseboard Management Controller (BMC)
– IPMI – CIM – WSMAN – SSH – SNMP – Telnet – VNC – Web UI
5
– CPU – RAM – Storage/RAID Controller – NIC – Convergent Network Adapter/Fibre Channel
– BIOS – Service Processor – NIC, Storage Controller
– NIC IP, drivers
6
7
– Over the systems management interface (IPMI, CIM, SNMP) – Out-of-band. – Over the OS’s network interface (SNMP, CIM, etc.) – In-band.
information/functionality.
– LAN On Motherboard. – Virtual USB NIC.
Server Hardware Operating System Service Processor
In-band Out-of-band
8
Server Hardware Operating System Service Processor Server Hardware Operating System Service Processor Server Hardware Operating System Service Processor
In-band Out-of-band Management Console Managed Servers
9
10
IPMI kernel module Autoload
to load ipmi kernel modules
– ipmi_devintf – Ipmi_si – Ipmi_msghandler
environments
– TODO: autoload ipmi_watchdog
11
Exchange Information between OS and Service Processor
Processor
– System Host Name – Operating System – Operating System Version
the OS
– ipmitool/contrib
12
aid with debugging.
Service Processor
more than just resetting the system.
– Record failure in Sensor Event Log – Send alerts over SNMP/SMS/Phone, etc. – Capture VGA as a JPEG, Capture Video.
13
panic events for years.
– /dev/watchdog interface to the Service Processor. – watchdog pings converted to KCS messages to BMC. – Traditionally required agents in OS to send KCS messages to BMC. – Watchdogd or Systemd can act as watchdog daemons in the OS.
some guess work.
multi-watchdog.
14
agent.
limited/non-exhaustive.
MIB.
network’s Trap Sink.
Server Hardware Operating System Service Processor Management Console: SNMP get/set TRAP SNMP proxy TRAP forward
15
Get/Set
Processor’s IP for a subset of OID
Trap
Processor’s IP.
– contrib/bmc-snmp-proxy
16
NetworkManager.
– http://idrac.local – # ipmitool –I lan –H idrac.local – # snmpget idrac.local
Server Hardware Operating System Service Processor USB NIC
17
value.
compute managers to perform workload migration decisions
Server Hardware Operating System Service Processor Health Health
18
– raw 0x30 0x51
– Bit 0- Set = Storage status Normal – Bit 1- Set = Storage status Error (non-critical) – Bit 2- Set = Storage status Failed (critical) – Bit 3- Set = Storage status Unknown – Bit 4- Set = Global status Normal – Bit 5- Set = Global status Error (non-critical) – Bit 6- Set = Global status Failed (critical) – Bit 7- Set = Global status Unknown
– SNMPv2-SMI::enterprises.674.10892.5.2.2.0 – 1: other -- the is not one of the below. – 2: unknown -- not known or monitored. – 3: ok -- the status is ok. – 4: nonCritical -- the status is warning, non- critical. – 5: critical -- the status is critical (failure). – 6: nonRecoverable -- the status is non- recoverable (dead).
19
20
– OS Started – OS Stopped – OS Install Started – OS Install Stopped – OS Install Aborted – OS Install Failed
administrators/console software on server state.
21
debugging
locked-up or there was a Kernel Panic. On application/kernel error:
down.
22
– console=ttyS0,115200
23
– http://openipmi.sourceforge.net/IPMI.pdf – http://ipmitool.sourceforge.net/ – http://www.gnu.org/software/freeipmi/
– http://www.openlmi.org/ – https://github.com/abrt/abrt/wiki/ABRT-Project
– Exchange Information
– http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/exchange-bmc-os-info.init.redhat
– SNMP Redirection
– http://sourceforge.net/p/ipmitool/source/ci/master/tree/contrib/bmc-snmp-proxy
– Installer Status Event logging
– http://sourceforge.net/p/ipmitool/patches/97/
– Fedora Feature Page
– http://fedoraproject.org/wiki/Features/AgentFreeManagement
– http://en.community.dell.com/techcenter/systems-management/w/wiki/3204.dell-remote-access-controller-drac-idrac.aspx
24
25
26
27
Automated System Recovery with Systemd Watchdog Daemon