Non-intrusive, Out-of-band and Out-of-the-Box Systems Monitoring in - - PowerPoint PPT Presentation

non intrusive out of band and out of the box systems
SMART_READER_LITE
LIVE PREVIEW

Non-intrusive, Out-of-band and Out-of-the-Box Systems Monitoring in - - PowerPoint PPT Presentation

Non-intrusive, Out-of-band and Out-of-the-Box Systems Monitoring in the Cloud SIGMETRICS June 18, 2014 Canturk Isci Sahil Suneja Vasanth Bala Eyal de Lara Todd Mummert University of Toronto IBM T.J. Watson Research IBM Research Data


slide-1
SLIDE 1

Non-intrusive, Out-of-band and Out-of-the-Box Systems Monitoring in the Cloud

SIGMETRICS June 18, 2014 Canturk Isci Sahil Suneja Vasanth Bala Eyal de Lara Todd Mummert University of Toronto IBM T.J. Watson Research

slide-2
SLIDE 2

IBM Research

2

Data Center Machines

HARDWARE OS VMs = new processes for the cloud computer!

Traditional Modern

slide-3
SLIDE 3

IBM Research

3

Traditional Systems Monitoring

slide-4
SLIDE 4

IBM Research

4

Traditional Systems Monitoring

HARDWARE VM VIRTUALIZATION LAYER VM VM VM

slide-5
SLIDE 5

IBM Research

5

Introducing- Near Field Monitoring

HARDWARE VM VIRTUALIZATION LAYER VM VM VM

slide-6
SLIDE 6

IBM Research

6

Near Field Monitoring (NFM)

slide-7
SLIDE 7

IBM Research

7

NFM's Advantages

  • Always-on: Works for unresponsive or compromised systems
  • Out-of-the-box: Unmodified guest

No agent or hook installation

  • Non-intrusive: No guest cooperation

No interference with guest operation

  • Out-of-band: Outside guest's context

Decouple execution and monitoring

  • Virtualization-aware: Holistic knowledge

Accurate and efficient monitoring

slide-8
SLIDE 8

IBM Research

8

NFM's Architecture

Frontend Backend

Hypervisor

MEM View

Frame Datastore

APP

Analytics Apps

Memory Crawl API

VM

OS MEM Disk Disk View

Disk Crawl API

Cloud Analytics

Crawl Logic

Structured view of VM states APP APP

{ ....... ....... }

Frames

Frontend Backend

slide-9
SLIDE 9

IBM Research

9

Approach: VM Memory Introspection

  • 1. Exposing VM Memory State

– Gain access to VM’s memory image from outside

  • Unmodified VM
  • Unmodified hypervisor
  • 2. Exploit VM Memory State

– Reconstruct VM's runtime state from the memory image – In-memory kernel data structure traversal

Hypervisor

MEM View

Memory Crawl API

VM

OS MEM Disk Disk View

Disk Crawl API

Crawl Logic

slide-10
SLIDE 10

IBM Research

10

Approach | Exposing VM Mem State

  • Memory dump

– Dump / migrate guest memory to file – KVM-QEMU pmemsave or migrate-to-file – High overhead: VM paused for dump duration

  • Live R/O memory handle

– Xen

  • Map guest memory into crawler process- xc_map_foreign_range()

– KVM

  • No default support
  • New live handle, read VM mem directly via

– QEMU process' /proc/<pid>/mem + /proc/<pid>/maps – Negligible impact on VM

slide-11
SLIDE 11

IBM Research

11

Approach | Exploiting VM Mem State

  • Extract system information by traversing linux kernel's

C structs in exposed memory image

– Different structs for different kinds of information

  • task_struct, mm_struct, files_struct, net_device etc.
  • Requirements:

– Starting addresses for structs

  • /boot/System.map

– Offsets for particular struct fields

  • Linux source or vmlinux
  • /boot/<Build.config>
slide-12
SLIDE 12

IBM Research

12

Backend | Crawl Output

Frame Datastore

APP

Analytics Apps

Cloud Analytics

Crawl Logic

Structured view of VM states APP APP

{ ....... ....... }

Frames

VM Mem/Disk handle

CPU NumCores, Hz, CacheSize, ... OS Nodename, Release, Arch, ... N/W device HWaddr, Ipaddr, TX/RX bytes, ... Modules Name, State, ... Process PID, Command, RSS, ... Open files FD → filename, ... Memory Mapping MappedFiles, VA → PA mappings, ... N/W connections SocketState, {Src, Dst, Ports}, ...

slide-13
SLIDE 13

IBM Research

13

Backend | Prototype Apps

  • 1. CTop: Cloud-wide consolidated resource monitoring
  • 2. PaVScan: Hypervisor paging aware virus scanner
  • 3. RConsole: Remote console
  • 4. TopoLog: Network topology discovery
slide-14
SLIDE 14

IBM Research

14

Evaluating NFM

  • Latency / monitoring frequency?
  • Accuracy?
  • Overhead?
  • Advantages?
slide-15
SLIDE 15

IBM Research

15

NFM's High Monitoring Frequency

Safe: 10Hz KVM: 20Hz Xen: 200Hz

Basic Crawl Full Crawl

slide-16
SLIDE 16

IBM Research

16

top – 11:58:42 up 1 day, 22:19, 1 user, load average: 0.90, 0.22, 0.11 Tasks: 57 total, 3 running, 54 sleeping, 0 stopped, 0 zombie Cpu(s): 99.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.7%hi, 0.0%si, 0.3%st Mem: 2052104k total, 1976340k used, 75764k free, 3996k buffers Swap: 6160380k total, 304068k used, 5856312k free, 1868k cached

|

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1942 root 20 1028m 1.0g 188 R 49.9 51.0 0:08.98 malloc 1940 root 20 1028m 780m 136 R 49.5 38.9 0:11:91 malloc 1 root 20 56220 1164 408 S 0.0 0.1 0:00.71 systemd 2 root 20 S 0.0 0.0 0:00.00 kthreadd Every 0.5s: ./topUpdate.sh CPU up time: 4461430125 jiffies PID PID VIRT VIRT RES RES %CPU %CPU %MEM %MEM TIME+ TIME+ COMMAND COMMAND 1942 1052704KB 1047368KB 45.8 51.0 0:08:33 malloc 1940 1052704KB 798816KB 45.8 38.9 0:11.92 malloc 1 56220KB 1164KB 0.0 0.1 0:00.70 systemd 2 0.0 0.0 0:00.00 kthreadd :

  • NFM's Accuracy: Cloud Top vs. top

cTop top

slide-17
SLIDE 17

IBM Research

17

NFM's High Accuracy

<4% variation

slide-18
SLIDE 18

IBM Research

18

NFM's Low VM Overhead

base 10Hz monitoring virusscanning hashing

2000 4000 6000 8000 10000 12000 1 2 3 4 5 6 7 8 9

Reply rate [/s] Response time [ms]

Reply rate | 512MB WS Response time | 512MB WS

+ 256MB WS in paper

slide-19
SLIDE 19

IBM Research

19

  • Via RConsole - Out-of-band console-like R/O interface
  • Supported functions: ls, lsmod, ps, netstat, ifconfig, ...
  • Time travel: sync and seed APIs
  • Analyzes unresponsive systems: kernel panic, misconfigured n/w
  • Detects (some) rootkits:

Active Internet connections (servers and established) Proto Local Address Foreign Address State tcp 127.0.0.1:25 0.0.0.0:* LISTEN tcp 9.XX.XXX.110:52019 9.XX.XXX.109:22 ESTABLISHED : tcp 9.XX.XXX.110:22 9.XX.XXX.15:49845 ESTABLISHED

In-VM Console:

Active Internet connections Proto Local Address Foreign Address State PID Process tcp 127.0.0.1:25 0.0.0.0:0 SS_UNCONNECTED 741 [sendmail] tcp 9.XX.XXX.110:52019 9.XX.XXX.109:22 SS_CONNECTED 6177 [ssh] : tcp 9.XX.XXX.110:22 9.XX.XXX.15:49845 SS_CONNECTED 14894 [sshd] tcp 0.0.0.0:2476 0.0.0.0:0 SS_UNCONNECTED 23304 [datacpy]

RConsole:

NFM's Advantages: Analyze Dysfunctional Systems

slide-20
SLIDE 20

IBM Research

20

NFM's Advantages: Better Accuracy

  • Distributed Application

– 3 LAMP instances

VM1 VM2 VM3 Reservation 30% 30% 30% Allocation 100% 70% 30%

slide-21
SLIDE 21

IBM ResearchNFM's Advantages:

Better Accuracy

slide-22
SLIDE 22

IBM Research

22

Conclusion

  • Current monitoring techniques unfit for modern virtualized Cloud
  • Introducing Near Field Monitoring- Leverage virtualization for a

fundamentally different VM monitoring approach

– Eliminates in-VM hooks, provides same fidelity monitoring out-of-band – Decoupled VM monitoring - execution architecture – Alleviates concerns with existing techniques

  • Always-on, non-intrusive, holistic view, ...
  • Evaluation:
  • High frequency
  • Low overhead
  • Better accuracy
  • Higher efficiency