Virtual CPU Validation Nadav Amit, Dan Tsafrir, Assaf Schuster - - PowerPoint PPT Presentation

virtual cpu validation
SMART_READER_LITE
LIVE PREVIEW

Virtual CPU Validation Nadav Amit, Dan Tsafrir, Assaf Schuster - - PowerPoint PPT Presentation

Virtual CPU Validation Nadav Amit, Dan Tsafrir, Assaf Schuster Ahmad Ayoub, Eran Shlomo Question Your video server freezes once a month. Why? OS, drivers, BIOS CPU, hardware Virus / Hack Cosmic rays / Power Anything else?


slide-1
SLIDE 1

Virtual CPU Validation

Nadav Amit, Dan Tsafrir, Assaf Schuster Ahmad Ayoub, Eran Shlomo

slide-2
SLIDE 2

Question

Your video server freezes

  • nce a month. Why?
  • OS, drivers, BIOS
  • CPU, hardware
  • Virus / Hack
  • Cosmic rays / Power

Anything else?

slide-3
SLIDE 3

“75% of x86 server workloads are virtualized” [Gartner’15]

0% 20% 40% 60% 80% 2011 2012 2013 2014 2015

Virtualized Workloads Year

slide-4
SLIDE 4

Hypervisor Bugs

  • HW assists virtualization,

but SW is still there

  • Bug implications: security, stability
  • CPU virtualization is hardest, and

its bugs have the greatest impact

application OS CPU hypervisor

slide-5
SLIDE 5

Real-Life Example

  • Non-existent register reads leaked host data
  • Security vulnerability
  • Patching required reboot
slide-6
SLIDE 6

Existing Solutions

Micro-hypervisors [Steinberg’10] Reduced trusted-computing base, not hypervisor code Formal Verification [Leinenbach’09] No formal model of CPU Fuzzing [Martigonini’12] No knowledge of CPU semantics

slide-7
SLIDE 7

Observation

  • CPU vendors invest heavily in developing testing

tools

  • 100s of person years or more!
  • Physical and virtual CPU should behave similarly
  • So tools for testing physical CPUs

should be able to find bugs in virtual CPUs

slide-8
SLIDE 8

Contribution

  • 1. Adapt & apply physical CPU testing tools to

VCPUs

  • 2. Study hypervisor bugs
  • Found, fixed, and analyzed >100 bugs
slide-9
SLIDE 9

Outline

  • Motivation
  • System
  • Physical CPU testing tools
  • Adapting tools to VCPUs
  • Results
  • Causes of bugs
  • Impact of bugs
  • Architectural flaws (as opposed to SW bugs)
  • Conclusions
slide-10
SLIDE 10

Physical CPU Testing

Generator

  • Arch. Sim.

Test Initialization Random code & Test templates Completion

slide-11
SLIDE 11

Physical CPU Testing

Loader Generator SUT CPU

  • Arch. Sim.

Test Test Res. Debug Tools

slide-12
SLIDE 12

Benefits

  • High coverage
  • Due to intimate architecture semantic awareness + effort
  • Low false-positive rate
  • No undefined results of instructions
  • No nondeterministic results (due to errata or asyncevents)
  • Easy to debug
  • Interim checks
  • Detailed failure indications
  • Trace of expected architectural execution
slide-13
SLIDE 13

Adaptation: Test Generation

  • Broken or missing

virtualization features

  • Add:
  • Cache-line monitoring
  • Performance Monitor

Unit v3

  • Workaround:
  • Nested virtualization
  • Data breakpoints

Generator

  • Arch. Sim.

Test

slide-14
SLIDE 14

Adaptation: Execution and Debug

Vloader SUT CPU Test Res. Debug Tools

  • Load tests using

hypervisor monitor protocol

  • Curb OS jitter
  • Emulate test device for

I/O instructions

  • Enhance debug tools
slide-15
SLIDE 15

Effort and Testing Time

  • Bootstrapping effort
  • 2 weeks to run the first empty test
  • 1.5 months to run the first full test
  • Per-test time
  • Generation – 5 seconds
  • Execution – less than a second / 1MB
  • Failure debugging avg – ~3 hours (high var)
slide-16
SLIDE 16

Outline

  • Motivation
  • System
  • Physical CPU testing tools
  • Adapting tools to VCPUs
  • Results
  • Causes of bugs
  • Impact of bugs
  • Architectural flaws (as opposed to SW bugs)
  • Conclusions
slide-17
SLIDE 17

Testing KVM: 117 Bugs

instruction emulator 62%

  • ther

7% local APIC 6% model specific registers 7% task switch 4% reset 5% debug 9%

slide-18
SLIDE 18

Instruction Emulator

  • Why does a hypervisor need an instruction emulator?
  • Port I/O and Memory Mapped I/O (MMIO)

Emulating instructions that access emulated devices

  • Support for old hardware

Restricted guest; shadow page tables

  • Vendor specific instructions

Migration between AMD and Intel

  • Instruction emulator stress
  • Emulate every instruction
  • Run natively if emulation is unsupported
slide-19
SLIDE 19

Bug Causes

  • Mostly due to not

following specifications

  • Documentation can

be improved

  • Plain coding errors
  • Races
  • Null dereferences
  • Wrong error codes
  • Decimal/Hex

not following specifications 78% coding errors 15% unclear documentation 7%

slide-20
SLIDE 20

Implications: Security

  • 6 vulnerabilities
  • Impact:
  • Host compromised: 3 host DoS
  • VM compromised: 2 VM DoS, 1 privilege escalation
  • Main cause – instruction emulator bugs
  • x86 ISA consists of 800+ instructions
  • Usually, many instructions should not be emulated
  • But the hypervisor can be tricked to emulate them
slide-21
SLIDE 21

Implications: Security - Example

vCPU0 (4) Emulate “buggy” instruction “MOV R8, [HPET]” (1) Execute MMIO instruction hypervisor “SYSENTER” vCPU1 “SYSENTER” (3) write a “buggy” instruction (2) VM-exit “SYSENTER”

Exploiting CVE-2015-0239 – potential privilege escalation

slide-22
SLIDE 22

Implications: Stability

  • Hard to quantify
  • One bug caused virtual machines to

freeze

  • Nontrivial race
  • Turns to be 5-year old bug
  • Was seen number of times over the years
  • 4 additional software regressions
slide-23
SLIDE 23

Hardware Flaws

  • Found 4 architecture flaws
  • Desired virtual machine properties
  • Equivalence
  • Efficiency
  • Resource Control
  • Causes:
  • Non-virtualizable state
  • Missing state save/restore facilities
  • Errata

Both cannot be kept

slide-24
SLIDE 24

Hardware Flaw: FPU state

  • Old CPUs: restore either 16-bit FCS or 64-bit FIP
  • New CPUs: deprecate FCS save/restore

New Problem in Real-Mode: FIP = (FCS << 4) | FIP

FCS FIP FIP 32-bit 64-bit FCS FIP 64-bit CPU 16-bit

slide-25
SLIDE 25

Outline

  • Motivation
  • System
  • Physical CPU testing tools
  • Adapting tools to VCPUs
  • Results
  • Causes of bugs
  • Impact of bugs
  • Architectural flaws (as opposed to SW bugs)
  • Conclusions
slide-26
SLIDE 26

Conclusions

  • Virtualization robustness/security should not be

assumed

  • CPU vendors are able to test hypervisors efficiently
  • And it is in their best interest…
  • Demand it from your CPU vendor!