Scotch: Combining Software Guard Extensions and System Management - - PowerPoint PPT Presentation

scotch combining software guard extensions and system
SMART_READER_LITE
LIVE PREVIEW

Scotch: Combining Software Guard Extensions and System Management - - PowerPoint PPT Presentation

Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage Kevin Leach 1 , Fengwei Zhang 2 , and Westley Weimer 1 1 University of Michigan, 2 Wayne State University Leach, Zhang, & Weimer 1 / 19


slide-1
SLIDE 1

Scotch: Combining Software Guard Extensions and System Management Mode to Monitor Cloud Resource Usage

Kevin Leach1, Fengwei Zhang2, and Westley Weimer1

1University of Michigan, 2Wayne State University Leach, Zhang, & Weimer 1 / 19

slide-2
SLIDE 2

Summary We use Intel Software Guard Extensions (SGX) and System Management Mode (SMM) to accurately monitor resource consumption of virtual machines (VMs) in the presence of a compromised VM or hypervisor

Leach, Zhang, & Weimer 2 / 19

slide-3
SLIDE 3

Motivation

◮ Multi-tenant cloud computing increases utilization ◮ Client agrees to pay Cloud provider for a particular service level

◮ e.g., $1 per hour of CPU time Leach, Zhang, & Weimer 3 / 19

slide-4
SLIDE 4

Motivation

◮ Multi-tenant cloud computing increases utilization ◮ Client agrees to pay Cloud provider for a particular service level

◮ e.g., $1 per hour of CPU time

◮ Cloud provider depends on hypervisor/virtual machine monitor

(VMM) platform to distribute resources

◮ Xen, QEMU, etc.

Cloud VMM

VM 1 VM 2 VM 3

Leach, Zhang, & Weimer 3 / 19

slide-5
SLIDE 5

Motivation

◮ Multi-tenant cloud computing increases utilization ◮ Client agrees to pay Cloud provider for a particular service level

◮ e.g., $1 per hour of CPU time

◮ Cloud provider depends on hypervisor/virtual machine monitor

(VMM) platform to distribute resources

◮ Xen, QEMU, etc.

Cloud VMM

VM 1

1 3 CPU

VM 2

1 3 CPU

VM 3

1 3 CPU

If all 3 VMs peg the CPU, the VMM must decide how to allocate CPU time based on each client’s service level.

Leach, Zhang, & Weimer 3 / 19

slide-6
SLIDE 6

Motivation

◮ Cloud provider depends on VMM platform to distribute resources

Leach, Zhang, & Weimer 4 / 19

slide-7
SLIDE 7

Motivation

◮ Cloud provider depends on VMM platform to distribute resources ◮ Two issues

Leach, Zhang, & Weimer 4 / 19

slide-8
SLIDE 8

Motivation

◮ Cloud provider depends on VMM platform to distribute resources ◮ Two issues

  • 1. What if the VMM/cloud provider is malicious?

◮ Manipulate resource consumption to bill customers more Leach, Zhang, & Weimer 4 / 19

slide-9
SLIDE 9

Motivation

◮ Cloud provider depends on VMM platform to distribute resources ◮ Two issues

  • 1. What if the VMM/cloud provider is malicious?

◮ Manipulate resource consumption to bill customers more

  • 2. What if the VMM is vulnerable to malicious VMs?

◮ Malicious VM manipulates resource consumption to steal resources

from benign customers

Leach, Zhang, & Weimer 4 / 19

slide-10
SLIDE 10

Resource Accounting Attacks

◮ Benign Behavior

1 2 1 2 1 2 1 VMM decides to bill a guest: 1 2 1 2 1 2 1

CPU time

0ms 30ms 60ms 90ms 120ms 150ms 180ms

. . .

The Xen hypervisor regularly checks which VM is active to determine how much CPU time each VM uses

Leach, Zhang, & Weimer 5 / 19

slide-11
SLIDE 11

Resource Accounting Attacks

◮ Malicious Behavior

1 2 1 2 1 2 1 VMM decides to bill a guest: 1 1 1 1

CPU time

0ms 30ms 60ms 90ms 120ms 150ms 180ms

. . .

A malicious VM (2) with knowledge of the VMM can affect the appearance of resource consumption by itself and benign VMs.

Leach, Zhang, & Weimer 6 / 19

slide-12
SLIDE 12

Resource Interference Attacks

Attacker can take advantage of known victim behavior Malicious VM Victim VM (webserver) Flood HTTP server Thrash from I/O Free CPU cycles Malicious VM can cause benign VM to free up resources for itself

Leach, Zhang, & Weimer 7 / 19

slide-13
SLIDE 13

VM Escape Attack

Malicious VM can exploit buggy VMM implementation, allowing code execution with VMM privilege

◮ Could potentially alter resource consumption to hide itself

Leach, Zhang, & Weimer 8 / 19

slide-14
SLIDE 14

Scotch: Transparent Cloud Resource Accounting Two desired properties

  • 1. Transparent

◮ The underlying VMM and VMs are not aware accounting occurs

  • 2. Tamper-resistant

◮ A malicious VMM or VM guest cannot reliably alter accounting data Leach, Zhang, & Weimer 9 / 19

slide-15
SLIDE 15

Insights for Scotch

◮ System Management Mode

High priority System Management Interrupt causes CPU to atomically execute SMM handler code

Leach, Zhang, & Weimer 10 / 19

slide-16
SLIDE 16

Insights for Scotch

◮ System Management Mode

High priority System Management Interrupt causes CPU to atomically execute SMM handler code

◮ Use SMM to collect raw resource consumption data

Leach, Zhang, & Weimer 10 / 19

slide-17
SLIDE 17

Insights for Scotch

◮ System Management Mode

High priority System Management Interrupt causes CPU to atomically execute SMM handler code

◮ Use SMM to collect raw resource consumption data ◮ SMM logically collects data, then relays it to SGX enclave

Leach, Zhang, & Weimer 10 / 19

slide-18
SLIDE 18

Insights for Scotch

◮ Software Guard Extensions

Enclave-based trusted execution environment (TEE); userspace code runs in isolation

Leach, Zhang, & Weimer 11 / 19

slide-19
SLIDE 19

Insights for Scotch

◮ Software Guard Extensions

Enclave-based trusted execution environment (TEE); userspace code runs in isolation

◮ Use SGX enclave so that benign user can monitor and verify their

resource consumption

Leach, Zhang, & Weimer 11 / 19

slide-20
SLIDE 20

Insights for Scotch

◮ Software Guard Extensions

Enclave-based trusted execution environment (TEE); userspace code runs in isolation

◮ Use SGX enclave so that benign user can monitor and verify their

resource consumption

◮ Raw data collected by SMM is relayed to SGX enclave

Leach, Zhang, & Weimer 11 / 19

slide-21
SLIDE 21

Scotch: Transparent Cloud Resource Accounting

Protected System VMM (e.g., Xen) VM1 VM2 VM3

SGX Enclave

SMI Handler Data SGX Enclave

VM1 data VM2 data VM3 data

True timer

1 2 3 4 5

  • 1. VMM decides to switch between VM guests

Leach, Zhang, & Weimer 12 / 19

slide-22
SLIDE 22

Scotch: Transparent Cloud Resource Accounting

Protected System VMM (e.g., Xen) VM1 VM2 VM3

SGX Enclave

SMI Handler Data SGX Enclave

VM1 data VM2 data VM3 data

True timer

1 2 3 4 5

  • 2. Scotch measures resource consumption by invoking SMM every context

switch

Leach, Zhang, & Weimer 12 / 19

slide-23
SLIDE 23

Scotch: Transparent Cloud Resource Accounting

Protected System VMM (e.g., Xen) VM1 VM2 VM3

SGX Enclave

SMI Handler Data SGX Enclave

VM1 data VM2 data VM3 data

True timer

1 2 3 4 5

  • 3. SMM handler executes resource accounting in isolation

Leach, Zhang, & Weimer 12 / 19

slide-24
SLIDE 24

Scotch: Transparent Cloud Resource Accounting

Protected System VMM (e.g., Xen) VM1 VM2 VM3

SGX Enclave

SMI Handler Data SGX Enclave

VM1 data VM2 data VM3 data

True timer

1 2 3 4 5

  • 4. Data is marshalled to SGX enclave within VM

Leach, Zhang, & Weimer 12 / 19

slide-25
SLIDE 25

Scotch: Transparent Cloud Resource Accounting

Protected System VMM (e.g., Xen) VM1 VM2 VM3

SGX Enclave

SMI Handler Data SGX Enclave

VM1 data VM2 data VM3 data

True timer

1 2 3 4 5

  • 5. Benign VM can monitor resource accounting data with high integrity

Leach, Zhang, & Weimer 12 / 19

slide-26
SLIDE 26

Evaluation

Research Questions

◮ RQ1: Can we maintain accurate accounting during scheduler attacks? ◮ RQ2: What is our overhead on benign workloads? ◮ RQ3: Can we maintain accurate accounting during resource

interference attacks?

◮ RQ4: Can we maintain accurate accounting during VM escape

attacks?

Leach, Zhang, & Weimer 13 / 19

slide-27
SLIDE 27

RQ1: Scheduler Attacks

◮ Implement controllable scheduler

◮ Simulate attacker by altering the CPU time allocation by a varying

degree

Leach, Zhang, & Weimer 14 / 19

slide-28
SLIDE 28

RQ1: Scheduler Attacks

◮ Implement controllable scheduler

◮ Simulate attacker by altering the CPU time allocation by a varying

degree

◮ Run two VMs, one simulated attacker and one benign

◮ Both are computing indicative workloads: pi, gzip, and the PARSEC

benchmarks

Leach, Zhang, & Weimer 14 / 19

slide-29
SLIDE 29

RQ1: Scheduler Attacks

◮ Implement controllable scheduler

◮ Simulate attacker by altering the CPU time allocation by a varying

degree

◮ Run two VMs, one simulated attacker and one benign

◮ Both are computing indicative workloads: pi, gzip, and the PARSEC

benchmarks

◮ Compare observed CPU time consumption presented by Xen vs.

Scotch

Leach, Zhang, & Weimer 14 / 19

slide-30
SLIDE 30

RQ1: Scheduler Attacks

◮ Implement controllable scheduler

◮ Simulate attacker by altering the CPU time allocation by a varying

degree

◮ Run two VMs, one simulated attacker and one benign

◮ Both are computing indicative workloads: pi, gzip, and the PARSEC

benchmarks

◮ Compare observed CPU time consumption presented by Xen vs.

Scotch

◮ TL;DR Scotch shows significant difference in allocated CPU time

Leach, Zhang, & Weimer 14 / 19

slide-31
SLIDE 31

RQ1: Scheduler Attacks

Table : Ratio of attacker VM CPU time to guest VM CPU time.

Scheduler attack severity level Benign 1 3 5 7 9 10 Scotch 1.00 1.04 1.10 1.17 1.26 1.36 1.41 ground truth 0.99 1.05 1.12 1.17 1.25 1.35 1.39

Leach, Zhang, & Weimer 15 / 19

slide-32
SLIDE 32

RQ1: Scheduler Attacks

Table : Ratio of attacker VM CPU time to guest VM CPU time.

Scheduler attack severity level Benign 1 3 5 7 9 10 Scotch 1.00 1.04 1.10 1.17 1.26 1.36 1.41 ground truth 0.99 1.05 1.12 1.17 1.25 1.35 1.39 The attacker receives disproportionate CPU time. Ground truth obtained with Xentrace.

Leach, Zhang, & Weimer 15 / 19

slide-33
SLIDE 33

RQ2: Overhead

◮ Invoking SMIs to run accounting code can be costly

Leach, Zhang, & Weimer 16 / 19

slide-34
SLIDE 34

RQ2: Overhead

◮ Invoking SMIs to run accounting code can be costly ◮ Accounting code takes 2248 ± 69 cycles to execute

◮ Roughly 1µs incurred every context switch Leach, Zhang, & Weimer 16 / 19

slide-35
SLIDE 35

RQ2: Overhead

◮ Invoking SMIs to run accounting code can be costly ◮ Accounting code takes 2248 ± 69 cycles to execute

◮ Roughly 1µs incurred every context switch

◮ Xen takes roughly 20,000 cycles (7.1µs) per context switch

Leach, Zhang, & Weimer 16 / 19

slide-36
SLIDE 36

RQ2: Overhead

◮ Invoking SMIs to run accounting code can be costly ◮ Accounting code takes 2248 ± 69 cycles to execute

◮ Roughly 1µs incurred every context switch

◮ Xen takes roughly 20,000 cycles (7.1µs) per context switch ◮ Scotch adds 14% overhead per context switch

Leach, Zhang, & Weimer 16 / 19

slide-37
SLIDE 37

RQ2: Overhead

◮ Invoking SMIs to run accounting code can be costly ◮ Accounting code takes 2248 ± 69 cycles to execute

◮ Roughly 1µs incurred every context switch

◮ Xen takes roughly 20,000 cycles (7.1µs) per context switch ◮ Scotch adds 14% overhead per context switch ◮ .0033% system overhead on CPU-bound workloads

Leach, Zhang, & Weimer 16 / 19

slide-38
SLIDE 38

RQ3: Resource Interference Attacks

◮ By construction, Scotch provides accurate accounting information

Leach, Zhang, & Weimer 17 / 19

slide-39
SLIDE 39

RQ3: Resource Interference Attacks

◮ By construction, Scotch provides accurate accounting information ◮ Scotch does not automatically detect Resource Interference Attacks

Leach, Zhang, & Weimer 17 / 19

slide-40
SLIDE 40

RQ3: Resource Interference Attacks

◮ By construction, Scotch provides accurate accounting information ◮ Scotch does not automatically detect Resource Interference Attacks ◮ However, SGX allows userspace access to reliable accounting

information

◮ Client can monitor their resource usage and perform their own analysis

for their case

Leach, Zhang, & Weimer 17 / 19

slide-41
SLIDE 41

RQ4: VM Escape Attacks

◮ Accounting code stored in isolated SMRAM and SGX enclave

Leach, Zhang, & Weimer 18 / 19

slide-42
SLIDE 42

RQ4: VM Escape Attacks

◮ Accounting code stored in isolated SMRAM and SGX enclave ◮ Even if attacker roots hypervisor, they cannot change the accounting

code

Leach, Zhang, & Weimer 18 / 19

slide-43
SLIDE 43

RQ4: VM Escape Attacks

◮ Accounting code stored in isolated SMRAM and SGX enclave ◮ Even if attacker roots hypervisor, they cannot change the accounting

code

◮ BIOS locks SMRAM, so no opportunity for attacker to infiltrate SMM

if BIOS is trusted

Leach, Zhang, & Weimer 18 / 19

slide-44
SLIDE 44

Conclusion

◮ Scotch is an SMM- and SGX-based framework for accurately

accounting resource consumption in cloud infrastructure

Leach, Zhang, & Weimer 19 / 19

slide-45
SLIDE 45

Conclusion

◮ Scotch is an SMM- and SGX-based framework for accurately

accounting resource consumption in cloud infrastructure

◮ Scotch accounts for resource usage every context switch, introducing

minimal overhead on indicative workloads

Leach, Zhang, & Weimer 19 / 19

slide-46
SLIDE 46

Conclusion

◮ Scotch is an SMM- and SGX-based framework for accurately

accounting resource consumption in cloud infrastructure

◮ Scotch accounts for resource usage every context switch, introducing

minimal overhead on indicative workloads

◮ Scotch accurately accounts for CPU time consumption in the presence

  • f scheduler attack

◮ Porting drivers to SMM would readily admit incorporating accounting

for additional types of resources, such as network usage

Leach, Zhang, & Weimer 19 / 19

slide-47
SLIDE 47

Conclusion

◮ Scotch is an SMM- and SGX-based framework for accurately

accounting resource consumption in cloud infrastructure

◮ Scotch accounts for resource usage every context switch, introducing

minimal overhead on indicative workloads

◮ Scotch accurately accounts for CPU time consumption in the presence

  • f scheduler attack

◮ Porting drivers to SMM would readily admit incorporating accounting

for additional types of resources, such as network usage

◮ By construction, Scotch protects the hypervisor from VM escape and

  • ther control hijacking attacks

Leach, Zhang, & Weimer 19 / 19