System-Level Protection Against Cache-Based Side Channel Attacks in - - PowerPoint PPT Presentation

system level protection against cache based side channel
SMART_READER_LITE
LIVE PREVIEW

System-Level Protection Against Cache-Based Side Channel Attacks in - - PowerPoint PPT Presentation

System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud Taesoo Kim, Marcus Peinado, Gloria Mainar-Ruiz MIT CSAIL Microsoft Research Security is a big concern in cloud adoption Why are cache-based side channel


slide-1
SLIDE 1

System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud

Taesoo Kim, Marcus Peinado, Gloria Mainar-Ruiz MIT CSAIL Microsoft Research

slide-2
SLIDE 2

Security is a big concern in cloud adoption

slide-3
SLIDE 3

Why are cache-based side channel attacks important?

  • CPU cache is the most fine-grained shared

resource in the cloud environment

  • Cache-based side channel attacks:
  • 2003 DES by Tsunoo et al. (with 2

26.0 samples)

  • 2005 AES by Bernstein et al. (with 2

18.9 samples)

  • 2005 RSA by Percival et al. (-)
  • 2011 AES by Gullasch et al. (with 2

6.6 samples)

slide-4
SLIDE 4

Background: CPU & Memory

L1 L2

slide-5
SLIDE 5

Background: cache structure

Core1 L3 Core2 Core3 Core4 RAM ~50 ~240 Cache miss Cache hit 16G 8M

>

x 2046

slide-6
SLIDE 6

Background: cache terminologies

L3 RAM

  • Pre-image set: set of memory mapped into the

same cache line

Pre-image set

slide-7
SLIDE 7

Background: cache terminologies

L3 RAM

  • Pre-image set: set of memory mapped into the

same cache line

  • Cache line set: set of cache lines mapped by

the same pre-image set

... Cache line set Pre-image set Cache line Different class of colored pages (Colored pages)

slide-8
SLIDE 8

Background: cache-based side channel

L3 RAM Victim Attacker Core1 Core2 Core3 Core4 ~50 ~240 Cache miss Cache hit 16G 8M

slide-9
SLIDE 9

Cache-based side channel attacks (cache attacks)

while(1) { beg = rdtsc() access memory diff = rdtsc() - beg } diff t Core1 L3 Core2 RAM Victim Attacker

S-Box?

slide-10
SLIDE 10

Types of cache attacks

  • Time-driven attacks

: measure access time depending on states of cache

  • Passive time-driven attacks

: measure total execution time of victim

  • Active time-driven attacks

: manipulate states of cache

  • Trace-driven attacks

: probe which cache lines victim has accessed

→ Attackers should co-locate with a victim

slide-11
SLIDE 11

Goal

To provide cloud tenants a protection mechanism against cache attacks:

  • Active time-driven attacks
  • Trace-driven attacks
  • Minimal performance overhead
  • Compatible with commodity hardware

But our solution still provides:

slide-12
SLIDE 12

Idea: protect only sensitive data

  • Give a private page to each cloud tenant
  • No other tenants can cause cache interference
  • Load sensitive data to the private page

void *sm_alloc(size_t size) void sm_free(void *ptr)

slide-13
SLIDE 13

Strawman: construct a private page

L3 RAM ... Core1

M1 VM1

M1 Core2 VM2

  • Do not assign pre-image sets of the private

pages (same colored pages) to other VMs

M1 A private page of VM1 Reserved pages

~1%

Reserved

slide-14
SLIDE 14

Strawman: assign a private page to each VM

L3 RAM Core1

M1 VM1

M1 Core2 VM2

M2

M2 VM3

M3

M3 VM4

M4

VM5

M5

M4 M5 ...

  • 1. How to make sure that a private page stays

in the cache?

slide-15
SLIDE 15

Strawman: assign a private page to each VM

L3 RAM Core1

M1 VM1

M1 Core2 VM2

M2

M2 VM3

M3

M3 VM4

M4

VM5

M5

M4 M5 ...

  • 2. How to make it scalable if we increase

the number of VMs?

slide-16
SLIDE 16

Strawman: assign a private page to each VM

L3 RAM Core1

M1 VM1

M1 Core2 VM2

M2

M2 VM3

M3

M3 VM4

M4

VM5

M5

M4 M5 ...

  • 3. How to utilize the reserved regions?

~1 % x 5

slide-17
SLIDE 17

Three challenges

  • 1. How to make sure that a private page stays

in the cache? → Lock cache lines

  • 2. How to make it scalable if we increase

the number of VMs? → Assign a private page per core

  • 3. How to utilize the reserved regions?

→ Mediate accesses on reserved regions

slide-18
SLIDE 18
  • 1. Locking cache lines
  • Locked: never evicted from the cache
  • Inertia property of cache (shared LLC):
  • An eviction only can happen when there is an

attempt to add another item into the cache

  • Cache lines will stay still until we access an

address that is not in the cache

slide-19
SLIDE 19

Cache interference

L3 CPU L1C L2 L1D VM1 VM2 waiting Context switches L3 CPU (Hyperthread) L1C L2 L1D VM1 VM2 Hyperthread Core1 L1C L2 L3 L1D Core2 L1C L2 L1D VM1 VM2 Simultaneous execution

slide-20
SLIDE 20

Keep cache lines locked

  • Context switch:
  • Reload locked cache lines
  • Hyperthread:
  • Force gang schedule

(no two VMs run on the same core simultaneously)

  • Simultaneous execution:
  • Never map pages that collide with private pages
slide-21
SLIDE 21
  • 2. Assign a private page per core

L3 RAM Core1 Core2 ... M1 M2 M3 M4 VM3

M3

VM4

M4

VM5

M5

M5

M1 VM1

M1 VM2

M2

M2

  • Load a private page of active VM onto the

private page of the core

slide-22
SLIDE 22
  • 2. Assign a private page per core

L3 RAM Core1 Core2 ... M1 M2 M3 M4 VM3

M3

VM4

M4

VM5

M5

M5

M1 VM1

M1 VM2

M2

M2

  • No cache interference between running VMs

No cache interferece

slide-23
SLIDE 23

Save / load private pages on context switch

L3 RAM Core1 Core2 ... M1 M2 M4 VM2

M2

VM4

M4

VM5

M5

M5

M1 VM1

M1 VM3

M3

M2 M3 M3

slide-24
SLIDE 24
  • 3. Utilize reserved regions

L3 RAM Core1 Core2 ... M1 M2 M3 M4 M5

M1 VM1

M1 VM2

M2

M2

  • Assign pages to VMs
  • Mediate their accesses
slide-25
SLIDE 25

Page Table Alert (PTA)

L3 RAM Core1 ...

M1 VM1

M1 HPA EPT V I I I I I Hypervisor VM2 Core2

  • Mark invalid on reserved pages (pre-image sets)
  • Mediate their accesses in the page fault handler

...

slide-26
SLIDE 26

③ ② ① ② ①

Handle Page Table Alert (PTA)

Access ① ③ ... ① ② ③ ④ PTA ① Reload ... ① ② ③ ④ PTA ② Locked ① ② Reload PTA ... ① ② ③ ④ ③ ② Reload ③ ② ... ① ② ③ ④ Mark invalid

# valid pages + locked = Set-associativity Set-associativity (w=3)

Cache Cache line set ① ② ③ ④ ... Memory Private page Pre-image set Mark invalid

slide-27
SLIDE 27

Summary of design

  • Tenants use a private page for sensitive data
  • Assign a private page per core
  • Use fixed amount of reserved memory
  • Load a private page of VM on one of the core
  • Utilize reserved regions
  • Assign reserved regions to VMs as usual
  • Mediate their accesses with PTA
slide-28
SLIDE 28

Implementation: StealthMem

  • Host OS: Windows Server 2008 R2
  • bcdedit: configure reserved area as bad pages
  • Hypervisor: HyperV
  • Disable large pages (2MB/4MB)
  • Mediate invd, wbinv instructions from VMs
  • Expose a single private page to VM

Component Modified lines of code Bootmgr/Winloader 500 lines of C HyperV 5,000 lines of C

slide-29
SLIDE 29

Evaluation

  • How much overhead?
  • How does it compare with the stock HyperV?
  • How does it compare with other mechanisms?
  • How to understand overhead characteristics?
  • How easy to adopt in existing applications?
  • How to secure popular block ciphers?
slide-30
SLIDE 30

Overhead without large pages

Run Spec2006 Average w/o large pages

  • 4.9%

StealthMem

  • 5.9%
slide-31
SLIDE 31

Compare with PageColoring

  • PageColoring: statically divide caches per VM
  • Run SPEC2006 with various #VM

StealthMem PageColoring

slide-32
SLIDE 32

Microbench: overheads with various working sets

  • Microbench:
  • Working set: vary array size between 1~12 MB
  • Read array in quasi-linear fashion
  • Measure execution time
  • Settings:
  • Each VM has a private page
  • 7 VMs: one VM runs microbench while others idle

– Baseline, PageColoring – StealthMem (w/o PTA): do not utilize reserved regions – StealthMem (w/ PTA) : utilize reserved regions with PTA

slide-33
SLIDE 33

Microbench: overheads with various working sets

TLB: 2MB = 4KB x 512 L3: 8MB

slide-34
SLIDE 34

Microbench: overheads with various working sets

slide-35
SLIDE 35

Modifying existing applications

  • e.g., modify Blowfish to use StealthMem

Encryption Size of S-box LoC changes DES 256 * 8 = 2 kB 5 lines AES 1024 * 4 = 4 kB 34 lines Blowfish 1024 * 4 = 4 kB 3 lines

static unsigned long S[4][256];

  • riginal

typedef unsigned long ULA[256]; static ULA *S; <@initialization function> S = sm_alloc(4*4*256); modified

slide-36
SLIDE 36

Overhead of secured ciphers

  • Encryption throughput of DES / AES / Blowfish
  • Baseline: unmodified version
  • Stealth: secured S-Box with StealthMem

A small buffer (50,000 bytes) A large buffer (5,000,000 bytes) Cipher Baseline Stealth Baseline Stealth DES 60 MB/s 58

  • 3%

59 MB/s 57

  • 3%

AES 150 MB/s 143

  • 5%

142 MB/s 135

  • 5%

Blowfish 77 MB/s 75

  • 2%

75 MB/s 74

  • 2%
slide-37
SLIDE 37

Related work

  • Initial abstraction of StealthMem (by Erlingsson and Abadi)
  • Hardware-based:
  • Obfuscating access patterns: PLcache, RPcache ...
  • Dynamic cache partitioning
  • App. specific hardware: AES encryption instruction

→ StealthMem works on commodity hardware

  • Software-based:
  • Static partitioning: PageColoring
  • App. specific mitigation: reducing timing channels

→ StealthMem provides flexible, better performance

slide-38
SLIDE 38

Conclusion

  • StealthMem: an efficient system-level

protection mechanism against cache-based side channel attacks

  • Implement the abstraction of StealthMem
  • Three new techniques:
  • Locking cache lines
  • Assigning a private page per core
  • Mediating access on the private pages with PTA