System-Level Protection Against Cache-Based Side Channel Attacks in - PowerPoint PPT Presentation

System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud Taesoo Kim, Marcus Peinado, Gloria Mainar-Ruiz MIT CSAIL Microsoft Research

Security is a big concern in cloud adoption

Why are cache-based side channel attacks important? ● CPU cache is the most fine-grained shared resource in the cloud environment ● Cache-based side channel attacks: ● 2003 DES by Tsunoo et al. (with 2 26.0 samples) ● 2005 AES by Bernstein et al. (with 2 18.9 samples) ● 2005 RSA by Percival et al. (-) ● … ● 2011 AES by Gullasch et al. (with 2 6.6 samples)

Background: CPU & Memory L1 L2

Background: cache structure Cache miss Cache hit ~50 ~240 Core1 Core2 Core3 Core4 8M L3 > x 2046 16G RAM

Background: cache terminologies ● Pre-image set: set of memory mapped into the same cache line L3 Pre-image set RAM

Background: cache terminologies ● Pre-image set: set of memory mapped into the same cache line ● Cache line set : set of cache lines mapped by the same pre-image set Cache line Cache line set L3 ... Pre-image set RAM (Colored pages) Different class of colored pages

Background: cache-based side channel Cache hit Cache miss ~50 ~240 Victim Attacker Core1 Core2 Core3 Core4 8M L3 16G RAM

Cache-based side channel attacks (cache attacks) while(1) { beg = rdtsc () access memory diff = rdtsc () - beg } diff Victim Attacker Core1 Core2 t L3 S-Box? RAM

Types of cache attacks ● Time -driven attacks : measure access time depending on states of cache ● Passive time-driven attacks : measure total execution time of victim ● Active time-driven attacks : manipulate states of cache ● Trace -driven attacks : probe which cache lines victim has accessed → Attackers should co-locate with a victim

Goal To provide cloud tenants a protection mechanism against cache attacks: ● Active time-driven attacks ● Trace-driven attacks But our solution still provides: ● Minimal performance overhead ● Compatible with commodity hardware

Idea: protect only sensitive data ● Give a private page to each cloud tenant ● No other tenants can cause cache interference ● Load sensitive data to the private page void * sm_alloc (size_t size) void sm_free (void * ptr)

Strawman: construct a private page ● Do not assign pre-image sets of the private pages (same colored pages) to other VMs M1 VM1 VM2 M1 A private page of VM1 Core1 Core2 Reserved pages L3 ~1% M1 RAM Reserved ...

Strawman: assign a private page to each VM 1. How to make sure that a private page stays in the cache? VM3 M1 VM1 VM2 M3 M2 VM4 M4 Core1 Core2 VM5 M5 L3 M1 M2 M3 M4 M5 RAM ...

Strawman: assign a private page to each VM 2. How to make it scalable if we increase the number of VMs? VM3 M1 VM1 VM2 M3 M2 VM4 M4 Core1 Core2 VM5 M5 L3 M1 M2 M3 M4 M5 RAM ...

Strawman: assign a private page to each VM 3. How to utilize the reserved regions? VM3 M1 VM1 VM2 M3 M2 VM4 M4 Core1 Core2 VM5 M5 L3 M1 M2 M3 M4 M5 ~1 % x 5 RAM ...

Three challenges 1. How to make sure that a private page stays in the cache? → Lock cache lines 2. How to make it scalable if we increase the number of VMs? → Assign a private page per core 3. How to utilize the reserved regions? → Mediate accesses on reserved regions

1. Locking cache lines ● Locked : never evicted from the cache ● Inertia property of cache (shared LLC): ● An eviction only can happen when there is an attempt to add another item into the cache ● Cache lines will stay still until we access an address that is not in the cache

Cache interference VM2 VM1 VM1 VM1 VM2 VM2 waiting CPU CPU (Hyperthread) Core1 Core2 L1C L1D L1C L1D L1C L1D L1C L1D L2 L2 L2 L2 L3 L3 L3 Simultaneous execution Context switches Hyperthread

Keep cache lines locked ● Context switch: ● Reload locked cache lines ● Hyperthread: ● Force gang schedule (no two VMs run on the same core simultaneously) ● Simultaneous execution: ● Never map pages that collide with private pages

2. Assign a private page per core ● Load a private page of active VM onto the private page of the core VM3 M1 VM1 VM2 M3 M2 VM4 M4 Core1 Core2 VM5 M5 L3 M1 M2 M2 M5 RAM M1 M4 M3 ...

2. Assign a private page per core ● No cache interference between running VMs VM3 M1 VM1 VM2 M3 M2 VM4 M4 Core1 Core2 VM5 M5 No cache interferece L3 M1 M2 M2 M5 RAM M1 M4 M3 ...

Save / load private pages on context switch VM2 M1 VM1 VM3 M2 M3 VM4 M4 Core1 Core2 VM5 M5 L3 M1 M3 M2 M2 M5 RAM M1 M4 M3 ...

3. Utilize reserved regions ● Assign pages to VMs M1 VM1 VM2 M2 ● Mediate their accesses Core1 Core2 L3 M1 M2 M2 M5 RAM M1 M4 M3 ...

Page Table Alert (PTA) ● Mark invalid on reserved pages (pre-image sets) ● Mediate their accesses in the page fault handler M1 VM1 VM2 ... Core1 Core2 Hypervisor L3 HPA M1 V I I EPT RAM I I ... I

Handle Page Table Alert (PTA) Reload Reload Reload Cache line set Locked ① ② ③ Set-associativity ① ① ③ ③ (w=3) ① ② ② ② ② Cache PTA PTA Memory PTA Mark invalid Private page ① ① ① ① ① ② ② ② ② ② # valid pages + locked Mark invalid ③ ③ = Set-associativity ③ ③ ③ ④ ④ ④ ④ ④ ... ... ... ... ... Access ① ② ③ Pre-image set

Summary of design ● Tenants use a private page for sensitive data ● Assign a private page per core ● Use fixed amount of reserved memory ● Load a private page of VM on one of the core ● Utilize reserved regions ● Assign reserved regions to VMs as usual ● Mediate their accesses with PTA

Implementation: StealthMem ● Host OS: Windows Server 2008 R2 ● bcdedit : configure reserved area as bad pages ● Hypervisor: HyperV ● Disable large pages (2MB/4MB) ● Mediate invd , wbinv instructions from VMs ● Expose a single private page to VM Component Modified lines of code Bootmgr/Winloader 500 lines of C HyperV 5,000 lines of C

Evaluation ● How much overhead? ● How does it compare with the stock HyperV? ● How does it compare with other mechanisms? ● How to understand overhead characteristics? ● How easy to adopt in existing applications? ● How to secure popular block ciphers?

Overhead without large pages Average w/o large pages -4.9% StealthMem -5.9% Run Spec2006

Compare with PageColoring ● PageColoring : statically divide caches per VM ● Run SPEC2006 with various #VM StealthMem PageColoring

Microbench: overheads with various working sets ● Microbench: ● Working set: vary array size between 1~12 MB ● Read array in quasi-linear fashion ● Measure execution time ● Settings: ● Each VM has a private page ● 7 VMs : one VM runs microbench while others idle – Baseline, PageColoring – StealthMem ( w/o PTA): do not utilize reserved regions – StealthMem ( w/ PTA) : utilize reserved regions with PTA

Microbench: overheads with various working sets TLB: 2MB = 4KB x 512 L3: 8MB

Microbench: overheads with various working sets

Modifying existing applications ● e.g., modify Blowfish to use StealthMem original modified typedef unsigned long ULA[256]; static unsigned long S[4][256]; static ULA *S; <@initialization function> S = sm_alloc (4*4*256); Encryption Size of S-box LoC changes DES 256 * 8 = 2 kB 5 lines AES 1024 * 4 = 4 kB 34 lines Blowfish 1024 * 4 = 4 kB 3 lines

Overhead of secured ciphers ● Encryption throughput of DES / AES / Blowfish ● Baseline: unmodified version ● Stealth: secured S-Box with StealthMem A small buffer (50,000 bytes) A large buffer (5,000,000 bytes) Cipher Baseline Stealth Baseline Stealth DES 60 MB/s 58 -3% 59 MB/s 57 -3% AES 150 MB/s 143 -5% 142 MB/s 135 -5% Blowfish 77 MB/s 75 -2% 75 MB/s 74 -2%

Related work ● Initial abstraction of StealthMem (by Erlingsson and Abadi) ● Hardware - based : ● Obfuscating access patterns: PLcache, RPcache ... ● Dynamic cache partitioning ● App. specific hardware: AES encryption instruction → StealthMem works on commodity hardware ● Software - based : ● Static partitioning: PageColoring ● App. specific mitigation: reducing timing channels → StealthMem provides flexible , better performance

Conclusion ● StealthMem : an efficient system-level protection mechanism against cache-based side channel attacks ● Implement the abstraction of StealthMem ● Three new techniques: ● Locking cache lines ● Assigning a private page per core ● Mediating access on the private pages with PTA

System-Level Protection Against Cache-Based Side Channel Attacks in - PowerPoint PPT Presentation

System-Level Protection Against Cache-Based Side Channel Attacks in the Cloud Taesoo Kim, Marcus Peinado, Gloria Mainar-Ruiz MIT CSAIL Microsoft Research Security is a big concern in cloud adoption Why are cache-based side channel

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

CaSym: Cache Aware Symbolic Execution for Side Channel Detection and Mitigation Robert

Cache Timing Side-Channel Vulnerability Checking with Computation Tree Logic Shuwen Deng , Wenjie

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

AUTOMATED SOFTWARE PROTECTION FOR THE MASSES AGAINST SIDE-CHANNEL ATTACKS PHISIC 2018 |

Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer

Input Front side bus A Front side bus B controller Bus #0 HI North bridge Output Bus #0

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Channel design Channel coverage Intensive Selective Exclusive Channel

Channel Provisioning in Grid Overlay Networks Dinil Mon Divakaran Pascale Vicat-Blanc Primet

Exploiting Level- Exploiting Level -of of- -Detail Perception Detail Perception Multiple

SMB3 Multi-Channel in Samba ... Now Really! Michael Adam Red Hat / samba.org sambaXP -

Modern Wireless Networks 5G Physical Layer ICEN 574 Spring 2019 Prof. Dola Saha 1 Spectrum

Statistical methods for understanding complex biophysical neural data Liam Paninski Department

IoTSSC Bluetooth Bluetooth Classic (Basic Rate BR/Enhanced Data Rate EDR) In 1990s

Active-Code Reloading in the OODIDA Platform 12 June 2018 Gregor Ulm, Emil Gustavsson, Mats

ECE 3574: Applied Software Design Meeting 2: Version Control The goal of the next few meetings is