Microarchitectural Attacks: Protecting Cloud Accelerators
By Ahmad “Daniel” Moghimi PhD Candidate Worcester Polytechnic Institute (WPI) @danielmgmi
Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad - - PowerPoint PPT Presentation
Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad Daniel Moghimi PhD Candidate Worcester Polytechnic Institute (WPI) @danielmgmi OUTLINE Summary of Recent Contributions: Microarchiture MemJam Intel SGX
By Ahmad “Daniel” Moghimi PhD Candidate Worcester Polytechnic Institute (WPI) @danielmgmi
▪ Summary of Recent Contributions: ▪ Microarchiture→ MemJam ▪ Intel SGX → CacheZoom ▪ Intel EPID → CacheQuote ▪ Speculation → Spoiler ▪ Mitigation → MicroWalk ▪ Shared FPGA-CPU Hardware Security ▪ Proposal ▪ Lab Equipment/Setup ▪ Ongoing Work
2
3
add %ebx, %eax sub %eax, %edx xor %ecx, %ecx add %eax, %edi sub %ecx, %edi
1 2 3 4 5
4
add %ebx, %eax sub %eax, %edx xor %ecx, %ecx add %eax, %edi sub %ecx, %edi
1 2 3 4 5 IF ID EX
WB
Instruction Fetch Instruction Decode Execute Write Back
IF IF ID
5
add %ebx, %eax sub %eax, %edx xor %ecx, %ecx add %eax, %edi sub %ecx, %edi
1 2 3 4 5 IF ID EX
WB
Instruction Fetch Instruction Decode Execute Write Back
IF IF ID EX ID IF
6
add %ebx, %eax sub %eax, %edx xor %ecx, %ecx add %eax, %edi sub %ecx, %edi
1 2 3 4 5 IF ID EX
WB
Instruction Fetch Instruction Decode Execute Write Back
IF IF ID EX ID IF
WB
EX ID IF
7
add %ebx, %eax sub %eax, %edx xor %ecx, %ecx add %eax, %edi sub %ecx, %edi
1 2 3 4 5 IF ID EX
WB
Instruction Fetch Instruction Decode Execute Write Back
IF IF ID EX ID IF
WB
EX ID IF EX EX ID IF
WB
ID
WB
EX EX
WB WB
8
▪
Memory loads/stores are executed out of order and speculatively
▪
The dependency is verified after the execution!
▪
4K Aliasing: Addresses that are 4K apart are assumed dependent
▪
Re-execute the load and corresponding instructions due to false dependency
▪
Virtual-to-physical address translation → Memory disambiguation
mov %eax, (%ebx) mov (%ecx), %edx
Load
Store
Execute
Load
Execute
Store
Dependent?
Yes 9
Core HT – Thread A HT – Thread B Load 0xFECD1 Load 0xFECD2 Load 0xFECD3 Load 0xFECD4 Load 0xFECD5 Load 0xFECD6 Load 0xFECD7 Load 0xFECD8 Execute & Time 10
Core HT – Thread A HT – Thread B Load 0xFECD1 Load 0xFECD2 Load 0xFECD3 Load 0xFECD4 Load 0xFECD5 Load 0xFECD6 Load 0xFECD7 Load 0xFECD8 Execute & Time Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF Store 0x12ABCDEF 11
Core HT – Thread A HT – Thread B Load 0xFECD1 Load 0xFECD2 Load 0xFECD3 Load 0xFECD4 Load 0xFECD5 Load 0xFECD6 Load 0xFECD7 Load 0xFECD8 Execute & Time Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 Store 0x12ABC200 12
Core HT – Thread A HT – Thread B Load 0xFECD1 Load 0xFECD2 Load 0xFECD3 Load 0xFECD4 Load 0xFECD5 Load 0xFECD6 Load 0xFECD7 Load 0xFECD8 Execute & Time Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC Store 0x12ABC 13
14
15 Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical)
16 Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) L1 Cache Attacks
17 Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) L1 Cache Attacks L2/LLC Cache Attacks
18 Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) L1 Cache Attacks L2/LLC Cache Attacks
▪
Conflicted intra-cache line Leakage (4-byte granularity)
▪
Higher time correlates→ Memory accesses with the same bit 3 to 12
▪
4 bits of intra-cache level leakage
CPU
Core HT HT Core HT HT Encryption Service
load compute load load compute load compute load load
Execute Execute Again Higher time if there are more number of 4K conflicts 19
▪
Scatter-gather implementation of AES
▪
256 S-Box – 4 Cache Line
▪
Cache independent access pattern
▪
Implemented and distributed as part of Intel products
▪
Intel SGX Linux Software Development Kit (SDK)
▪
Intel IPP Cryptography Library
20 LINE 2 A LINE 2 B LINE 2 C LINE 2 D 64 Bytes 4 Cache Lines S-Box Lookup A B C
Local Buffer
D B
21 LINE 2 64 Bytes 4 Cache Lines
Local Buffer
22 LINE 2 64 Bytes 4 Cache Lines
Local Buffer
23
▪ Trusted Execution Environment (TEE) ▪ Enclave: Hardware protected user-level software module
▪ Loaded by the user program ▪ Mapped by the Operating System ▪ Authenticated and Encrypted by CPU
▪ Memory accesses are protected by the hardware
24
25
1. Isolation of the target & victim cache
interrupting the victim
26
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 27
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 28 Step 2: Victim executes some codes
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 29 Step 2: Victim executes some codes
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 30 Step 2: Victim executes some codes Step 3: Attacker interrupts the execution pipeline
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 31 Step 2: Victim executes some codes Step 3: Attacker interrupts the execution pipeline Step 4: Attacker probes the access times → Go to step 1
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 32 Step 2: Victim executes some codes Step 3: Attacker interrupts the execution pipeline Step 4: Attacker probes the access times → Go to step 1
L1D Cache
1 2 3 4 5 6 7 8 57 58 59 60 61 62 63 … 56 Step 1: Attacker prime all the L1D sets PC 33 Step 2: Victim executes some codes Step 3: Attacker interrupts the execution pipeline Step 4: Attacker probes the access times → Go to step 1
34
35
▪
Quoting Enclave:
▪
EPID Signature scheme built-in enclave by Intel
▪
Attest the integrity of user-provided enclave
▪ EPID Implementation (is)was not constant-time
36
▪
Loop iteration leaks Leading Zero Bits
▪
CacheZoom to accurately measure
▪
Feed the short vectors to a lattice and
37
38
39
40
41
▪
Detecting Leakages based on Binary Instrumentation and Mutual Information Analysis
42
▪
FPGAs on the cloud can boost applications
▪
Optimized Application-specific Hardware Configuration
▪
e.g Real-time Artificial Intelligence
▪
New Attack Surface:
▪
Accelerator Function Units (AFUs) placed on the FPGA can be used to interact with the CPU
▪
AFU to AFU Attack
▪
AFU to HPS Attack
▪
AFU to CPU Attack
▪
CPU to AFU Attack
▪
Across VMS ?
43
44
▪
Rowhammer
▪
Trojan Bitstreams
45
▪
Cache Attacks
▪
Cold Boot
▪
DMA/IOMMU
▪
FPGA-centric Attacks
▪
Infancy, Attack/Defense Playground (Intel SGX in 2015)
▪
Customizable Hardware → More Devastating Attacks
▪
E.g. Design your own timers, Direct access to memory interface, etc.
▪
Complex Threat Model
46
▪
Weekly Meeting ( 2 Faculty + 3 Students = 5 people are actively involved.)
▪
Software
▪
OPAE Stack
▪
Intel Quartus (Synthesis)
▪
KVM (Virtualization Scenario)
▪
Hardware
▪
Remote Access to Intel Labs (Xeon)
▪
Local Server including Intel PAC
▪
Heavy Load Workstation (Synthesis)
47
▪
Threat Modeling of the Technology based on Modern Use Cases
▪
Security Analysis of the Entire Stack Based on Available Resources
48
49
50
▪
Thanks to Carlos Rosaz, Matthias Schunter, Anand Rajan, Evan Custodio from Intel
▪ Questions?
51
▪
Memory Interface and the Cache Coherency Protocol
▪
Side-channel Analysis of Memory Operations
52