Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad - PowerPoint PPT Presentation

Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad “Daniel” Moghimi PhD Candidate Worcester Polytechnic Institute (WPI) @danielmgmi

OUTLINE ▪ Summary of Recent Contributions: ▪ Microarchiture → MemJam ▪ Intel SGX → CacheZoom ▪ Intel EPID → CacheQuote ▪ Speculation → Spoiler ▪ Mitigation → MicroWalk ▪ Shared FPGA-CPU Hardware Security ▪ Proposal ▪ Lab Equipment/Setup ▪ Ongoing Work 2

Microarchitecture (Memory) 3

μ Arch Attacks: Data Dependency add %ebx, %eax 1 sub %eax, %edx 2 xor %ecx, %ecx 3 add %eax, %edi 4 sub %ecx, %edi 5 4

μ Arch Attacks: Pipelined Memory Exec add %ebx, %eax 1 IF ID sub %eax, %edx 2 IF xor %ecx, %ecx 3 add %eax, %edi 4 sub %ecx, %edi 5 Instruction Fetch IF Instruction Decode ID Execute EX Write Back WB 5

μ Arch Attacks: Pipelined Memory Exec add %ebx, %eax 1 IF ID EX sub %eax, %edx 2 IF ID xor %ecx, %ecx IF 3 add %eax, %edi 4 sub %ecx, %edi 5 Instruction Fetch IF Instruction Decode ID Execute EX Write Back WB 6

μ Arch Attacks: Pipelined Memory Exec add %ebx, %eax 1 WB IF ID EX sub %eax, %edx 2 IF ID EX xor %ecx, %ecx IF ID 3 add %eax, %edi IF 4 sub %ecx, %edi 5 Instruction Fetch IF Instruction Decode ID Execute EX Write Back WB 7

μ Arch Attacks: Pipelined Memory Exec add %ebx, %eax 1 WB IF ID EX sub %eax, %edx 2 IF ID EX EX WB xor %ecx, %ecx IF ID 3 EX WB add %eax, %edi WB EX IF ID 4 EX WB IF ID sub %ecx, %edi 5 Instruction Fetch IF Instruction Decode ID Execute EX Write Back WB 8

μ Arch Attacks: 4K Aliasing False Dependency Memory loads/stores are executed out of order and speculatively ▪ The dependency is verified after the execution! ▪ mov %eax, (%ebx) Execute Execute Store Load Store mov (%ecx), %edx Load Dependent? Yes 4K Aliasing: Addresses that are 4K apart are assumed dependent ▪ Re-execute the load and corresponding instructions due to false dependency ▪ Virtual-to-physical address translation → Memory disambiguation ▪ 9

μ Arch Attacks – Hyperthreading 4K Aliasing Core HT – Thread A HT – Thread B Load 0xFECD1 Load 0xFECD2 Execute & Time Load 0xFECD3 Load 0xFECD4 Load 0xFECD5 Load 0xFECD6 Load 0xFECD7 Load 0xFECD8 10

μ Arch Attacks – Hyperthreading 4K Aliasing Core HT – Thread A HT – Thread B Store 0x12ABCDEF Load 0xFECD1 Store 0x12ABCDEF Load 0xFECD2 Execute & Time Store 0x12ABCDEF Load 0xFECD3 Store 0x12ABCDEF Load 0xFECD4 Store 0x12ABCDEF Load 0xFECD5 Store 0x12ABCDEF Load 0xFECD6 Store 0x12ABCDEF Load 0xFECD7 Store 0x12ABCDEF Load 0xFECD8 Store 0x12ABCDEF Store 0x12ABCDEF 11

μ Arch Attacks – Hyperthreading 4K Aliasing Core HT – Thread A HT – Thread B Store 0x12ABC200 Load 0xFECD1 Store 0x12ABC200 Load 0xFECD2 Execute & Time Store 0x12ABC200 Load 0xFECD3 Store 0x12ABC200 Load 0xFECD4 Store 0x12ABC200 Load 0xFECD5 Store 0x12ABC200 Load 0xFECD6 Store 0x12ABC200 Load 0xFECD7 Store 0x12ABC200 Load 0xFECD8 Store 0x12ABC200 Store 0x12ABC200 12

μ Arch Attacks – Hyperthreading 4K Aliasing Core HT – Thread A HT – Thread B Store 0x12ABC Load 0xFECD1 Store 0x12ABC Load 0xFECD2 Execute & Time Store 0x12ABC Load 0xFECD3 Store 0x12ABC Load 0xFECD4 Store 0x12ABC Load 0xFECD5 Store 0x12ABC Load 0xFECD6 Store 0x12ABC Load 0xFECD7 Store 0x12ABC Load 0xFECD8 Store 0x12ABC Store 0x12ABC 13

MemJam 14

MemJam – Intra Cache Line Resolution Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) 15

MemJam – Intra Cache Line Resolution Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) L1 Cache Attacks 16

MemJam – Intra Cache Line Resolution Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) L1 Cache Attacks L2/LLC Cache Attacks 17

MemJam – Intra Cache Line Resolution Least 12 bits (Virtual Address = Physical Address) Rest of the bits (Virtual != Physical) L1 Cache Attacks L2/LLC Cache Attacks Conflicted intra-cache line Leakage (4-byte granularity) ▪ Higher time correlates → Memory accesses with the same bit 3 to 12 ▪ 4 bits of intra-cache level leakage ▪ 18

MemJam Attack CPU Core Core HT HT HT HT Execute load compute load Execute Again load compute load Higher time if there compute are more number of load 4K conflicts load Encryption 19 Service

Constant time AES – Safe2Encrypt_RIJ128 Scatter-gather implementation of AES ▪ 256 S-Box – 4 Cache Line ▪ Cache independent access pattern ▪ Implemented and distributed as part of Intel products ▪ Intel SGX Linux Software Development Kit (SDK) ▪ Intel IPP Cryptography Library ▪ 64 Bytes A LINE 2 4 Cache Lines B LINE 2 C LINE 2 D LINE 2 B D A C B S-Box Lookup 20 Local Buffer

MemJam Attack on Safe2Encrypt_RIJ128 64 Bytes LINE 2 4 Cache Lines 21 Local Buffer

MemJam Attack on Safe2Encrypt_RIJ128 64 Bytes LINE 2 4 Cache Lines 22 Local Buffer

Intel SGX 23

INTEL SOFTWARE GUARD EXTENSION (SGX) ▪ Trusted Execution Environment (TEE) ▪ Enclave: Hardware protected user-level software module ▪ Loaded by the user program ▪ Mapped by the Operating System ▪ Authenticated and Encrypted by CPU ▪ Memory accesses are protected by the hardware 24

MemJam Attack on SGX 25

CacheZoom: Controlled Cache Attack ON SGX 1. Isolation of the target & victim cache 2. Stabilize the processor frequency 3. Perform the attack on small exec steps by interrupting the victim 4. Measure and filter the remaining noise 26

CacheZoom: Interrupted Cache Attack PC L1D Cache 0 1 2 3 4 Step 1: Attacker prime all the L1D sets 5 6 7 8 … 56 57 58 59 60 61 62 63 27

CacheZoom: Interrupted Cache Attack PC L1D Cache 0 1 2 3 4 Step 1: Attacker prime all the L1D sets 5 Step 2: Victim executes some codes 6 7 8 … 56 57 58 59 60 61 62 63 28

CacheZoom: Interrupted Cache Attack L1D Cache 0 PC 1 2 3 4 Step 1: Attacker prime all the L1D sets 5 Step 2: Victim executes some codes 6 7 8 … 56 57 58 59 60 61 62 63 29

CacheZoom: Interrupted Cache Attack L1D Cache 0 PC 1 2 3 4 Step 1: Attacker prime all the L1D sets 5 Step 2: Victim executes some codes 6 7 8 … Step 3: Attacker interrupts the execution pipeline 56 57 58 59 60 61 62 63 30

CacheZoom: Interrupted Cache Attack L1D Cache 0 PC 1 2 3 4 Step 1: Attacker prime all the L1D sets 5 Step 2: Victim executes some codes 6 7 8 … Step 3: Attacker interrupts the execution pipeline 56 Step 4: Attacker probes the access times 57 58 → Go to step 1 59 60 61 62 63 31

CacheZoom: Interrupted Cache Attack L1D Cache 0 1 2 3 4 Step 1: Attacker prime all the L1D sets PC 5 Step 2: Victim executes some codes 6 7 8 … Step 3: Attacker interrupts the execution pipeline 56 Step 4: Attacker probes the access times 57 58 → Go to step 1 59 60 61 62 63 32

CacheZoom: Interrupted Cache Attack L1D Cache 0 1 2 3 4 Step 1: Attacker prime all the L1D sets PC 5 Step 2: Victim executes some codes 6 7 8 … Step 3: Attacker interrupts the execution pipeline 56 Step 4: Attacker probes the access times 57 58 → Go to step 1 59 60 61 62 63 33

CacheZoom: Interrupted Cache Attack 34

CacheQuote 35

CacheQuote Attack Quoting Enclave: ▪ EPID Signature scheme built-in enclave by Intel ▪ Attest the integrity of user-provided enclave ▪ ▪ EPID Implementation (is)was not constant-time 36

CacheQuote Attack Loop iteration leaks Leading Zero Bits ▪ CacheZoom to accurately measure ▪ Feed the short vectors to a lattice and ▪ 37

Memory Speculation 38

Speculative Memory Accesses 39

Spoiler on Spoiler Attack 40

MicroWalk : Finding μ Arch Sources in Binaries Detecting Leakages based on Binary Instrumentation ▪ and Mutual Information Analysis 41

Accelerators in the Cloud 42

Side-channel Threats Shared FPGA-CPU Platforms FPGAs on the cloud can boost applications ▪ Optimized Application-specific Hardware Configuration ▪ e.g Real-time Artificial Intelligence ▪ New Attack Surface: ▪ Accelerator Function Units (AFUs) placed on the FPGA can be used to interact with the CPU ▪ or other AFUs for malicious purpose. AFU to AFU Attack ▪ AFU to HPS Attack ▪ AFU to CPU Attack ▪ CPU to AFU Attack ▪ Across VMS ? ▪ 43

Shared FPGA-CPU Platforms 44

Attack Vectors Rowhammer DMA/IOMMU Cache Attacks ▪ ▪ ▪ Trojan Bitstreams FPGA-centric Attacks Cold Boot ▪ ▪ ▪ 45

What is interesting about FPGA-CPU in the Cloud? Infancy, Attack/Defense Playground (Intel SGX in 2015) ▪ Customizable Hardware → More Devastating Attacks ▪ E.g. Design your own timers, Direct access to memory interface, etc. ▪ Complex Threat Model ▪ 46

Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad - PowerPoint PPT Presentation

Microarchitectural Attacks: Protecting Cloud Accelerators By Ahmad Daniel Moghimi PhD Candidate Worcester Polytechnic Institute (WPI) @danielmgmi OUTLINE Summary of Recent Contributions: Microarchiture MemJam Intel SGX

Microarchitectural Attacks and Heterogenous Cloud Computing By Daniel Moghimi PhD Candidate

Microarchitectural Attacks in the Cloud Thomas Eisenbarth 07.11.2017 Workshop on Cryptography

Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU Interrupt Logic Jo Van Bulck

MicroScope: Enabling Microarchitectural Replay Attacks Dimitrios Skarlatos, Mengjia Yan, Bhargava

Defense Mechanism against Spectre Attacks Jacob Fustos, Farzad Farshchi, Heechul Yun University

Protean General Purpose Guard (PGPG): Detecting and Mitigating Cache-based Microarchitectural

Wireless Security Wireless Network Attacks Access control attacks These attacks attempt to

Generic Attacks on Stream Ciphers John Mattsson Generic Attacks on Stream Ciphers 2/22

Remote Timing Attacks on TPMs, AKA TPM-Fail Daniel Moghimi About Me Daniel Moghimi

Microarchitectural Analysis and Optimization Techniques Gunther Huebler Collaborators: Vincent

Medusa: Microarchitectural Data Leakage via Automated Attack Synthesis Daniel Moghimi

Stratus: Clouds with Microarchitectural Resource Management Kaveh Razavi and Animesh Trivedi

TUNING SLIDE Fast and Accurate Microarchitectural Simulation with ZSim Daniel Sanchez, Nathan

PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifications Yatin A.

Prevention of Microarchitectural Covert Channels on an Open-Source 64-bit RISC-V Core Fourth

Intro to Microarchitectural Atacks Thomas Eisenbarth 12.06.2018 Summer School on Real-World

Retirement in a Life Cycle Model With Home Production Richard Rogerson Johanna Wallenius

A Coactive Learning View of Online Structured Prediction in SMT Artem Sokolov , Stefan Riezler

Training-Time Optimization of a Budgeted Booster Yi Huang *Brian Powers Lev Reyzin University

Digital Signatures Dennis Hofheinz (slides based on slides by Bjrn Kaidel) Digital Signatures

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions

Meta-transfer Learning for Few-shot Learning Yaoyao Liu Tianjin University and NUS School of

A Unified Framework for Delay-Sensitive Communications Fangwen Fu fwfu@ee.ucla.edu Advisor:

Robust Predictions in Dynamic Screening Daniel Garrett, Alessandro Pavan, Juuso Toikka March 2018