Spectre and Meltdown Clifford Wolf q/Talk 2018-01-30 Spectre and - - PowerPoint PPT Presentation
Spectre and Meltdown Clifford Wolf q/Talk 2018-01-30 Spectre and - - PowerPoint PPT Presentation
Spectre and Meltdown Clifford Wolf q/Talk 2018-01-30 Spectre and Meltdown Spectre (CVE-2017-5753 and CVE-2017-5715) Is an architectural security bug that effects most modern processors with speculative execution It allows a program
Spectre and Meltdown
- Spectre (CVE-2017-5753 and CVE-2017-5715)
– Is an architectural security bug that effects most modern processors with
speculative execution
– It allows a program to read memory locations in its memory space without
technically accessing that location.
– This is a problem with code running in “sandbox environments”, such as a
web-browser executing JavaScript code: The JavaScript code can access all data in the browsers memory, such as login credentials for webpages.
- Meltdown (CVE-2017-5754)
– Is a related hardware vulnerability in some Intel x86, some IBM POWER, and
some ARM processors.
– It allows a process to read all memory in the system.
But how does it work?
- To answer this question we must first discuss some implementation details of
modern speculating superscalar out-of-order processors.
– Scalar – execute one instruction per cycle – Superscalar – execute >1 instruction per cycle – Out-of-order – execute instructions in a different order than they appear in the program code – Speculative execution – instead of waiting for the result of a computation, guess the result
and keep executing. Roll back if the guess turned out to be incorrect. This helps avoid pipeline stalls in cases where its possible to make good guesses.
– The types of speculative execution important for understanding Spectre/Meltdown:
- Branch prediction – guess if a branch is taken or not
- Branch target prediction – guess the target of a dynamic jump
- Trap optimism – always guess that instructions will not cause traps
– When the guess was wrong we need to roll back the entire CPU state so that it looks to the
software as if no code had been executed speculatively.
What is pipelining?
The more work we do in one cycle, the slower our circuit gets. This is a slow circuit: Stage 1 Stage 2 Stage 3 Stage 4 FF FF FF FF FF
Data Clock
Task 1 Task 2 Task 3 Task 4 FF FF
Data Clock
But we want high clock rates for CPUs! This pipeline works with a 4x faster clock rate: decode load exec store
Program: A: r1 r2 → r3 ⊙ B: r4 r5 → r6 ⊙ C: r7 r8 → r9 ⊙ D: r9 r1 → r1 ⊙ E: r10 r11 → r12 ⊙ F: r13 r14 → r15 ⊙ G: r16 r17→ r18 ⊙ Stage 1 Stage 2 Stage 3 Stage 4 A B A C B A D C B A D C B D C D E D F E D G F E D G F E G F G time
In-order pipeline stalls
D has to wait for C => pipeline stalls
Program: A: r1 r2 → r3 ⊙ B: r4 r5 → r6 ⊙ C: r7 r8 → r9 ⊙ D: r9 r1 → r1 ⊙ E: r10 r11 → r12 ⊙ F: r13 r14 → r15 ⊙ G: r16 r17→ r18 ⊙ Stage 1 Stage 2 Stage 3 Stage 4 A B A C B A E C B A F E C B G F E C D G F E D G F D G D time
Out-of-order execution to the rescue!
E, F, G is executed out-of-order to improve system performance. But what if D traps?
Out-of-order execution in modern CPUs
- Some instructions (such as memory loads) can stall for >100 cycles. We need very deep out-
- f-order execution to hide this latency.
– Without speculative execution it would be impossible to keep the processor busy for so many cycles.
There is no way around speculative execution for modern high-speed processors.
- We need many more physical addresses than are available in the ISA to remember previous
states (Scoreboarding isn’t sufficient → Register renaming, Tomasulo’s algorithm)
– We need previous states for rollback when instructions trap or branch prediction is wrong. – And we need more registers because the dynamic instruction order may have significantly higher
register pressure than the original instruction order.
- But there is more to the processor state than just general purpose registers. Clean rollback is
incredibly hard!
– For memory writes there is a store buffer for the pending writes during speculative execution. – CPU flags may be stored in shadow registers for each checkpoint we might need to rollback to. – But there is no mechanism to rollback the state of the CPU caches.
- Caches are just a performance optimization, so it can’t hurt if information from speculative
execution can be recovered from cache timings ... right? Unfortunately this is wrong.
What is a CPU cache?
- Caches are local memories close to the CPU that have much faster access times than main
memory.
– Addresses “in the cache” can be accessed quickly – The first access to an address moves that memory location into the cache – Addresses that haven’t been accessed in a while are evicted from the cache – The granularity of this is aligned cache lines of usually 64 bytes each.
- There are special commands to flush the CPU caches.
- Even without those commands we can access memory in a way that guarantees that all
cache lines of interest are evicted from the cache. (By accessing other memory locations that are mapped to the same cache slot.)
- By measuring the access time to a memory location we can measure if that location is in the
cache or not.
- This allows us to detect which memory locations the CPU has accessed recently.
Spectre Variant 1 – CVE 2017-5753 (bounds check bypass, simplified explanation)
- Consider something like the following code:
- peek(128) will return 1 if the least significant bit of
protected_data[0] is set.
- We have effectively bypassed the (i < 128) bounds check.
uint8_t unprotected_data[128]; uint8_t protected_data[1]; int peek(int i) { flush_or_evict_caches(); if (slow_predicted_true(i < 128)) { int a = unprotected_data[i]; int b = unprotected_data[64*(a&1)]; return b; } return is_in_cache(unprotected_data[64]); }
Spectre Variant 2 – CVE 2017-5715 (branch target injection)
- Variant 1 relies on tricking the branch predictor into making an incorrect guess
- n whether a branch is taken or not.
- But processors can also branch to dynamic locations:
– x86: jmp eax; jmp[eax]; ret; jmp dword ptr [0x12345678] – ARM: MOV pc, r14 – MIPS: jr $ra – RISC-V: jalr x0,x1,0
- Spectre Variant 2 tricks the branch predictor into incorrectly guessing the
destination of such dynamic jumps.
- This can be used to speculatively execute arbitrary code gadgets, similar to
return-oriented-programming (ROP).
- Exfiltrate data using cache side channel.
Spectre and JIT Sandboxes
- Spectre only allows a process to read its own memory. So you might ask:
“What is the problem?”
- Its JIT sandboxes, where we run JIT-compiled untrusted code in our
process, assuming the bounds checks added by the JIT compiler will prevent the code from reading data it should not have access to.
– For example: A website running JavaScript code in your browser might access
security credentials or other private data in your browser memory.
- But that means the JavaScript code must be tailored to a JIT compiler to
yield the correct malicious machine code.
– For example you can’t simply flush CPU caches. Instead you must execute memory
access pattern that will evict the relevant pages from the cache.
– That’s why it said “simplified explanation” on the slide for Variant 1. – The Spectre paper contains a JavaScript code snippet that demonstrates such an
attack using the V8 JavaScript engine.
Meltdown – CVE 2017-5754
- The Meltdown attack exploits a privilege escalation vulnerability specific to
some processors:
– At least sometimes, Intel processors don’t check memory protection during speculative
execution.
– Instead memory protection is checked after the fact when instructions are committed.
But at that point we already exfiltrated data using the cache side channel.
– By adding a trapping instruction before the access to privileged memory we prevent
the access to be committed. So “it never happened” and no access violation is detected by the OS. But the data read can still be reconstructed using the cache state.
- Every Intel x86 / x86_64 processor since 1995
– Only exception afaik are Intel Atom processors from before 2013
- AMD x86 / x86_64 processors are not affected by Meltdown
- Very few ARM processors are affected. For example ARM Cortex-A75
- IBM POWER and System Z are also affected by variants of Meltdown
Meltdown Mitigations
- Short-term mitigation for existing processors:
– Flush TLBs when leaving kernel code – This prevents speculative access to kernel memory – But it also adds a performance penalty that can be significant for some workloads,
especially on processors that do not support selective TLB flushing (most Intel processors before Haswell).
- Long-term fix:
– Better isolation of kernel and user-land page tables – Probably at the cost of not allowing speculative execution into kernel code (such as
system calls)
- In my opinion there is no doubt that Meltdown is a hardware bug that needs to
be addressed in future hardware generations.
- But Intel says its processors “work as designed”, calls mitigation a “security
feature” instead of “bug fix”.
Spectre Mitigations
- In my opinion it is yet unclear to what extend we need to change our processors and to what extend
we need to change the model of what a processor does that we use to write software.
- Possible mitigations without software changes include:
– Do not perform any speculative execution. For example your phone most likely has a processor that does not even
perform out-of-order execution.
– Do not speculatively load or evict any cache lines. That would slow down the processor. This slowdown would be
significant on a system where access to main memory is pretty slow (such as a modern PC).
– Add special hidden cache slots used for speculative execution. This would allow rollback to also correctly restore
the cache state, eliminating the cache side-channel.
- Possible mitigations that require software changes and some kind of hardware support and/or
compiler support:
– Use special code sequences for bounds checking that make sure we never speculatively execute memory
accesses that are out-of-bounds.
– Use special code sequences for dynamic jumps that eliminate branch speculation (or always speculate with a safe
branch target).
- Possible software-only mitigations:
– Never run JIT code with sensitive data mapped into the process address space. For example, run JIT code in a
separate process and use explicit IPC to exchange data between it and the main program.
This is just the tip of the iceberg!
- Spectre and Meltdown are just the beginning.
- We need to fundamentally rethink the way we design complex computer systems.
– Formal modeling of all information flow – With regard to side channels in general: What is SW responsibility and what is HW/OS
responsibility? We need to rethink our models for HW that we use for writing SW.
- Other variations of Spectre:
– Scenarios where a victim process only executes the speculative code and the data is then
extracted from the side-channel in another process.
- This would still be an issue even if rollback is perfect because the other process could monitor of usage
- f shared resources in real-time while speculative execution is happening.
- A variation on this would be using multiple threads on a hyper-threaded CPU. This would enable
attacks on systems that don’t speculatively load data into L1 cache: Measure if the other thread gets scheduled as result of the victim thread being stalled. Use loads of data not in L1 cache to signal the “result” from the speculative execution.
– Instead of L1 cache timings an attacker could monitor any other part of the system that get
utilized during speculative execution, such as L2/L3 cache, main memory throughput, utilization of compute resources shared between cores (such as FPUs), power consumption, etc.
References
- Spectre and Meltdown papers and additional information:
https://meltdownattack.com/
- Link to this presentation:
http://clifford.at/papers/2018/spectre/