Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU - - PowerPoint PPT Presentation

nemesis studying microarchitectural timing leaks in
SMART_READER_LITE
LIVE PREVIEW

Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU - - PowerPoint PPT Presentation

Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU Interrupt Logic Jo Van Bulck Frank Piessens Raoul Strackx imec-DistriNet, KU Leuven ACM CCS, October 2018 Microarchitectural side-channels and where to find them CPU cache


slide-1
SLIDE 1

Nemesis: Studying Microarchitectural Timing Leaks in Rudimentary CPU Interrupt Logic

Jo Van Bulck Frank Piessens Raoul Strackx

imec-DistriNet, KU Leuven

ACM CCS, October 2018

slide-2
SLIDE 2
slide-3
SLIDE 3

Microarchitectural side-channels and where to find them

CPU cache Branch prediction Address translation

1 / 14

slide-4
SLIDE 4

Microarchitectural side-channels and where to find them

CPU cache Branch prediction Address translation

1 / 14

slide-5
SLIDE 5

Microarchitectural side-channels and where to find them

Intel response [Int18] This is not a bug or a flaw . . . [side-channels] can’t be eliminated

1 / 14

slide-6
SLIDE 6

Microarchitectural side-channels and where to find them

Intel response [Int18] This is not a bug or a flaw . . . [side-channels] can’t be eliminated

⇒ Systematically study microarchitectural leakage

1 / 14

slide-7
SLIDE 7

Nemesis: Studying rudimentary CPU interrupt logic

Overview ⇒ Interrupts leak instruction execution times ⇒ Determine control flow in enclave programs

2 / 14

slide-8
SLIDE 8

Nemesis: Studying rudimentary CPU interrupt logic

Overview ⇒ Interrupts leak instruction execution times ⇒ Determine control flow in enclave programs Research contributions ⇒ (First) remote µ-arch attack on embedded CPUs ⇒ Understanding CPU pipeline leakage (˜Meltdown)

2 / 14

slide-9
SLIDE 9

Back to basics: Fetch decode execute

Fetch instruction Decode Execute

3 / 14

slide-10
SLIDE 10

Back to basics: Fetch decode execute

Fetch instruction Decode Execute Interrupt

3 / 14

slide-11
SLIDE 11

Back to basics: Fetch decode execute

Fetch instruction Decode Execute Interrupt

Interrupts delayed till instruction retirement

3 / 14

slide-12
SLIDE 12

Wait a cycle: Interrupt latency as a side-channel

CLK CMD NOP IRQ logic ISR IRQ CMD ADD IRQ logic ISR IRQ

4 / 14

slide-13
SLIDE 13

Wait a cycle: Interrupt latency as a side-channel

CLK CMD NOP IRQ logic ISR IRQ CMD ADD IRQ logic ISR IRQ

4 / 14

slide-14
SLIDE 14

Enclaved execution adversary model Mem HDD OS kernel

Trusted Untrusted

CPU App App TPM Hypervisor Enclave app

Intel SGX promise: hardware-level isolation and attestation

5 / 14

slide-15
SLIDE 15

Enclaved execution adversary model Mem HDD OS kernel

Trusted Untrusted

CPU App App TPM Hypervisor Enclave app

Untrusted OS → new class of powerful side-channels

5 / 14

slide-16
SLIDE 16

Sancus: Open source trusted computing for the IoT

Embedded enclaved execution: ISA extensions for isolation & attestation Save + clear CPU state on enclave interrupt

Noorman et al. “Sancus 2.0: A Low-Cost Security Architecture for IoT devices”, TOPS 2017 [NVBM+17] https://github.com/sancus-pma and https://distrinet.cs.kuleuven.be/software/sancus/ 6 / 14

slide-17
SLIDE 17

Sancus: Open source trusted computing for the IoT

Embedded enclaved execution: ISA extensions for isolation & attestation Save + clear CPU state on enclave interrupt Extremely low-end processor (openMSP430): Area: ≤ 2 kLUTs Deterministic execution: no pipeline/cache/MMU/. . . No known microarchitectural side-channels (!)

Noorman et al. “Sancus 2.0: A Low-Cost Security Architecture for IoT devices”, TOPS 2017 [NVBM+17] https://github.com/sancus-pma and https://distrinet.cs.kuleuven.be/software/sancus/ 6 / 14

slide-18
SLIDE 18

Secure input-output with Sancus enclaves

Driver enclave: Exclusive access to memory-mapped I/O device

Van Bulck et al. “VulCAN: Vehicular component authentication and software isolation”, ACSAC 2017 [VBMP17] 7 / 14

slide-19
SLIDE 19

Secure input-output with Sancus enclaves

Driver enclave: 16-bit vector indicates which keys are down 0100000000000000

traverse bits

PIN code enclave

7 / 14

slide-20
SLIDE 20

Secure input-output with Sancus enclaves

Attacker: Interrupt conditional control flow to infer secret PIN

Key 'B' was pressed!

0100000000000000

traverse bits

IRQ PIN code enclave

7 / 14

slide-21
SLIDE 21

Sancus IRQ timing attack: Inferring key strokes

1 4

IRQ latency Instruction (interrupt number)

Enclave x-ray: Start-to-end trace enclaved execution

8 / 14

slide-22
SLIDE 22

Sancus IRQ timing attack: Inferring key strokes

1 4

IRQ latency

1

Instruction (interrupt number)

Enclave x-ray: Keymap bit traversal (ground truth)

8 / 14

slide-23
SLIDE 23

Sancus IRQ timing attack: Inferring key strokes

1 4

IRQ latency

1 1 2 3 4

IRQ latency (cycles) Instruction (interrupt number)

0 (no press) 1 (key pressed) 0 (no press)

8 / 14

slide-24
SLIDE 24

Interrupting and resuming Intel SGX enclaves

Challenge: x86 execution time prediction (timer)

9 / 14

slide-25
SLIDE 25

Interrupting and resuming Intel SGX enclaves

SGX-Step: user space APIC timer + IRQ handling

SGX-Step user space

Van Bulck et al. “SGX-Step: A practical attack framework for precise enclave execution control”, SysTEX 2017 [VBPS17] https://github.com/jovanbulck/sgx-step 9 / 14

slide-26
SLIDE 26

Microbenchmarks: Measuring x86 instruction latencies

Latency distribution: 10,000 samples from benchmark enclave

IRQ latency (cycles) Frequency

nop add rdrand fscale lfence

10 / 14

slide-27
SLIDE 27

Microbenchmarks: Measuring x86 instruction latencies

Timing leak: reconstruct instruction latency class

IRQ latency (cycles) Frequency

nop add rdrand fscale lfence

10 / 14

slide-28
SLIDE 28

Microbenchmarks: Measuring x86 cache misses

Timing leak: reconstruct micro-architectural cache state

load cache hit load cache miss

IRQ latency (cycles) Frequency 11 / 14

slide-29
SLIDE 29

Microbenchmarks: Measuring x86 cache misses

Timing leak: many more → see paper!

load cache hit load cache miss

IRQ latency (cycles) Frequency 11 / 14

slide-30
SLIDE 30

Single-stepping SGX enclaves in practice

Enclave x-ray: Start-to-end trace enclaved execution

Instruction (interrupt number) IRQ latency (cycles) 12 / 14

slide-31
SLIDE 31

Single-stepping SGX enclaves in practice

Enclave x-ray: Spotting high-latency instructions

Instruction (interrupt number) IRQ latency (cycles)

rdrand (generate stack canary on enclave entry)

12 / 14

slide-32
SLIDE 32

Single-stepping SGX enclaves in practice

Enclave x-ray: Zooming in on bsearch function

Instruction (interrupt number) IRQ latency (cycles) 12 / 14

slide-33
SLIDE 33

De-anonymizing enclave lookups

Binary search: Find 40 in {20, 30, 40, 50, 80, 90, 100}

13 / 14

slide-34
SLIDE 34

De-anonymizing enclave lookups

Adversary: Infer secret lookup in known array left right hit

13 / 14

slide-35
SLIDE 35

De-anonymizing enclave lookups

Goal: Infer lookup → reconstruct bsearch control flow

7800 7950

Interrupt (instruction number) IRQ latency (cycles)

13 / 14

slide-36
SLIDE 36

De-anonymizing enclave lookups

Goal: Infer lookup → reconstruct bsearch control flow

7800 7950

Interrupt (instruction number)

Left Right Hit

IRQ latency (cycles)

13 / 14

slide-37
SLIDE 37

De-anonymizing enclave lookups

⇒ Sample instruction latencies in secret-dependent path

7800 7950

Interrupt (instruction number)

HLLL LLHL HHHH

IRQ latency (cycles)

13 / 14

slide-38
SLIDE 38

Conclusions

Nemesis contributions ⇒ Understanding CPU interrupt leakage ⇒ (First) embedded + high-end µ-arch channel

14 / 14

slide-39
SLIDE 39

Conclusions

Nemesis contributions ⇒ Understanding CPU interrupt leakage ⇒ (First) embedded + high-end µ-arch channel

https://github.com/jovanbulck/nemesis

14 / 14

slide-40
SLIDE 40

References I

Intel Corporation. Resources and response to side channel variants 1, 2, 3. intel.com/content/www/us/en/architecture-and-technology/side-channel-variants-1-2-3.html, 2018.

  • S. Lee, M.-W. Shih, P. Gera, T. Kim, H. Kim, and M. Peinado.

Inferring fine-grained control flow inside SGX enclaves with branch shadowing. In Proceedings of the 26th USENIX Security Symposium. USENIX Association, 2017.

  • J. Noorman, J. T. M¨

uhlberg, and F. Piessens. Authentic execution of distributed event-driven applications with a small TCB. In 13th International Workshop on Security and Trust Management (STM’17), vol. 10547 of LNCS, pp. 55–71, Heidelberg, 2017. Springer.

  • J. Noorman, J. Van Bulck, J. T. M¨

uhlberg, F. Piessens, P. Maene, B. Preneel, I. Verbauwhede, J. G¨

  • tzfried, T. M¨

uller, and F. Freiling. Sancus 2.0: A low-cost security architecture for IoT devices. ACM Transactions on Privacy and Security (TOPS), 2017.

  • J. Van Bulck, J. T. M¨

uhlberg, and F. Piessens. VulCAN: Efficient component authentication and software isolation for automotive control networks. In Proceedings of the 33th Annual Computer Security Applications Conference (ACSAC’17). ACM, 2017.

  • J. Van Bulck, J. Noorman, J. T. M¨

uhlberg, and F. Piessens. Towards availability and real-time guarantees for protected module architectures. In Companion Proceedings of the 15th International Conference on Modularity (MASS’16), pp. 146–151. ACM, 2016.

  • J. Van Bulck, F. Piessens, and R. Strackx.

SGX-Step: A practical attack framework for precise enclave execution control. In Proceedings of the 2nd Workshop on System Software for Trusted Execution, SysTEX’17, pp. 4:1–4:6. ACM, 2017. 15 / 14

slide-41
SLIDE 41

Appendix: Interrupting and resuming SGX enclaves

16 / 14

slide-42
SLIDE 42

Appendix: Sancus keypad application scenario

MMIO SM_driver

SM_secure

(asm)

Timer_A MSP430 core

while (poll_keypad()) function poll_keypad : key_state = read_key_state() for i=0 to 15 do if key_state & (0x1<<i) then secret_pin.add(keymap[i]) end if end for

INTERRUPT

17 / 14

slide-43
SLIDE 43

Appendix: Measuring x86 data dependencies

Division: execution time ≈ dividend significant bits

18 / 14

slide-44
SLIDE 44

Appendix: Measuring x86 page table walks

TLB miss: flush unprotected page table entries

19 / 14

slide-45
SLIDE 45

Appendix: Measuring x86 cache misses

20 / 14

slide-46
SLIDE 46

Appendix: Boxplot binary search distribution

⇒ 100 bsearch runs: left (blue), right (green), hit (red)

cmp cmp 8100 8200 8000 7900 7800 7700 cmp jg jg add mov sub mov test lea pop jne shr pop mov test pop mov je pop shr mov pop mov mov pop imul shr retq 1 2 3 4 5 6 7 8 9 10

latency instr.

21 / 14

slide-47
SLIDE 47

Appendix: Boxplot Zigzagger distribution

⇒ 100 zigzag runs: branch taken (blue), not-taken (red)

8200 8100 8000 7900 7800 7700 nop lea jmp jmpq jmpq lea jmpq jmpq lea cmp jmp cmove jmp jmp jmp latency instr. zz4 zz4 zz3 b3.j b2.j b1/b2 b1/b2 b1.j/b2 zz2/b2 secret-dependent code path jmp jmp jmp

22 / 14