Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch - - PowerPoint PPT Presentation

inferring fine grained control
SMART_READER_LITE
LIVE PREVIEW

Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch - - PowerPoint PPT Presentation

Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing Sangho Lee Ming-Wei Shih Prasun Gera Taesoo Kim Hyesoon Kim Marcus Peinado 26 th USENIX Security Symposium August 17, 2017 Intel Software Guard Extension (SGX)


slide-1
SLIDE 1

Inferring Fine-grained Control Flow Inside SGX Enclaves with Branch Shadowing

Sangho Lee Ming-Wei Shih Prasun Gera Taesoo Kim Hyesoon Kim Marcus Peinado 26th USENIX Security Symposium August 17, 2017

slide-2
SLIDE 2

Intel Software Guard Extension (SGX)

User process Trusted enclave System software (OS, hypervisor, …) Encrypt Prohibited ECALL Decrypt No cold-boot attack OCALL/ Return

Cache

2

Sensitive

  • perations

Normal

  • perations
slide-3
SLIDE 3

Intel Software Guard Extension (SGX)

User process Trusted enclave System software (OS, hypervisor, …) Encrypt Prohibited ECALL Decrypt No cold-boot attack OCALL/ Return

Cache

Q: What about side-channel attacks?

2

Sensitive

  • perations

Normal

  • perations
slide-4
SLIDE 4

Side-channel attacks against Intel SGX are getting attention

Monitor page-fault or page-access sequence

(Oakland15, ASIACCS16, Security17)

  • Noise-free, but coarse-grained (page address)

Measure cache hit/miss timing

(EuroSec17, DIMVA17, ATC17, WOOT17)

  • Fine-grained (cache line), but noisy

3

slide-5
SLIDE 5

Page-fault side channel (Oakland15)

Unmap all pages and monitor page fault sequences

  • Page 1->Page 2: A member
  • Page 1->Page 3: Not a member

if (is_member(person)) { welcome(); } else { bye(); }

4

Page1: is_member() Page2: welcome() Page3: bye()

slide-6
SLIDE 6

Page-fault side channel (Oakland15)

Unmap all pages and monitor page fault sequences

  • Page 1->Page 2: A member
  • Page 1->Page 3: Not a member

if (is_member(person)) { welcome(); } else { bye(); }

Does not work when a sensitive control flow change

  • ccurs within the same page (or cache line)

4

Page1: is_member() Page2: welcome() Page3: bye()

slide-7
SLIDE 7

Branch shadowing: A fine-grained side- channel attack against Intel SGX

  • Can attack each branch instruction
  • Neither page nor cache-line granularity
  • Deterministically identify branch history
  • Either taken or not taken
  • Not about timing difference
  • Achieve high attack success rate
  • Recover 66% of a 1024-bit RSA private key from a single run

5

slide-8
SLIDE 8

Observation: SGX does not clear branch history!

CPU caches how each branch instruction has been executed for later prediction, even for SGX.

  • Either taken or not taken, as well as its target address

Does an attacker have a reliable way to extract branch history from SGX?

6

slide-9
SLIDE 9

Performance monitoring unit (PMU) is prohibited

  • PMUs to profile branch history
  • Last branch record (LBR) and processor trace (PT)
  • Prediction results (success/failure), target address, …
  • Anti side channel inference (ASCI)
  • SGX doesn’t publish hardware performance events to PMUs.
  • Malicious OS cannot directly use PMUs to get

SGX’s branch history.

7

slide-10
SLIDE 10

Branch collision timing attack works for SGX but has limitations

Mispredicted branch takes longer than a correctly predicted branch.

  • But, we cannot directly time a target branch inside SGX.

if (is_member(p)){ … } else { … } Misprediction Rollback& Re-execute

8

slide-11
SLIDE 11

Branch collision timing attack works for SGX but has limitations

Colliding branches affect each other’s prediction (MICRO16).

  • e.g., if a branch has been taken, CPU will predict other colliding

branches will also be taken. ADDR[31:0] taken/not-taken target address 0xff12345678 0xffc12345678 Branch instructions with colliding addresses (CPU truncates higher bits to reduce storage overhead.) … …

9

slide-12
SLIDE 12

Branch collision timing attack works for SGX but has limitations

Branch execution inside SGX affects colliding branches outside of SGX (shadow branch).

  • We can time a shadow branch instead of the actual target to

know whether it has been mispredicted, but…

This attack has two critical limitations.

  • Suffer from high measurement noise
  • Difficult to synchronize target and shadow branches

10

slide-13
SLIDE 13

Limitation 1: High measurement noise

Mispredicted branch takes long to do rollback while suffering from high variance.

200 400 600 800 1000 Mean Stdev Cycle Prediction Misprediction ~25 cycles ~800 cycles (depending on rollbacked instructions)

* 10,000 times. 120 NOPs at the fall-through path

11

slide-14
SLIDE 14

Limitation 2: Difficulty in synchronization

We need to time a shadow branch right after a target has been executed to avoid overwriting.

  • e.g., Skylake’s branch target buffer: 4 ways x 1,024 sets
  • Worst case: Five branch executions would overwrite the target

branch history.

Synchronization is difficult because SGX does not allow single-stepping.

12

slide-15
SLIDE 15

How does branch shadowing overcome the two limitations?

Apply LBR to a shadow branch to identify branch prediction results instead of timing

  • No ASCI because a shadow branch is outside of SGX
  • Deterministic: Either correctly predicted or mispredicted

Realize near single-stepping by increasing timer interrupt frequency and disabling the cache

  • Can interrupt SGX enclaves for every ~5 cycles

13

slide-16
SLIDE 16

Threat model

  • Attacker knows the source code or binary of a

target enclave.

  • Attacker can frequently interrupt the target

enclave’s execution to execute attack code.

  • Attacker prevents or disrupts the target enclave

from accessing a trusted time source.

14

slide-17
SLIDE 17

Step 1: Prepare a shadow copy of an SGX program to monitor it with LBR

cmp … je L1 … … jmpq *rdx … SGX enclave

LBR ASCI

15

slide-18
SLIDE 18

Step 1: Prepare a shadow copy of an SGX program to monitor it with LBR

cmp … je L1 … … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code (outside of SGX)

LBR

Colliding branch instructions

ASCI

15

slide-19
SLIDE 19

Step 1: Prepare a shadow copy of an SGX program to monitor it with LBR

cmp … je L1 … … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code (outside of SGX)

LBR

Colliding branch instructions can monitor all branch executions

15

slide-20
SLIDE 20

Step 2: Interrupt SGX execution and monitor shadow code with LBR

cmp … je L1 … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code execute

16

slide-21
SLIDE 21

Step 2: Interrupt SGX execution and monitor shadow code with LBR

cmp … je L1 … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code execute

16

slide-22
SLIDE 22

Step 2: Interrupt SGX execution and monitor shadow code with LBR

cmp … je L1 … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code execute execute while enabling LBR (predicted or mispredicted?)

16

slide-23
SLIDE 23

Step 2: Interrupt SGX execution and monitor shadow code with LBR

cmp … je L1 … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code execute resume execute while enabling LBR (predicted or mispredicted?)

16

slide-24
SLIDE 24

Step 2: Interrupt SGX execution and monitor shadow code with LBR

cmp … je L1 … jmpq *rdx … SGX enclave cmp rax,rax je L1’ … (nop) mov addr,rdx jmpq *rdx … (nop) Shadow code execute resume execute while enabling LBR (predicted or mispredicted?)

Whether or not shadow branches were correctly predicted reveals the history of target branches.

16

slide-25
SLIDE 25

Shadow conditional branch and prediction result

cmp rax, rax 0xff*530:je 0xff*5f4 0xff*532:nop … 0xff*5f4:nop Shadow code cmp $0, rax 0x00*530:je 0x005f4 0x00*532:inc rbx … 0x00*5f4:dec rbx SGX enclave

LBR does not report not-taken branches, so we make our shadow branch be always taken.

Always taken

?

collision

17

slide-26
SLIDE 26

Shadow conditional branch and prediction result

  • Our shadow branch should be taken, but how

does CPU predict it with target branch’s history?

  • If the target branch has been taken

➢LBR: The shadow branch has been correctly predicted.

  • If the target branch has been not taken

➢LBR: The shadow branch has been mispredicted.

18

slide-27
SLIDE 27

Shadow conditional branch and prediction result

  • Our shadow branch should be taken, but how

does CPU predict it with target branch’s history?

  • If the target branch has been taken

➢LBR: The shadow branch has been correctly predicted.

  • If the target branch has been not taken

➢LBR: The shadow branch has been mispredicted.

Deterministically identify whether a target conditional branch has been taken or not taken

18

slide-28
SLIDE 28

Shadow indirect branch and prediction result

mov 0xff*532,rdx 0xff*530:jmpq *rdx 0xff*532:nop … 0xff*5f4:nop Shadow code 0x00*530:jmpq *rdx 0x00*532:inc rbx … 0x00*5f4:dec rbx SGX enclave Next instruction

For an indirect branch, LBR reports a target prediction result. We use its default target: Next instruction. ?

collision

19

slide-29
SLIDE 29

Shadow indirect branch and prediction result

  • Our shadow branch will be correctly predicted

unless the target branch updates cached destination.

  • If the target branch has been executed

➢LBR: The shadow branch has been mispredicted.

  • If the target branch has been not executed

➢LBR: The shadow branch has been correctly predicted.

20

slide-30
SLIDE 30

Shadow indirect branch and prediction result

  • Our shadow branch will be correctly predicted

unless the target branch updates cached destination.

  • If the target branch has been executed

➢LBR: The shadow branch has been mispredicted.

  • If the target branch has been not executed

➢LBR: The shadow branch has been correctly predicted.

Deterministically identify whether a target indirect branch has been executed or not

20

slide-31
SLIDE 31

Near single-stepping: Frequent timer and disabled cache

Increase timer interrupt frequency

  • Adjust the timestamp counter value of the local APIC timer

using a model-specific register, MSR_IA32_TSC_DEADLINE

Disable the CPU cache

  • CD bit of the CR0 register (code?)

21

slide-32
SLIDE 32

Near single-stepping: Frequent timer and disabled cache

Increase timer interrupt frequency

  • Adjust the timestamp counter value of the local APIC timer

using a model-specific register, MSR_IA32_TSC_DEADLINE

Disable the CPU cache

  • CD bit of the CR0 register (code?)

~50 cycles

21

slide-33
SLIDE 33

Near single-stepping: Frequent timer and disabled cache

Increase timer interrupt frequency

  • Adjust the timestamp counter value of the local APIC timer

using a model-specific register, MSR_IA32_TSC_DEADLINE

Disable the CPU cache

  • CD bit of the CR0 register (code?)

~5 cycles

21

slide-34
SLIDE 34

Attack evaluation: Sliding-window exponentiation

/* X = A^E mod N */ mbedtls_mpi_exp_mod(X, A, E, N, _RR) { … while (1) { // i-th bit of exponent ei = (E->p[nblimbs] >> bufsize) & 1; if (ei == 0 && state == 0) continue; if (ei == 0 && state == 1) mpi_montmul(X, X, N, mm, &T); … } …

taken only when ei is one

22

slide-35
SLIDE 35

Attack evaluation: Sliding-window exponentiation

/* X = A^E mod N */ mbedtls_mpi_exp_mod(X, A, E, N, _RR) { … while (1) { // i-th bit of exponent ei = (E->p[nblimbs] >> bufsize) & 1; if (ei == 0 && state == 0) continue; if (ei == 0 && state == 1) mpi_montmul(X, X, N, mm, &T); … } …

taken only when ei is one

We can recover 66% of a 1024-bit RSA private key from a single run (~10 runs are enough to fully recover it).

22

slide-36
SLIDE 36

Attack demo

https://youtu.be/jf9PanlF374

23

slide-37
SLIDE 37

Hardware countermeasure: Flush branch history at SGX mode switch

Most effective, but need hardware modification

  • It would not be realized by microcode update.

Overhead depends on how frequently SGX mode switch occurs.

24

slide-38
SLIDE 38

Simulation result

0.2 0.4 0.6 0.8 1 bzip2 gcc mcf h264ref omnetpp astar gamess namd sphinx3 GMEAN Normalized IPC 100 1k 10k 100k 1M 10M

Overhead was ~2% when mode switching occurs at every 100k cycles.

  • Ten times frequent than the timer interrupt of Windows 10

(generated for every 1M cycles @ 4GHz CPU)

25

slide-39
SLIDE 39

Software mitigation: Branch obfuscation

Replace a set of branches with a single indirect branch plus conditional move instructions

  • Indirect branch only reveals when and whether it has been

executed, not its target.

  • Conditional move is used to conditionally update the indirect

branch’s target.

Modify LLVM for automatic transformation

  • Average overhead: Below 1.3x (nbench)

26

slide-40
SLIDE 40

Example of branch obfuscation

L0:cmp $0,$a je L2 L1:… L2:… Can identify whether L1 or L2 has been executed

27

slide-41
SLIDE 41

Example of branch obfuscation

L0:cmp $0,$a je L2 L1:… L2:… Can identify whether L1 or L2 has been executed Can identify whether Z1 has been executed but not its target transformation L0: mov $L1,r15 cmp $0,$a cmov $L2,r15 jmp Z1 L1: … L2: … … Z1: jmpq *r15

27

slide-42
SLIDE 42

Conclusion

Branch shadowing: Fine-grained and deterministic side-channel attack on SGX

  • Reveal direction and/or execution of individual branch instrs

Proposed hardware- and software-based countermeasures

  • Branch history flushing and obfuscation

Thanks for listening! Sangho Lee (sangho@gatech.edu)

28