Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR - - PowerPoint PPT Presentation

jump over aslr attacking branch predictors to bypass aslr
SMART_READER_LITE
LIVE PREVIEW

Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR - - PowerPoint PPT Presentation

Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR Presentation by Eric Newberry and Youssef Tobah Paper by Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh 1 Motivation Buffer overflow attacks modify the control flow of a


slide-1
SLIDE 1

Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR

Presentation by Eric Newberry and Youssef Tobah Paper by Dmitry Evtyushkin, Dmitry Ponomarev, and Nael Abu-Ghazaleh

1

slide-2
SLIDE 2

Motivation

  • Buffer overflow attacks modify the control flow of a program

○ Exploit unsafe writes to overwrite the stack pointers ○ Function returns to the location specified by the attacker

  • Numerous security measures proposed to counteract:

○ No eXecute (NX) page bit - prevents non-code memory pages from being executed ○ Address space location randomization (ASLR) - Randomizes some bits in virtual addresses

  • If we can recover the randomized ASLR bits, may make this less difficult
  • Can we exploit a hardware side-channel to do this?

2

slide-3
SLIDE 3

Goals of the Attack

  • Recover the randomized (ASLR) bits of the virtual memory address by exploiting

(and measuring) collision delays in the Branch Target Buffer (BTB)

  • Do so to recover both userland ASLR and kernel ASLR (KASLR) bits
  • Then attacker can proceed with the buffer overflow attack
  • However, attacker needs to be able to do this using an unprivileged process

○ No special privileges ○ No memory disclosures

3

slide-4
SLIDE 4

Branch Target Buffer (BTB)

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

4

slide-5
SLIDE 5

The Attack in a Nutshell

  • Multiple addresses can map to the same location in the BTB

○ On Intel Haswell processors, cache tag is only a portion of the upper-order address bits, while virtual addresses are 48 bits

  • Attacker process causes collisions in the BTB using branch instructions
  • This increases latency of the attacker process, as its previously-loaded BTB entries

were overwritten by the victim

  • From these latency measurements, we can significantly narrow down (or guess
  • utright) the ASLR bits of the victim process or the kernel

5

slide-6
SLIDE 6

Attacking Other Userland Processes

  • Attacker loads some code into memory at a location where the upper order

address bits of the attacker code match targeted code in the victim

  • Send signal to victim process to execute targeted code
  • Victim will execute, overwriting attacker’s entry in the BTB
  • When attacker resumes, it will execute its branch and measure execution time
  • If branch took longer than usual to run, likely indicates a BTB collision

○ Only causes slowdown if target addresses of branches in attacker and victim differ ○ Intel states incorrect BTB prediction causes 8 cycle IF bubble ○ Average measured time over multiple attempts for increased accuracy

Spy T2 Spy T1 Victim T2 55.76 cycles 69.38 (+11.12) cycles Victim T1 64.93 (+9.17) cycles 58.26

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

6

slide-7
SLIDE 7

Attacking the Kernel

  • Kernel utilizes same virtual address space as

attacker process

  • We know that the first 18 bits of the kernel

address (out of 48) are fixed

  • Can’t utilize same virtual address as the kernel,

but using only the upper-order portions of the last 31 bits for the BTB tag gets around this issue (at least on Intel Haswell processors)

  • Therefore, the attack can be done in a manner

similar to userland processes

  • However, it’s easier to run targeted code in the

kernel due to syscalls

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

7

slide-8
SLIDE 8

KASLR in Linux

  • KASLR generates random bits during boot

○ Determines offset for physical memory location of kernel ○ Same offset is applied to virtual memory

  • Only 9 bits randomized

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

8

slide-9
SLIDE 9

Recovering KASLR Bits

  • Locate branch instruction in kernel to execute
  • List potential addresses for branch, and at each one

○ Allocate buffer ○ Load with code containing branch and time-measurement instruction ○ Activate kernel branch ○ Activate spy branch ○ Measure time taken to execute spy branch

9

slide-10
SLIDE 10

KASLR Recovery Results

  • Experiment performed using

‘open’ system call

  • Intentionally provide incorrect file

name

○ Increases speed ○ Reduces noise

  • 512 possible addresses for branch

○ 50 measurements collected at each address ○ Attack done in 60 milliseconds

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

10

slide-11
SLIDE 11

Recovering Userland ASLR Bits

  • Similar to KASLR bit recovery

○ Targets victim process with branch instruction ○ Makes list of potential addresses ○ Allocates buffer at address ○ Forces victim to call branch and use BTB ○ Spy runs jump multiple times and measures time ○ Measures again with spy branching to different target

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

11

slide-12
SLIDE 12

ASLR Co-residency

  • For KASLR, every process has kernel
  • For userland ASLR, attack needs to ensure co-residency to get BTB collisions
  • Two methods to ensure co-residency

Load all cores with dummy processes, except core with victim and spy Have spy run on every core

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR” Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

12

slide-13
SLIDE 13

Experimental Results

  • Not all ASLR bits retrieved

○ Bits 12 to 40 randomized ○ Only 18 bits used in BTB

  • Checking 100 addresses took

about 1 second

○ Further optimizations could improve performance

13

Evtyushkin et al, “Jump Over ASLR: Attacking Branch Predictors to Bypass ASLR”

slide-14
SLIDE 14

Technical Insights

  • ASLR vulnerability exposed
  • Potential mitigation techniques proposed
  • Software solutions

○ Finer grained ASLR ○ Fuzzing timing measurements ○ For KASLR, randomize higher order bits for every process launch

  • Hardware Solutions

○ Full virtual address for accessing BTB ○ For kernel, add secret value to BTB hash function ■ Use different hash value for each process ○ Flush BTB on context switches ○ Each BTB could be marked with process ID

14

slide-15
SLIDE 15

Related Work

  • Other side channel attacks

○ Cache side channels (many works) ○ Branch predictor attack to get secret keys (Aciicmez et al) ○ Branch predictor attack to build inter-process communication channel (Evtyushkin et al)

  • Other ASLR work

○ Brute-force attacks (Shacham et al) ○ Memory disclosure attacks (Roglia et al) ○ Attacking KASLR by causing cache collisions b/w kernel and userland processes (Hund et al) ○ TLB manipulation attack to reveal kernel memory pages (Hund et al)

  • BTB attacks are simpler and do not rely on noisy cache measurements

15

slide-16
SLIDE 16

Conclusion

  • By causing collisions in the Branch Target Buffer (BTB), we can recover (or at

least narrow down) ASLR bits

  • This mechanism is extremely effective for both userland processes and the kernel
  • The authors proposed multiple mechanisms, both in software and hardware, to

close this side channel and prevent ASLR bits from being recovered

16

slide-17
SLIDE 17

Discussion Points

  • For userland ASLR attack, not all random bits are known by end of attack. Is this

acceptable?

○ Essentially reduces difficulty of a brute force attack

  • Is the method of maximizing process calls on all cores but one sufficiently

stealthy?

  • Is there a software mechanism that could be developed to detect this attack?

○ This would be useful on older systems designed before this side channel was discovered.

  • One of the mitigations the authors propose is finer-grained randomization of

ASLR, even going as far as to do this at an instruction-level granularity. Would this cause any difficulties for developers?

○ Particularly during debugging?

17