ASLR on the Line
Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, Crisano Giuffrida
VUSec
ASLR on the Line Ben Gras, Kaveh Razavi, Erik Bosman , Herbert Bos, - - PowerPoint PPT Presentation
ASLR on the Line Ben Gras, Kaveh Razavi, Erik Bosman , Herbert Bos, Cris ano Giu ff rida VUSec Erik Bosman @brainsmoke Kaveh Razavi @gober Ben Gras @bjg Stephan van Schaik ASLR Address Space Layout Randomiza on Widely deployed
Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, Crisano Giuffrida
VUSec
Erik Bosman
@brainsmoke
Ben Gras
@bjg
Kaveh Razavi
@gober
Stephan van Schaik
Address Space Layout Randomizaon
Widely deployed exploit migaon strategy: Choose a different locaon for code and data every me a process is run.
lower addresses higher addresses 2
48-1
lower addresses higher addresses 2
48-1
lower addresses higher addresses 2
48-1
lower addresses higher addresses 2
48-1
lower addresses higher addresses 2
48-1
lower addresses higher addresses 2
48-1
lower addresses higher addresses 2
48-1
Address Space Layout Randomizaon
Makes life for exploit writers a bit more difficult. Usually exploits need to know the locaon
Address Space Layout Randomizaon
Exploit writers need to find a bug which leaks addresses without crashing the program. ... or do they?
A side-channel aack on processes baked into the hardware to discover ASLR informaon from Javascript in the browser. ASLR Cache (AnC) ⊕
CPU Core L1 L2 L3 (Last Level Cache), shared between cores DDR Memory CPU Core L1 L2 CPU Core L1 L2 CPU Core L1 L2
Modern CPU architectures
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data virtual address
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data
MMU
virtual address physical address
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data virtual address physical address MMU TLB cache
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data virtual address physical address MMU TLB cache PT walk miss
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data virtual address physical address MMU TLB cache PT walk miss
t0=performance.now();
t1=performance.now(); t = t1-t0; measured me real me
t0=performance.now();
t1=performance.now(); t = t1-t0; measured me real me
t0=performance.now();
t1=performance.now(); t = t1-t0; measured me real me
aer an- side-channel migaons (firefox) aer an- side-channel migaons (firefox)
c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();
while(t1 == p.now()) { c++; } measured me real me
aer an- side-channel migaons (firefox)
c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();
while(t1 == p.now()) { c++; } measured me real me
aer an- side-channel migaons (firefox)
c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();
while(t1 == p.now()) { c++; } measured me real me
aer an- side-channel migaons (firefox)
c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();
while(t1 == p.now()) { c++; } measured me real me
aer an- side-channel migaons (firefox)
c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();
while(t1 == p.now()) { c++; } measured me real me
aer an- side-channel migaons (firefox) aer an- side-channel migaons (firefox)
c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();
while(t1 == p.now()) { c++; } measured me real me
aer an- side-channel migaons (chrome) aer an- side-channel migaons (chrome)
memory which may be shared between mulple worker threads.
enabled by default by Firefox, Chrome and Edge since 2017 memory which may be shared between mulple worker threads.
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;
buf[0]=0;
1 2
measured me real me
using SharedArrayBuffer and worker threads
cache line (64 bytes) memory memory access data physical address
L3 cache N-way associave cache set 1 cache set memory memory access data physical address
L3 cache
2048 cache sets with 64 byte cache lines memory memory access data physical address
L3 cache
as many slices as cores
memory memory access data physical address
L3 cache
memory memory access data physical address
L3 cache
cache_set = (addr >> 6) % 2048, direct mapping, repeated every 128KB memory memory access data physical address
L3 cache
cache_set = (addr >> 6) % 2048, cache_slice = xor_hash(addr) direct mapping, repeated every 128KB memory memory access data physical address
L3 cache cache_set = (addr >> 6) % 2048, direct mapping, repeated every 128KB memory memory access data physical address
L3 cache cache_set = (addr >> 6) % 2048, two cache lines mapping to the same cache set have the same physical address modulo 128KB direct mapping, repeated every 128KB memory memory access data physical address
L3 cache cache_set = (addr >> 6) % 2048, two cache lines mapping to the same cache set have the same physical address modulo 4KB direct mapping, repeated every 128KB memory memory access data physical address
L3 cache two cache lines mapping to the same cache set have the same
memory page memory memory access data physical address 1 page = 64 cache lines
L3 cache EVICT + TIME (does an operaon use a specific cache line?)
L3 cache EVICT + TIME (does an operaon use a specific cache line?) evict(line_x); time(); t0 = time();
t = time()-t0;
L3 cache EVICT + TIME (does an operaon use a specific cache line?) evict(line_x); time(); t0 = time();
t = time()-t0;
L3 cache EVICT + TIME (does an operaon use a specific cache line?)
X mybuf X X X ...
evict(line_x); time(); t0 = time();
t = time()-t0;
L3 cache EVICT + TIME (does an operaon use a specific cache line?)
X mybuf X X X ...
evict(line_x); time(); t0 = time();
t = time()-t0;
X X X X X X X X X X X X
...
L3 cache EVICT + TIME (does an operaon use a specific cache line?)
X mybuf X X X ...
evict(line_x); time(); t0 = time();
t = time()-t0;
X X X X X X X X X X X X
...
L3 cache EVICT + TIME (does an operaon use a specific cache line?)
X mybuf X X X ...
evict(line_x); time(); t0 = time();
t = time()-t0;
X X X X X X X X X X X X
... trigger memory access (or not)
CPU Core
L1 code / L1 data L2 L3 (Last Level Cache), shared between cores
... ...
memory access data virtual address physical address MMU TLB cache PT walk miss
lower addresses higher addresses 2
48-1
CR3 512 entries covering 512GB each 2
48-1
CR3 512 entries covering 1GB each 2
48-1
CR3 512 entries covering 2MB each 2
48-1
CR3 512 entries poinng to 4096 byte regions in memory 2
48-1
CR3 2
48-1
virtual address lookup (x86_64)
virtual address lookup (x86_64)
7 F 8 3 B 6 3 7 3 4
TLB miss!
CR3
CR3 512 entries 511
255
CR3 512 entries 511
255
CR3 512 entries
255
CR3
14
512 entries
255
CR3
14
512 entries
255
CR3
14 433
512 entries
255
CR3
14 433
512 entries
255
CR3
14 433 370
512 entries
255
CR3
14 433 370
actual data
255
CR3
14 433 370
actual data
64
255
CR3
14 433 370
actual data 4K page
64
255
CR3
14 433 370
4K page
64
255
CR3
14 433 370
4K page
64
address informaon is directly encoded into the page table lookups, and page tables are pages themselves.
255
CR3
14 433 370
4K page
64
255
CR3
14 433 370
4K page
64
4K page 4K page 4K page 4K page
255
CR3
14 433 370
4K page
64
4K page 4K page 4K page 4K page
CR3
... ...
255 254 253 252 251 250 249 248 256 247 255
... ...
255 254 253 252 251 250 249 248 256 247 255 1 Cache line = 64 bytes = 8 possible page table entries
... ...
255 254 253 252 251 250 249 248 256 247 255 1 Cache line = 64 bytes = 8 possible page table entries
... ...
255 254 253 252 251 250 249 248 256 247 255 1 Cache line = 64 bytes = 8 possible page table entries cache line reveals 6 address bits
64 370 433 14 255
64 370 433 14 255
64 370 433 14 255 locaon within the page known by studying browser memory allocator
64 370 433 14 255
64 370 433 14 255
max entropy le:
? ? ? ?
max entropy le:
? ? ? ?
max entropy le: 4*3 bits + ...
? ? ? ?
which hit belongs to which cache line? max entropy le: 4*3 bits + ...
? ? ? ?
which hit belongs to which cache line? max entropy le: 4*3 bits + log2( 4 * 3 * 2 * 1 )
? ? ? ?
which hit belongs to which cache line? max entropy le: ~ 16.6 bits
allocate a buffer perform this side-channel aack on buffer entries 4096 bytes apart measure when the page table lookup crosses a cache line boundary
... ...
375 374 373 372 371 370 369 368 376 367 370
... ...
375 374 373 372 371 370 369 368 376 367 371 +4096 bytes
... ...
375 374 373 372 371 370 369 368 376 367 372 +4096 bytes +4096 bytes
... ...
375 374 373 372 371 370 369 368 376 367 373 +4096 bytes +4096 bytes +4096 bytes
... ...
375 374 373 372 371 370 369 368 376 367 374 +4096 bytes +4096 bytes +4096 bytes +4096 bytes
... ...
375 374 373 372 371 370 369 368 376 367 375 +4096 bytes +4096 bytes +4096 bytes +4096 bytes +4096 bytes
376 +4096 bytes +4096 bytes +4096 bytes +4096 bytes +4096 bytes +4096 bytes
... ...
375 374 373 372 371 370 369 368 367 376
we can do the same thing for the 2nd level page table
... ...
439 438 437 436 435 434 433 432 440 431 433
... ...
439 438 437 436 435 434 433 432 440 431 434 +2MB
... ...
439 438 437 436 435 434 433 432 440 431 435 +2MB +2MB
... ...
439 438 437 436 435 434 433 432 440 431 436 +2MB +2MB +2MB
... ...
439 438 437 436 435 434 433 432 440 431 437 +2MB +2MB +2MB +2MB
... ...
439 438 437 436 435 434 433 432 440 431 438 +2MB +2MB +2MB +2MB +2MB
... ...
439 438 437 436 435 434 433 432 440 431 439 +2MB +2MB +2MB +2MB +2MB +2MB
... ...
439 438 437 436 435 434 433 432 440 431 440 +2MB +2MB +2MB +2MB +2MB +2MB +2MB
? ?
? ?
? ?
max entropy le: 2*3 + log2(2 * 1) = 7 bits
... ...
15 14 13 12 11 10 9 8 16 7 +1GB 14
... ...
255 254 253 252 251 250 249 248 256 247 255 +512GB
Firefox (on Linux) does not inialize ArrayBuffers, so linux does not allocate space for the actual pages We can allocate huge chunks and use sliding to recover the whole address
Allocang large chunks of memory
Chrome does inialize memory, but jumps ahead in the address space every me it creates a new heap 3rd level address bits can be recovered, 4'th level bits needs chrome to inialize/free up to 4TB :-)
Allocang large chunks of memory
This side-channel was detected on 22 out of 22 tested architectures!
Demo video
Conclusions
protecng against side-channel aacks in favor of adding features :,-(
aacks from Javascript on the Memory Managment Unit to recover ASLR informaon
VUSec
project page: hps://vusec.net/projects/anc