Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us - - PowerPoint PPT Presentation

delta pointers buffer overflow checks without the checks
SMART_READER_LITE
LIVE PREVIEW

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us - - PowerPoint PPT Presentation

Delta Pointers: Buffer Overflow Checks Without the Checks Tadde us Kroes & Koen Koning Erik van der Kouwe Herbert Bos Cristiano Giuffrida June 19, 2018 Preview buffer[10] secret 2 Preview buffer[10] secret buffer[5] 2 Preview


slide-1
SLIDE 1

Delta Pointers: Buffer Overflow Checks Without the Checks

Tadde¨ us Kroes & Koen Koning Erik van der Kouwe Herbert Bos Cristiano Giuffrida

June 19, 2018

slide-2
SLIDE 2

Preview

buffer[10] secret

2

slide-3
SLIDE 3

Preview

buffer[10] secret

buffer[5]

2

slide-4
SLIDE 4

Preview

buffer[10] secret

buffer[5] buffer[11]

2

slide-5
SLIDE 5

Preview

buffer[10] secret

buffer[5] buffer[11]

secret

2

slide-6
SLIDE 6

Preview

buffer[10] secret

buffer[5] buffer[11]

2

slide-7
SLIDE 7

Preview

buffer[10] secret

buffer[5] buffer[11]

Automatic! FAST!

  • no branches
  • no mem access

2

slide-8
SLIDE 8

Buffer overflows still very common today

3

slide-9
SLIDE 9

Bounds checking is slow

139%

MPX

94%

SGXBounds

80%

ASan

72%

BaggyBounds

64%

Low-Fat Pointers

% overhead

100 150 50

4

slide-10
SLIDE 10

What is bounds checking?

void foo(char *buffer, size_t n) { buffer[n] = 10; }

5

slide-11
SLIDE 11

What is bounds checking?

void foo(char *buffer, size_t n) { buffer[n] = 10; } Attacker-controlled?

5

slide-12
SLIDE 12

What is bounds checking?

void foo(char *buffer, size_t n) { if (n >= SIZE(buffer)) ERROR("overflow"); buffer[n] = 10; }

A u t

  • m

a t i c a l l y i n s e r t e d

5

slide-13
SLIDE 13

What is bounds checking?

void foo(char *buffer, size_t n) { if (n >= SIZE(buffer)) ERROR("overflow"); buffer[n] = 10; }

Needs metadata

5

slide-14
SLIDE 14

What is bounds checking?

void foo(char *buffer, size_t n) { if (n >= SIZE(buffer)) ERROR("overflow"); buffer[n] = 10; }

Needs metadata Branching check

5

slide-15
SLIDE 15

What is bounds checking?

void foo(char *buffer, size_t n) { if (n >= SIZE(buffer)) ERROR("overflow"); buffer[n] = 10; }

Needs metadata Branching check

Overhead!

5

slide-16
SLIDE 16

What is bounds checking?

void foo(char *buffer, size_t n) { if (n >= SIZE(buffer)) ERROR("overflow"); buffer[n] = 10; }

Needs metadata Branching check

Overhead!

Efficient solution: pointer tagging

5

slide-17
SLIDE 17

What is bounds checking?

void foo(char *buffer, size_t n) { if (n >= SIZE(buffer)) ERROR("overflow"); buffer[n] = 10; }

Needs metadata Branching check

Overhead!

Efficient solution: pointer tagging Still slow

5

slide-18
SLIDE 18

Our approach: Delta Pointers

◮ Use pointer tagging

◮ No memory access for metadata lookup

◮ No need for branches

◮ Delegate checks to (off-the-shelf) hardware instead

◮ Focus on common case: upper bound on x86 64

◮ Mitigates all CVEs reported by related work 6

slide-19
SLIDE 19

Our approach: Delta Pointers

◮ Use pointer tagging

◮ No memory access for metadata lookup

◮ No need for branches

◮ Delegate checks to (off-the-shelf) hardware instead

◮ Focus on common case: upper bound on x86 64

◮ Mitigates all CVEs reported by related work 6

slide-20
SLIDE 20

Our approach: Delta Pointers

◮ Use pointer tagging

◮ No memory access for metadata lookup

◮ No need for branches

◮ Delegate checks to (off-the-shelf) hardware instead

◮ Focus on common case: upper bound on x86 64

◮ Mitigates all CVEs reported by related work 6

slide-21
SLIDE 21

Our approach: Delta Pointers

139%

MPX

94%

SGXBounds

80%

ASan

72%

BaggyBounds

64%

Low-Fat Pointers

35%

Delta Pointers

% overhead

100 150 50

7

slide-22
SLIDE 22

Regular pointers

00 e8 02 0c 40 10 00 00

virtual address ext

48 bit 16 bit

8

slide-23
SLIDE 23

Regular pointers

00 e8 02 0c 40 10 00 00

virtual address ext

48 bit 16 bit

U p p e r 1 6 b i t s m u s t b e z e r

  • 8
slide-24
SLIDE 24

Regular pointers

00 e8 02 0c 40 10 00 00

virtual address ext

48 bit 16 bit

00 e8 02 0c 40 10 00 01

virtual address ext

48 bit 16 bit

Non-canonical, MMU faults!

8

slide-25
SLIDE 25

Tagged pointers

00 e8 02 0c 40 10 12 34

virtual address tag

48 bit 16 bit

E n c

  • d

e i n f

  • r

m a t i

  • n

i n u n u s e d b i t s !

9

slide-26
SLIDE 26

Tagged pointers

02 0c 40 10 12 34 56 78

virtual address tag

32 bit 32 bit

Shrink address space for bigger tags

9

slide-27
SLIDE 27

Delta Pointers

02 0c 40 10 00 00 00 18

virtual address tag

32 bit 32 bit

size Size=24

10

slide-28
SLIDE 28

Delta Pointers

02 0c 40 1c 00 00 00 18

virtual address tag

32 bit 32 bit

size Size=24

W h a t a b

  • u

t i n t e r n a l p

  • i

n t e r s ?

10

slide-29
SLIDE 29

Delta Pointers

02 0c 40 1c 00 00 00 0c

virtual address tag

32 bit 32 bit

distance

Check upper bound for any pointer

Distance=12

10

slide-30
SLIDE 30

Delta Pointers

02 0c 40 1c ff ff ff f4

virtual address tag

32 bit 32 bit

  • distance

10

slide-31
SLIDE 31

Delta Pointers

02 0c 40 1c 7f ff ff f4

virtual address tag

32 bit 32 bit

  • distance
  • verflow bit

Set to 1 if

  • ut-of-bounds

10

slide-32
SLIDE 32

Delta Pointers

02 0c 40 1c 7f ff ff f4

virtual address delta tag

32 bit 32 bit

  • distance
  • verflow bit

Set to 1 if

  • ut-of-bounds

10

slide-33
SLIDE 33

Instrumentation 02 0c 40 10 0 00 00 00 00

char *p = malloc(24);

11

slide-34
SLIDE 34

Instrumentation 02 0c 40 10 0 00 00 00 00

char *p = malloc(24);

Distance=24

11

slide-35
SLIDE 35

Instrumentation 02 0c 40 10 0 00 00 00 00

char *p = malloc(24);

Distance=24

  • distance

0 00 00 00 00

11

slide-36
SLIDE 36

Instrumentation 02 0c 40 10 0 00 00 00 00

char *p = malloc(24);

Distance=24

| (-24 << 32);

0 7f ff ff e8

  • 24

11

slide-37
SLIDE 37

Instrumentation 02 0c 40 27 0 7f ff ff e8

p += 23;

+23

11

slide-38
SLIDE 38

Instrumentation 02 0c 40 27 0 7f ff ff e8

p += 23;

+23

Replicate arithmetic on tag

11

slide-39
SLIDE 39

Instrumentation 02 0c 40 27 0 7f ff ff e8

p += 23;

+23

+ (23 << 32);

+23 0 7f ff ff ff

D i s t a n c e = 1

11

slide-40
SLIDE 40

Instrumentation 02 0c 40 28 1 00 00 00 00

carry

p += 1 + (1 << 32);

+1 +1

Distance=0,

  • verflowed!

11

slide-41
SLIDE 41

Instrumentation 02 0c 40 27 0 7f ff ff ff

p += -1 + (-1 << 32);

  • 1
  • 1

carry

Distance=1, in-bounds again

11

slide-42
SLIDE 42

Instrumentation 02 0c 40 27 0 7f ff ff ff

p += -1 + (-1 << 32);

  • 1
  • 1

carry

Distance=1, in-bounds again

  • ne operation!

11

slide-43
SLIDE 43

Dereferencing an in-bounds pointer

02 0c 40 27 0 7f ff ff ff

12

slide-44
SLIDE 44

Dereferencing an in-bounds pointer

02 0c 40 27 0 7f ff ff ff ff ff ff ff 1 00 00 00 00

&

Strips away distance

12

slide-45
SLIDE 45

Dereferencing an in-bounds pointer

02 0c 40 27 0 7f ff ff ff ff ff ff ff 1 00 00 00 00

&

02 0c 40 27 0 00 00 00 00

12

slide-46
SLIDE 46

Dereferencing an in-bounds pointer

02 0c 40 27 0 7f ff ff ff ff ff ff ff 1 00 00 00 00

&

02 0c 40 27 0 00 00 00 00

Normal (in-bounds) pointer, access OK!

12

slide-47
SLIDE 47

Dereferencing an out-of-bounds pointer

02 0c 40 2c 1 00 00 00 04

13

slide-48
SLIDE 48

Dereferencing an out-of-bounds pointer

02 0c 40 2c 1 00 00 00 04 ff ff ff ff 1 00 00 00 00

&

13

slide-49
SLIDE 49

Dereferencing an out-of-bounds pointer

02 0c 40 2c 1 00 00 00 04 ff ff ff ff 1 00 00 00 00

&

02 0c 40 2c 1 00 00 00 00

13

slide-50
SLIDE 50

Dereferencing an out-of-bounds pointer

02 0c 40 2c 1 00 00 00 04 ff ff ff ff 1 00 00 00 00

&

Non-canonical pointer, MMU faults!

02 0c 40 2c 1 00 00 00 00

13

slide-51
SLIDE 51

Implementation

◮ LLVM based prototype for C/C++ ◮ Stack + heap + globals ◮ 32-bit address → 4GB address space ◮ 31-bit distance → 2GB allocations ◮ Instrument NULL pointer with distance = −1 ◮ Optimizations: omit instrumentation on in-bounds pointers

14

slide-52
SLIDE 52

Pointer tagging breaks things

◮ Uninstrumented libraries

// strdup(ptr); TAG(strdup(MASK(ptr)));

◮ Non-zero NULL pointer ◮ Subtraction, addition, multiplication, vectors, etc. ◮ Incomplete type information (e.g., unions) ◮ Compiler quirks ◮ . . . and more

◮ Solved with TBAA + def-use chain analysis ◮ Details in paper 15

slide-53
SLIDE 53

Pointer tagging breaks things

◮ Uninstrumented libraries

// strdup(ptr); TAG(strdup(MASK(ptr)));

◮ Non-zero NULL pointer ◮ Subtraction, addition, multiplication, vectors, etc. ◮ Incomplete type information (e.g., unions) ◮ Compiler quirks ◮ . . . and more

◮ Solved with TBAA + def-use chain analysis ◮ Details in paper 15

slide-54
SLIDE 54

Pointer tagging breaks things

◮ Uninstrumented libraries

// strdup(ptr); TAG(strdup(MASK(ptr)));

◮ Non-zero NULL pointer ◮ Subtraction, addition, multiplication, vectors, etc. ◮ Incomplete type information (e.g., unions) ◮ Compiler quirks ◮ . . . and more

◮ Solved with TBAA + def-use chain analysis ◮ Details in paper 15

slide-55
SLIDE 55

Pointer tagging breaks things

◮ Uninstrumented libraries

// strdup(ptr); TAG(strdup(MASK(ptr)));

◮ Non-zero NULL pointer ◮ Subtraction, addition, multiplication, vectors, etc. ◮ Incomplete type information (e.g., unions) ◮ Compiler quirks ◮ . . . and more

◮ Solved with TBAA + def-use chain analysis ◮ Details in paper 15

slide-56
SLIDE 56

Pointer tagging breaks things

◮ Uninstrumented libraries

// strdup(ptr); TAG(strdup(MASK(ptr)));

◮ Non-zero NULL pointer ◮ Subtraction, addition, multiplication, vectors, etc. ◮ Incomplete type information (e.g., unions) ◮ Compiler quirks ◮ . . . and more

◮ Solved with TBAA + def-use chain analysis ◮ Details in paper 15

slide-57
SLIDE 57

Pointer tagging breaks things

◮ Uninstrumented libraries

// strdup(ptr); TAG(strdup(MASK(ptr)));

◮ Non-zero NULL pointer ◮ Subtraction, addition, multiplication, vectors, etc. ◮ Incomplete type information (e.g., unions) ◮ Compiler quirks ◮ . . . and more

◮ Solved with TBAA + def-use chain analysis ◮ Details in paper 15

slide-58
SLIDE 58

Evaluation

16

slide-59
SLIDE 59

Nginx

0.1 0.2 0.3 0.4 0.5 0.6 5 10 15 20 25 30 35 40 45 50 55 60 Latency (ms) Throughput (x1000 reqs/s) Baseline Delta Pointers

3-6% (I/O bound)

17

slide-60
SLIDE 60

SPEC CPU2006 (C/C++)

0% 25% 50% 75% 100% 400.perlbench 401.bzip2 403.gcc 429.mcf 433.milc 444.namd 445.gobmk 447.dealII 450.soplex 453.povray 456.hmmer 458.sjeng 462.libquantum 464.h264ref 470.lbm 471.omnetpp 473.astar 482.sphinx3 483.xalancbmk masking tagging arithmetic

35% geomean with optimizations

18

slide-61
SLIDE 61

Is that any good?

19

slide-62
SLIDE 62

Is that any good? Is it better than branches?

19

slide-63
SLIDE 63

Is that any good? Is it better than branches?

Branching implementation: 48% overhead

19

slide-64
SLIDE 64

Is that any good? Is it better than branches?

Branching implementation: 48% overhead > 35%!

19

slide-65
SLIDE 65

Is that any good? Is it better than branches?

Branching implementation: 48% overhead > 35%!

Yes

19

slide-66
SLIDE 66

Conclusion

◮ Reliable pointer tagging implementation ◮ We can check (upper) bounds without checks ◮ Faster than existing solutions

https://github.com/vusec/deltapointers

VUSec VUSec

20

slide-67
SLIDE 67

Related work

System C++ Metadata Checks Passing OoB pointers Non-linear Runtime Memory Softbound ✗ Table Deref ✓ ✓ 67% 64% Baggy Bounds ✗ Layout Arith ✓a ✓ 72% 11% PAriCheck ✗ Shadow Arith ✓ ✓b 96% 18% LBC ✗ Shadow Deref ✓ ✗ 22% 7.7% ASan ✓ Shadow Deref ✓ ✗ 80% 237% Intel MPX ✓ Table Deref ✓ ✓ 139% 90% LowFat ✓ Layout Deref ✗ ✓ 54% 5.2% SGXBounds ✓ Tag Deref ✓ ✓ 89% 0.1% Delta Pointers ✓ Tag — ✓ ✓ 35% 0%

a Only up to alloc size/2 on 32-bit. b Unless wrap-around on 16-bit labels occurs. 21

slide-68
SLIDE 68

Impact of optimization with static analysis

0% 25% 50% 75% 100% 400.perlbench 401.bzip2 403.gcc 429.mcf 433.milc 444.namd 445.gobmk 447.dealII 450.soplex 453.povray 456.hmmer 458.sjeng 462.libquantum 464.h264ref 470.lbm 471.omnetpp 473.astar 482.sphinx3 483.xalancbmk unoptimized

  • ptimized

41% ⇒ 35%

22

slide-69
SLIDE 69

Statistics

◮ 72% of SPEC offsets are dynamic ◮ 80% increase in code size with Delta Pointers

23

slide-70
SLIDE 70

Branching implementation

void foo(int n) { char *buffer = malloc(24); char *p = buffer + n; *p = 'x'; }

24

slide-71
SLIDE 71

Branching implementation

S t

  • r

e e n d p

  • i

n t e r d i s t a n c e i n t a g

void foo(int n) { char *buffer = malloc(24); buffer |= (buffer + 24) << 32; char *p = buffer + n; *p = 'x'; }

24

slide-72
SLIDE 72

Branching implementation

E x t r a c t t a g

  • n

l

  • a

d / s t

  • r

e

void foo(int n) { char *buffer = malloc(24); buffer |= (buffer + 24) << 32; char *p = buffer + n; tag = p >> 32; p = p & 0xffffffff; *p = 'x'; }

24

slide-73
SLIDE 73

Branching implementation

B r a n c h i n g c h e c k

void foo(int n) { char *buffer = malloc(24); buffer |= (buffer + 24) << 32; char *p = buffer + n; tag = p >> 32; p = p & 0xffffffff; if (p >= tag) ERROR("overflow"); *p = 'x'; }

24

slide-74
SLIDE 74

Some pointer tagging challenges

◮ Some operations need masking to preserve semantics

char a[10]; // size_t len = &a[10] - &a[0]; size_t len = MASK (&a[10]) - MASK (&a[0]);

◮ Pointers that look like integers

union { char *buf; uint64_t foo; } field; field.buf += 42; // should instrument field.foo += 42; // should NOT

25