Safely Optimizing Casts between Pointers and Integers Juneyoung Lee - - PowerPoint PPT Presentation

safely optimizing casts between
SMART_READER_LITE
LIVE PREVIEW

Safely Optimizing Casts between Pointers and Integers Juneyoung Lee - - PowerPoint PPT Presentation

EuroLLVM19 SRC Safely Optimizing Casts between Pointers and Integers Juneyoung Lee Seoul National Univ. Chung-Kil Hur MPI-SWS Ralf Jung Zhengyang Liu University of Utah John Regehr Nuno P. Lopes Microsoft Research 1 Overview


slide-1
SLIDE 1

Safely Optimizing Casts between Pointers and Integers

EuroLLVM’19 SRC

Nuno P. Lopes Seoul National Univ. Juneyoung Lee Chung-Kil Hur Zhengyang Liu John Regehr University of Utah Ralf Jung Microsoft Research MPI-SWS

1

slide-2
SLIDE 2

2

Overview

Assembly (x86-64, ARM, ..) LLVM IR Pointer [0, 264) [0, 264) + provenance Integer [0, 264) [0, 264) + ?

slide-3
SLIDE 3

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

We use C syntax for LLVM IR code for readability

Problem: Pointer as a Pure Integer

3

* https://godbolt.org/z/9eNt6w

slide-4
SLIDE 4

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

We use C syntax for LLVM IR code for readability

Problem: Pointer as a Pure Integer

3

* https://godbolt.org/z/9eNt6w

slide-5
SLIDE 5

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

We use C syntax for LLVM IR code for readability

Problem: Pointer as a Pure Integer

3

* https://godbolt.org/z/9eNt6w

slide-6
SLIDE 6

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

We use C syntax for LLVM IR code for readability

Problem: Pointer as a Pure Integer

3

* https://godbolt.org/z/9eNt6w

slide-7
SLIDE 7

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

We use C syntax for LLVM IR code for readability

constant prop.

Problem: Pointer as a Pure Integer

3

* https://godbolt.org/z/9eNt6w

slide-8
SLIDE 8

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

3

* https://godbolt.org/z/9eNt6w

slide-9
SLIDE 9

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x100

3

* https://godbolt.org/z/9eNt6w

slide-10
SLIDE 10

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101

3

* https://godbolt.org/z/9eNt6w

slide-11
SLIDE 11

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101 true

3

* https://godbolt.org/z/9eNt6w

slide-12
SLIDE 12

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101 0x101 true

3

* https://godbolt.org/z/9eNt6w

slide-13
SLIDE 13

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101 0x101 true

10

3

* https://godbolt.org/z/9eNt6w

slide-14
SLIDE 14

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101 0x101 true

10

3

* https://godbolt.org/z/9eNt6w

slide-15
SLIDE 15

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101 0x101 true

10

10

3

* https://godbolt.org/z/9eNt6w

slide-16
SLIDE 16

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

Problem: Pointer as a Pure Integer

0x0

Memory:

  • 0x100

0x101

0x100 0x101 0x101 true

10

10

3

Problem with “pointer as a pure integer” Cannot protect accesses from different blocks

* https://godbolt.org/z/9eNt6w

slide-17
SLIDE 17

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

0x100 0x101

  • 0x100

0x101

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

LLVM’s Solution: Pointers have Provenance

0x0

Memory:

4

* https://godbolt.org/z/9eNt6w

slide-18
SLIDE 18

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

(p,0x100) (q,0x101)

  • 0x100

0x101

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

LLVM’s Solution: Pointers have Provenance

0x0

Memory:

p: -

0x100 0x101

q: 0

Provenance Provenance

4

* https://godbolt.org/z/9eNt6w

slide-19
SLIDE 19

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

(p,0x100) (q,0x101)

  • 0x100

0x101

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

LLVM’s Solution: Pointers have Provenance

0x0

Memory:

p: -

0x100

true

0x101

q: 0

Provenance Provenance

4

* https://godbolt.org/z/9eNt6w

slide-20
SLIDE 20

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

(p,0x100) (q,0x101)

  • 0x100

0x101

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

LLVM’s Solution: Pointers have Provenance

0x0

Memory:

p: -

0x100

(p,0x101) true

0x101

q: 0

Provenance Provenance

4

* https://godbolt.org/z/9eNt6w

slide-21
SLIDE 21

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

(p,0x100) (q,0x101)

  • 0x100

0x101

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

LLVM’s Solution: Pointers have Provenance

0x0

Memory:

p: -

0x100

(p,0x101) true

0x101

q: 0

Provenance Provenance

4

* https://godbolt.org/z/9eNt6w

slide-22
SLIDE 22

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

(p,0x100) (q,0x101)

  • 0x100

0x101

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

constant prop.

LLVM’s Solution: Pointers have Provenance

0x0

Memory:

p: -

0x100

(p,0x101) true

0x101

q: 0

Provenance Provenance

4

Undefined Behavior because p ≠ q

* https://godbolt.org/z/9eNt6w

slide-23
SLIDE 23

5

What about Integers?

Assembly (x86-64, ARM, ..) LLVM IR Pointer [0, 264) [0, 264) + provenance Integer [0, 264) [0, 264) + ?

Casting

slide-24
SLIDE 24

constant prop.

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

Miscompilation with PtrIntCast

6

slide-25
SLIDE 25

constant prop.

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

Miscompilation with PtrIntCast

6

slide-26
SLIDE 26

constant prop.

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); }

Miscompilation with PtrIntCast

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

  • int. eq.

prop. cast elim.

6

slide-27
SLIDE 27

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

  • int. eq.

prop.

7

Miscompilation with PtrIntCast

slide-28
SLIDE 28

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

  • int. eq.

prop.

7

Miscompilation with PtrIntCast

slide-29
SLIDE 29

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

  • int. eq.

prop.

7

Miscompilation with PtrIntCast

slide-30
SLIDE 30

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

cast elim.

7

Miscompilation with PtrIntCast

slide-31
SLIDE 31

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

constant prop.

7

Miscompilation with PtrIntCast

slide-32
SLIDE 32

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

7

Miscompilation with PtrIntCast

slide-33
SLIDE 33

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

10

7

Miscompilation with PtrIntCast

slide-34
SLIDE 34

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

10

7

Miscompilation with PtrIntCast

slide-35
SLIDE 35

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

10

7

Miscompilation with PtrIntCast

slide-36
SLIDE 36

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

10

We found this miscompilation bug in both LLVM & GCC

7

Miscompilation with PtrIntCast

slide-37
SLIDE 37

Which pass is responsible for it?

8

slide-38
SLIDE 38

Problem depends on the model

char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(0); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); }

constant prop.

  • int. eq.

prop. cast elim.

9

Integer with provenance cannot explain Integer without provenance cannot explain

slide-39
SLIDE 39

Integer-With-Provenance Model

  • int. eq.

prop.

10

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); }

slide-40
SLIDE 40

Integer-With-Provenance Model

  • int. eq.

prop.

10

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); }

slide-41
SLIDE 41

Integer-With-Provenance Model

  • int. eq.

prop.

10

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); }

Has provenance q

slide-42
SLIDE 42

Integer-With-Provenance Model

  • int. eq.

prop.

10

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)iq = 10; print(q[0]); } char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); }

Has provenance p Has provenance q

slide-43
SLIDE 43

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

Integer-Without-Provenance Model

11

cast elim.

slide-44
SLIDE 44

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

Integer-Without-Provenance Model

11

cast elim. Provenance p removed

slide-45
SLIDE 45

char p[1],q[1]={0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(char*)(int)(p+1)=10; print(q[0]); } char p[1],q[1] = {0}; int ip = (int)(p+1); int iq = (int)q; if (iq == ip) { *(p+1) = 10; print(q[0]); }

Integer-Without-Provenance Model

11

cast elim. Provenance p removed Provenance p remains

slide-46
SLIDE 46

Integer-With-Provenance is Unnatural

  • Hard to explain integer equality propagation
  • Hard to explain many other transformations as well

12

r = (i + j) - k r = i + (j – k) r = (int)(float)j r = j

slide-47
SLIDE 47

Integer-With-Provenance is Unnatural

  • Hard to explain integer equality propagation
  • Hard to explain many other transformations as well

12

r = (i + j) - k r = i + (j – k) r = (int)(float)j r = j

  • prov. – prov.?
  • prov. + prov.?
slide-48
SLIDE 48

Integer-With-Provenance is Unnatural

  • Hard to explain integer equality propagation
  • Hard to explain many other transformations as well

12

r = (i + j) - k r = i + (j – k) r = (int)(float)j r = j

  • prov. – prov.?
  • prov. + prov.?

provenance in float types?

slide-49
SLIDE 49

13

Our Suggestion [OOPSLA’18]: Integer-Without-Provenance Model

Assembly (x86-64, ARM, ..) LLVM IR Pointer [0, 264) [0, 264) + provenance Integer [0, 264)

[𝟏, 𝟑𝟕𝟓)

slide-50
SLIDE 50

Integer-Without-Provenance Model

  • Semantics of Casts
  • Problematic Optimizations
  • How to Recover Performance?

14

slide-51
SLIDE 51

Semantics of Casts [OOPSLA’18]

  • 1. Pointer-to-integer casts remove provenance
  • 2. Integer-to-pointer casts gain full provenance

15

How to regain protection from unknown accesses? By exploiting nondeterministic allocation How to perform in-bounds checking on full-provenance pointers? By recording in-bounds offsets at the pointer & checking when dereferenced

slide-52
SLIDE 52

Optimizations Unsound in Our Model

1 6

p2 = (char*)(int)p p2 = p c = icmp eq (int)p, (int)q c = icmp eq p, q

  • 1. Cast Elimation
  • 2. Integer Comparison to Pointer Comparison
slide-53
SLIDE 53

Optimizations Unsound in Our Model

1 6

p2 = (char*)(int)p p2 = p c = icmp eq (int)p, (int)q c = icmp eq p, q

  • 1. Cast Elimation
  • 2. Integer Comparison to Pointer Comparison

Full provenance Provenance p

slide-54
SLIDE 54

Optimizations Unsound in Our Model

1 6

p2 = (char*)(int)p p2 = p c = icmp eq (int)p, (int)q c = icmp eq p, q Comparison of integers Comparison of pointers

  • 1. Cast Elimation
  • 2. Integer Comparison to Pointer Comparison

Full provenance Provenance p

slide-55
SLIDE 55

Performance Issue

  • Cast elimination removes significant portion of casts
  • 13% of ptrtoints, 40% of inttoptrs from C/C++ benchmarks *
  • Disabling cast elimination hinders other optimizations
  • ptrtoint makes variables escaped
  • inttoptr is regarded as pointing to an unknown object
  • Disabling cast elimination causes slowdown
  • 1% slowdown in perlbench_r, blender_r

17

* SPEC2017rate + LLVM test-suite, -O3

slide-56
SLIDE 56

Our Solution

  • 1. Do not generate Ptr↔Int casts in the first place
  • 86% of Ptr↔Int casts are introduced by LLVM, not by programmers
  • Ptr  Int casts are generated from pointer subtractions
  • Int  Ptr casts are from canonicalizing loads/stores as int types
  • How: by introducing new features
  • 2. Allow the previous optimizations conditionally
  • How: by developing an analyzer to check such conditions

18

slide-57
SLIDE 57

To reduce PtrInt Casts: Introduce Pointer Subtraction Operation

19

i = psub p, q

psub 𝑞, 𝑟 ≝ ቐ 𝑞 − 𝑟

ip = ptrtoint p iq = ptrtoint q i = ip - iq

Before Fix (Uses ptrtoint) After Fix (Uses psub)

If 𝑞𝑠𝑝𝑤 𝑞 = 𝑞𝑠𝑝𝑤 𝑟 ∨ 𝑞𝑠𝑝𝑤 𝑞 = full ∨ 𝑞𝑠𝑝𝑤 𝑟 = full Otherwise

poison

slide-58
SLIDE 58

To reduce IntPtr Casts: Stop Canonicalizing Loads/Stores as Ints

20

v = load i64* p v2= load i8** p

* https://godbolt.org/z/y48Mkt

slide-59
SLIDE 59

To reduce IntPtr Casts: Stop Canonicalizing Loads/Stores as Ints

20

v = load i64* p v2= load i8** p v = load i64* p v2= inttoptr v

* https://godbolt.org/z/y48Mkt

slide-60
SLIDE 60

To reduce IntPtr Casts: Stop Canonicalizing Loads/Stores as Ints

20

v = load i64* p v2= load i8** p v = load i64* p v2= inttoptr v

* https://godbolt.org/z/y48Mkt

v = load i8** p v2= load i8** p

slide-61
SLIDE 61

To reduce IntPtr Casts: Stop Canonicalizing Loads/Stores as Ints

20

v = load i64* p v2= load i8** p v = load i64* p v2= inttoptr v

* https://godbolt.org/z/y48Mkt

v = load i8** p v2= load i8** p Use ‘d64’ (data type) instead Has Provenance Supports Integer operations d64 Yes No i64 No Yes

Unlike cast between intptr, d64ptr preserves provenance.

slide-62
SLIDE 62

To reduce IntPtr Casts: Stop Canonicalizing Loads/Stores as Ints

20

v = load i64* p v2= load i8** p v = load i64* p v2= inttoptr v

* https://godbolt.org/z/y48Mkt

v = load i8** p v2= load i8** p Use ‘d64’ (data type) instead Has Provenance Supports Integer operations d64 Yes No i64 No Yes

Unlike cast between intptr, d64ptr preserves provenance.

slide-63
SLIDE 63

Conditionally Allowing Cast Elimination

2 1

p2 = inttoptr(ptrtoint p) c = icmp eq/ne p2, q c = icmp eq/ne p, q

  • More examples & descriptions are listed at https://github.com/aqjune/eurollvm19

// p and q have same underlying object p2 = inttoptr(ptrtoint p) c = psub p2, q c = psub p, q

slide-64
SLIDE 64

Evaluation: the # of Casts

Baseline (LLVM 8.0) No Cast Fold Reduce Cast Introduction Conditionally Fold Before O3 # of ptrtoints 44K 44K 14K 14K # of inttoptrs 1.5K 1.5K 1.5K 1.5K After O3 # of ptrtoints 57K 66K 11K 11K # of inttoptrs 29K 45K 5K 4.8K

Disable unsound opts. Add psub, stop load/store to int Conditionally allow cast elim.

  • C/C++ benchmarks of SPEC2017rate + LLVM Nightly Tests used
  • 81% of ptrtoints / 83% of inttoptrs removed (compared to baseline)
slide-65
SLIDE 65
  • 1.00%

0.00% 1.00% 2.00% 3.00% 4.00%

i5-6600 i7-7700

Evaluation: Performance Impact

  • LLVM Nightly Tests (C/C++): ~0.1% avg. slowdown (-1% ~ 3.6%)

23

<SPEC2017rate Speedup>

*Positive number means faster

slide-66
SLIDE 66
  • Provenance helps compiler do more optimizations on pointers
  • Integer with provenance works badly with integer optimizations
  • We suggest separating pointers/integers conceptually
  • We show how to regain performance after removing invalid optimizations

Conclusion

24

https://github.com/aqjune/eurollvm19

slide-67
SLIDE 67
  • Provenance helps compiler do more optimizations on pointers
  • Integer with provenance works badly with integer optimizations
  • We suggest separating pointers/integers conceptually
  • We show how to regain performance after removing invalid optimizations

Conclusion

24

https://github.com/aqjune/eurollvm19

PROGRAM: Name: ptrintload3 ENTRY: v16 = ptrtoint i8* p1 to i16 p2 = inttoptr i16 v16 to i8* v2 = load i8* p2 v1 = load i8* p1 PRECONDS: Instruction "v2 = load i8* p2" has no UB. CHECK: Instruction "v1 = load i8* p1" has no UB? v1 === v2? Result: INCORRECT

We’re updating Alive to support pointer-integer casts! 

slide-68
SLIDE 68

25

slide-69
SLIDE 69

supplementary slides

26

slide-70
SLIDE 70

char p[1],q[1] = {0}; if (foo(p, q)) { //readonly *(p+i) = 10; print(0); }

Constant Propagation and Readonly function

27

char p[1],q[1] = {0}; if (foo(p, q)) { //readonly *(p+i) = 10; print(q[0]); }

constant prop.

slide-71
SLIDE 71

char p[1],q[1] = {0}; if (foo(p, q)) { //readonly *(p+i) = 10; print(0); }

Constant Propagation and Readonly function

27

char p[1],q[1] = {0}; if (foo(p, q)) { //readonly *(p+i) = 10; print(q[0]); }

return (int)(p+1) == (int)q? 1?

constant prop.

slide-72
SLIDE 72

Integer Equality Propagation and Performance

  • Performed by many optimizations
  • CVP, Instruction Simplify, GVN, Loop Exit Value Rewrite, …
  • Reduces code size
  • 10% in minisat, -6% in smg2000, -4% in simple_types_constant_folding, …
  • Boosts performance in small benchmarks
  • x2000 speedup in nestedloop

28

slide-73
SLIDE 73

Sound Optimizations that are already in LLVM

2 9

select (p==null), p, null null // null=(void*)0 gep(p, -(int)q) (void*)((int)p-(int)q)

Rationale It is safe to replace p with (void*)(int)p.

slide-74
SLIDE 74

Delayed Inbounds Checking

30

p = (char*)0x100 // p=(0x100,*) p2 = gep p, 1 // p=(0x101,*) p3 = gep inbounds p, 1 // p = (0x101,*,{0x100,0x101}) load p3 // 0x100, 0x101 should be // in-bounds addrs of the // object at 0x101