Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim - - PowerPoint PPT Presentation

taming undefined behavior
SMART_READER_LITE
LIVE PREVIEW

Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim - - PowerPoint PPT Presentation

PLDI 2017 Barcelona Taming Undefined Behavior in LLVM Juneyoung Lee Yoonseung Kim Seoul National Univ. Youngju Song Chung-Kil Hur Sanjoy Das Azul Systems Google David Majnemer University of Utah John Regehr Nuno P. Lopes Microsoft


slide-1
SLIDE 1

Taming Undefined Behavior in LLVM

Nuno P. Lopes

PLDI 2017 Barcelona

Seoul National Univ. Juneyoung Lee Yoonseung Kim Youngju Song Chung-Kil Hur Azul Systems Sanjoy Das Google John Regehr University of Utah David Majnemer Microsoft Research

slide-2
SLIDE 2

/ 21

What this talk is about

  • A compiler IR (Intermediate Representation)

can be designed to allow more optimizations by supporting “undefined behaviors (UBs)”

  • LLVM IR’s UB model
  • Complicated
  • Invalidates some textbook optimizations
  • Our new UB model
  • Simpler
  • Can validate textbook optimizations (and more)

2

slide-3
SLIDE 3

/ 21

Undefined Behavior (UB) & Problems

3

slide-4
SLIDE 4

/ 21

int* p int a int b

Peephole Optimization

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

4

Motivation for UB

slide-5
SLIDE 5

/ 21

int* p int a int b

Peephole Optimization

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

0x100 0x100 0xFFFFFF00

4

Motivation for UB

slide-6
SLIDE 6

/ 21

int* p int a int b

Peephole Optimization

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

0x0 (Overflow!)

0x100 0x100 0xFFFFFF00

4

Motivation for UB

slide-7
SLIDE 7

/ 21

int* p int a int b

Peephole Optimization

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

false

0x0 (Overflow!)

0x100 0x100 0xFFFFFF00

4

Motivation for UB

slide-8
SLIDE 8

/ 21

int* p int a int b

Peephole Optimization

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

false true

0x0 (Overflow!)

0x100 0x100 0xFFFFFF00

4

Motivation for UB

slide-9
SLIDE 9

/ 21

int* p int a int b

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior

Peephole Optimization

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

false true

0x0 (Overflow!)

0x100 0x100 0xFFFFFF00

4

UB

Motivation for UB

slide-10
SLIDE 10

/ 21

Loop Invariant Code Motion

5

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior

Problems with UB

slide-11
SLIDE 11

/ 21

Loop Invariant Code Motion

5

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

0xFFFFFF00 0xFFFFFF00

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior

Problems with UB

slide-12
SLIDE 12

/ 21

Loop Invariant Code Motion

5

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

0xFFFFFF00 0xFFFFFF00 Overflow!

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior

Problems with UB

slide-13
SLIDE 13

/ 21

Loop Invariant Code Motion

5

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

0xFFFFFF00 0xFFFFFF00 Overflow!

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior

UB

Problems with UB

slide-14
SLIDE 14

/ 21

Existing Approaches

6

slide-15
SLIDE 15

/ 21

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior

Poison Value: A Deferred UB

7

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

0xFFFFFF00 0xFFFFFF00 Overflow!

UB

slide-16
SLIDE 16

/ 21

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value”

Poison Value: A Deferred UB

7

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

0xFFFFFF00 0xFFFFFF00 Overflow! poison

UB

slide-17
SLIDE 17

/ 21

Simple UB Model: Pointer Arithmetic Overflow is Undefined Behavior LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value”

Poison Value: A Deferred UB

7

... for(i=0; i<n; ++i) { a[i] = p + 0x100 } q = p + 0x100 for(i=0; i<n; ++i) { a[i] = q }

IR IR

0xFFFFFF00 0xFFFFFF00 Overflow! poison

UB

slide-18
SLIDE 18

/ 21

LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value”

Poison Value: A Deferred UB

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

0xFFFFFF00 0x100 0x100 0

8

UB

0x0 (Overflow!)

slide-19
SLIDE 19

/ 21

LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value”

Poison Value: A Deferred UB

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

0xFFFFFF00 0x100 0x100 0

8

UB

0x0 (Overflow!)

Poison

slide-20
SLIDE 20

/ 21

LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value”

Poison Value: A Deferred UB

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

0xFFFFFF00 0x100 0x100 0

8

UB

0x0 (Overflow!)

Poison

slide-21
SLIDE 21

/ 21

LLVM’s UB Model: Pointer Arithmetic Overflow is A Poison “Value”

Poison Value: A Deferred UB

  • utput(p + a > p + b)
  • utput(a > b)

IR IR

0xFFFFFF00 0x100 0x100 0

8

UB UB

0x0 (Overflow!)

Poison

slide-22
SLIDE 22

/ 21

Summary of Poison

9

p a p b + + >

  • utput

0xFFFFFF00 0x100

slide-23
SLIDE 23

/ 21

Summary of Poison

9

p a p b + + >

  • utput

0xFFFFFF00 0x100

poison

slide-24
SLIDE 24

/ 21

Summary of Poison

9

p a p b + + >

  • utput

0xFFFFFF00 0x100

poison poison

Propagate

slide-25
SLIDE 25

/ 21

Summary of Poison

9

p a p b + + >

  • utput

0xFFFFFF00 0x100

poison poison

Propagate Raise UB

UB

slide-26
SLIDE 26

/ 21

Summary of Poison

9

p a p b + + >

  • utput

0xFFFFFF00 0x100

poison poison

Propagate Raise UB

“Poison is Sometimes Too Poisonous” UB

slide-27
SLIDE 27

/ 21

LLVM’s UB Model: Branching on poison is ???

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

Global Value Numbering (GVN)

Problems with LLVM’s UB

slide-28
SLIDE 28

/ 21

LLVM’s UB Model: Branching on poison is ???

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

slide-29
SLIDE 29

/ 21

LLVM’s UB Model: Branching on poison is ???

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

slide-30
SLIDE 30

/ 21

LLVM’s UB Model: Branching on poison is ???

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

slide-31
SLIDE 31

/ 21

LLVM’s UB Model: Branching on poison is ???

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison poison poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

slide-32
SLIDE 32

/ 21

LLVM’s UB Model: Branching on poison is ???

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison poison poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

UB

slide-33
SLIDE 33

/ 21

LLVM’s UB Model: Branching on poison is ??? LLVM’s UB Model: Branching on poison is Undefined Behavior

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison poison poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

UB

slide-34
SLIDE 34

/ 21

LLVM’s UB Model: Branching on poison is ??? LLVM’s UB Model: Branching on poison is Undefined Behavior

10

if (x == y) { .. use x .. } if (x == y) { .. use y .. }

poison poison poison poison poison

Global Value Numbering (GVN)

Problems with LLVM’s UB

UB UB

slide-35
SLIDE 35

/ 21

Loop Unswitching (LU)

11

while (n > 0) { if (cond) A else B } if (cond) while (n > 0) { A } else while (n > 0) { B }

LLVM’s UB Model: Branching on poison is Undefined Behavior

Problems with LLVM’s UB

slide-36
SLIDE 36

/ 21

Loop Unswitching (LU)

11

while (n > 0) { if (cond) A else B } if (cond) while (n > 0) { A } else while (n > 0) { B }

poison poison LLVM’s UB Model: Branching on poison is Undefined Behavior

Problems with LLVM’s UB

slide-37
SLIDE 37

/ 21

Loop Unswitching (LU)

11

while (n > 0) { if (cond) A else B } if (cond) while (n > 0) { A } else while (n > 0) { B }

poison poison LLVM’s UB Model: Branching on poison is Undefined Behavior

Problems with LLVM’s UB

UB

slide-38
SLIDE 38

/ 21

Inconsistency in LLVM

  • GVN + LU is inconsistent.
  • We found a miscompilation bug in LLVM

due to the inconsistency (LLVM Bugzilla 31652).

  • It is being discussed in the community
  • No solution has been found yet

12

slide-39
SLIDE 39

/ 21

Our Approach

13

slide-40
SLIDE 40

/ 21

Overview

14

Existing Approaches

Defined values

  • Undef. values

Poison values

Can’t Control Poison

GVN + LU

More Defined Complex Inconsistent

UB

slide-41
SLIDE 41

/ 21

Overview

14

𝒈𝒔𝒇𝒇𝒜𝒇

Existing Approaches Our Approach

Defined values

  • Undef. values

Poison values Defined values Poison values

Can’t Control Poison

GVN + LU

More Defined Complex Inconsistent Simpler

UB UB

slide-42
SLIDE 42

/ 21

Overview

14

𝒈𝒔𝒇𝒇𝒜𝒇

Existing Approaches Our Approach

Defined values

  • Undef. values

Poison values Defined values Poison values

Can’t Control Poison

GVN + LU

More Defined Can Control Poison Complex Inconsistent Simpler

UB UB

slide-43
SLIDE 43

/ 21

Overview

14

𝒈𝒔𝒇𝒇𝒜𝒇

Existing Approaches Our Approach

Defined values

  • Undef. values

Poison values Defined values Poison values

Can’t Control Poison

GVN + LU

More Defined Can Control Poison Complex Inconsistent Simpler Consistent

UB UB

slide-44
SLIDE 44

/ 21

Key Idea: “Freeze”

  • Introduce a new instruction
  • Semantics:

15

y = freeze x When x is a defined value: When x is a poison value: freeze x freeze x

1 2 . . .

x

  • Nondet. Choice of

A Defined Value

slide-45
SLIDE 45

/ 21

Our UB Model: Branching on poison is Undefined Behavior poison

if (freeze(cond)) while (n > 0) { A } else while (n > 0) { B } (cond)

16

while (n > 0) { if (cond) A else B }

Our Solution

Loop Unswitching

UB

slide-46
SLIDE 46

/ 21

Our UB Model: Branching on poison is Undefined Behavior

if (freeze(cond)) while (n > 0) { A } else while (n > 0) { B }

poison

16

while (n > 0) { if (cond) A else B }

Our Solution

Loop Unswitching

UB

slide-47
SLIDE 47

/ 21

Our UB Model: Branching on poison is Undefined Behavior

if (freeze(cond)) while (n > 0) { A } else while (n > 0) { B }

poison

16

while (n > 0) { if (cond) A else B }

true false

Our Solution

Loop Unswitching

UB

slide-48
SLIDE 48

/ 21

Our UB Model: Branching on poison is Undefined Behavior

if (freeze(cond)) while (n > 0) { A } else while (n > 0) { B }

poison

16

while (n > 0) { if (cond) A else B }

true false

Our Solution

Loop Unswitching

UB

slide-49
SLIDE 49

/ 21

Summary of Freeze

  • Branching on freeze(poison) => Nondet.
  • Used for Loop Unswitching
  • Branching on poison => UB
  • Used for Global Value Numbering

17

Compilers can control poison!

slide-50
SLIDE 50

/ 21

Summary of Freeze

  • Branching on freeze(poison) => Nondet.
  • Used for Loop Unswitching
  • Branching on poison => UB
  • Used for Global Value Numbering

17

Compilers can control poison!

Freeze can also fix many other UB-related problems.

slide-51
SLIDE 51

/ 21

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

slide-52
SLIDE 52

/ 21

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison

slide-53
SLIDE 53

/ 21

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison poison

slide-54
SLIDE 54

/ 21

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison poison

slide-55
SLIDE 55

/ 21

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison

UB

poison poison

slide-56
SLIDE 56

/ 21

LLVM does not currently support it.

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison

UB

poison poison

slide-57
SLIDE 57

/ 21

LLVM does not currently support it.

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t)

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison

slide-58
SLIDE 58

/ 21

LLVM does not currently support it.

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t) freeze(x) | 0x1

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison

slide-59
SLIDE 59

/ 21

LLVM does not currently support it.

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t) freeze(x) | 0x1

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison

A defined value

slide-60
SLIDE 60

/ 21

LLVM does not currently support it.

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t) freeze(x) | 0x1

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison

A defined value non-zero

slide-61
SLIDE 61

/ 21

LLVM does not currently support it. Freeze can make LLVM support it!

// bitwise-or k = x | 0x1 t = 100 / k while (n > 0) use(t) freeze(x) | 0x1

18

// bitwise-or k = x | 0x1 while (n > 0) use(100 / k)

Hoisting Division

Further Example

poison poison poison

A defined value non-zero

slide-62
SLIDE 62

/ 21

Implementation

  • Target: LLVM 4.0 RC 4 (Mar. 2017)
  • Add Freeze instruction to LLVM IR
  • Bug Fixes Using Freeze
  • Loop Unswitching Optimization
  • C Bitfield Translation to LLVM IR
  • InstCombine Optimizations

19

* More details are given in the paper

slide-63
SLIDE 63

/ 21

Experiment Results

  • Benchmarks (4.6M LOC):
  • SPEC CPU2006
  • LLVM Nightly Test
  • Large Single File Benchmarks
  • Compilation Time: ± 1%
  • Compilation Memory Usage: Max + 2%
  • Generated Code Size: ± 0.5%
  • Execution Time: ± 3%

20

* More details are given in the paper

slide-64
SLIDE 64

/ 21

Experiment Results

  • Benchmarks (4.6M LOC):
  • SPEC CPU2006
  • LLVM Nightly Test
  • Large Single File Benchmarks
  • Compilation Time: ± 1%
  • Compilation Memory Usage: Max + 2%
  • Generated Code Size: ± 0.5%
  • Execution Time: ± 3%

20

* More details are given in the paper

“Freeze” Can Fix UB Semantics Without Significant Performance Penalty

slide-65
SLIDE 65

/ 21

Conclusion

  • Modern compilers’ UB models cannot support

some textbook optimizations.

  • We propose “freeze” to fix such problems.
  • Freeze has little impact on performance.

21