Shuffler: Fast and Deployable Continuous Code Re-Randomization - - PowerPoint PPT Presentation

shuffler fast and deployable continuous code re
SMART_READER_LITE
LIVE PREVIEW

Shuffler: Fast and Deployable Continuous Code Re-Randomization - - PowerPoint PPT Presentation

Shuffler: Fast and Deployable Continuous Code Re-Randomization David Williams-King, Graham Gobieski, Kent Williams-King, James P. Blake, Xinhao Yuan, Patrick Colp, Michelle Zheng, Vasileios P. Kemerlis, Junfeng Yang, William Aiello OSDI 2016


slide-1
SLIDE 1

1

Shuffler: Fast and Deployable Continuous Code Re-Randomization

David Williams-King,

Graham Gobieski, Kent Williams-King, James P. Blake, Xinhao Yuan, Patrick Colp, Michelle Zheng, Vasileios P. Kemerlis, Junfeng Yang, William Aiello

OSDI 2016

slide-2
SLIDE 2

2

Software Remains Vulnerable

  • High-profile server breaches are commonplace
slide-3
SLIDE 3

3

Software Remains Vulnerable

  • High-profile server breaches are commonplace
  • 90% of today’s attacks utilize ROP [1]
slide-4
SLIDE 4

4

Return-Oriented Programming

  • Reuse fragments of legitimate code (gadgets)

func_3 func_2 func_1 func_3 func_2 func_1 Program code ret addr Stack

slide-5
SLIDE 5

5

Return-Oriented Programming

  • Reuse fragments of legitimate code (gadgets)

Program code ret addr Stack

slide-6
SLIDE 6

6

Return-Oriented Programming

  • Reuse fragments of legitimate code (gadgets)

Stack ret addr ret addr ret addr data Buffer Overrun ret addr Program code

slide-7
SLIDE 7

7

Return-Oriented Programming

  • Reuse fragments of legitimate code (gadgets)

ROP gadget chain Stack ret addr ret addr ret addr data Buffer Overrun ret addr Program code

slide-8
SLIDE 8

8

Modern ROP Attacks

  • JIT-ROP [2]: iteratively read code at runtime
slide-9
SLIDE 9

9

Modern ROP Attacks

  • JIT-ROP [2]: iteratively read code at runtime

func_3 func_2 func_1 Target program Attacker func_3 func_2 func_1

slide-10
SLIDE 10

10

Modern ROP Attacks

  • JIT-ROP [2]: iteratively read code at runtime

Target program Attacker func_3 func_2 func_1

slide-11
SLIDE 11

11

Modern ROP Attacks

  • JIT-ROP [2]: iteratively read code at runtime

ROP gadget chain Target program Attacker Inject exploit func_3 func_2 func_1

slide-12
SLIDE 12

12

Modern ROP Attacks

  • JIT-ROP [2]: iteratively read code at runtime

ROP gadget chain Target program Attacker Inject exploit func_3 func_2 func_1

slide-13
SLIDE 13

13

The Shuffler Idea

  • What if we re-randomize code more rapidly

than an attacker discovers gadgets?

func_3 func_2 func_1 func_3 func_2 func_1

slide-14
SLIDE 14

14

The Shuffler Idea

  • What if we re-randomize code more rapidly

than an attacker discovers gadgets?

func_3 func_2 func_1

slide-15
SLIDE 15

15

The Shuffler Idea

  • What if we re-randomize code more rapidly

than an attacker discovers gadgets?

func_3 func_2 func_1 func_3 func_2 func_1

slide-16
SLIDE 16

16

The Shuffler Idea

  • What if we re-randomize code more rapidly

than an attacker discovers gadgets?

ROP gadget chain Inject exploit func_3 func_2 func_1

??

slide-17
SLIDE 17

17

The Shuffler Idea

  • What if we re-randomize code more rapidly

than an attacker discovers gadgets?

ROP gadget chain Inject exploit

slide-18
SLIDE 18

18

How Is This Possible?

  • Re-randomize code before an attacker uses it
slide-19
SLIDE 19

19

How Is This Possible?

  • Re-randomize code before an attacker uses it

– faster than disclosure vulnerability execution time; – faster than gadget chain computation time; – or, faster than network communication time

slide-20
SLIDE 20

20

How Is This Possible?

  • Re-randomize code before an attacker uses it

– faster than disclosure vulnerability execution time; – faster than gadget chain computation time; – or, faster than network communication time

slide-21
SLIDE 21

21

How Is This Possible?

  • Re-randomize code before an attacker uses it

– faster than disclosure vulnerability execution time; – faster than gadget chain computation time; – or, faster than network communication time

  • one memory disclosure can only travel 820 miles!
slide-22
SLIDE 22

22

What Is Shuffler?

  • Defense based on continuous re-randomization

– Defeats all known code reuse attacks – 20-50 millisecond shuffling, scales to 24 threads

  • Fast: bounds attacker’s available time

– Defeats even attackers with zero network latency

  • Deployable:

– Binary analysis w/o modifying kernel, compiler, ...

  • Egalitarian:

– Shuffler runs in same address space, defends itself

slide-23
SLIDE 23

23

Outline

slide-24
SLIDE 24

24

Outline

  • 1. Continuous re-randomization
  • 2. Accelerating our randomization
  • 3. Binary analysis and egalitarianism
  • 4. Results and Demo
slide-25
SLIDE 25

25

func_1 ... call func_2 ...

Continuous Re-Randomization

  • Easy to copy code & fix direct references

func_2 func_2

slide-26
SLIDE 26

26

func_1 ... call func_2 ...

Continuous Re-Randomization

  • Easy to copy code & fix direct references

(deleted) func_2

slide-27
SLIDE 27

27

Continuous Re-Randomization

  • Easy to copy code & fix direct references
  • What about code pointers?
slide-28
SLIDE 28

28

func_1 ... mov $func_2, ptr ... call *ptr ...

Continuous Re-Randomization

  • Easy to copy code & fix direct references
  • What about code pointers?

func_2

ptr:

slide-29
SLIDE 29

29

func_1 ... mov $func_2, ptr ... call *ptr ...

Continuous Re-Randomization

  • Easy to copy code & fix direct references
  • What about code pointers?

func_2 &func_2

ptr:

slide-30
SLIDE 30

30

func_1 ... mov $func_2, ptr ... call *ptr ...

Continuous Re-Randomization

  • Easy to copy code & fix direct references
  • What about code pointers?

func_2 (deleted) func_2 &func_2

ptr:

slide-31
SLIDE 31

31

func_1 ... mov $func_2, ptr ... call *ptr ...

Continuous Re-Randomization

  • Easy to copy code & fix direct references
  • What about code pointers?

&func_2

ptr:

(deleted) func_2

slide-32
SLIDE 32

32

Continuous Re-Randomization

  • Easy to copy code & fix direct references
  • What about code pointers?
  • How to update all

propagated pointers?

&func_2

ptr:

func_2 (deleted) &func_2 &func_2 &func_2 &func_2 &func_2 &func_2 &func_2 func_2

slide-33
SLIDE 33

33

Continuous Re-Randomization

  • Solution: add extra level of indirection

f_2_idx

ptr:

func_2 f_2_idx f_2_idx f_2_idx f_2_idx ...

%gs: (table)

... &func_2 ...

slide-34
SLIDE 34

34

Continuous Re-Randomization

  • Solution: add extra level of indirection

f_2_idx

ptr:

func_2 f_2_idx f_2_idx f_2_idx f_2_idx ...

%gs: (table)

... &func_2 ... f_2_idx f_2_idx f_2_idx

slide-35
SLIDE 35

35

Continuous Re-Randomization

  • Solution: add extra level of indirection

f_2_idx

ptr:

func_2 f_2_idx f_2_idx f_2_idx f_2_idx ...

%gs: (table)

... &func_2 ... f_2_idx f_2_idx f_2_idx func_2

slide-36
SLIDE 36

36

Continuous Re-Randomization

  • Solution: add extra level of indirection

f_2_idx

ptr:

f_2_idx f_2_idx f_2_idx f_2_idx ...

%gs: (table)

... &func_2 ... f_2_idx f_2_idx f_2_idx func_2 (deleted)

slide-37
SLIDE 37

37

Code Pointer Abstraction

  • Transforming *code_ptr into **code_ptr

– Correctness: pointer updates sound & precise – Disclosure-resilience: code ptr table is hidden

slide-38
SLIDE 38

38

Code Pointer Abstraction

  • Transforming *code_ptr into **code_ptr

– Correctness: pointer updates sound & precise – Disclosure-resilience: code ptr table is hidden

f_2_idx

ptr:

func_2 func_2 ...

%gs:

...

slide-39
SLIDE 39

39

Code Pointer Abstraction

  • Transforming *code_ptr into **code_ptr

– Correctness: pointer updates sound & precise – Disclosure-resilience: code ptr table is hidden

f_2_idx

ptr:

func_2 func_2 ...

%gs:

...

mov $0x40054d, %rax => mov $0x20, %rax

Rewrite initialization points Rewrite call sites

callq *%rax => callq *%gs:(%rax)

slide-40
SLIDE 40

40

Outline

  • 1. Continuous re-randomization
  • 2. Accelerating our randomization
  • 3. Binary analysis and egalitarianism
  • 4. Results and Demo
slide-41
SLIDE 41

41

Return Address Encryption

  • Return addresses are code pointers too
  • Could use code pointer table, but inefficient

– call/ret instructions highly optimized

slide-42
SLIDE 42

42

Return Address Encryption

  • Return addresses are code pointers too
  • Could use code pointer table, but inefficient

– call/ret instructions highly optimized

  • Alternative mechanism – correct and hidden

– Use normal call instructions – Encrypt return addresses with XOR key

slide-43
SLIDE 43

43

Return Address Encryption

  • Prevent return address disclosure
slide-44
SLIDE 44

44

Return Address Encryption

  • Prevent return address disclosure

Thread Stack ret addr func_2 func_1 ret addr ret addr func_3

slide-45
SLIDE 45

45

Return Address Encryption

  • Prevent return address disclosure

Thread Stack (encrypted) func_2 func_1 (encrypted) (encrypted) func_3

+ + +

XOR key

slide-46
SLIDE 46

46

Return Address Encryption

  • Prevent return address disclosure

func: ; original code ret

Thread Stack (encrypted) func_2 func_1 (encrypted) (encrypted) func_3

+ + +

XOR key

slide-47
SLIDE 47

47

Return Address Encryption

  • Prevent return address disclosure
  • We use binary rewriting (expand basic blocks)

func: mov %fs:0x28,%r11 xor %r11,(%rsp) ; original code mov %fs:0x28,%r11 xor %r11,(%rsp) ret

Thread Stack (encrypted) func_2 func_1 (encrypted) (encrypted) func_3

+ + +

XOR key

slide-48
SLIDE 48

48

Return Address Migration

  • Unwind stack and re-encrypt new addresses

Thread Stack (encrypted) func_2 func_1 (encrypted) (encrypted)

+ + +

XOR key func_3

slide-49
SLIDE 49

49

Return Address Migration

  • Unwind stack and re-encrypt new addresses

Thread Stack func_2 func_1 func_2 func_1

+ + +

XOR key func_3 func_3 (encrypted) (encrypted) (encrypted)

slide-50
SLIDE 50

50

Return Address Migration

  • Unwind stack and re-encrypt new addresses

Thread Stack (deleted) (deleted) func_2 func_1

+ + +

XOR key (deleted) func_3 (encrypted) (encrypted) (encrypted)

slide-51
SLIDE 51

51

Asynchronous Randomization

slide-52
SLIDE 52

52

Asynchronous Randomization

Computations 20ms shuffle period

  • Creating new code copies takes time
slide-53
SLIDE 53

53

Asynchronous Randomization

  • Creating new code copies takes time

Computations Generate permutation Make new code copy Fix call instructions Update code pointer table Stack unwind 15ms shuffling overhead 5ms real work

slide-54
SLIDE 54

54

Asynchronous Randomization

5ms real work

  • Creating new code copies takes time
  • Shuffler prepares new code asynchronously

Generate permutation Make new code copy Fix call instructions Update code pointer table Stack unwind 15ms shuffling overhead Computations

slide-55
SLIDE 55

55

Asynchronous Randomization

  • Creating new code copies takes time
  • Shuffler prepares new code asynchronously

Stack unwind Stack unwind 19.94ms real work 0.06ms Computations Computations Generate permutation Make new code copy Fix call instructions Update code pointer table

slide-56
SLIDE 56

56

Asynchronous Randomization

  • Creating new code copies takes time
  • Shuffler prepares new code asynchronously
  • Each thread unwinds its own stack in parallel

99.7% of runtime 0.3%

Computations Generate permutation Make new code copy Fix call instructions Update code pointer table Stack unwind Stack unwind Computations

slide-57
SLIDE 57

57

Outline

  • 1. Continuous re-randomization
  • 2. Accelerating our randomization
  • 3. Binary analysis and egalitarianism
  • 4. Results and Demo
slide-58
SLIDE 58

58

Augmented Binary Analysis

  • Use additional info from unmodified compilers

– Symbols, to distinguish code and data (no -s) – Relocations, to find all code pointers (--emit-relocs)

slide-59
SLIDE 59

59

Augmented Binary Analysis

  • Use additional info from unmodified compilers

– Symbols, to distinguish code and data (no -s) – Relocations, to find all code pointers (--emit-relocs)

.section .rodata: .quad 0x400620 .section .text: mov $0x400620, %rax

Code pointer, or integer?

slide-60
SLIDE 60

60

Augmented Binary Analysis

  • Use additional info from unmodified compilers

– Symbols, to distinguish code and data (no -s) – Relocations, to find all code pointers (--emit-relocs)

.section .rodata: .quad 0x400620 .section .text: mov $0x400620, %rax .section .rodata: .quad 4195872 .section .text: mov $4195872, %rax

Code pointer, or integer?

slide-61
SLIDE 61

61

Augmented Binary Analysis

  • Use additional info from unmodified compilers

– Symbols, to distinguish code and data (no -s) – Relocations, to find all code pointers (--emit-relocs)

.section .rodata: .quad 0x400620 .section .text: mov $0x400620, %rax

Code pointer, or integer? Relocations (meta-data)

.section .rodata: .quad 4195872 .section .text: mov $4195872, %rax

slide-62
SLIDE 62

62

Augmented Binary Analysis

  • Use additional info from unmodified compilers

– Symbols, to distinguish code and data (no -s) – Relocations, to find all code pointers (--emit-relocs)

  • ask linker to preserve relocations

.section .rodata: .quad 0x400620 .section .text: mov $0x400620, %rax

Code pointer, or integer? Relocations (meta-data)

.section .rodata: .quad 4195872 .section .text: mov $4195872, %rax

slide-63
SLIDE 63

63

Augmented Binary Analysis

  • Allows accurate and complete disassembly
slide-64
SLIDE 64

64

Augmented Binary Analysis

  • Allows accurate and complete disassembly
  • Many special cases, but we handle them
slide-65
SLIDE 65

65

Where to Re-Randomize From

  • Most defenses operate at higher privilege level

– i.e. kernel, hypervisor, hardware – Or else declare their own code “trusted”

slide-66
SLIDE 66

66

Where to Re-Randomize From

  • Most defenses operate at higher privilege level

– i.e. kernel, hypervisor, hardware – Or else declare their own code “trusted”

  • Shuffler is egalitarian

– Same level of privilege, no system modifications – Defends itself from attack

slide-67
SLIDE 67

67

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

slide-68
SLIDE 68

68

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

mov 0x400620(,%rax,8),%rax jmpq *%rax 0x400620: 0x400508 0x400514 0x400630: 0x400520 0x40052c 0x400640: 0x400538 0x400544

memcpy’s code

slide-69
SLIDE 69

69

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

Rewrite main, printf, ..., memcpy, ...

mov 0x400620(,%rax,8),%rax jmpq *%rax

memcpy’s code

0x400620: 0x400508 0x400514 0x400630: 0x400520 0x40052c 0x400640: 0x400538 0x400544

slide-70
SLIDE 70

70

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

Rewrite main, printf, ..., memcpy, ...

mov 0x400620(,%rax,8),%rax jmpq *%rax 0x400620: 0x20 0x28 0x400630: 0x30 0x88 0x400640: 0x40 0x48

memcpy’s code

mov 0x400620(,%rax,8),%rax jmpq *%gs:(%rax)

New memcpy code Invalidates memcpy jump table But rewrite process uses (old) memcpy

slide-71
SLIDE 71

71

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

Rewrite main, printf, ..., memcpy, ...

mov 0x400620(,%rax,8),%rax jmpq *%rax 0x400620: 0x20 0x28 0x400630: 0x30 0x88 0x400640: 0x40 0x48

memcpy’s code

mov 0x400620(,%rax,8),%rax jmpq *%gs:(%rax)

New memcpy code

??

Invalidates memcpy jump table But rewrite process uses (old) memcpy

slide-72
SLIDE 72

72

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler
slide-73
SLIDE 73

73

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

Shuffler stage 1 Shuffler stage 2 Other libraries C library Program Loader

loads rewrites

slide-74
SLIDE 74

74

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

Shuffler stage 1 Shuffler stage 2 Other libraries C library Program Loader

invokes

slide-75
SLIDE 75

75

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

Shuffler stage 1 Shuffler stage 2 Other libraries C library Program Loader

erases erases

slide-76
SLIDE 76

76

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

– Make new copies

Shuffler stage 2 Other libraries C library Program Shuffler stage 2 Other libraries C library Program

slide-77
SLIDE 77

77

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

– Make new copies

Shuffler stage 2 Other libraries C library Program

slide-78
SLIDE 78

78

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

– Make new copies

Shuffler stage 2 Other libraries C library Program Shuffler stage 2 Other libraries C library Program

slide-79
SLIDE 79

79

Egalitarian Bootstrapping

  • Problem: transformations break original code

– e.g. memcpy uses code pointers

  • Solution: use two copies of Shuffler

– Make new copies

Shuffler stage 2 Other libraries C library Program

slide-80
SLIDE 80

80

Outline

  • 1. Continuous re-randomization
  • 2. Accelerating our randomization
  • 3. Binary analysis and egalitarianism
  • 4. Results and Demo
slide-81
SLIDE 81

81

Performance Evaluation

  • SPEC CPU overhead at 50ms = 14.9%
slide-82
SLIDE 82

82

Performance Evaluation

  • SPEC CPU overhead at 50ms = 14.9%
  • Multiprocess Nginx up to 24 workers
slide-83
SLIDE 83

83

Security Evaluation

  • Two disclosure-based attack methodologies:

– Scan many pages for the desired gadgets

  • impacted by disclosure time, network latency

– Explore gadget space in small number of pages

  • impacted by ROP chain computation time (> 40 seconds)
slide-84
SLIDE 84

84

Security Evaluation

  • Two disclosure-based attack methodologies:

– Scan many pages for the desired gadgets

  • impacted by disclosure time, network latency

– Explore gadget space in small number of pages

  • impacted by ROP chain computation time (> 40 seconds)
  • Published JIT-ROP takes 2300-378000 ms
  • We can re-randomize typically every 20-50 ms
slide-85
SLIDE 85

85

Demo

slide-86
SLIDE 86

86

slide-87
SLIDE 87

87

Conclusion

  • Continuous re-randomization every 20-50 ms
slide-88
SLIDE 88

88

Conclusion

  • Continuous re-randomization every 20-50 ms
  • Fast:

– Defeats all known code reuse attacks – Asynchronous shuffling offloads overhead

  • Deployable:

– Binary analysis w/o modifying kernel, compiler, ...

  • Egalitarian:

– No additional privileges required – Shuffler defends its own code

slide-89
SLIDE 89

Questions?

Demo website: http://shuffled.elfery.net:8000

slide-90
SLIDE 90

90

Related Work

  • JIT-ROP, SOSP 2013
  • Oxymoron, Usenix Sec 2014
  • Code Pointer Integrity, OSDI 2014
  • Stabilizer, SIGARCH 2013
  • Remix, CODASPY 2016
  • TASR, CCS 2015
  • ...more related work in our paper

[1] https://securityintelligence.com/anti-rop-a-moving-target-defense/ [2] http://www.ieee-security.org/TC/SP2013/papers/4977a574.pdf

slide-91
SLIDE 91

91

Future Work

  • Translating stack unwind information

– Breaks C++ exceptions, pthread_cancel, etc.

  • Cannot shuffle the loader currently

– Breaks dlopen

  • If shuffling takes too long, no mechanism to

pause target program

slide-92
SLIDE 92

92

Shuffler Thread Performance

  • Asynchronous shuffling runs quickly
  • Synchronous runtime is 0.3% of total runtime
slide-93
SLIDE 93

93

Scalability

  • Tradeoff for server workers

– Multithreaded => better performance overhead – Multiprocess => no disclosures across workers

  • Both techniques scale well in practice (up to 24x)

unw unw Computations unw Computations unw Computations

Multithreaded program

unw unw Computations unw Computations

Multiprocess program

unw

n Shuffler threads 1 common Shuffler thread