SaBRE Load-time selective binary rewriting Paul-Antoine Arras , - - PowerPoint PPT Presentation

sabre
SMART_READER_LITE
LIVE PREVIEW

SaBRE Load-time selective binary rewriting Paul-Antoine Arras , - - PowerPoint PPT Presentation

FOSDEM 2020 SaBRE Load-time selective binary rewriting Paul-Antoine Arras , Anastasios Andronidis, Lus Pina, Karolis Mituzas, Qianyi Shu, Daniel Grumberg, Cristian Cadar Software Reliability Group, Imperial College London How resilient is my


slide-1
SLIDE 1

SaBRE

Load-time selective binary rewriting

Paul-Antoine Arras, Anastasios Andronidis, Luís Pina, Karolis Mituzas, Qianyi Shu, Daniel Grumberg, Cristian Cadar

Software Reliability Group, Imperial College London

FOSDEM 2020

slide-2
SLIDE 2

How resilient is my software?

  • Assess fault tolerance
  • E.g. disk full, memory exhausted
  • Hard to reproduce on real system
  • Can we simulate a fault?
  • Yes, but...
  • Kernel hacking is dangerous
  • Tinkering with libraries can also be painful
  • What’s in between?

2

slide-3
SLIDE 3

Hello world

3

Python print('Hello, world!') User code Library Operating system User space Kernel space System call interface

slide-4
SLIDE 4

System call interface

  • Set of low-level operations
  • Request a service
  • Very similar to function call

○ Several arguments ○ One result

4

User space Kernel space System call interface Python print('Hello, world!') System call (C syntax) write(1, “Hello, world!”, 13)

slide-5
SLIDE 5

System call errors

  • Return value

○ ≥ 0 → success ○ < 0 → failure

  • write

○ Size written ○ E.g. permission denied (EPERM), disk full (ENOSPC)

5

User space Kernel space System call interface Python file = open(“/tmp/hello”, “w”) file.write(“Hello!”) System call

  • pen(“/tmp/hello”, “w”) = 8

write(8, “Hello!”, 6) = 6 write(8, “Hello!”, 6) = EPERM < 0

slide-6
SLIDE 6

Fault injection

  • How to simulate e.g. a permission error at system call level?
  • Swap return value with error code

6

write(8, “Hello!”, 6) = 6 write(8, “Hello!”, 6) = EPERM

How to achieve that?

slide-7
SLIDE 7

Binary rewriting

7

slide-8
SLIDE 8

What is binary rewriting?

  • Modify program at machine code level
  • No source code needed
  • Does not require recompilation
  • Only requirement: disassembling

○ Break program into sequence of instructions ○ Assume done for now

8

0 1 0 0 1 0 0 1 0 0 1

Binary push R0 load 0x14,R0 call fnct

  • r 0x67,R2

Disassembly Disassembling

slide-9
SLIDE 9

What is binary rewriting?

  • Modify program at machine code level
  • No source code needed
  • Does not require recompilation
  • Only requirement: disassembling

○ Break program into sequence of instructions ○ Assume done for now

9

0 1 0 0 1 0 0 1 0 0 1

Binary push R0 load 0x14,R0 call fnct

  • r 0x67,R2

Disassembly Disassembling

slide-10
SLIDE 10

Disassembly

Offset Size in bytes Human-readable instruction (mnemonic + operands) 0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x0b 2 jump L1 0x0d 3 and 0x45,R2 0x10 5 jump L2 …

slide-11
SLIDE 11
  • Remove
  • Replace
  • Add

Operations on instructions

slide-12
SLIDE 12

Remove

Pad with nops

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 1 nop 0x02 1 nop 0x03 5 call fnct 0x08 3

  • r 0x67,R2
slide-13
SLIDE 13

Replace

Call → jump

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 ? jump 0x?? 3

  • r 0x67,R2
slide-14
SLIDE 14

Size matters

  • Shifting instructions is impractical

○ Jumps become invalid ○ Addresses have to be recomputed

  • Do rewritten instructions fit?
  • Compare instruction sizes

○ Original S(O) ○ Rewritten S(R)

slide-15
SLIDE 15

Replace

Call → jump

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 ? jump 0x?? 3

  • r 0x67,R2

S(O) = 5 S(R) = ?

slide-16
SLIDE 16

Replace

Call → jump

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 jump 0x08 3

  • r 0x67,R2

S(O) = S(R)

slide-17
SLIDE 17

Replace

Call → jump

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 3 jump 0x06 1 nop 0x07 1 nop 0x08 3

  • r 0x67,R2

S(O) ≤ S(R)

slide-18
SLIDE 18

Replace depends on relative sizes

  • If S(R) = S(O) → just replace
  • If S(R) < S(O) → pad with nops
  • If S(R) > S(O) → ???
slide-19
SLIDE 19

Problem How to fit larger instructions?

19

slide-20
SLIDE 20

Detour

  • Problem: S(R) ≥ S(O)
  • Shifting instructions still not an option
  • Solution: relocate instructions to out-of-line scratch space

1. Allocate memory 2. Move instructions 3. Add jumps into and out of moved instructions

20

slide-21
SLIDE 21

Add with detour

Insert a jump to rewritten instructions

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 jump D0 L0: 0x08 3

  • r 0x67,R2

… D0: 0xffec 5 call fnct … // added instructions 0xfffd 5 jump L0 Out-of-line scratch space

slide-22
SLIDE 22

Add with detour

Insert a jump to rewritten instructions

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 jump D0 L0: 0x08 3

  • r 0x67,R2

… D0: 0xffec 5 call fnct … // added instructions 0xfffd 5 jump L0 Out-of-line scratch space

S(O) = S(J)

slide-23
SLIDE 23

Replace with detour

  • If S(J) ≤ S(O) → replace and pad with nops
  • Otherwise, relocate neighbouring instructions

0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x00 1 push R0 0x01 5 jump D0 0x06 1 nop 0x07 1 nop L0: 0x08 3

  • r 0x67,R2

… D0: … // substitute instructions 0xfff8 5 call fnct 0xfffd 5 jump L0

S(J) = 5 S(O) = 2

Out-of-line scratch space

slide-24
SLIDE 24

Replace depends on relative sizes

  • S(R) = S(O) → replace O with R
  • S(R) < S(O) → replace and pad with nops
  • S(R) > S(O) → detour with jump (J)

○ S(J) = S(O) → replace O with J ○ S(J) < S(O) → replace and pad with nops ○ S(J) > S(O) → replace and relocate surrounding instructions

slide-25
SLIDE 25

Can instructions always be relocated?

slide-26
SLIDE 26

Side effects

push R0 test R0,R1 set parity flag jpe L0 jump if parity even

  • r 0x67,R2

Alter status flags

push R0 test R0,R1 add R2,R3 jpe L0

  • r 0x67,R2

Solution Whitelist of instructions known to be safe to relocate

slide-27
SLIDE 27

0x00 1 push R0 0x01 5 jump D0 0x06 1 nop 0x07 1 nop L0: 0x08 3

  • r 0x67,R2

… D0: … // added instructions 0xffea 2 load 0x48(PC),R0 0xffec 5 call fnct 0xfffd 5 jump L0 0x00 1 push R0 0x01 2 load 0x48(PC),R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

PC-relative addressing

0x48 + 0x01 = 0x49 0x48 + 0xFFEA = 0x10032

slide-28
SLIDE 28

0x00 1 push R0 0x01 5 jump D0 0x06 1 nop 0x07 1 nop L0: 0x08 3

  • r 0x67,R2

… D0: … // added instructions 0xffe6 6 load -0xff9d(PC),R0 0xffec 5 call fnct 0xfffd 5 jump L0 0x00 1 push R0 0x01 2 load 0x48(PC),R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

PC-relative addressing

0x49 - 0xFFE6 = -0xFF9D

Solution Fixup displacement in relocated instruction

slide-29
SLIDE 29

Branch target

0x00 1 push R0 0x01 2 load 0x14,R0 L0: 0x03 5 call fnct 0x08 3

  • r 0x67,R2

… 0x68 2 jump L0 0x00 1 push R0 0x01 5 jump D0 0x06 1 nop 0x07 1 nop L1: 0x08 3

  • r 0x67,R2

… 0x68 2 jump L0 … D0: 0xffec 2 load 0x14,R0 … // added instructions 0xfff8 5 call fnct 0xfffd 5 jump L1

Solution Record branch target locations before rewriting

slide-30
SLIDE 30

Problematic instructions

  • Branch targets → do not rewrite
  • PC-relative addressing → fixup displacement
  • Side effects → only rewrite white-listed instructions
slide-31
SLIDE 31

What if not enough instructions can be relocated?

31

slide-32
SLIDE 32

Cannot relocate instructions

32

  • Cannot accommodate jump
  • Detour cannot be used
  • Instead, insert short illegal instruction
  • Setup signal handler to catch SIGILL
  • Put added instructions into handler
  • Significant overhead but extremely rare
slide-33
SLIDE 33

Disassembling

33

slide-34
SLIDE 34

What is disassembling?

34

0 1 0 0 1 0 0 1 0 0 1

Binary push R0 load 0x14,R0 call fnct

  • r 0x67,R2

Disassembly Disassembling

0 1 0 0 1 0 0 1 0 0 1

Binary push R0 load 0x14,R0 call fnct

  • r 0x67,R2

Disassembly Disassembling

Break binary code into sequence of instructions

slide-35
SLIDE 35

Disassembler types

  • Dynamic

○ Actually run program ○ Decode instructions just in time ○ Runtime penalty ○ E.g. Dyninst, DynamoRIO, Pin

  • Static

○ Program is not run ○ Binary scanned according to algorithm ○ No runtime penalty ○ E.g. Multiverse

35

slide-36
SLIDE 36

Static disassembler

Linear Sweep 0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x0b 2 jump L1 0x0d 3 and 0x45,R2 0x10 5 jump L2 0x15 ? bad // garbage Recursive Traversal 0x00 1 push R0 0x01 2 load 0x14,R0 0x03 5 call fnct 0x08 3

  • r 0x67,R2

0x0b 2 jump L1 … // skipped instructions L1: 0x6c 4 move R0,R1

slide-37
SLIDE 37

Disassembly challenges

  • Code discovery or content classification problem

○ Mixed code and data ○ Halting problem

  • Instruction overlapping

○ Variable-length ISA ○ One byte encodes several instructions ○ Obfuscation technique

37

slide-38
SLIDE 38

SaBRe: Load-time selective binary rewriting for system calls and function prologues

38

slide-39
SLIDE 39

Fault injection

  • How to simulate e.g. a permission error at system call level?
  • Swap return value with error code

39

write(8, “Hello!”, 6) = 6 write(8, “Hello!”, 6) = EPERM

How to achieve that?

slide-40
SLIDE 40

Intercepting system calls

  • Problem: syscalls are in libraries, not user programs
  • How to ensure all syscalls are intercepted?
  • Rewriting on disk ahead of time is impractical
  • Rewriting in memory just in time incurs overhead
  • Solution: “just ahead of time”

○ At load time ○ Intercept dynamic linker ○ Rewrite libraries when mapped ○ In process memory

40

slide-41
SLIDE 41

Application Programming Interface

  • Instrumentation through user-defined plugins

○ E.g. fault injection plugin ○ Plugin implements hook functions

  • Hook function for syscalls:

long (*sbr_sc_handler_fn) (long, long, long, long, long, long, long)

○ Receives syscall number and 6 args ○ Responsible for actually issuing syscall ○ May alter parameters and return value

41

slide-42
SLIDE 42

Fault injection

  • How to simulate e.g. a permission error at system call level?
  • Swap return value with error code

42

write(8, “Hello!”, 6) = 6 write(8, “Hello!”, 6) = EPERM

How to achieve that?

slide-43
SLIDE 43

Basic fault injection plugin

long handle_syscall (long sc_no, long arg1, long arg2, long arg3, long arg4, long arg5, long arg6) { long ret = real_syscall(sc_no, arg1, arg2, arg3, arg4, arg5, arg6); if (sc_no == SYS_WRITE) ret = -EPERM; return ret; }

43

slide-44
SLIDE 44

Available plugins

  • Identity

○ Dummy plugin ○ Passes syscalls on unchanged ○ Testing and benchmarking

  • Fault injector

○ Test application resiliency ○ Simulate syscall failures ○ User sets failure probability by syscall category

  • Tracer

○ strace replacement ○ Does not use ptrace ○ Much more efficient

44

slide-45
SLIDE 45

Supported archs

  • x86_64

○ Main dev target ○ Continuous integration

  • RISC-V

○ Open ISA ○ Devroom yesterday ○ QEMU Linux support

45

slide-46
SLIDE 46

Current implementation

  • Target OS: Linux
  • Languages: mainly C + ASM snippets
  • No third-party library
  • LoC

○ Backbone: 2108 ○ x86_64: 1333 ■ C: 963 ■ ASM: 410 ○ API + support code: 2016

  • Program size

○ x86_64: 47 KiB ○ RISC-V: 40 KiB

46

slide-47
SLIDE 47

Evaluation

47

slide-48
SLIDE 48
  • Experimental setup

○ Plugin: identity ○ Applications: nginx, lighttpd, redis, memcached ○ Glibc 2.28 ○ 3-minute execution, millions of requests ○ CPU utilisation: 100%

  • Results

○ 404 syscalls detoured ○ Load-time overhead: 60ms ○ Run-time overhead: ≤ 3%

Worst-case overhead

48

slide-49
SLIDE 49

System call tracer

  • Plugin mimics strace’s output
  • Not using ptrace
  • Compare with strace and equivalent Pin plugin
  • Issue read and write with dd

○ ~1M syscalls issued

  • Estimate disk usage of 150GB tree with du

49

slide-50
SLIDE 50

dd results

50

slide-51
SLIDE 51

du results

51

slide-52
SLIDE 52

Fault injector

  • Applications: GNU coreutils
  • Busybox testsuite
  • Bugs found

○ Cryptic error messages ○ Lack of resiliency ○ Crashes

52

slide-53
SLIDE 53

SaBRe in a nutshell

  • Selective binary rewriting

○ Syscalls and vDSO ○ Function prologues ○ x86_64-specific: RDTSC

  • At load time, in process memory
  • Low overhead, suitable for embedded devices
  • Simple API to build plugins
  • Available plugins

○ Fault injector ○ Tracer

  • GPLv3

53

slide-54
SLIDE 54

Availability

  • GitHub: https://github.com/srg-imperial/SaBRe
  • Licence: GPLv3

○ Plugins also under GPLv3

  • PR and bug reports welcome!

54