DynaGuard: Armoring Canary-Based Protections against Brute-force - - PowerPoint PPT Presentation

dynaguard armoring canary based protections against brute
SMART_READER_LITE
LIVE PREVIEW

DynaGuard: Armoring Canary-Based Protections against Brute-force - - PowerPoint PPT Presentation

DynaGuard: Armoring Canary-Based Protections against Brute-force Attacks Theofilos Petsios, Vasileios P. Kemerlis Michalis Polychronakis Angelos D. Keromytis Columbia University Stony Brook University Brown University 2015 Annual Computer


slide-1
SLIDE 1

DynaGuard: Armoring Canary-Based Protections against Brute-force Attacks

2015 Annual Computer Security Applications Conference (ACSAC) Los Angeles, California, USA

Columbia University Brown University Stony Brook University

Theofilos Petsios, Angelos D. Keromytis Vasileios P. Kemerlis Michalis Polychronakis

slide-2
SLIDE 2

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Background: Stack Smashing Protection

2

  • Prevents the overwrite of the return address by a stack buffer overflow
  • Places a random value after critical data in the stack
  • Random value: ➡ “Canary” or “Canary Cookie”
  • Critical data ➡ Return address, Frame pointer, etc.
  • The canary is 4 bytes long in x86, 8 bytes in x86-64
  • Generated dynamically at the creation of each thread, and stored in the

Thread-Local Storage (TLS) area

  • Checked upon function epilogue
  • Supported in GCC, Microsoft VS (/GS) and LLVM
slide-3
SLIDE 3

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Background: Stack Smashing Protection

3 Overflow Direction buffer start byte 0x0 byte 0x1 ... byte 0x7 ... canary start byte 0x0 byte 0x7 canary end copy of str Return Address Frame Pointer Canary char buffer[] int *x int i copy of n Higher Addresses Lower Addresses

int vuln(int n, char *str) { int i; int *x = NULL; char bufger [8]; ... /* unbounded copy */ memcpy(bufger , str , n); ... }

slide-4
SLIDE 4

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

4

An attacker may brute-force the canary byte-by-byte in very few attempts if they are able to perform the following steps:

  • Force child processes to be forked by the same parent process
  • Verify if these child processes crashed or not
  • Overwrite a single byte of the canary each time until all the bytes are

recovered

slide-5
SLIDE 5

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

5

  • Possible due to the current process creation mechanism:

  • Certain data is inherited from the parent process, although it should be

different (other examples include VM side channel attacks and the PRNG state in forked processes)

slide-6
SLIDE 6

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

6

Overflow Direction buffer start byte 0x0 byte 0x1 ... byte 0x7 ... canary start byte 0x0 byte 0x7 canary end copy of str Return Address Frame Pointer Canary char buffer[] int *x int i copy of n Higher Addresses Lower Addresses

slide-7
SLIDE 7

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

7

slide-8
SLIDE 8

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

8

slide-9
SLIDE 9

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

9

slide-10
SLIDE 10

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Canary Brute-force

10

Overflow Direction buffer start byte 0x0 byte 0x1 ... byte 0x7 canary start canary end copy of str Return Address Frame Pointer Canary char buffer[] int *x int i copy of n Higher Addresses Lower Addresses

slide-11
SLIDE 11

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 11

A byte-by-byte brute-force requires 4*256 = 1024 attempts on average on x86 and 2048 on x86-64, assuming a fully random canary

slide-12
SLIDE 12

Canary Brute-force Guessing Timeline

2006

Ben Hawkes introduced the technique in RUXCON 2006 
 (Title: "Exploiting OpenBSD”)

2010

Adam Zabrocki (pi3) discussed remote stack exploitation techniques in Linux, FreeBSD and OpenBSD and among

  • ther things, revisited Ben's attack in Phrack #67

2013

Nikolaos Rangos (Kingcope) released an exploit for the Nginx web-server that builds upon the previous attack(s) to construct a remote exploit Andrea Bittau et al. introduced the BROP technique, which among other things, uses a generalized version of the above to leak/bypass stack canaries

2014

slide-13
SLIDE 13

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Key idea: Upon each fork() update the inherited (old) canaries in the child process

  • Update the canary in the TLS of the new (child) process
  • Update the canaries in all inherited stack frames (from the parent

process) with the new canary value

13

DynaGuard Design

slide-14
SLIDE 14

Simply updating the canary in the TLS* for new (child) processes is not enough as it will cause a false abort if execution reaches one of the parent’s inherited frames *as proposed in a recent paper

slide-15
SLIDE 15

} }

......

}

a b

......

TLS

canary

previous frames

slide-16
SLIDE 16

} }

......

}

TLS a b Parent Process

......

TLS a b

} }

......

}

Child Process

slide-17
SLIDE 17

} } }

......

}

TLS a b c Parent Process

......

TLS a b c

} } }

......

}

Child Process

slide-18
SLIDE 18

} } }

......

}

TLS a b c Parent Process

......

TLS a b c

} } }

......

}

Child Process

= ? = ?

slide-19
SLIDE 19

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 19

} }

......

}

a b

......

TLS

canary

previous frames

canary address bufger

canary push canary reference canary check

&(canary a) &(canary b)

......

= ?

slide-20
SLIDE 20

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 20

&(can. a) &(can. b)

......

} }

......

}

TLS a b Parent Process

} }

......

}

TLS a b Child Process

canary address bufger

&(can. a) &(can. b)

......

canary address bufger

slide-21
SLIDE 21

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 21

&(can. a) &(can. b)

......

canary address bufger

&(can. c) &(can. a) &(can. b)

......

canary address bufger

&(can. c)

} } }

......

}

TLS a b c Parent Process

......

TLS a b c

} } }

......

}

Child Process

slide-22
SLIDE 22

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 22

&(can. a) &(can. b)

......

canary address bufger

&(can. c) &(can. a) &(can. b)

......

canary address bufger

&(can. c)

} } }

......

}

TLS a b c Parent Process

......

TLS a b c

} } }

......

}

Child Process

= ? = ?

slide-23
SLIDE 23

Two flavors: Compiler-based and DBI-based

Implementation

slide-24
SLIDE 24

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Implementation: Compiler-based Version

24

slide-25
SLIDE 25

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 25

  • Two components:
  • GCC plugin
  • Runtime library
  • Total of ~1250 LOC
  • Maintain two canaries at runtime:
  • DynaGuard-compiled code uses DynaGuard canaries
  • legacy code/libraries use the glibc canaries

Implementation: Compiler-based Version

slide-26
SLIDE 26

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 26

  • Both canaries have same entropy but are stored in different TLS offsets
  • GCC plugin replaces the glibc canaries with the DynaGuard canaries
  • DynaGuard’s runtime library:
  • allocates Canary Address Buffer (CAB) in the heap for each thread,

before it starts executing and deallocates it when terminating

  • performs CAB bookkeeping
  • updates all canaries in the child process’s stack, as well as its TLS upon

a fork()

Implementation: Compiler-based Version

slide-27
SLIDE 27

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 27

Compiler-based Version: DynaGuard GCC Plugin

  • Reserve 4 out of 8 __padding entries of the tcbhead_t

struct in the TLS. 
 Reserved TLS offsets range from 0x2a0 to 0x2b8:

  • CAB address stored at %fs:0x2a0
  • CAB current index: %fs:0x2a8
  • CAB size: %fs:0x2b0
  • DynaGuard canary: %fs:0x2b8
  • Insert code to push/pop canary addresses in CAB upon a

canary push/pop

slide-28
SLIDE 28

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 28

;function prologue push %rbp mov %rsp,%rbp sub $0x40,%rsp ;canary stack placement mov %fs:0x28,%rax mov %rax,-0x8(%rbp) xor %eax,%eax ... ;canary check mov -0x8(%rbp),%rcx xor %fs:0x28,%rcx je <exit> callq <__stack_chk_fail@plt> Original push %rbp mov %rsp,%rbp sub $0x40,%rsp push %r14 (1) push %r15 lea -0x8(%rbp),%rax (2) mov %fs:0x2a0,%r14 (3) mov %fs:0x2a8,%r15 (4) mov %rax,(%r14,%r15,8) (5) incq %fs:0x2a8 (6) pop %r15 (7) pop %r14 mov %fs:0x2b8,%rax (8) mov %rax,-0x8(%rbp) xor %eax,%eax ... decq %fs:0x2a8 (9) mov -0x8(%rbp),%rcx xor %fs:0x2b8,%rcx (10) je <exit> callq <__stack_chk_fail@plt> DynaGuard

Compiler-based Version: DynaGuard GCC Plugin

slide-29
SLIDE 29

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 29

Compiler-based Version: DynaGuard Runtime Library

  • PIC module loaded via LD_PRELOAD
  • Invoked only for CAB setup and resize operations, as well as for

canary updates.

  • All push/pop operations of canary addresses are implemented by

the GCC plugin

slide-30
SLIDE 30

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 30

Compiler-based Version: DynaGuard Runtime Library

  • Constructor routine allocates CAB in main thread
  • Hooks:
  • pthread_create to setup the entries in TLS before

start_routine starts executing

  • the fork() system call and updates all canaries in the child

process's stack (before the child commences execution)

  • stack unwinding routines and updates the CAB accordingly
  • Write-protects the last page of CAB, registers a SIGSEGV

handler, and hooks signal and sigaction

  • If signal due to a full CAB, resize accordingly and resume

execution

  • Else, invoke the original signal handler and let the application

handle the signal

slide-31
SLIDE 31

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Implementation: DBI-based Version

31

  • Instr. Binary

DynaGuard Pin Instrumentation API Pin Virtual Machine Code Cache Kernel Space User Space Analysis Code

  • Instr. Code

Native Code Single Address Space

Implemented using Intel’s Pin DBI framework

  • No source code needed
  • Same design as previously except now execution occurs under Pin
slide-32
SLIDE 32

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Implementation: DBI-based Version

32

  • Monitor all canary push and pop operations
  • Update all canaries in the child process accordingly upon a fork
  • No need for complex tracking of stack unwinding: simply track

modifications of the stack pointer

  • Maintain a per-thread CAB buffer, eliminating the overhead of using

the Pin built-in trace buffer

Instrumentation Pseudocode if((instruction has segment prefix) && (prefix is one of fs/gs) && (offset from fs/gs is 0x28/0x14) && (instr. is a ‘mov’ from mem to reg) && (next instr. is a `mov’ from reg to mem)&& (dest. operand(register) of current instr. is the source operand of next instr.)) { insert_analysis_call( before_next_instr, push_canary(thread_context, canary_address))} push rbp mov rsp,%rbp sub $0x40,rsp mov fs:0x28,%rax (1) mov rax,-0x8(%rbp)(2) Sample Function Prologue

slide-33
SLIDE 33

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Evaluation

33

Effectiveness:

  • Successfully defends against BROP and Nginx public exploits

without breaking correctness Performance:

  • SPEC CPU 2006 INT benchmarks
  • Popular Server Applications: Apache, Nginx, PostgreSQL,

MySQL, SQLite

  • Phoronix default profile for all server applications except

MySQL (for which we used SysBench)

  • Average overhead 1.2% in GCC version, 2.92% on top of PIN

in DBI version

slide-34
SLIDE 34

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 34

0.995 1 1.005 1.01 1.015 1.02 1.025 1.03 1.035 1.04 1.045 1.05 1.055 1.06 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Apache Nginx PostgreSQL SQLite MySQL

Slowdown (normalized over native)

SPEC CPU2006 Benchmarks I/O-bound Benchmarks

SPEC CPU2006: 1.5% Server applications (Phoronix and SysBench): 0.46%

Compiler-based version of DynaGuard

slide-35
SLIDE 35

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard 35

0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 400.perlbench 401.bzip2 403.gcc 429.mcf 445.gobmk 456.hmmer 458.sjeng 462.libquantum 464.h264ref 471.omnetpp 473.astar 483.xalancbmk Apache Nginx PostgreSQL SQLite MySQL

Slowdown (normalized over native)

SPEC CPU2006 Benchmarks I/O-bound Benchmarks

Pin DynaGuard

SPEC CPU2006: 3.2% - 2.19x (avg 1.56x) PostgreSQL : 0.4% - SQLite : 8.19% - MySQL: 214%- Apache: 3.2x - Nginx: 2.8x

Average CPU overhead 170.66%, 2.92% atop PIN

DBI-based version of DynaGuard

slide-36
SLIDE 36

Theofilos Petsios (theofilos@cs.columbia.edu) ACSAC 2015 /36 DynaGuard

Summary

36

  • DynaGuard protects canary-based defenses against byte-by-byte brute

forcing of the canary cookie

  • Supports applications for which source code is available as well as

binary-only programs

  • Offers a lightweight solution for the more general problem of

memory duplication with respect to reduced entropy for security- sensitive applications (e.g., PRNGs of OpenSSL and LibreSSL)


  • Has minimal incremental overhead over the respective underlying

protection (e.g., GCC’s SSP & Pin’s native DBI respectively)

  • Source code is available at https://github.com/nettrino/dynaguard