Guarding Vulnerable Code: Module 1: Sanitization Mathias Payer, - - PowerPoint PPT Presentation

guarding vulnerable code module 1 sanitization
SMART_READER_LITE
LIVE PREVIEW

Guarding Vulnerable Code: Module 1: Sanitization Mathias Payer, - - PowerPoint PPT Presentation

Guarding Vulnerable Code: Module 1: Sanitization Mathias Payer, Purdue University http://hexhive.github.io 1 Vulnerabilities everywhere? 2 Common Languages: TIOBE18 Jul 2018 Jul 2017 Change Language Ratings Change 1 1 Java


slide-1
SLIDE 1

1

Guarding Vulnerable Code: Module 1: Sanitization

Mathias Payer, Purdue University http://hexhive.github.io

slide-2
SLIDE 2

2

Vulnerabilities everywhere?

slide-3
SLIDE 3

3

Common Languages: TIOBE’18

Jul 2018 Jul 2017 Change Language Ratings Change 1 1 Java 16.139% +2.37% 2 2 C 14.662% +7.34% 3 3 C++ 7.615% +2.04% 4 4 Python 6.361% +2.82% 5 7 + VB .NET 4.247% +1.20% 6 5

  • C#

3.795% +0.28% 7 6

  • PHP

2.832%

  • 0.26%

8 8 JavaScript 2.831% +0.22% 9 - ++ SQL 2.334% +2.33% 10 18 ++ Objective-C 1.453%

  • 0.44%
slide-4
SLIDE 4

4

Software is highly complex

Google Chrome: 76 MLoC Gnome: 9 MLoC Xorg: 1 MLoC glibc: 2 MLoC Linux kernel: 17 MLoC

Low-level languages (C/C++) trade type safety and memory safety for performance

slide-5
SLIDE 5

5

Defense: Testing vs. Mitigations

Mitigations

  • Stop exploitation
  • Always on
  • Low overhead

Software Testing

  • Discover bugs
  • Development tool
  • Result oriented
slide-6
SLIDE 6

6

Memory Corruption

slide-7
SLIDE 7

7

Memory error: invalid dereference

Dangling pointer: (temporal) Out-of-bounds pointer: (spatial) Violation iff: pointer is read, written, or freed char foo[40]; foo[42] = 23; free(foo); *foo = 23;

slide-8
SLIDE 8

8

Type Confusion

slide-9
SLIDE 9

9

Type confusion through downcasts

Base Greeter Exec

Greeter *g = new Greeter(); Base *b = static_cast<Base*>(g); Exec *e = static_cast<Exec*>(b); √

X

slide-10
SLIDE 10

10

C++ casting operations

  • static_cast<ToClass>(Object)

– Compile time check – No runtime type information

  • dynamic_cast<ToClass>(Object)

– Runtime check – Requires Runtime Type Information (RTTI) – Not used in performance critical code

slide-11
SLIDE 11

11

Static cast

movq -24(%rbp), %rax # Load pointer # Type “check” movq %rax, -40(%rbp) # Store pointer

Base *b = …; a = static_cast<Greeter*>(b);

slide-12
SLIDE 12

12

Dynamic cast (O2)

leaq _ZTI7Greeter(%rip), %rdx leaq _ZTI4Base(%rip), %rsi xorl %ecx, %ecx movq %rbp, %rdi # Load pointer call __dynamic_cast@PLT # Type check

Base *b = …; a = dynamic_cast<Greeter*>(b);

slide-13
SLIDE 13

13

Type confusion

class Base { int x; }; class Greeter: Base { int y; virtual void Hi(); }; … Base *Bptr = new Base(); Greeter *Gptr; Gptr = static_cast<Greeter*>Gptr; // Type Conf Gptr->y = 0x43; // Memory safety violation! Gptr->Hi(); // Control-flow hijacking x vtable* y B G x Bptr Gptr vtable*? y?

slide-14
SLIDE 14

14

Type Confusion Demo

slide-15
SLIDE 15

15

C++ virtual dispatch

class Base { … }; class Exec: public Base { public: virtual void exec(char *prg) { system(prg); } }; class Greeter: public Base { public: virtual void sayHi(char *str) { std::cout << str << std::endl; } }; Greeter *greeter = new Greeter(); greeter->sayHi("Oh, hello there!"); Base Greater Exec

slide-16
SLIDE 16

16

Simple exploitation demo

int main() { Base *b1 = new Greeter(); Base *b2 = new Exec(); Greeter *g; g = static_cast<Greeter*>(b1); g->sayHi("Greeter says hi!"); g = static_cast<Greeter*>(b2); g->sayHi("/usr/bin/xcalc"); delete b1; delete b2; return 0; }

vtable* b1 vtable* b2 GreeterT ExecT

// g[0][0](str); // g[0][0](str);

slide-17
SLIDE 17

17

Sanitization

slide-18
SLIDE 18

18

Problem: broken abstractions?

C/C++ void log(int a) { printf("Log: "); printf("%d", a); } void (*fun)(int) = &log; void init() { fun(15); } ASM log: ... fun: .quad log init: ... movl $15, %edi movq fun(%rip), %rax call *%rax

slide-19
SLIDE 19

19

LLVM Sanitization

  • Test cases detect bugs through assertions,

segmentation faults, traps, exceptions

  • Enforce stronger policies during testing!

– Address Sanitizer: memory safety – Leak Sanitizer: memory leaks – Memory Sanitizer: uninitialized memory – UBSan: undefined behavior – Thread Sanitizer: data races – HexVASAN: variadic argument checker – HexType: type safety

slide-20
SLIDE 20

20

Type Safety

slide-21
SLIDE 21

21

Type confusion detection*

  • A static cast is checked only at compile time

– Fast but no runtime guarantees

  • Dynamic casts are checked at runtime

– High overhead, limited to polymorphic classes

  • HexType design:

– Conceptually check all casts dynamically – Aggressively optimize design and implementation

* TypeSanitizer: Practical Type Confusion Detection. Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Herbert Bos, Cristiano Giuffrida, Erik van der Kouwe. In CCS'16 * HexType: Efficient Detection of Type Confusion Errors for C++. Yuseok Jeon, Priyam Biswas, Scott A. Carr, Byoungyoung Lee, and Mathias Payer. In CCS'17

slide-22
SLIDE 22

22

Making type checks explicit

  • Enforce runtime check at all cast sites

– static_cast<ToClass>(Object) – dynamic_cast<ToClass>(Object) – reinterpret_cast<ToClass>(Object) – (ToClass)(Object)

  • Build global type hierarchy
  • Keep track of the allocation type of each object

– Must instrument all forms of allocation – Requires disjoint metadata

slide-23
SLIDE 23

23

HexType: design

Instrumentation (Type casting verification) Source code LLVM Pass Clang Type Hierarchy Information HexType Runtime Library HexType Binary Link

slide-24
SLIDE 24

24

HexType: aggressive optimization

  • Limit tracing to unsafe types

– Remove tracing of types that are never cast

  • Limit checking to unsafe casts

– Remove statically verifiable casts

  • No more RTTI for dynamic casts

– Replace dynamic casts with fast lookup

slide-25
SLIDE 25

25

Demo Time!

slide-26
SLIDE 26

26

HexType coverage

slide-27
SLIDE 27

27

Newly discovered bugs

  • Discovered seven new vulnerabilities:

Apache Xerces C++ Qt base library

DOMNode DOM Character Data DOM Element DOM ElementImpl DOM Text DOM TextImpl

Type Confusion!

QMapNode Base QMapNode

slide-28
SLIDE 28

28

Sanitizer Summary: Type Safety

  • Type confusion fundamental in today’s exploits
  • Existing sanitizers are incomplete, partial, slow
  • HexType

– (Almost) full coverage (2-6x increase) – Reasonable overhead (SPEC CPU: 0-32x

improvement, Firefox: 0-0.5x slowdown)

– Future work: remaining coverage, optimizations

slide-29
SLIDE 29

29

T-Fuzz

slide-30
SLIDE 30

30

Fuzzing Challenges

  • Challenges

– Shallow coverage – Hard to find “deep” bugs

  • Root cause

– Fuzzer-generated inputs

cannot bypass complex sanity checks in the target program

– Existing work limits itself

to input generation

start end

check1 check2 check3

bug Shallow code paths Shallow code paths Deep code paths Deep code paths

slide-31
SLIDE 31

31

T-Fuzz: Fuzz the Program!

  • Option 1: generate input to bypass checks by

heavy-weight program analysis techniques

– Driller (concolic analysis) – VUzzer (dynamic taint analysis)

  • Our idea: remove program’s sanity checks

– Checks filter orthogonal input, e.g., magic values,

checksum, or hashes (Non-Critical Check, NCC)

– Insight: removing NCCs is safe

if (strncmp(hdr, “ELF", 3) == 0) { // main program logic } else { error(); }

slide-32
SLIDE 32

32

Design and Implementation

  • Fuzzer generates inputs
  • When “stuck”

– Detect NCCs*

  • Transform program
  • Verify crashes

*Approximation of NCCs: edged in the CFG connecting covered/uncovered nodes

Fuzzer (e.g. AFL) Program Transformer Crash Analyzer

Bug Reports

False Positjves

Crashing inputs Inputs Transformed Programs

slide-33
SLIDE 33

33

Detecting NCC’s

  • Approximate NCCs as edges

connecting covered and uncovered nodes in CFG

– Over approximate, may

contain false positives

– Lightweight and simple to

implement

33

Covered Node Uncovered Node NCC Candidates

slide-34
SLIDE 34

34

Program Transformation

  • Our approach: negate NCCs

– Simple: static binary rewriting – Zero runtime overhead in

resulting target program

– Unchanged CFG – Trace in transformed program

maps to original program

– Path constraints of original

program can be recovered

34

start end

A == B

True branch False branch

start end

A != B

True branch False branch Negated Check

slide-35
SLIDE 35

35

Comparison to Symbolic Executoion

  • Explores all code paths,

tracks constraints

  • Path explosion, e.g., loops
  • Each branch doubles the

number of code paths

  • Resource requirement
  • Theoretically beautiful,

limited scalability

... ... ... ... ...

( Path1, constraint set1) ( Pathn, constraint setn)

...

slide-36
SLIDE 36

36

Comparison to Concolic Execution

  • Guided by concrete inputs
  • Follows single code path,

collects constraints for new code paths

  • Reduced resource

requirements

  • Still an exponential number
  • f paths to explore!

... ... ... ... ... input

C1 Not C1

slide-37
SLIDE 37

37

Comparison to Driller (Fuzz & CE)

  • Fuzzing until coverage wall
  • When fuzzing gets “stuck”,

concolic execution explores new code paths using fuzzer generated inputs

  • Limitations

– “SE & constraints solving” slows

down fuzzing

– Not able to bypass “hard” checks

Fuzzer Inputs

mutating

target program Crashes

SE & constraint solving

slide-38
SLIDE 38

38

T-Fuzz: fuzz first, solve only crashes

  • Fuzzing/SE decoupled
  • SE only applied to

detected crashes

  • For “hard” checks,

T-Fuzz detects the guarded bug, but cannot verify it

T-Fuzz Fuzzer program Crashes Program Transformation T-Fuzz in action SE & constraints solving

slide-39
SLIDE 39

39

Evaluation

  • Implementation

– Fuzzer: shellphish fuzzer (python wrapper of AFL) – Program Transformer: angr tracer, radare2 – Crash Analyzer: 2k LoC Python hackery

  • Evaluation

– DARPA CGC dataset – LAVA-M dataset – 4 real-world programs

slide-40
SLIDE 40

40

DARPA CGC Dataset

  • Improvement over Driller/AFL:

55 (45%) / 61 (58%)

  • Driller outperforms T-Fuzz

– 3 due to false crashes (L1) – 7 due to transformation

explosion (L2)

Method # bugs AFL 105 Driller 121 T-Fuzz 166 Driller - AFL 16 T-Fuzz - AFL 61 T-Fuzz - Driller 55 Driller - T-Fuzz 10 AFL (105) T-Fuzz (166) Driller (121) 10 6 55

slide-41
SLIDE 41

41

LAVA-M Dataset

  • T-Fuzz outperforms VUzzer and Steelix for “hard” checks
  • T-Fuzz defeated by Steelix due to transformation explosion

in who, but still found more bugs than VUzzer

  • T-Fuzz found 1 unintended bug in who

Program # of bugs VUzzer Steelix T-Fuzz base64 44 17 43 43 unique 28 27 24 26 md5sum 57 1 28 49 who 2136 50 194 95*

slide-42
SLIDE 42

42

Evaluation on Real Programs

  • Time budget: 24 hours

– T-Fuzz triggers more crashes than AFL – T-Fuzz found 3 new bugs in latest versions of

ImageMagick and libpoppler (marked by *)

Program + library AFL T-Fuzz pngfix + libpng (1.7.0) 11 tiffinfo + libtiff (3.8.2) 53 124 magick + ImageMagicK (7.0.7) 2* pdftohtml + libpoppler (0.62.0) 1*

slide-43
SLIDE 43

43

T-Fuzz Summary

  • Fuzzers hit coverage wall, no “deep” bugs

– T-Fuzz mutates both input and target program – T-Fuzz improves over Driller/AFL by 45%/58% – T-Fuzz triggeres bugs guarded by “hard” checks – New bugs: 1 in LAVA-M, 3 in real-world programs

slide-44
SLIDE 44

44

Conclusion

slide-45
SLIDE 45

45

Conclusion

  • Goal: Protect systems despite vulnerabilities
  • Sanitization finds bugs during testing

– HexType brings type safety to C++ – T-Fuzz explores deep program paths

  • Combine sanitization and fuzzing for best results

Thank you! Questions?

Source: https://hexhive.github.io

slide-46
SLIDE 46

46

Word Cloud

Source: https://hexhive.github.io