1
Guarding Vulnerable Code: Module 1: Sanitization
Mathias Payer, Purdue University http://hexhive.github.io
Guarding Vulnerable Code: Module 1: Sanitization Mathias Payer, - - PowerPoint PPT Presentation
Guarding Vulnerable Code: Module 1: Sanitization Mathias Payer, Purdue University http://hexhive.github.io 1 Vulnerabilities everywhere? 2 Common Languages: TIOBE18 Jul 2018 Jul 2017 Change Language Ratings Change 1 1 Java
1
Mathias Payer, Purdue University http://hexhive.github.io
2
3
Jul 2018 Jul 2017 Change Language Ratings Change 1 1 Java 16.139% +2.37% 2 2 C 14.662% +7.34% 3 3 C++ 7.615% +2.04% 4 4 Python 6.361% +2.82% 5 7 + VB .NET 4.247% +1.20% 6 5
3.795% +0.28% 7 6
2.832%
8 8 JavaScript 2.831% +0.22% 9 - ++ SQL 2.334% +2.33% 10 18 ++ Objective-C 1.453%
4
Low-level languages (C/C++) trade type safety and memory safety for performance
5
6
7
Dangling pointer: (temporal) Out-of-bounds pointer: (spatial) Violation iff: pointer is read, written, or freed char foo[40]; foo[42] = 23; free(foo); *foo = 23;
8
9
Base Greeter Exec
Greeter *g = new Greeter(); Base *b = static_cast<Base*>(g); Exec *e = static_cast<Exec*>(b); √
10
– Compile time check – No runtime type information
– Runtime check – Requires Runtime Type Information (RTTI) – Not used in performance critical code
11
movq -24(%rbp), %rax # Load pointer # Type “check” movq %rax, -40(%rbp) # Store pointer
Base *b = …; a = static_cast<Greeter*>(b);
12
leaq _ZTI7Greeter(%rip), %rdx leaq _ZTI4Base(%rip), %rsi xorl %ecx, %ecx movq %rbp, %rdi # Load pointer call __dynamic_cast@PLT # Type check
Base *b = …; a = dynamic_cast<Greeter*>(b);
13
class Base { int x; }; class Greeter: Base { int y; virtual void Hi(); }; … Base *Bptr = new Base(); Greeter *Gptr; Gptr = static_cast<Greeter*>Gptr; // Type Conf Gptr->y = 0x43; // Memory safety violation! Gptr->Hi(); // Control-flow hijacking x vtable* y B G x Bptr Gptr vtable*? y?
14
15
class Base { … }; class Exec: public Base { public: virtual void exec(char *prg) { system(prg); } }; class Greeter: public Base { public: virtual void sayHi(char *str) { std::cout << str << std::endl; } }; Greeter *greeter = new Greeter(); greeter->sayHi("Oh, hello there!"); Base Greater Exec
16
int main() { Base *b1 = new Greeter(); Base *b2 = new Exec(); Greeter *g; g = static_cast<Greeter*>(b1); g->sayHi("Greeter says hi!"); g = static_cast<Greeter*>(b2); g->sayHi("/usr/bin/xcalc"); delete b1; delete b2; return 0; }
vtable* b1 vtable* b2 GreeterT ExecT
// g[0][0](str); // g[0][0](str);
17
18
C/C++ void log(int a) { printf("Log: "); printf("%d", a); } void (*fun)(int) = &log; void init() { fun(15); } ASM log: ... fun: .quad log init: ... movl $15, %edi movq fun(%rip), %rax call *%rax
19
segmentation faults, traps, exceptions
– Address Sanitizer: memory safety – Leak Sanitizer: memory leaks – Memory Sanitizer: uninitialized memory – UBSan: undefined behavior – Thread Sanitizer: data races – HexVASAN: variadic argument checker – HexType: type safety
20
21
– Fast but no runtime guarantees
– High overhead, limited to polymorphic classes
– Conceptually check all casts dynamically – Aggressively optimize design and implementation
* TypeSanitizer: Practical Type Confusion Detection. Istvan Haller, Yuseok Jeon, Hui Peng, Mathias Payer, Herbert Bos, Cristiano Giuffrida, Erik van der Kouwe. In CCS'16 * HexType: Efficient Detection of Type Confusion Errors for C++. Yuseok Jeon, Priyam Biswas, Scott A. Carr, Byoungyoung Lee, and Mathias Payer. In CCS'17
22
– static_cast<ToClass>(Object) – dynamic_cast<ToClass>(Object) – reinterpret_cast<ToClass>(Object) – (ToClass)(Object)
– Must instrument all forms of allocation – Requires disjoint metadata
23
Instrumentation (Type casting verification) Source code LLVM Pass Clang Type Hierarchy Information HexType Runtime Library HexType Binary Link
24
– Remove tracing of types that are never cast
– Remove statically verifiable casts
– Replace dynamic casts with fast lookup
25
26
27
Apache Xerces C++ Qt base library
DOMNode DOM Character Data DOM Element DOM ElementImpl DOM Text DOM TextImpl
Type Confusion!
QMapNode Base QMapNode
28
– (Almost) full coverage (2-6x increase) – Reasonable overhead (SPEC CPU: 0-32x
improvement, Firefox: 0-0.5x slowdown)
– Future work: remaining coverage, optimizations
29
30
– Shallow coverage – Hard to find “deep” bugs
– Fuzzer-generated inputs
cannot bypass complex sanity checks in the target program
– Existing work limits itself
to input generation
start end
check1 check2 check3
bug Shallow code paths Shallow code paths Deep code paths Deep code paths
31
heavy-weight program analysis techniques
– Driller (concolic analysis) – VUzzer (dynamic taint analysis)
– Checks filter orthogonal input, e.g., magic values,
checksum, or hashes (Non-Critical Check, NCC)
– Insight: removing NCCs is safe
if (strncmp(hdr, “ELF", 3) == 0) { // main program logic } else { error(); }
32
– Detect NCCs*
*Approximation of NCCs: edged in the CFG connecting covered/uncovered nodes
Fuzzer (e.g. AFL) Program Transformer Crash Analyzer
Bug Reports
False Positjves
Crashing inputs Inputs Transformed Programs
33
connecting covered and uncovered nodes in CFG
– Over approximate, may
contain false positives
– Lightweight and simple to
implement
33
Covered Node Uncovered Node NCC Candidates
34
– Simple: static binary rewriting – Zero runtime overhead in
resulting target program
– Unchanged CFG – Trace in transformed program
maps to original program
– Path constraints of original
program can be recovered
34
start end
A == B
True branch False branch
start end
A != B
True branch False branch Negated Check
35
tracks constraints
number of code paths
limited scalability
... ... ... ... ...
( Path1, constraint set1) ( Pathn, constraint setn)
...
36
collects constraints for new code paths
requirements
... ... ... ... ... input
C1 Not C1
37
concolic execution explores new code paths using fuzzer generated inputs
– “SE & constraints solving” slows
down fuzzing
– Not able to bypass “hard” checks
Fuzzer Inputs
mutating
target program Crashes
SE & constraint solving
38
detected crashes
T-Fuzz detects the guarded bug, but cannot verify it
T-Fuzz Fuzzer program Crashes Program Transformation T-Fuzz in action SE & constraints solving
39
– Fuzzer: shellphish fuzzer (python wrapper of AFL) – Program Transformer: angr tracer, radare2 – Crash Analyzer: 2k LoC Python hackery
– DARPA CGC dataset – LAVA-M dataset – 4 real-world programs
40
55 (45%) / 61 (58%)
– 3 due to false crashes (L1) – 7 due to transformation
explosion (L2)
Method # bugs AFL 105 Driller 121 T-Fuzz 166 Driller - AFL 16 T-Fuzz - AFL 61 T-Fuzz - Driller 55 Driller - T-Fuzz 10 AFL (105) T-Fuzz (166) Driller (121) 10 6 55
41
in who, but still found more bugs than VUzzer
Program # of bugs VUzzer Steelix T-Fuzz base64 44 17 43 43 unique 28 27 24 26 md5sum 57 1 28 49 who 2136 50 194 95*
42
– T-Fuzz triggers more crashes than AFL – T-Fuzz found 3 new bugs in latest versions of
ImageMagick and libpoppler (marked by *)
Program + library AFL T-Fuzz pngfix + libpng (1.7.0) 11 tiffinfo + libtiff (3.8.2) 53 124 magick + ImageMagicK (7.0.7) 2* pdftohtml + libpoppler (0.62.0) 1*
43
– T-Fuzz mutates both input and target program – T-Fuzz improves over Driller/AFL by 45%/58% – T-Fuzz triggeres bugs guarded by “hard” checks – New bugs: 1 in LAVA-M, 3 in real-world programs
44
45
– HexType brings type safety to C++ – T-Fuzz explores deep program paths
Source: https://hexhive.github.io
46