compiler agnostic function detection in binaries
play

Compiler-Agnostic Function Detection in Binaries Dennis Andriesse , - PowerPoint PPT Presentation

Compiler-Agnostic Function Detection in Binaries Dennis Andriesse , Asia Slowinska, Herbert Bos Vrije Universiteit Amsterdam EuroS&P 2017 Introduction Disassembly in Systems Security Disassembly is the backbone of all


  1. Compiler-Agnostic Function Detection in Binaries Dennis Andriesse † , Asia Slowinska, Herbert Bos † † Vrije Universiteit Amsterdam EuroS&P 2017

  2. Introduction Disassembly in Systems Security Disassembly is the backbone of all binary-level systems security work (and more) • Control-Flow Integrity • Automatic Vulnerability/Bug Search • Lifting binaries to LLVM/IR (e.g., for reoptimization) • Malware Analysis • Binary Hardening • Binary Instrumentation • . . . Compiler-Agnostic Function Detection in Binaries 1 of 19

  3. Introduction Results from Previous Work Function detection currently the main disassembly challenge • Even function start detection yields many FPs/FNs (20% + ) • Complex cases: non-standard prologues, tailcalls, inlining, . . . • Binary analysis commonly requires function information gcc-5.1.1 x86 gcc-5.1.1 x64 clang-3.7.0 x86 clang-3.7.0 x64 Visual Studio '15 x86 Visual Studio '15 x64 100 80 % correct (geometric mean) 60 angr 4.6.1.4 BAP 0.9.9 ByteWeight 0.9.9 40 Dyninst 9.1.0 Hopper 3.11.5 IDA Pro 6.7 Jakstab 0.8.4 20 SPEC (C) SPEC (C++) 0 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 Figure: Correctly detected function start addresses Compiler-Agnostic Function Detection in Binaries 2 of 19

  4. Introduction Function Detection: False Negative Listing: False negative indirectly called function for IDA Pro 6.7 ( gcc compiled with gcc at O3 for x64 ELF) 6caf10 <ix86 fp compare mode>: 6caf10: mov 0x3f0dde(%rip),%eax 6caf16: and $0x10,%eax 6caf19: cmp $0x1,%eax 6caf1c: sbb %eax,%eax 6caf1e: add $0x3a,%eax 6caf21: retq Compiler-Agnostic Function Detection in Binaries 3 of 19

  5. Introduction Function Detection: False Positive Listing: False positive function (shaded) for Dyninst ( perlbench compiled with gcc at O3 for x64 ELF) 46b990 <Perl pp enterloop>: [...] 46ba02: ja 46bb50 <Perl pp enterloop+0x1c0> 46ba08: mov %rsi,%rdi 46ba0b: shl %cl,%rdi 46ba0e: mov %rdi,%rcx 46ba11: and $0x46,%ecx 46ba14: je 46bb50 <Perl pp enterloop+0x1c0> [...] 46bb47: pop %r12 46bb49: retq 46bb4a: nopw 0x0(%rax,%rax,1) 46bb50: sub $0x90,%rax Compiler-Agnostic Function Detection in Binaries 4 of 19

  6. Current Approaches Signature-Based Function Detection • Most current approaches scan for prologue/epilogue signatures • IDA Pro, Dyninst, ByteWeight (Bao et al. 2014), (Shin et al. 2015) • Error-prone: sigs may be missing/optimized away • Non-scalable: new sigs needed for every compiler version/platform • Even machine learning approaches need continuous retraining Compiler-Agnostic Function Detection in Binaries 5 of 19

  7. Overview of Our Approach Compiler-Agnostic Function Detection • We propose a signature-less approach based on structural analysis of the Control-Flow Graph (CFG) • Basic premise: Weakly Connected Components Analysis • Compiler-agnostic: no training/maintenance needed • Able to detect all basic blocks of a function • Inherent support for corner cases such as non-contiguous functions Compiler-Agnostic Function Detection in Binaries 6 of 19

  8. Overview of Our Approach 1 call � Disassemble binary and generate interprocedural CFG (linear 1 disassembly + switch/inline data detection) Compiler-Agnostic Function Detection in Binaries 7 of 19

  9. Overview of Our Approach 2 � Hide edges e ∈ E call 2 Compiler-Agnostic Function Detection in Binaries 8 of 19

  10. Overview of Our Approach 3 f 3 f 1 f 2 � Locate directly called entry points and expand functions by 3 following control flow (ignoring direction) Compiler-Agnostic Function Detection in Binaries 9 of 19

  11. Overview of Our Approach 4 f 3 f 4 f 1 f 2 � Find remaining functions using Connected Components Analysis, 4 analyze control-flow to find entry points Compiler-Agnostic Function Detection in Binaries 10 of 19

  12. Evaluation gcc-5.1.1 x86 gcc-5.1.1 x64 clang-3.7.0 x86 clang-3.7.0 x64 Visual Studio '15 x86 Visual Studio '15 x64 1.0 0.8 0.6 f-score 0.4 Nucleus Dyninst 9.1.0 BAP/ByteWeight 0.9.9 0.2 IDA Pro 6.7 C C++ 0.0 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 Function Start Detection • Overall average F-score of 0.96 for SPEC CPU 2006 (similar for servers) • Stable performance across compiler/platform/optimization level • Main improvement over others: higher recall (fewer FNs) Compiler-Agnostic Function Detection in Binaries 11 of 19

  13. Evaluation gcc-5.1.1 x86 gcc-5.1.1 x64 clang-3.7.0 x86 clang-3.7.0 x64 Visual Studio '15 x86 Visual Studio '15 x64 1.0 0.8 0.6 f-score 0.4 Nucleus Dyninst 9.1.0 BAP/ByteWeight 0.9.9 0.2 IDA Pro 6.7 C C++ 0.0 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 O0 O1 O2 O3 Function Boundary Detection • Overall average F-score of 0.90 for SPEC CPU 2006 • Even better for C-only server tests (average F-score 0.97) • Again, more stable than other approaches • Best alternative: IDA Pro, average F-score of 0.84 Compiler-Agnostic Function Detection in Binaries 12 of 19

  14. Evaluation More Results • In-depth analysis of results (including FPs/FNs) in paper • Most complex cases handled correctly (non-contiguous functions, multi-entry functions, . . . ) • Main problematic case: tail calls Compiler-Agnostic Function Detection in Binaries 13 of 19

  15. Evaluation 160 Nucleus 140 Dyninst 9.1.0 IDA Pro 6.7 120 BAP/ByteWeight 0.9.9 runtime (s) 100 80 60 40 20 0 1000 10000 100000 1x10 6 # instructions Runtime • On par with fastest alternatives Compiler-Agnostic Function Detection in Binaries 14 of 19

  16. Applicability to Malware Analysis Resistance to Obfuscation • Although this talk is in the Malware session, we do not explicitly target malware • That said, our approach is agnostic of some basic obfuscation approaches • Instruction-level polymorphism • Mangling of function prologues/epilogues • Some control flow obfuscations (e.g., converting direct calls to indirect, branching functions, . . . ) • But we make no promises for arbitrary obfuscations! Compiler-Agnostic Function Detection in Binaries 15 of 19

  17. Issues with Evaluation of Machine Learning Approaches Performance Discrepancies • During our evaluation, noticed far lower performance for ByteWeight than previously reported (Bao et al. 2014) • Mean F-score 0.32 points lower than expected • Observation persists for gcc (v4.7–v5.1), clang , and Visual Studio • Upon closer inspection, discovered issues with test suite used to evaluate all major machine learning-based function detection work (Bao et al. 2014 and Shin et al. 2015) Compiler-Agnostic Function Detection in Binaries 16 of 19

  18. Issues with Evaluation of Machine Learning Approaches Test Suite Issues • Both Bao et al. and Shin et al. use ten-fold cross-validation to evaluate their work • Partition test suite into training set ( B T , 90% of binaries) and evaluation set ( B E ) • Repeat ten times such that each binary is in B E exactly once • Crucial to ensure sufficient variation in test suite to prevent overfitting! Compiler-Agnostic Function Detection in Binaries 17 of 19

  19. Issues with Evaluation of Machine Learning Approaches Test Suite Issues • Linux test suite used by Bao et al. and Shin et al. consists of coreutils (106 binaries), binutils (16 binaries), and findutils (7 binaries) • Average coreutils binary shares 54% of its functions with all other coreutils binaries • Average coreutils binary shares 94% of its functions with at least one other coreutils binary • For the average coreutils binary in B E , at least 86% of its functions are expected to occur in B T • Large degree of overfitting in evaluation of machine learning approaches, re-evaluation needed Compiler-Agnostic Function Detection in Binaries 18 of 19

  20. Conclusion • We introduced a novel compiler-agnostic function detector • No maintenance/learning phase required • More accurate results than existing approaches • Inherent support for complex cases • Available open source: https://www.vusec.net/projects/function-detection/ • Features export to IDA Pro → easy to use in real-world setting Compiler-Agnostic Function Detection in Binaries 19 of 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend