Compiler-Agnostic Function Detection in Binaries Dennis Andriesse , - - PowerPoint PPT Presentation

compiler agnostic function detection in binaries
SMART_READER_LITE
LIVE PREVIEW

Compiler-Agnostic Function Detection in Binaries Dennis Andriesse , - - PowerPoint PPT Presentation

Compiler-Agnostic Function Detection in Binaries Dennis Andriesse , Asia Slowinska, Herbert Bos Vrije Universiteit Amsterdam EuroS&P 2017 Introduction Disassembly in Systems Security Disassembly is the backbone of all


slide-1
SLIDE 1

Compiler-Agnostic Function Detection in Binaries

Dennis Andriesse†, Asia Slowinska, Herbert Bos†

†Vrije Universiteit Amsterdam

EuroS&P 2017

slide-2
SLIDE 2

Introduction

Disassembly in Systems Security

Disassembly is the backbone of all binary-level systems security work (and more)

  • Control-Flow Integrity
  • Automatic Vulnerability/Bug Search
  • Lifting binaries to LLVM/IR (e.g., for reoptimization)
  • Malware Analysis
  • Binary Hardening
  • Binary Instrumentation
  • . . .

Compiler-Agnostic Function Detection in Binaries 1 of 19

slide-3
SLIDE 3

Introduction

Results from Previous Work

Function detection currently the main disassembly challenge

  • Even function start detection yields many FPs/FNs (20%+)
  • Complex cases: non-standard prologues, tailcalls, inlining, . . .
  • Binary analysis commonly requires function information

20 40 60 80 100 O0 O1 O2 O3

% correct (geometric mean) gcc-5.1.1 x86

angr 4.6.1.4 BAP 0.9.9 ByteWeight 0.9.9 Dyninst 9.1.0 Hopper 3.11.5 IDA Pro 6.7 Jakstab 0.8.4 SPEC (C) SPEC (C++) O0 O1 O2 O3

gcc-5.1.1 x64

O0 O1 O2 O3

clang-3.7.0 x86

O0 O1 O2 O3

clang-3.7.0 x64

O0 O1 O2 O3

Visual Studio '15 x86

O0 O1 O2 O3

Visual Studio '15 x64

Figure: Correctly detected function start addresses

Compiler-Agnostic Function Detection in Binaries 2 of 19

slide-4
SLIDE 4

Introduction

Function Detection: False Negative

Listing: False negative indirectly called function for IDA Pro 6.7 (gcc compiled with gcc at O3 for x64 ELF)

6caf10 <ix86 fp compare mode>: 6caf10: mov 0x3f0dde(%rip),%eax 6caf16: and $0x10,%eax 6caf19: cmp $0x1,%eax 6caf1c: sbb %eax,%eax 6caf1e: add $0x3a,%eax 6caf21: retq

Compiler-Agnostic Function Detection in Binaries 3 of 19

slide-5
SLIDE 5

Introduction

Function Detection: False Positive

Listing: False positive function (shaded) for Dyninst (perlbench compiled with gcc at O3 for x64 ELF)

46b990 <Perl pp enterloop>: [...] 46ba02: ja 46bb50 <Perl pp enterloop+0x1c0> 46ba08: mov %rsi,%rdi 46ba0b: shl %cl,%rdi 46ba0e: mov %rdi,%rcx 46ba11: and $0x46,%ecx 46ba14: je 46bb50 <Perl pp enterloop+0x1c0> [...] 46bb47: pop %r12 46bb49: retq 46bb4a: nopw 0x0(%rax,%rax,1) 46bb50: sub $0x90,%rax

Compiler-Agnostic Function Detection in Binaries 4 of 19

slide-6
SLIDE 6

Current Approaches

Signature-Based Function Detection

  • Most current approaches scan for prologue/epilogue signatures
  • IDA Pro, Dyninst, ByteWeight (Bao et al. 2014), (Shin et al.

2015)

  • Error-prone: sigs may be missing/optimized away
  • Non-scalable: new sigs needed for every compiler

version/platform

  • Even machine learning approaches need continuous retraining

Compiler-Agnostic Function Detection in Binaries 5 of 19

slide-7
SLIDE 7

Overview of Our Approach

Compiler-Agnostic Function Detection

  • We propose a signature-less approach based on structural

analysis of the Control-Flow Graph (CFG)

  • Basic premise: Weakly Connected Components Analysis
  • Compiler-agnostic: no training/maintenance needed
  • Able to detect all basic blocks of a function
  • Inherent support for corner cases such as non-contiguous

functions

Compiler-Agnostic Function Detection in Binaries 6 of 19

slide-8
SLIDE 8

Overview of Our Approach

call

1

1

Disassemble binary and generate interprocedural CFG (linear disassembly + switch/inline data detection)

Compiler-Agnostic Function Detection in Binaries 7 of 19

slide-9
SLIDE 9

Overview of Our Approach

2

2

Hide edges e ∈ Ecall

Compiler-Agnostic Function Detection in Binaries 8 of 19

slide-10
SLIDE 10

Overview of Our Approach

3 f1 f2 f3

3

Locate directly called entry points and expand functions by following control flow (ignoring direction)

Compiler-Agnostic Function Detection in Binaries 9 of 19

slide-11
SLIDE 11

Overview of Our Approach

4 f1 f2 f3 f4

4

Find remaining functions using Connected Components Analysis, analyze control-flow to find entry points

Compiler-Agnostic Function Detection in Binaries 10 of 19

slide-12
SLIDE 12

Evaluation

0.0 0.2 0.4 0.6 0.8 1.0 O0 O1 O2 O3

f-score gcc-5.1.1 x86

Nucleus Dyninst 9.1.0 BAP/ByteWeight 0.9.9 IDA Pro 6.7 C C++ O0 O1 O2 O3

gcc-5.1.1 x64

O0 O1 O2 O3

clang-3.7.0 x86

O0 O1 O2 O3

clang-3.7.0 x64

O0 O1 O2 O3

Visual Studio '15 x86

O0 O1 O2 O3

Visual Studio '15 x64

Function Start Detection

  • Overall average F-score of 0.96 for SPEC CPU 2006 (similar for

servers)

  • Stable performance across compiler/platform/optimization level
  • Main improvement over others: higher recall (fewer FNs)

Compiler-Agnostic Function Detection in Binaries 11 of 19

slide-13
SLIDE 13

Evaluation

0.0 0.2 0.4 0.6 0.8 1.0 O0 O1 O2 O3

f-score gcc-5.1.1 x86

Nucleus Dyninst 9.1.0 BAP/ByteWeight 0.9.9 IDA Pro 6.7 C C++ O0 O1 O2 O3

gcc-5.1.1 x64

O0 O1 O2 O3

clang-3.7.0 x86

O0 O1 O2 O3

clang-3.7.0 x64

O0 O1 O2 O3

Visual Studio '15 x86

O0 O1 O2 O3

Visual Studio '15 x64

Function Boundary Detection

  • Overall average F-score of 0.90 for SPEC CPU 2006
  • Even better for C-only server tests (average F-score 0.97)
  • Again, more stable than other approaches
  • Best alternative: IDA Pro, average F-score of 0.84

Compiler-Agnostic Function Detection in Binaries 12 of 19

slide-14
SLIDE 14

Evaluation

More Results

  • In-depth analysis of results (including FPs/FNs) in paper
  • Most complex cases handled correctly (non-contiguous functions,

multi-entry functions, . . . )

  • Main problematic case: tail calls

Compiler-Agnostic Function Detection in Binaries 13 of 19

slide-15
SLIDE 15

Evaluation

20 40 60 80 100 120 140 160 1000 10000 100000 1x106

runtime (s) # instructions

Nucleus Dyninst 9.1.0 IDA Pro 6.7 BAP/ByteWeight 0.9.9

Runtime

  • On par with fastest alternatives

Compiler-Agnostic Function Detection in Binaries 14 of 19

slide-16
SLIDE 16

Applicability to Malware Analysis

Resistance to Obfuscation

  • Although this talk is in the Malware session, we do not explicitly

target malware

  • That said, our approach is agnostic of some basic obfuscation

approaches

  • Instruction-level polymorphism
  • Mangling of function prologues/epilogues
  • Some control flow obfuscations (e.g., converting direct calls to

indirect, branching functions, . . . )

  • But we make no promises for arbitrary obfuscations!

Compiler-Agnostic Function Detection in Binaries 15 of 19

slide-17
SLIDE 17

Issues with Evaluation of Machine Learning Approaches

Performance Discrepancies

  • During our evaluation, noticed far lower performance for

ByteWeight than previously reported (Bao et al. 2014)

  • Mean F-score 0.32 points lower than expected
  • Observation persists for gcc (v4.7–v5.1), clang, and Visual

Studio

  • Upon closer inspection, discovered issues with test suite used to

evaluate all major machine learning-based function detection work (Bao et al. 2014 and Shin et al. 2015)

Compiler-Agnostic Function Detection in Binaries 16 of 19

slide-18
SLIDE 18

Issues with Evaluation of Machine Learning Approaches

Test Suite Issues

  • Both Bao et al. and Shin et al. use ten-fold cross-validation to

evaluate their work

  • Partition test suite into training set (BT , 90% of binaries) and

evaluation set (BE)

  • Repeat ten times such that each binary is in BE exactly once
  • Crucial to ensure sufficient variation in test suite to prevent
  • verfitting!

Compiler-Agnostic Function Detection in Binaries 17 of 19

slide-19
SLIDE 19

Issues with Evaluation of Machine Learning Approaches

Test Suite Issues

  • Linux test suite used by Bao et al. and Shin et al. consists of

coreutils (106 binaries), binutils (16 binaries), and findutils (7 binaries)

  • Average coreutils binary shares 54% of its functions with all
  • ther coreutils binaries
  • Average coreutils binary shares 94% of its functions with at

least one other coreutils binary

  • For the average coreutils binary in BE, at least 86% of its

functions are expected to occur in BT

  • Large degree of overfitting in evaluation of machine

learning approaches, re-evaluation needed

Compiler-Agnostic Function Detection in Binaries 18 of 19

slide-20
SLIDE 20

Conclusion

  • We introduced a novel compiler-agnostic function detector
  • No maintenance/learning phase required
  • More accurate results than existing approaches
  • Inherent support for complex cases
  • Available open source:

https://www.vusec.net/projects/function-detection/

  • Features export to IDA Pro → easy to use in real-world setting

Compiler-Agnostic Function Detection in Binaries 19 of 19