Finding library subroutines in stripped statically-linked binaries - - PowerPoint PPT Presentation

finding library subroutines in stripped statically linked
SMART_READER_LITE
LIVE PREVIEW

Finding library subroutines in stripped statically-linked binaries - - PowerPoint PPT Presentation

Finding library subroutines in stripped statically-linked binaries findmagic Katharina Bogad Technische Universitt Mnchen Computer Science Department SS 2015 January 18, 2017 K. Bogad findmagic SS 2015 January 18, 2017 1 / 39


slide-1
SLIDE 1

Finding library subroutines in stripped statically-linked binaries

findmagic Katharina Bogad

Technische Universität München

Computer Science Department

SS 2015 January 18, 2017

  • K. Bogad

findmagic SS 2015 January 18, 2017 1 / 39

slide-2
SLIDE 2
  • bligatory tl;dr me slide

y

▸ Computer Science student ▸ Member of the H4x0rPsch0rr CTF-Team and CTF-Player for fun

(and sometimes profit)

▸ Interested in reverse engineering for long time ▸ Hates QR-Codes

  • K. Bogad

findmagic SS 2015 January 18, 2017 2 / 39

slide-3
SLIDE 3

Preliminary audience questions

y

Who of you has...

▸ basic knowledge of graph theory?

  • K. Bogad

findmagic SS 2015 January 18, 2017 3 / 39

slide-4
SLIDE 4

Preliminary audience questions

y

Who of you has...

▸ basic knowledge of graph theory? ▸ reverse engineered a statically linked binary at least once?

  • K. Bogad

findmagic SS 2015 January 18, 2017 3 / 39

slide-5
SLIDE 5

Problem description

Why?y ▸ Traditional pattern-matching: exact library needed for decent

results

▸ Works reasonably well in homogenous environments like

MSVCRT

▸ Open source libraries? ▸ Embedded devices?

  • K. Bogad

findmagic SS 2015 January 18, 2017 4 / 39

slide-6
SLIDE 6

Problem description

Why?y

So, what are we doing if we cannot have symbols?

▸ Looking at the arguments? ▸ Looking at suspicious constants?

Think of 0x8080808080 for strlen(3) Let’s automate this!

  • K. Bogad

findmagic SS 2015 January 18, 2017 5 / 39

slide-7
SLIDE 7

Problem description

Why?y

However, there are caveats:

▸ Finding arguments is not a trivial task. ▸ What makes a constant suspicious?

  • K. Bogad

findmagic SS 2015 January 18, 2017 6 / 39

slide-8
SLIDE 8

Problem description

Why?y

However, there are caveats:

▸ Finding arguments is not a trivial task. ▸ What makes a constant suspicious?

But automating gives new perspectives: Comparing callgraphs!

  • K. Bogad

findmagic SS 2015 January 18, 2017 6 / 39

slide-9
SLIDE 9

Algorithm Design

Graph definitiony ▸ Program is a set of attributed graphs G = (N,B) ▸ Nodes N are functions ▸ Branches B are calls between functions

  • K. Bogad

findmagic SS 2015 January 18, 2017 7 / 39

slide-10
SLIDE 10

Algorithm Design

Definitions for later usey

We need:

▸ A string definition

(∀(i,c) ∈ str ∶ c ≥ 0x20 ∧ c ≤ 0xDF ∨ c = 0x0A ∨ c = 0x0D ∨ c = 0x09 ∨ c = 0x00) ∧ ∣str∣ > 1 (1) ∧ (∀(i,c) ∈ str∣i = max(i,str) ∶ c = 0x00) ∧ (∀(i,c) ∈ str∣i ≠ max(i,str) ∶ c ≠ 0x00)

  • K. Bogad

findmagic SS 2015 January 18, 2017 8 / 39

slide-11
SLIDE 11

Algorithm Design

Definitions for later usey

We need:

▸ A string definition

(∀(i,c) ∈ str ∶ c ≥ 0x20 ∧ c ≤ 0xDF ∨ c = 0x0A ∨ c = 0x0D ∨ c = 0x09 ∨ c = 0x00) ∧ ∣str∣ > 1 (1) ∧ (∀(i,c) ∈ str∣i = max(i,str) ∶ c = 0x00) ∧ (∀(i,c) ∈ str∣i ≠ max(i,str) ∶ c ≠ 0x00) Printable characters from extended ASCII ...

  • K. Bogad

findmagic SS 2015 January 18, 2017 8 / 39

slide-12
SLIDE 12

Algorithm Design

Definitions for later usey

We need:

▸ A string definition

(∀(i,c) ∈ str ∶ c ≥ 0x20 ∧ c ≤ 0xDF ∨ c = 0x0A ∨ c = 0x0D ∨ c = 0x09 ∨ c = 0x00) ∧ ∣str∣ > 1 (1) ∧ (∀(i,c) ∈ str∣i = max(i,str) ∶ c = 0x00) ∧ (∀(i,c) ∈ str∣i ≠ max(i,str) ∶ c ≠ 0x00) ... and \n, \r, \t and 0x00 ...

  • K. Bogad

findmagic SS 2015 January 18, 2017 8 / 39

slide-13
SLIDE 13

Algorithm Design

Definitions for later usey

We need:

▸ A string definition

(∀(i,c) ∈ str ∶ c ≥ 0x20 ∧ c ≤ 0xDF ∨ c = 0x0A ∨ c = 0x0D ∨ c = 0x09 ∨ c = 0x00) ∧ ∣str∣ > 1 (1) ∧ (∀(i,c) ∈ str∣i = max(i,str) ∶ c = 0x00) ∧ (∀(i,c) ∈ str∣i ≠ max(i,str) ∶ c ≠ 0x00) ... with a minimum length of 2 ...

  • K. Bogad

findmagic SS 2015 January 18, 2017 8 / 39

slide-14
SLIDE 14

Algorithm Design

Definitions for later usey

We need:

▸ A string definition

(∀(i,c) ∈ str ∶ c ≥ 0x20 ∧ c ≤ 0xDF ∨ c = 0x0A ∨ c = 0x0D ∨ c = 0x09 ∨ c = 0x00) ∧ ∣str∣ > 1 (1) ∧ (∀(i,c) ∈ str∣i = max(i,str) ∶ c = 0x00) ∧ (∀(i,c) ∈ str∣i ≠ max(i,str) ∶ c ≠ 0x00) ... where the last character is 0x00 and no other character is 0x00.

  • K. Bogad

findmagic SS 2015 January 18, 2017 8 / 39

slide-15
SLIDE 15

Algorithm Design

Definitions for later usey

We need:

▸ A node definition

N = (n,s,C,S,I)

▸ n: Function name ▸ s: Function address ▸ C: Multiset of constant values ▸ S: Multiset of cross-referenced strings ▸ I: Ordered multiset of the machine instructions

  • K. Bogad

findmagic SS 2015 January 18, 2017 9 / 39

slide-16
SLIDE 16

Algorithm Design

Get crackin’y

Objective: Generate a bijective mapping M = N1 → N2

▸ N1: known library function ▸ N2: function inside the target library

  • K. Bogad

findmagic SS 2015 January 18, 2017 10 / 39

slide-17
SLIDE 17

Algorithm Design

Get crackin’y 1 Acquire target library with debug symbols

  • K. Bogad

findmagic SS 2015 January 18, 2017 11 / 39

slide-18
SLIDE 18

Algorithm Design

Get crackin’y 1 Acquire target library with debug symbols 2 Build the graphs for it

  • K. Bogad

findmagic SS 2015 January 18, 2017 11 / 39

slide-19
SLIDE 19

Algorithm Design

Get crackin’y 1 Acquire target library with debug symbols 2 Build the graphs for it 3 Build graphs for the binary we analyse

  • K. Bogad

findmagic SS 2015 January 18, 2017 11 / 39

slide-20
SLIDE 20

Algorithm Design

Get crackin’y 1 Acquire target library with debug symbols 2 Build the graphs for it 3 Build graphs for the binary we analyse 4 Match them

  • K. Bogad

findmagic SS 2015 January 18, 2017 11 / 39

slide-21
SLIDE 21

Algorithm Design

Get crackin’y

Do we need exactly the same binary used for linking?

▸ Short answer: no.

  • K. Bogad

findmagic SS 2015 January 18, 2017 12 / 39

slide-22
SLIDE 22

Algorithm Design

Get crackin’y

Do we need exactly the same binary used for linking?

▸ Short answer: no. ▸ Long answer: it depends.

  • K. Bogad

findmagic SS 2015 January 18, 2017 12 / 39

slide-23
SLIDE 23

Algorithm Design

Get crackin’y ▸ A reasonably close version is enough ▸ Watch out for compiler flags ▸ Also problematic: assert()

  • K. Bogad

findmagic SS 2015 January 18, 2017 13 / 39

slide-24
SLIDE 24

Algorithm Design

Why assert() is evily

Caution: real world example

2391 assert((unsigned long) (old_size) < (unsigned long) (nb + MINSIZE));

with relocation:

1 (unsigned long) (old_size) < (unsigned long) ( 2 nb + (unsigned long)( 3 (((__builtin_offsetof (struct malloc_chunk, fd_nextsize)) + 4 ( 5 (2 * (sizeof(size_t))) - 1 6 )) 7 & ~( 8 (2 * (sizeof(size_t))) - 1 9 ))))

No code, but debug strings vary! without relocation:

1 (unsigned long) (old_size) < (unsigned long) ( 2 nb + (unsigned long)( 3 (((__builtin_offsetof(struct malloc_chunk, fd_nextsize)) + 4 ((2 * (sizeof(size_t)) < __alignof__ (long double) ? 5 __alignof__ (long double) : 6 2 * (sizeof(size_t)) 7 ) - 1)) 8 & ~( 9 (2 * (sizeof(size_t)) < __alignof__ (long double) ? 10 __alignof__ (long double) : 11 2 * (sizeof(size_t)) 12 ) - 1 13 ))))

  • K. Bogad

findmagic SS 2015 January 18, 2017 14 / 39

slide-25
SLIDE 25

Automatic binary analysis

Overviewy 1 Iterate over subroutines 2 Iterate over the instructions of these subroutines 3 If something interesting is found, add it to the corresponding list1

1See the paper for a marvellous formal definitions for this

  • K. Bogad

findmagic SS 2015 January 18, 2017 15 / 39

slide-26
SLIDE 26

Automatic binary analysis

call analysisy

▸ call instructions add a new branch to the functions callgraph ▸ Additionally for Intel x86_64 architecture: ▸ Only if it’s a near call - opcode 0xE8 ▸ This ensures we’re in the same section ▸ Other architectures may need different conditions!

  • K. Bogad

findmagic SS 2015 January 18, 2017 16 / 39

slide-27
SLIDE 27

Automatic binary analysis

Stringsy ▸ Look for something that loads a pointer (x86_64: lea, mov) ▸ Check if it’s a string by our definition ▸ If so, add it to the Strings of the current function

  • K. Bogad

findmagic SS 2015 January 18, 2017 17 / 39

slide-28
SLIDE 28

Automatic binary analysis

Constantsy ▸ We don’t want to add pointer arithmetic as constants ▸ Interesting constants are often bitmasks ▸ Thus, we limit ourselves to the immediates of and, or, xor and mov ▸ Optionally, we may exclude further by doing value checking on

the constant

  • K. Bogad

findmagic SS 2015 January 18, 2017 18 / 39

slide-29
SLIDE 29

Automatic binary analysis

Matchingy

Isomorphism:

▸ Ancient greek: isos = equal and morphe = shape ▸ Mathematical way to compare the structure of objects

  • K. Bogad

findmagic SS 2015 January 18, 2017 19 / 39

slide-30
SLIDE 30

Automatic binary analysis

Matchingy

Choosing the right algorithm:

▸ Ullmann’s algorithm ▸ Nauty (no automporphism, yes?) ▸ VF2

  • K. Bogad

findmagic SS 2015 January 18, 2017 20 / 39

slide-31
SLIDE 31

Automatic binary analysis

Matchingy

Choosing the right algorithm:

▸ Ullmann’s algorithm ▸ Nauty (no automporphism, yes?) ▸ VF2

  • K. Bogad

findmagic SS 2015 January 18, 2017 20 / 39

slide-32
SLIDE 32

Automatic binary analysis

Matchingy

Choosing the right algorithm:

▸ Ullmann’s algorithm ▸ Nauty (no automporphism, yes?) ▸ VF2

  • K. Bogad

findmagic SS 2015 January 18, 2017 20 / 39

slide-33
SLIDE 33

Automatic binary analysis

Matchingy

Choosing the right algorithm:

▸ Ullmann’s algorithm ▸ Nauty (no automporphism, yes?) ▸ VF2

  • K. Bogad

findmagic SS 2015 January 18, 2017 20 / 39

slide-34
SLIDE 34

Automatic binary analysis

Matchingy

Choosing the right algorithm:

▸ Callgraphs cannot be considered randomly connected ▸ Some functions imply calls to other functions ▸ malloc() & free(), accept() & close(), ... ▸ VF2 is very fast in this situation ▸ Also, VF2 can check semantic equality of the nodes in the same

step

  • K. Bogad

findmagic SS 2015 January 18, 2017 21 / 39

slide-35
SLIDE 35

VF2

y

▸ G1 = (N1,B1), G2 = (N2,B2) ▸ Mapping M ⊂ N1 × N2 ▸ M must be a bijective function ▸ M must not alter the branch structure

  • K. Bogad

findmagic SS 2015 January 18, 2017 22 / 39

slide-36
SLIDE 36

VF2

y

▸ State Space Representation (SSR) s ▸ Essentially a set of tuples (n1,n2) ▸ M(s) denotes a partial mapping ▸ Two subgraphs G1(s) and G2(s) can be derived, containing only

the nodes in the matching and the branches connecting them

▸ Same for M1(s), M2(s), B1(s), B2(s)

  • K. Bogad

findmagic SS 2015 January 18, 2017 23 / 39

slide-37
SLIDE 37

VF2

y

▸ Transition from state s to s′: s′ = s ∪ {(n,m)} ▸ But: only a small set of these states are consistent ▸ We introduce k-lookahead rules to conclude wether a consistent

state can be reached after k steps

▸ These rules will be called feasibility rules

  • K. Bogad

findmagic SS 2015 January 18, 2017 24 / 39

slide-38
SLIDE 38

VF2

Feasibility rulesy

Feasibility function: F(s,n,m) = Fsyn(s,n,m) ∧ Fsem(s,n,m)

▸ Fsyn → syntactic feasibility ▸ Fsem → semantic feasibility

  • K. Bogad

findmagic SS 2015 January 18, 2017 25 / 39

slide-39
SLIDE 39

VF2

Feasibility rulesy ▸ Initial state is empty, i.e. M(s0) = ∅ ▸ In each step, compute P(S), the node pairs of candidates to be

added

▸ Tin n → nodes with branches ending into Gn(s) ▸ Tout n

→ nodes with branches starting from Gn(s)

▸ P(s) = {(n,m)∣n ∈ Tout 1 ,m ∈ Tout 2 } if no Tout n

is empty, Tin

n otherwise ▸ If P(s) is still empty, backtrack until a state s is reached with P(s)

containing not examined node pairs

  • K. Bogad

findmagic SS 2015 January 18, 2017 26 / 39

slide-40
SLIDE 40

VF2

Feasibility rulesy

Check predecessors of current node: Rpred(s,n,m) ⇐ ⇒ (∀n′ ∈ M1(s) ∩ Pred(G1,n)∃m′ ∈ Pred(G2,m)∣(n′,m′) ∈ M(s))∧ (∀m′ ∈ M2(s) ∩ Pred(G2,m)∃n′ ∈ Pred(G1,n)∣(n′,m′) ∈ M(s))

  • K. Bogad

findmagic SS 2015 January 18, 2017 27 / 39

slide-41
SLIDE 41

VF2

Feasibility rulesy

Check successors of current node: Rsucc(s,n,m) ⇐ ⇒ (∀n′ ∈ M1(s) ∩ Succ(G1,n)∃m′ ∈ Succ(G2,m)∣(n′,m′) ∈ M(s))∧ (∀m′ ∈ M2(s) ∩ Succ(G2,m)∃n′ ∈ Succ(G1,n)∣(n′,m′) ∈ M(s))

  • K. Bogad

findmagic SS 2015 January 18, 2017 28 / 39

slide-42
SLIDE 42

VF2

Feasibility rulesy

1-lookahead: Rin(s,n,m) ⇐ ⇒ (∣Succ(G1,n) ∩ Tin

1 (s)∣ = ∣Succ(G2,m) ∩ Tin 2 (s)∣)∧

(∣Pred(G1,n) ∩ Tin

1 (s)∣ = ∣Pred(G2,m) ∩ Tin 2 (s)∣)

Rout(s,n,m) ⇐ ⇒ (∣Succ(G1,n) ∩ Tout

1 (s)∣ = ∣Succ(G2,m) ∩ Tout 2 (s)∣)∧

(∣Pred(G1,n) ∩ Tout

1 (s)∣ = ∣Pred(G2,m) ∩ Tout 2 (s)∣)

  • K. Bogad

findmagic SS 2015 January 18, 2017 29 / 39

slide-43
SLIDE 43

VF2

Feasibility rulesy

2-lookahead: Rnew(s,n,m) ⇐ ⇒ (∣̃ N1(s) ∩ Pred(G1,n)∣ = ∣̃ N2(s) ∩ Pred(G2,m)∣)∧ (∣̃ N1(s) ∩ Succ(G1,n)∣ = ∣̃ N2(s) ∩ Succ(G2,m)∣)

  • K. Bogad

findmagic SS 2015 January 18, 2017 30 / 39

slide-44
SLIDE 44

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-45
SLIDE 45

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-46
SLIDE 46

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-47
SLIDE 47

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-48
SLIDE 48

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-49
SLIDE 49

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-50
SLIDE 50

VF2

Feasibility rulesy

Example:

__regfree __free_dfa_content free_state .free free_token sub_deadbeef sub_00c0ffee sub_b00bbabe sub_1234abcd sub_13371337

  • K. Bogad

findmagic SS 2015 January 18, 2017 31 / 39

slide-51
SLIDE 51

VF2

Feasibility rulesy

Semantic feasibility: We define a compatibility relation ≈ n ≈ m ⇐ ⇒ (∀c ∈ Cn ∃ c′ ∈ Cm∣c = c′)∧ (∀c ∈ Cm ∃ c′ ∈ Cn∣c = c′)∧ (∀s ∈ Sn ∃ s′ ∈ Sm∣s = s′)∧ (∀s ∈ Sm ∃ s′ ∈ Sn∣s = s′)

  • K. Bogad

findmagic SS 2015 January 18, 2017 32 / 39

slide-52
SLIDE 52

VF2

Feasibility rulesy

This yields the final rule: Fsem(s,n,m) ⇐ ⇒ n ≈ m ∧ ∀(n′,m′) ∈ M(s),(n,n′) ∈ B1 ⇒ (n,n′) ≈ (m,m′) ∧ ∀(n′,m′) ∈ M(s),(n′,n) ∈ B1 ⇒ (n′,n) ≈ (m′,m)

  • K. Bogad

findmagic SS 2015 January 18, 2017 33 / 39

slide-53
SLIDE 53

Matching

y

▸ Matching is done in brute-force manner ▸ Multiple sets:

▸ Functions that can be exactly identified ▸ Functions that have multiple, possible matches ▸ Functions that cannot be found via matching (no strings, no

constants and no function calls - IO_default_sync)

  • K. Bogad

findmagic SS 2015 January 18, 2017 34 / 39

slide-54
SLIDE 54

Evaluation

Implementationy ▸ Test implementation was created ▸ Free as in Speech (GPLv3 or Later) ▸ Grab it from github:

https://github.com/masterofjellyfish/findmagic

▸ Disclaimer: You need .NET Framework or Mono ▸ Supports only x86_64 for now ▸ Major code cleanup and more architectures (ARM, MIPS) are

planned

  • K. Bogad

findmagic SS 2015 January 18, 2017 35 / 39

slide-55
SLIDE 55

Evaluation

Resultsy

Exact matches: find- FLIRT definitions linked against magic (IDA)

glibc 2.21

(arch linux)

glibc 2.21

(arch linux) 376 233

glibc 2.21

(arch linux)

glibc 2.13

(debian- wheezy) 105 72

  • K. Bogad

findmagic SS 2015 January 18, 2017 36 / 39

slide-56
SLIDE 56

Evaluation

Resultsy ▸ Algorithm can also provide hints ▸ For example: strcpy_sse2, strcpy_sse3 ▸ Same constants, same callgraph ▸ Indistinguishable by the algorithm, but they do the same job ▸ Helpful for manual reversing!

  • K. Bogad

findmagic SS 2015 January 18, 2017 37 / 39

slide-57
SLIDE 57

Evaluation

Known Limitationsy

Recovery fails if multiple matching possibilities and function is not part of unique call graph. Example:

1 .CapstoneX86Detail 2 push rbp 3 mov rbp, rsp 4 mov rax, rdi 5 add rax, 0x30 6 leave 7 retn

Pseudocode:

1 cs_x86* CapstoneX86Detail(cs_detail *detail) { 2 return &detail->x86; 3 }

  • K. Bogad

findmagic SS 2015 January 18, 2017 38 / 39

slide-58
SLIDE 58

Questions?

y

Thanks!

  • K. Bogad

findmagic SS 2015 January 18, 2017 39 / 39