Be a Binary Rockst r An Introduction to Program Analysis with - - PowerPoint PPT Presentation

be a binary rockst r
SMART_READER_LITE
LIVE PREVIEW

Be a Binary Rockst r An Introduction to Program Analysis with - - PowerPoint PPT Presentation

Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja Agenda Motivation Current State of Program Analysis Design Goals of Binja Program Analysis Building Tools 2 Motivation 3 Tooling - Concrete ->


slide-1
SLIDE 1

Be a Binary Rockst r

An Introduction to Program Analysis with Binary Ninja

slide-2
SLIDE 2

Agenda

  • Motivation
  • Current State of Program Analysis
  • Design Goals of Binja Program

Analysis

  • Building Tools

2

slide-3
SLIDE 3

Motivation

3

slide-4
SLIDE 4
  • Tooling - Concrete -> Symbolic

○ Increase speed & effectiveness of RE / VR

  • Make Program Analysis more accessible &

useful

4

slide-5
SLIDE 5

Foundations

  • Need to understand code semantics
  • Could be done directly on the assembly
  • An Intermediate Language (IL) is needed

5

slide-6
SLIDE 6

Why IL?

  • Architecture Abstraction
  • Smaller number of instructions

6

slide-7
SLIDE 7

Easy to lift

  • Simple flags calculation
  • As close to native instructions as possible
  • Typeless - types inferred later

7

slide-8
SLIDE 8

Easy to read

  • Intuitive to read
  • Tree-based infix notation
  • No register abstraction
  • Flags calculation only when necessary
  • Avoid excessive temporaries

8

slide-9
SLIDE 9

IL Instruction Set Size

Instruction Set Size

Easier to analyze Easier to lift

9

slide-10
SLIDE 10

The Options

10

slide-11
SLIDE 11

Existing Options for IL

  • BAP
  • VEX
  • REIL
  • LLVM
  • IDA

11

slide-12
SLIDE 12

BAP

  • Tree-tree based :)
  • Flags are explicit and inhibit readability :(
  • Written in OCAML :(

12

slide-13
SLIDE 13

add ebx, eax shl ebx, cl addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32 REBX:u32 = REBX:u32 + REAX:u32 RCF:bool = REBX:u32 < t:u32 addr 0x2 @asm ”shl %cl,%ebx” t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 & 0x1f:u32) RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) & RCF:bool | ̃((RECX:u32 & 0x1f:u32) = 0:u32) & low:bool(t1:u32)

13

slide-14
SLIDE 14

VEX

  • Register names are abstracted :(
  • Single assignment :(
  • Over 1000 instructions! :(
  • Yet they call it “RISC-like”
  • Even Angr is planning a move away from it

14

slide-15
SLIDE 15

t0 = GET:I32(16) t1 = 0x8:I32 t3 = Sub32(t0,t1) PUT(16) = t3 PUT(68) = 0x59FC8:I32 subs R2, R2, #8

15

slide-16
SLIDE 16

REIL

  • Tiny instruction set
  • Horrible readability
  • Makes abstractions nearly impossible
  • Flags are explicit and inhibit readability :(

16

slide-17
SLIDE 17

00000000.00 STR R_EAX:32, , V_00:32 00000000.01 STR 0:1, , R_CF:1 00000000.02 AND V_00:32, ff:8, V_01:8 00000000.03 SHR V_01:8, 7:8, V_02:8 00000000.04 SHR V_01:8, 6:8, V_03:8 00000000.05 XOR V_02:8, V_03:8, V_04:8 00000000.06 SHR V_01:8, 5:8, V_05:8 00000000.07 SHR V_01:8, 4:8, V_06:8 00000000.08 XOR V_05:8, V_06:8, V_07:8 00000000.09 XOR V_04:8, V_07:8, V_08:8 00000000.0a SHR V_01:8, 3:8, V_09:8 00000000.0b SHR V_01:8, 2:8, V_10:8 00000000.0c XOR V_09:8, V_10:8, V_11:8 00000000.0d SHR V_01:8, 1:8, V_12:8 00000000.0e XOR V_12:8, V_01:8, V_13:8 00000000.0f XOR V_11:8, V_13:8, V_14:8 00000000.10 XOR V_08:8, V_14:8, V_15:8 00000000.11 AND V_15:8, 1:1, V_16:1 00000000.12 NOT V_16:1, , R_PF:1 00000000.13 STR 0:1, , R_AF:1 00000000.14 EQ V_00:32, 0:32, R_ZF:1 00000000.15 SHR V_00:32, 1f:32, V_17:32 00000000.16 AND 1:32, V_17:32, V_18:32 00000000.17 EQ 1:32, V_18:32, R_SF:1 00000000.18 STR 0:1, , R_OF:1

test eax, eax

17

slide-18
SLIDE 18

LLVM

  • Easy to analyze and has great tools already

available

  • It’s a compiler!

○ Reversers want a decompiler. ○ Cannot be the only goal

18

slide-19
SLIDE 19

LLVM Challenges

  • Hard to lift well from compiled binaries

○ Designed for compiler output

  • Expects type information in the instructions
  • SSA form - assembly is not
  • Stack in assembly looks like a structure, but

structures lose many advantages of SSA

19

slide-20
SLIDE 20

IDA

?

20

slide-21
SLIDE 21

Binary Ninja’s Answer

  • Binary Ninja Intermediate Language (BNIL)

21

slide-22
SLIDE 22

IL Goals & Design

22

slide-23
SLIDE 23

Why Another IL?

  • Popular existing ILs for compiled binaries are not very

human readable. They are extremely low level and verbose.

  • Existing ILs are single stage. Heavyweight analysis must

be performed to get anywhere close to decompiled output.

  • Writing a lifter for a new architecture is usually very time

consuming.

23

slide-24
SLIDE 24

Binary Ninja IL

  • Create a family of ILs with multiple stages of analysis
  • Lowest level is close to assembly
  • After analysis and transformations, higher levels are

closer to decompiled output and would be much easier to translate to good LLVM code

  • Analysis involved in each transformation is easy to

understand, fast, and directly aids further analysis

24

slide-25
SLIDE 25

IL Design Goals

  • Human readable
  • Computer understandable (SSA, 3AF, etc.)
  • Plugin understandable
  • Easy to lift native architectures
  • Translation to other ILs such as LLVM

25

slide-26
SLIDE 26

Human Readable

  • Reads like pseudocode, even in lowest level

form

  • Flags are resolved into readable expressions

26

slide-27
SLIDE 27

Low Level IL Example

lea rax, [0x201047] lea rdi, [0x201040] push rbp sub rax, rdi mov rbp, rsp cmp rax, 0xe ja 0x68d rax = 0x201047 rdi = 0x201040 push(rbp) rax = rax - rdi rbp = rsp if (rax u> 0xe) then 6 @ 0x68d else 8 @ 0x68b

x86-64 Assembly Low Level IL

27

slide-28
SLIDE 28

Low Level IL Example

addiu $sp, $sp, -0x18 sw $ra, 0x14($sp) lw $a0, ($a1) jal atoi nop sltiu $at, $v0, 0x20 beqz $at, 0x4002d8 nop $sp = $sp - 0x18 [$sp + 0x14].d = $ra $a0 = [$a1].d call(atoi) $at = $v0 u< 0x20 ? 1 : 0 if ($at == 0) then 7 @ 0x4002d8 else 12 @ 0x400290

MIPS Assembly Low Level IL

28

slide-29
SLIDE 29

Computer Understandable

  • Multiple IL forms
  • Pick the right IL for the task at hand

29

slide-30
SLIDE 30

IL Forms

Lifted IL

ASM -> IL

Low Level IL

Flags use resolved

Medium Level IL

Stack usage resolved Type propagation

SSA / 3AF SSA / 3AF

High Level IL

Calls in high level form Expression folding Like decompiled output

30

slide-31
SLIDE 31

Plugin Understandable

  • All IL forms directly accessible from API
  • Analysis performed on IL also accessible by

API

31

slide-32
SLIDE 32

Easy to Lift

  • Expression tree
  • Designed for quick, modular lifter

implementations

  • Semantic flags eases the burden of describing

flag effects during lifting

32

slide-33
SLIDE 33

Semantic Flags

  • Architecture plugins define the set of flags and

their semantic roles

  • Instructions can define a set of flags they write
  • Data flow analysis is performed to link flag

uses to flag writes

33

slide-34
SLIDE 34

Semantic Flags

  • In most compiled code, flags are resolved to

simple comparison expressions with no effort from the architecture plugin

  • Special cases fall back to emitting concrete

flag write expressions

34

slide-35
SLIDE 35

Semantic Flags Example

sub.q{*}(rax, 0xe) if (u>) then … else … if (rax u> 0xe) then … else … “Writes to all ALU flags” “Flag state representing unsigned greater than” Folded expression describing use of flags

35

slide-36
SLIDE 36

Translating Upwards

  • Semantic flags analysis gives Low Level IL with flag

usage fully resolved

  • Stack is represented as memory accesses, so data flow

can be difficult to compute on stack variables in Low Level IL

  • Need to analyze and translate to Medium Level IL

36

slide-37
SLIDE 37

Low Level IL to Medium Level IL

  • Low Level IL is translated to SSA form
  • Use implicit data flow from SSA to resolve stack layout
  • Data flow based stack layout resolution avoids problems

with nonstandard frame pointer behavior

  • Translate loads and stores on stack to stack variable uses

and assignments

37

slide-38
SLIDE 38

Medium Level IL Example

push(ebp) ebp = esp esp = esp - 0x18 eax = [ebp + 8].d [esp].d = eax call(free) esp = ebp ebp = pop <return> jump(pop) var_4 = ebp eax = arg_4 var_1c = eax free(var_1c) ebp = var_4 return Medium Level IL

38

slide-39
SLIDE 39

Medium Level IL

  • Registers and stack usage are now both treated as

variables

  • Stack variables no longer use explicit memory access
  • Translate to SSA form to obtain implicit data flow on both

registers and stack variables

  • Type propagation is performed on SSA form

39

slide-40
SLIDE 40

Using Medium Level IL - Jump Tables

40

slide-41
SLIDE 41

Using Medium Level IL - Jump Tables

  • Jump table resolution based on path-sensitive data flow
  • SSA conversion process also tracks control flow

dependence for every block

  • Data flow computations allow disjoint sets of possible

values

  • Reads from memory are simulated
  • At jump site, possible values are the possible jump targets

41

slide-42
SLIDE 42

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Solve for this to get jump targets

Medium Level IL SSA Form

42

slide-43
SLIDE 43

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Track flow backwards with SSA to find definitions

43

slide-44
SLIDE 44

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Memory read depends on value of x8#1

44

slide-45
SLIDE 45

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Value used in branch comparison

45

slide-46
SLIDE 46

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Branch condition must be false to reach jump site

46

slide-47
SLIDE 47

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) When false, we know that x0#2.d is between 0 and 0x1f inclusive

47

slide-48
SLIDE 48

Jump Table Example

x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Resolve forward to obtain possible jump targets Set of possible values here are the jump targets

48

slide-49
SLIDE 49

Using Medium Level IL - Jump Tables

  • More complex idioms need to combine multiple sources of

information

  • Value through SSA ϕ-functions is the set union of the

inputs

  • Value of a specific SSA variable is the set intersection of

the information found in the definition and all uses of the variable

49

slide-50
SLIDE 50

Leveraging the Jump Table Algorithm

  • Single jump table algorithm works on all architectures with

no additional effort from architecture plugin

  • Control flow dependence information accessible from API
  • Queries for set of possible values accessible from API

50

slide-51
SLIDE 51

The Final Forms

  • Medium Level IL has the type information, stack

knowledge, and SSA form to translate easily into LLVM IR

  • Further analysis can be performed to translate to High

Level IL, the Binary Ninja IL that will be used to create its decompiler

  • All aspects of every IL form are plugin accessible, so

translating to other representations is straightforward

51

slide-52
SLIDE 52

Binary Ninja for Profit

52

slide-53
SLIDE 53

Binja API

  • Python, C and C++ API’s

(headless)

  • Branches: Basic block/ Function edges

(incoming & outgoing)

  • Get the register states, some naive range

analysis

  • api.binary.ninja/search.html

53

slide-54
SLIDE 54

binja_memcpy.py: IL /bin/bash

54

slide-55
SLIDE 55

binja_memcpy.py: IL /bin/bash

55

slide-56
SLIDE 56

binja_memcpy.py: API

56

slide-57
SLIDE 57

binja_memcpy.py: API

57

slide-58
SLIDE 58

binja_memcpy.py: API

58

slide-59
SLIDE 59

binja_memcpy.py: API

59

slide-60
SLIDE 60

binja_memcpy.py: Output

60

slide-61
SLIDE 61

SSA: Uninitialized variable

for func in bv.functions: for block in func.medium_level_il.ssa_form: for instr in block: visit_instr(instr)

61

slide-62
SLIDE 62

SSA: Uninitialized variable

def visit_instr(inst): # Read of variable if inst.operation == MLIL_VAR_SSA: # Not written if inst.index == 0: if inst.src.type == StackVariableSourceType: # Local variables if inst.src.identifier < 0: print ("Uninitialized stack variable reference at " + hex(inst.address))

62

slide-63
SLIDE 63

SSA: Uninitialized variable

else: for op in instr.operands: if isinstance(op, MediumLevelILInstruction): visit_instr(op)

63

slide-64
SLIDE 64

Symbolic Execution

  • Very accurate
  • Takes time, data, and memory, often not feasible
  • IDEA! Reasoning only about what we care about.
  • Apply complex data to abstract domains !
  • Domains: type, sign, range, color etc….

64

slide-65
SLIDE 65

Abstract Interpretation

  • Sets of

concrete values are abstracted imprecisely

  • Galois

Connection formalizes Concrete <-> Abstract

65

slide-66
SLIDE 66
  • X ‘s value is imprecise
  • Compilers perform imprecise

abstraction

int x; int[] a = new int[10]; a[2 * x] = 3;

  • 1. Add precision - i.e. declare

abstract value [0, 9] 1.

  • 2. Symbolically execute with

abstract domain/ values

  • Requires control-flow analysis

Abstract Interpretation

66

slide-67
SLIDE 67

Abstract Domains & Sign Analysis

int a,b,c; a = 42; b = 87; if (input) { c = a + b; } else { c = a - b; }

  • Map variables to an

abstract value

67

slide-68
SLIDE 68

Abstract Domains & Sign Analysis

  • Binary Ninja plugin
  • Path sensitive - construct lattices of

abstract values

  • Under approximate
  • One abstract state per CFG node
  • Avoid loss in precision for fractions.

68

slide-69
SLIDE 69

Demo!

  • Analyze

example program

  • PHP

CVE-2016-6289

69

slide-70
SLIDE 70

UAF Analysis: PointsTo for Binja IL

blog.trailofbits.com/2016/03/09/the-problem-with-dynamic-program-analysis/

  • Before: Allocation -> Write
  • UAF Analysis: Allocation -> Free -> Use
  • Key Idea: Data flow graph, assignments, copies, dereferences,

and frees of pointers

  • Context and path sensitive (path API == soon!).

70

slide-71
SLIDE 71

Devirtualizing C++

  • VTable Function Call
  • Example: mov eax, [ecx]; call [eax + 4]

https://blog.trailofbits.com/2017/02/13/devirtualizing-c-with-binary-ninja/

71

slide-72
SLIDE 72

Devirtualizing C++

72

slide-73
SLIDE 73

Devirtualizing C++

73

slide-74
SLIDE 74

Playing with Scripts!

  • memcpy, headless python

API script

  • Depth-first-search, path

sensitive CFG template

  • Sign analysis, abstract domain

plugin, CFG traversal script

https://github.com/ trailofbits/binjascripts

  • And much much more ….

74

slide-75
SLIDE 75

Conclusion: Resources

  • binary.ninja/
  • Abstract Interpretation talk:

santos.cs.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf

  • Static Program Analysis Book!

cs.au.dk/~amoeller/spa/spa.pdf

75

slide-76
SLIDE 76

Conclusion: Binary Ninja

76

slide-77
SLIDE 77

Contact Us

Sophia d’Antoine

  • IRC/Slack: @quend
  • sophia@trailofbits.com

Peter LaFosse Rusty Wagner

  • https://binaryninjaslack.herokuapp.com
  • binaryninja@vector35.com

77