Be a Binary Rockst r An Introduction to Program Analysis with - - PowerPoint PPT Presentation
Be a Binary Rockst r An Introduction to Program Analysis with - - PowerPoint PPT Presentation
Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja Agenda Motivation Current State of Program Analysis Design Goals of Binja Program Analysis Building Tools 2 Motivation 3 Tooling - Concrete ->
Agenda
- Motivation
- Current State of Program Analysis
- Design Goals of Binja Program
Analysis
- Building Tools
2
Motivation
3
- Tooling - Concrete -> Symbolic
○ Increase speed & effectiveness of RE / VR
- Make Program Analysis more accessible &
useful
4
Foundations
- Need to understand code semantics
- Could be done directly on the assembly
- An Intermediate Language (IL) is needed
5
Why IL?
- Architecture Abstraction
- Smaller number of instructions
6
Easy to lift
- Simple flags calculation
- As close to native instructions as possible
- Typeless - types inferred later
7
Easy to read
- Intuitive to read
- Tree-based infix notation
- No register abstraction
- Flags calculation only when necessary
- Avoid excessive temporaries
8
IL Instruction Set Size
Instruction Set Size
Easier to analyze Easier to lift
9
The Options
10
Existing Options for IL
- BAP
- VEX
- REIL
- LLVM
- IDA
11
BAP
- Tree-tree based :)
- Flags are explicit and inhibit readability :(
- Written in OCAML :(
12
add ebx, eax shl ebx, cl addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32 REBX:u32 = REBX:u32 + REAX:u32 RCF:bool = REBX:u32 < t:u32 addr 0x2 @asm ”shl %cl,%ebx” t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 & 0x1f:u32) RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) & RCF:bool | ̃((RECX:u32 & 0x1f:u32) = 0:u32) & low:bool(t1:u32)
13
VEX
- Register names are abstracted :(
- Single assignment :(
- Over 1000 instructions! :(
- Yet they call it “RISC-like”
- Even Angr is planning a move away from it
14
t0 = GET:I32(16) t1 = 0x8:I32 t3 = Sub32(t0,t1) PUT(16) = t3 PUT(68) = 0x59FC8:I32 subs R2, R2, #8
15
REIL
- Tiny instruction set
- Horrible readability
- Makes abstractions nearly impossible
- Flags are explicit and inhibit readability :(
16
00000000.00 STR R_EAX:32, , V_00:32 00000000.01 STR 0:1, , R_CF:1 00000000.02 AND V_00:32, ff:8, V_01:8 00000000.03 SHR V_01:8, 7:8, V_02:8 00000000.04 SHR V_01:8, 6:8, V_03:8 00000000.05 XOR V_02:8, V_03:8, V_04:8 00000000.06 SHR V_01:8, 5:8, V_05:8 00000000.07 SHR V_01:8, 4:8, V_06:8 00000000.08 XOR V_05:8, V_06:8, V_07:8 00000000.09 XOR V_04:8, V_07:8, V_08:8 00000000.0a SHR V_01:8, 3:8, V_09:8 00000000.0b SHR V_01:8, 2:8, V_10:8 00000000.0c XOR V_09:8, V_10:8, V_11:8 00000000.0d SHR V_01:8, 1:8, V_12:8 00000000.0e XOR V_12:8, V_01:8, V_13:8 00000000.0f XOR V_11:8, V_13:8, V_14:8 00000000.10 XOR V_08:8, V_14:8, V_15:8 00000000.11 AND V_15:8, 1:1, V_16:1 00000000.12 NOT V_16:1, , R_PF:1 00000000.13 STR 0:1, , R_AF:1 00000000.14 EQ V_00:32, 0:32, R_ZF:1 00000000.15 SHR V_00:32, 1f:32, V_17:32 00000000.16 AND 1:32, V_17:32, V_18:32 00000000.17 EQ 1:32, V_18:32, R_SF:1 00000000.18 STR 0:1, , R_OF:1
test eax, eax
17
LLVM
- Easy to analyze and has great tools already
available
- It’s a compiler!
○ Reversers want a decompiler. ○ Cannot be the only goal
18
LLVM Challenges
- Hard to lift well from compiled binaries
○ Designed for compiler output
- Expects type information in the instructions
- SSA form - assembly is not
- Stack in assembly looks like a structure, but
structures lose many advantages of SSA
19
IDA
?
20
Binary Ninja’s Answer
- Binary Ninja Intermediate Language (BNIL)
21
IL Goals & Design
22
Why Another IL?
- Popular existing ILs for compiled binaries are not very
human readable. They are extremely low level and verbose.
- Existing ILs are single stage. Heavyweight analysis must
be performed to get anywhere close to decompiled output.
- Writing a lifter for a new architecture is usually very time
consuming.
23
Binary Ninja IL
- Create a family of ILs with multiple stages of analysis
- Lowest level is close to assembly
- After analysis and transformations, higher levels are
closer to decompiled output and would be much easier to translate to good LLVM code
- Analysis involved in each transformation is easy to
understand, fast, and directly aids further analysis
24
IL Design Goals
- Human readable
- Computer understandable (SSA, 3AF, etc.)
- Plugin understandable
- Easy to lift native architectures
- Translation to other ILs such as LLVM
25
Human Readable
- Reads like pseudocode, even in lowest level
form
- Flags are resolved into readable expressions
26
Low Level IL Example
lea rax, [0x201047] lea rdi, [0x201040] push rbp sub rax, rdi mov rbp, rsp cmp rax, 0xe ja 0x68d rax = 0x201047 rdi = 0x201040 push(rbp) rax = rax - rdi rbp = rsp if (rax u> 0xe) then 6 @ 0x68d else 8 @ 0x68b
x86-64 Assembly Low Level IL
27
Low Level IL Example
addiu $sp, $sp, -0x18 sw $ra, 0x14($sp) lw $a0, ($a1) jal atoi nop sltiu $at, $v0, 0x20 beqz $at, 0x4002d8 nop $sp = $sp - 0x18 [$sp + 0x14].d = $ra $a0 = [$a1].d call(atoi) $at = $v0 u< 0x20 ? 1 : 0 if ($at == 0) then 7 @ 0x4002d8 else 12 @ 0x400290
MIPS Assembly Low Level IL
28
Computer Understandable
- Multiple IL forms
- Pick the right IL for the task at hand
29
IL Forms
Lifted IL
ASM -> IL
Low Level IL
Flags use resolved
Medium Level IL
Stack usage resolved Type propagation
SSA / 3AF SSA / 3AF
High Level IL
Calls in high level form Expression folding Like decompiled output
30
Plugin Understandable
- All IL forms directly accessible from API
- Analysis performed on IL also accessible by
API
31
Easy to Lift
- Expression tree
- Designed for quick, modular lifter
implementations
- Semantic flags eases the burden of describing
flag effects during lifting
32
Semantic Flags
- Architecture plugins define the set of flags and
their semantic roles
- Instructions can define a set of flags they write
- Data flow analysis is performed to link flag
uses to flag writes
33
Semantic Flags
- In most compiled code, flags are resolved to
simple comparison expressions with no effort from the architecture plugin
- Special cases fall back to emitting concrete
flag write expressions
34
Semantic Flags Example
sub.q{*}(rax, 0xe) if (u>) then … else … if (rax u> 0xe) then … else … “Writes to all ALU flags” “Flag state representing unsigned greater than” Folded expression describing use of flags
35
Translating Upwards
- Semantic flags analysis gives Low Level IL with flag
usage fully resolved
- Stack is represented as memory accesses, so data flow
can be difficult to compute on stack variables in Low Level IL
- Need to analyze and translate to Medium Level IL
36
Low Level IL to Medium Level IL
- Low Level IL is translated to SSA form
- Use implicit data flow from SSA to resolve stack layout
- Data flow based stack layout resolution avoids problems
with nonstandard frame pointer behavior
- Translate loads and stores on stack to stack variable uses
and assignments
37
Medium Level IL Example
push(ebp) ebp = esp esp = esp - 0x18 eax = [ebp + 8].d [esp].d = eax call(free) esp = ebp ebp = pop <return> jump(pop) var_4 = ebp eax = arg_4 var_1c = eax free(var_1c) ebp = var_4 return Medium Level IL
38
Medium Level IL
- Registers and stack usage are now both treated as
variables
- Stack variables no longer use explicit memory access
- Translate to SSA form to obtain implicit data flow on both
registers and stack variables
- Type propagation is performed on SSA form
39
Using Medium Level IL - Jump Tables
40
Using Medium Level IL - Jump Tables
- Jump table resolution based on path-sensitive data flow
- SSA conversion process also tracks control flow
dependence for every block
- Data flow computations allow disjoint sets of possible
values
- Reads from memory are simulated
- At jump site, possible values are the possible jump targets
41
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Solve for this to get jump targets
Medium Level IL SSA Form
42
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Track flow backwards with SSA to find definitions
43
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Memory read depends on value of x8#1
44
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Value used in branch comparison
45
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Branch condition must be false to reach jump site
46
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) When false, we know that x0#2.d is between 0 and 0x1f inclusive
47
Jump Table Example
x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Resolve forward to obtain possible jump targets Set of possible values here are the jump targets
48
Using Medium Level IL - Jump Tables
- More complex idioms need to combine multiple sources of
information
- Value through SSA ϕ-functions is the set union of the
inputs
- Value of a specific SSA variable is the set intersection of
the information found in the definition and all uses of the variable
49
Leveraging the Jump Table Algorithm
- Single jump table algorithm works on all architectures with
no additional effort from architecture plugin
- Control flow dependence information accessible from API
- Queries for set of possible values accessible from API
50
The Final Forms
- Medium Level IL has the type information, stack
knowledge, and SSA form to translate easily into LLVM IR
- Further analysis can be performed to translate to High
Level IL, the Binary Ninja IL that will be used to create its decompiler
- All aspects of every IL form are plugin accessible, so
translating to other representations is straightforward
51
Binary Ninja for Profit
52
Binja API
- Python, C and C++ API’s
(headless)
- Branches: Basic block/ Function edges
(incoming & outgoing)
- Get the register states, some naive range
analysis
- api.binary.ninja/search.html
53
binja_memcpy.py: IL /bin/bash
54
binja_memcpy.py: IL /bin/bash
55
binja_memcpy.py: API
56
binja_memcpy.py: API
57
binja_memcpy.py: API
58
binja_memcpy.py: API
59
binja_memcpy.py: Output
60
SSA: Uninitialized variable
for func in bv.functions: for block in func.medium_level_il.ssa_form: for instr in block: visit_instr(instr)
61
SSA: Uninitialized variable
def visit_instr(inst): # Read of variable if inst.operation == MLIL_VAR_SSA: # Not written if inst.index == 0: if inst.src.type == StackVariableSourceType: # Local variables if inst.src.identifier < 0: print ("Uninitialized stack variable reference at " + hex(inst.address))
62
SSA: Uninitialized variable
else: for op in instr.operands: if isinstance(op, MediumLevelILInstruction): visit_instr(op)
63
Symbolic Execution
- Very accurate
- Takes time, data, and memory, often not feasible
- IDEA! Reasoning only about what we care about.
- Apply complex data to abstract domains !
- Domains: type, sign, range, color etc….
64
Abstract Interpretation
- Sets of
concrete values are abstracted imprecisely
- Galois
Connection formalizes Concrete <-> Abstract
65
- X ‘s value is imprecise
- Compilers perform imprecise
abstraction
int x; int[] a = new int[10]; a[2 * x] = 3;
- 1. Add precision - i.e. declare
abstract value [0, 9] 1.
- 2. Symbolically execute with
abstract domain/ values
- Requires control-flow analysis
Abstract Interpretation
66
Abstract Domains & Sign Analysis
int a,b,c; a = 42; b = 87; if (input) { c = a + b; } else { c = a - b; }
- Map variables to an
abstract value
67
Abstract Domains & Sign Analysis
- Binary Ninja plugin
- Path sensitive - construct lattices of
abstract values
- Under approximate
- One abstract state per CFG node
- Avoid loss in precision for fractions.
68
Demo!
- Analyze
example program
- PHP
CVE-2016-6289
69
UAF Analysis: PointsTo for Binja IL
blog.trailofbits.com/2016/03/09/the-problem-with-dynamic-program-analysis/
- Before: Allocation -> Write
- UAF Analysis: Allocation -> Free -> Use
- Key Idea: Data flow graph, assignments, copies, dereferences,
and frees of pointers
- Context and path sensitive (path API == soon!).
70
Devirtualizing C++
- VTable Function Call
- Example: mov eax, [ecx]; call [eax + 4]
https://blog.trailofbits.com/2017/02/13/devirtualizing-c-with-binary-ninja/
71
Devirtualizing C++
72
Devirtualizing C++
73
Playing with Scripts!
- memcpy, headless python
API script
- Depth-first-search, path
sensitive CFG template
- Sign analysis, abstract domain
plugin, CFG traversal script
https://github.com/ trailofbits/binjascripts
- And much much more ….
74
Conclusion: Resources
- binary.ninja/
- Abstract Interpretation talk:
santos.cs.ksu.edu/schmidt/Escuela03/WSSA/talk1p.pdf
- Static Program Analysis Book!
cs.au.dk/~amoeller/spa/spa.pdf
75
Conclusion: Binary Ninja
76
Contact Us
Sophia d’Antoine
- IRC/Slack: @quend
- sophia@trailofbits.com
Peter LaFosse Rusty Wagner
- https://binaryninjaslack.herokuapp.com
- binaryninja@vector35.com
77