Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, - - PowerPoint PPT Presentation

dagger
SMART_READER_LITE
LIVE PREVIEW

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, - - PowerPoint PPT Presentation

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, Pierre Collet, Thomas Coudray, Jonathan Salwan, Amaury de la Vieuville Semantics ? The decompilation process Use cases & tools Semantics Binary > IR x86 add rax, 15 sub


slide-1
SLIDE 1

Decompiling to IR

Dagger

Ahmed Bougacha

with Geoffroy Aubey, Pierre Collet, Thomas Coudray, Jonathan Salwan, Amaury de la Vieuville

slide-2
SLIDE 2

Semantics ? The decompilation process Use cases & tools

slide-3
SLIDE 3

Semantics

Binary > IR

slide-4
SLIDE 4

add rax, 15 sub [rbx + 8], rax

x86

slide-5
SLIDE 5

add rax, 15 %rax2 = add i64 %rax1, 15

IR x86

slide-6
SLIDE 6

add rax, 15 %rax2 = add i64 %rax1, 15

IR x86

slide-7
SLIDE 7

add rax, 15 %rax2 = add i64 %rax1, 15

IR x86

slide-8
SLIDE 8

add rax, 15 %rax2 = add i64 %rax1, 15

IR x86

slide-9
SLIDE 9

add rax, 15 %rax2 = add i64 %rax1, 15

IR x86

slide-10
SLIDE 10

sub [rbx + 8], rax %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

IR x86

slide-11
SLIDE 11

sub [rbx + 8], rax %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

IR x86

slide-12
SLIDE 12

sub [rbx + 8], rax %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

IR x86

slide-13
SLIDE 13

sub [rbx + 8], rax %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

IR x86

slide-14
SLIDE 14

add rax, 15 sub [rbx + 8], rax %rax2 = add i64 %rax1, 15 %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

IR x86

slide-15
SLIDE 15

... sub reg32, reg32 // SUB32rr sub mem32, reg32 // SUB32mr sub reg32, imm32 // SUB32ri sub reg64, reg64 // SUB64rr ...

Dozens of SUBs: x86

slide-16
SLIDE 16

... sub reg32, reg32 sub mem32, reg32 sub reg32, imm32 sub reg64, reg64 ...

Dozens of SUBs:

%dst = sub iXX %src1, %src2

IR x86

slide-17
SLIDE 17

Dening Semantics

Binary > Mir > IR

slide-18
SLIDE 18

def SUB : InstructionSemantics<[ (set vop0, (sub vop1, vop2)) ]>;

slide-19
SLIDE 19

def : OpcodesSemantics< SUB, [SUB32ri, SUB32mr, SUB32rr, ...] >;

slide-20
SLIDE 20

def GR32 // RegisterClass ... def i32mem // Operand ... def SUB32mr { // Instruction ... dag OutOperandList = (outs); dag InOperandList = (ins i32mem:$dst, GR32:$src); ...

TableGen Operands

slide-21
SLIDE 21

sub [ebx + 8], eax ## <MCInst #2562 SUB32mr ## <MCOperand Reg:45> ## <MCOperand Imm:1> ## <MCOperand Reg:0> ## <MCOperand Imm:8> ## <MCOperand Reg:0> ## <MCOperand Reg:43>>

MC Operands

slide-22
SLIDE 22

Virtual Operands

slide-23
SLIDE 23

Virtual Operands

Register class: get the register value Operand: look for OperandMapping Input

slide-24
SLIDE 24

Virtual Operands

Register class: put the value in the register Operand: look for OperandMapping Output

slide-25
SLIDE 25

def : OperandMapping< GR32, /* In */ (get mc_op0), /* Out */ (put mc_op0, result) >;

Operand Mapping: Register Classes

slide-26
SLIDE 26

def : OperandMapping< imm32, /* In */ (mov mc_op0), /* Out */ () >;

Operand Mapping: Immediates

slide-27
SLIDE 27

// base + index * scale + offset // op0 + op1 * op2 + op3 def BISO : SemaFrag< (add mc_op0, (add mc_op3, (mul mc_op1, mc_op2)))) >;

Operand Mapping: Custom Operands

slide-28
SLIDE 28

def : OperandMapping< i32mem, /* In */ (load (BISO)), /* Out */ (store (BISO), result) >;

Operand Mapping: Custom Operands

slide-29
SLIDE 29

Virtual Operand Expansion

(sub vop1, vop2) (sub (load (add ..)), (get mc_op5))

SUB32mr

slide-30
SLIDE 30

Virtual Operand Expansion

(sub vop1, vop2) (sub (load (add ..)), (get mc_op5))

SUB32mr

(sub (get mc_op0), (mov mc_op1))

SUB32ri

slide-31
SLIDE 31

Virtual Operand Expansion

(sub vop1, vop2) (sub (get mc_op0), (mov mc_op1)) %0 = get32 mcop0 %1 = mov32 mcop1 %r = sub32 %0, %1

untyped expression tree typed instruction list SUB32ri

slide-32
SLIDE 32

Mir

Binary > Mir > IR

slide-33
SLIDE 33

get %td0, 4 ... put 4, %td3

Mir: Target registers

slide-34
SLIDE 34

advance @9 get %td0, EBX mov %td1, 1234 add %td2, %td0, %td1 put EBX, %td2 advance +6

Mir: Advance

9: 81 c3 d2 04 00 00 add ebx, 1234

slide-35
SLIDE 35

IR

Binary > Mir > IR

slide-36
SLIDE 36

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

slide-37
SLIDE 37

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32

slide-38
SLIDE 38

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32 %ebx2

slide-39
SLIDE 39

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32 %ebx2

slide-40
SLIDE 40

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32 %ebx2

slide-41
SLIDE 41

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32 %ebx2

slide-42
SLIDE 42

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32 %ebx2, 12

slide-43
SLIDE 43

... sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR

sub ebx, ecx add ebx, 12

x86 Mir IR

... %ebx2 = sub i32 ... %ebx3 = add i32 %ebx2, 12

slide-44
SLIDE 44

advance @22 get %tq0, RCX mov %tq1, 8 add %tq2, %tq0, %tq1 put RCX, %tq2 advance +4 ... jmp 22

Generating Branches

I22: %rcx2 = add i64 %rcx1, 8 br label %I22

IR Mir

22: 48 83 c1 08 add rcx, 8 ... xx: xx xx xx xx jmp 22

slide-45
SLIDE 45

JumpTable: %p = phi ... switch i64 %p, label %fail [i64 22, label %I22 i64 26, label %I26]

Generating Indirect Branches

I22: %rcx2 = add i64 %rcx1, 8 I26: %ebx2 = sub i64 %ebx1, 3 22: 48 83 c1 08 add rcx, 8 26: 83 eb 03 sub ebx, 3

slide-46
SLIDE 46

get %td0, R5 mov %td1, 1 add %td2, %td0, %td1 select %td3, xx, %td2, %td0 put R7, %td3

Generating Predicated Instructions

%1 = add i32 %r5_1, 1 %r7_2 = select xx, i64 %1, %r5_1

IR Mir

addge r7, r5, #1

slide-47
SLIDE 47

advance @22 get %tq0, RCX mov %tq1, 8 add %tq2, %tq0, %tq1 ... cmpne %f3, %tq2 ... put RCX, %tq2 advance +4 jmpne 22

Generating Condition Codes

I22: %rcx2 = add i64 %rcx1, 8 %ne2 = icmp ne i64 %rcx2, 0 br i1 %ne2, label %I22

IR Mir

22: 48 83 c1 08 add rcx, 8 26: xx xx xx xx jne 22

slide-48
SLIDE 48

Using the IR

IR > ?

slide-49
SLIDE 49

Binary Rewriting

slide-50
SLIDE 50

Binary Rewriting Missing semantics ➔ Inline assembly

slide-51
SLIDE 51

Binary Rewriting Missing semantics ➔ Inline assembly Data sections ➔ Map it all

slide-52
SLIDE 52

Static Binary Translation

slide-53
SLIDE 53

Dynamic Binary Translation

slide-54
SLIDE 54

Dynamic Binary Translation Self-altering code ➔ Mark read/execute

slide-55
SLIDE 55

Dynamic Binary Translation Self-altering code ➔ Mark read/execute Code discovery ➔ Per-BB translation

slide-56
SLIDE 56

Dynamic Binary Instrumentation

slide-57
SLIDE 57

Binary Analysis

slide-58
SLIDE 58

Simulation

slide-59
SLIDE 59

Simulation Missing semantics ➔ Runtime library

slide-60
SLIDE 60

Simulation Missing semantics ➔ Runtime library Cycle accuracy ➔ Machine Model?

slide-61
SLIDE 61

To-Source Decompilation

slide-62
SLIDE 62

To-source Decompilation C source output ➔ C Backend!

slide-63
SLIDE 63

To-source Decompilation C source output ➔ C Backend! IR “highering” ➔ Optimizations

slide-64
SLIDE 64

To-source Decompilation C source output ➔ C Backend! IR “highering” ➔ Optimizations Lack of accuracy ➔ Metadata

slide-65
SLIDE 65

Going forward

slide-66
SLIDE 66

Merging semantics with SD patterns? Going forward

slide-67
SLIDE 67

Merging semantics with SD patterns? Removing the Mir backend Going forward

slide-68
SLIDE 68

Merging semantics with SD patterns? Removing the Mir backend Analyzes & Highering Going forward

slide-69
SLIDE 69

Merging semantics with SD patterns? Removing the Mir backend Analyzes & Highering Tools! Going forward

slide-70
SLIDE 70

http://dagger.repzret.org

Questions?