Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, - PowerPoint PPT Presentation

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, Pierre Collet, Thomas Coudray, Jonathan Salwan, Amaury de la Vieuville

Semantics ? The decompilation process Use cases & tools

Semantics Binary > IR

x86 add rax, 15 sub [rbx + 8], rax

x86 IR add rax, 15 %rax2 = add i64 %rax1, 15

x86 IR add rax, 15 %rax 2 = add i64 %rax 1 , 15

x86 IR add r ax, 15 %rax2 = add i64 %rax1, 15

x86 IR add rax, 15 %rax2 = add i64 %rax1, 15

x86 IR %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 sub [rbx + 8], rax %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

x86 IR %1 = add i64 %rbx1 , 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 sub [ rbx + 8 ], rax %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

x86 IR %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 sub [rbx + 8], rax %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

x86 IR %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 sub [ rbx + 8 ] , rax %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

x86 IR add rax, 15 %rax2 = add i64 %rax1, 15 %1 = add i64 %rbx1, 8 %2 = inttoptr i64 %1 to i64* %3 = load i64* %2 sub [rbx + 8], rax %4 = sub i64 %3, %rax2 store i64 %4, i64* %2

Dozens of SUBs: x86 ... sub reg32, reg32 // SUB32rr sub mem32, reg32 // SUB32mr sub reg32, imm32 // SUB32ri sub reg64, reg64 // SUB64rr ...

Dozens of SUBs: x86 IR ... sub reg32, reg32 sub mem32, reg32 %dst = sub i XX %src1, %src2 sub reg32, imm32 sub reg64, reg64 ...

De  ning Semantics Binary > Mir > IR

def SUB : InstructionSemantics<[ (set vop0, (sub vop1, vop2)) ]>;

def : OpcodesSemantics< SUB, [SUB32ri, SUB32mr, SUB32rr, ...] >;

TableGen Operands def GR32 // RegisterClass ... def i32mem // Operand ... def SUB32mr { // Instruction ... dag OutOperandList = (outs); dag InOperandList = (ins i32mem:$dst, GR32:$src) ; ...

MC Operands sub [ebx + 8], eax ## <MCInst #2562 SUB32mr ## <MCOperand Reg:45> ## <MCOperand Imm:1> ## <MCOperand Reg:0> ## <MCOperand Imm:8> ## <MCOperand Reg:0> ## <MCOperand Reg:43>>

Virtual Operands

Virtual Operands Input Register class: get the register value Operand: look for OperandMapping

Virtual Operands Output Register class: put the value in the register Operand: look for OperandMapping

Operand Mapping: Register Classes def : OperandMapping< GR32, /* In */ (get mc_op0), /* Out */ (put mc_op0, result) >;

Operand Mapping: Immediates def : OperandMapping< imm32, /* In */ (mov mc_op0), /* Out */ () >;

Operand Mapping: Custom Operands // base + index * scale + offset // op0 + op1 * op2 + op3 def BISO : SemaFrag< (add mc_op0, (add mc_op3, (mul mc_op1, mc_op2)))) >;

Operand Mapping: Custom Operands def : OperandMapping< i32mem, /* In */ (load (BISO)), /* Out */ (store (BISO), result) >;

Virtual Operand Expansion (sub vop1, vop2) SUB32mr (sub (load (add ..)), (get mc_op5))

Virtual Operand Expansion (sub vop1, vop2) SUB32mr SUB32ri (sub (sub (load (add ..)), (get mc_op0), (get mc_op5)) (mov mc_op1))

Virtual Operand Expansion (sub vop1, vop2) SUB32ri untyped expression tree typed instruction list (sub %0 = get32 mcop0 (get mc_op0), %1 = mov32 mcop1 (mov mc_op1)) %r = sub32 %0, %1

Mir Binary > Mir > IR

Mir: Target registers get %td0, 4 ... put 4, %td3

Mir: Advance 9: 81 c3 d2 04 00 00 add ebx, 1234 advance @9 get %td0, EBX mov %td1, 1234 add %td2, %td0, %td1 put EBX, %td2 advance +6

IR Binary > Mir > IR

Generating IR x86 Mir IR ... sub ebx, ecx sub %td2, ... put EBX, %td2 get %td0, EBX mov %td1, 12 add ebx, 12 add %td2, %td0, %td1 put EBX, %td2

Generating IR x86 Mir IR ... ... sub ebx, ecx sub %td2, ... %ebx2 = sub i32 ... put EBX, %td2 get %td0, EBX mov %td1, 12 add ebx, 12 add %td2, %td0, %td1 %ebx3 = add i32 put EBX, %td2

Generating IR x86 Mir IR ... ... sub ebx, ecx sub %td2 , ... %ebx2 = sub i32 ... put EBX, %td2 get %td0, EBX mov %td1, 12 add ebx, 12 add %td2, %td0, %td1 %ebx3 = add i32 %ebx2 put EBX, %td2

Generating IR x86 Mir IR ... ... sub ebx, ecx sub %td2 , ... %ebx2 = sub i32 ... put EBX, %td2 get %td0, EBX mov %td1, 12 add ebx, 12 add %td2, %td0 , %td1 %ebx3 = add i32 %ebx2 put EBX, %td2

Generating IR x86 Mir IR ... ... sub ebx, ecx sub %td2, ... %ebx2 = sub i32 ... put EBX, %td2 get %td0, EBX mov %td1, 12 add ebx, 12 add %td2, %td0, %td1 %ebx3 = add i32 %ebx2, 12 put EBX, %td2

Generating Branches 22: � 48 83 c1 08 add rcx, 8 ... xx: xx xx xx xx jmp 22 Mir IR advance @22 I22: get %tq0, RCX mov %tq1, 8 add %tq2, %tq0, %tq1 %rcx2 = add i64 %rcx1, 8 put RCX, %tq2 advance +4 ... jmp 22 br label %I22

Generating Indirect Branches 22: 48 83 c1 08 add rcx, 8 26: 83 eb 03 sub ebx, 3 JumpTable: %p = phi ... I22: %rcx2 = add i64 %rcx1, 8 switch i64 %p, label %fail I26: [i64 22, label %I22 %ebx2 = sub i64 %ebx1, 3 i64 26, label %I26]

Generating Predicated Instructions addge r7, r5, #1 Mir IR get %td0, R5 mov %td1, 1 %1 = add i32 %r5_1, 1 add %td2, %td0, %td1 %r7_2 = select xx , i64 %1, %r5_1 select %td3, xx , %td2, %td0 put R7, %td3

Generating Condition Codes 22: � 48 83 c1 08 add rcx, 8 26: xx xx xx xx jne 22 Mir IR advance @22 I22: get %tq0, RCX mov %tq1, 8 add %tq2, %tq0, %tq1 %rcx2 = add i64 %rcx1, 8 ... cmpne %f3, %tq2 %ne2 = icmp ne i64 %rcx2, 0 ... put RCX, %tq2 advance +4 jmpne 22 br i1 %ne2, label %I22

Using the IR IR > ?

Binary Rewriting

Binary Rewriting Missing semantics ➔ Inline assembly

Binary Rewriting Missing semantics ➔ Inline assembly Data sections ➔ Map it all

Static Binary Translation

Dynamic Binary Translation

Dynamic Binary Translation Self-altering code ➔ Mark read/execute

Dynamic Binary Translation Self-altering code ➔ Mark read/execute Code discovery ➔ Per-BB translation

Dynamic Binary Instrumentation

Binary Analysis

Simulation

Simulation Missing semantics ➔ Runtime library

Simulation Missing semantics ➔ Runtime library Cycle accuracy ➔ Machine Model?

To-Source Decompilation

To-source Decompilation C source output ➔ C Backend!

To-source Decompilation C source output ➔ C Backend! IR “highering” ➔ Optimizations

To-source Decompilation C source output ➔ C Backend! IR “highering” ➔ Optimizations Lack of accuracy ➔ Metadata

Going forward

Going forward Merging semantics with SD patterns?

Going forward Merging semantics with SD patterns? Removing the Mir backend

Going forward Merging semantics with SD patterns? Removing the Mir backend Analyzes & Highering

Going forward Merging semantics with SD patterns? Removing the Mir backend Analyzes & Highering Tools!

Questions? http://dagger.repzret.org

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, - PowerPoint PPT Presentation

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, Pierre Collet, Thomas Coudray, Jonathan Salwan, Amaury de la Vieuville Semantics ? The decompilation process Use cases & tools Semantics Binary > IR x86 add rax, 15 sub

Dagger Category Theory Chris Heunen and Martti Karvonen 1 / 19 Outline What are dagger

Dagger limits Martti Karvonen (joint work with Chris Heunen) Structure of the talk 1. Dagger

Dagger category theory: monads and limits Martti Karvonen (joint work with Chris Heunen) August

Dagger A fast dependency injector for Android and Java. Thursday, November 8, 12 Introduction

PT symmetry Carl Bender Physics Department Washington University Dirac Hermiticity dagger H =

Dagger linear logic for categorical quantum mechanics Robin Cockett, Cole Comfort, and Priyaa

The dagger lambda calculus Philip Atzemoglou University of Oxford Quantum Physics and Logic 2014

Cloak and dagger Chris Heunen 1 / 34 Algebra and coalgebra Increasing generality: Vector

Quantum Channels for Mix Unitary Categories Robin Cockett, Cole Comfort, and Priyaa Srinivasan

Mix Unitary Categories Robin Cockett, Cole Comfort, and Priyaa Srinivasan CT2018, Ponta Delgada,

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Behavioral Cloning from

Android App Anatomy Eric Burke Square @burke_eric Topics Android lifecycle Fragments

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Supervised & Imitation

Common Proper Collective Abstract joy Banquo FINISHED? Friday Macbeth anger dagger Can

Jet list decoding D. J. Bernstein University of Illinois at Chicago Thanks to: NSF 1018836

Data-Driven Ensembles for Deep and Hard-Decision Hybrid Decoding International Symposium on

AGL ARM prototype development update T oward the AGL spec 2.0 definition Hisao Munakata Linux

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Szalinski : A Tool for Synthesizing Structured CAD Models with Equality Saturation and Inverse

Soot .class .jimple .java commandline args Soot Polyglot Jimpilfy Jimple Parser

TypeMiner: Recovering Types in Binary Code using Machine Learning Alwin Maier Hugo Gascon

OCaml Workshop 2020-08-28 Seb Mondet, TQ Tezos The Who Software Engineer at TQ Tezos Improve

Rev101 spritzers - CTF team spritz.math.unipd.it/spritzers.html Disclaimer All information

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, - PowerPoint PPT Presentation

Dagger Decompiling to IR Ahmed Bougacha with Geoffroy Aubey, Pierre Collet, Thomas Coudray, Jonathan Salwan, Amaury de la Vieuville Semantics ? The decompilation process Use cases & tools Semantics Binary > IR x86 add rax, 15 sub

Dagger Category Theory Chris Heunen and Martti Karvonen 1 / 19 Outline What are dagger

Dagger limits Martti Karvonen (joint work with Chris Heunen) Structure of the talk 1. Dagger

Dagger category theory: monads and limits Martti Karvonen (joint work with Chris Heunen) August

Dagger A fast dependency injector for Android and Java. Thursday, November 8, 12 Introduction

PT symmetry Carl Bender Physics Department Washington University Dirac Hermiticity dagger H =

Dagger linear logic for categorical quantum mechanics Robin Cockett, Cole Comfort, and Priyaa

The dagger lambda calculus Philip Atzemoglou University of Oxford Quantum Physics and Logic 2014

Cloak and dagger Chris Heunen 1 / 34 Algebra and coalgebra Increasing generality: Vector

Quantum Channels for Mix Unitary Categories Robin Cockett, Cole Comfort, and Priyaa Srinivasan

Mix Unitary Categories Robin Cockett, Cole Comfort, and Priyaa Srinivasan CT2018, Ponta Delgada,

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Behavioral Cloning from

Android App Anatomy Eric Burke Square @burke_eric Topics Android lifecycle Fragments

Trajectory Optimization, Imitation Learning Lecture 14 What will you take home today? Recap LQR

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Supervised &amp; Imitation

Common Proper Collective Abstract joy Banquo FINISHED? Friday Macbeth anger dagger Can

Jet list decoding D. J. Bernstein University of Illinois at Chicago Thanks to: NSF 1018836

Data-Driven Ensembles for Deep and Hard-Decision Hybrid Decoding International Symposium on

AGL ARM prototype development update T oward the AGL spec 2.0 definition Hisao Munakata Linux

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Szalinski : A Tool for Synthesizing Structured CAD Models with Equality Saturation and Inverse

Soot .class .jimple .java commandline args Soot Polyglot Jimpilfy Jimple Parser

TypeMiner: Recovering Types in Binary Code using Machine Learning Alwin Maier Hugo Gascon

OCaml Workshop 2020-08-28 Seb Mondet, TQ Tezos The Who Software Engineer at TQ Tezos Improve

Rev101 spritzers - CTF team spritz.math.unipd.it/spritzers.html Disclaimer All information

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 2: Supervised & Imitation