Binarylevel program analysis: Assembly basics Gang Tan CSE 597 - - PowerPoint PPT Presentation

binary level program analysis assembly basics
SMART_READER_LITE
LIVE PREVIEW

Binarylevel program analysis: Assembly basics Gang Tan CSE 597 - - PowerPoint PPT Presentation

Binarylevel program analysis: Assembly basics Gang Tan CSE 597 Spring 2019 Penn State University 1 Source code, Assembly code, Object code, and Executable Code Source Object Assembly Compiler Assembler code code code Then a


slide-1
SLIDE 1

Binary‐level program analysis: Assembly basics

Gang Tan

CSE 597 Spring 2019 Penn State University

1

slide-2
SLIDE 2

Source code, Assembly code, Object code, and Executable Code

  • Then a linker links object code of different compilation

units (files, libraries) into executable code

  • Assembly code

– Consist of assembly instructions – Specific for a particular architecture (x86, x64, ARM, SPARC, etc.)

  • Object code

– Consist of encodings of assembly instructions in bytes

  • Executable code

– AKA machine code – In a particular file format (e.g., ELF or PE)

2

Source code Compiler Assembly code Assembler Object code

slide-3
SLIDE 3

Example Source Code: hello.c

#include <stdio.h> int main() { printf("Hello, World!"); return 0; }

3

slide-4
SLIDE 4

Example Assembly Code: After “gcc ‐S ‐o hello.s hello.c”

.file "hello.c" .section .rodata .LC0: .string "Hello, World!" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, ‐16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $.LC0, %eax movq %rax, %rdi movl $0, %eax call printf movl $0, %eax leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .‐main .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7‐23)" .section .note.GNU‐stack,"",@progbits

4

slide-5
SLIDE 5

Example executable Code: After “gcc –o hello hello.c”

Do “objdump ‐s ./hello”

5

slide-6
SLIDE 6

Binary Code Analysis

  • Refer to analyzing assembly or executable

code

  • If given executable code

– Step 1: disassemble it to assembly code – Step 2: analyze the assembly code

  • The disassembly step may be hard or easy

– Depending on whether meta information is embedded into executable code

6

slide-7
SLIDE 7

Meta information in Executable Code

  • During compilation, meta information can be

embedded into executable code

  • Meta information: symbol tables

– Information about symbols (e.g., function and variable names) from source code – Each entry

  • The symbol name
  • The binding address
  • Type of the symbol
  • Misc. info

– Symbol tables consumed by linkers and debuggers

7

slide-8
SLIDE 8
  • bjdump ‐‐sym ./hello

8

slide-9
SLIDE 9

Meta information in Executable Code

  • Relocation information

– Before linking, memory addresses of functions and global data are unknown – Compilers generate relocation entries – Static/dynamic linkers patch the program during linking

9

slide-10
SLIDE 10

Meta information in Executable Code

  • Debugging information

– Generated by the compiler and consumed by debuggers (e.g., gdb) – During debugging, the debugger uses debugging info to relate binary code to source code

  • E.g., this instruction is generated code from this source code

line

– Include

  • Source code info: types and scopes of identifiers
  • Line‐number info: to relate binary to source code
  • Other info such as location description

– Debugging info formats: DWARF and STABS

10

slide-11
SLIDE 11

Stripped versus unstripped binaries

  • Stripped binaries

– Pure binary code; no meta information – Disassembly is hard (do not even know where functions start)

  • Unstripped binaries

– Binary code plus meta information – Disassembly is easy

  • Why stripped binaries?

– Meta information occupies space – Stripped binaries are harder to reverse engineer, making it easier to protect intellectual property

11

slide-12
SLIDE 12

Next: IA32 and Reverse Engineering basics

  • NSA tutorial on reverse engineering

– https://codebreaker.ltsnet.net/resources – Introduction to x86 Assembly – Reverse Engineering Machine Code Pt. 1 – Reverse Engineering Machine Code Pt. 2

12