program analysis
play

Program Analysis Attackers: need to analyze our program to modify - PowerPoint PPT Presentation

Program Analysis Attackers: need to analyze our program to modify it! Defenders: need to analyze our program to protect it! Two kinds of analyses: 1 static analysis tools collect information about a program by studying its code; 2 dynamic


  1. Program Analysis Attackers: need to analyze our program to modify it! Defenders: need to analyze our program to protect it! Two kinds of analyses: 1 static analysis tools collect information about a program by studying its code; 2 dynamic analysis tools collect information from executing the program. 1/22

  2. Static and Dynamic Analyses control-flow graphs: representation of functions. 2/22

  3. Static and Dynamic Analyses control-flow graphs: representation of functions. call graphs: representation of (possible) function calls. 2/22

  4. Static and Dynamic Analyses control-flow graphs: representation of functions. call graphs: representation of (possible) function calls. debugging: what path does the program take? 2/22

  5. Static and Dynamic Analyses control-flow graphs: representation of functions. call graphs: representation of (possible) function calls. debugging: what path does the program take? tracing: which functions/system calls get executed? 2/22

  6. Static and Dynamic Analyses control-flow graphs: representation of functions. call graphs: representation of (possible) function calls. debugging: what path does the program take? tracing: which functions/system calls get executed? profiling: what gets executed the most? 2/22

  7. Static and Dynamic Analyses control-flow graphs: representation of functions. call graphs: representation of (possible) function calls. debugging: what path does the program take? tracing: which functions/system calls get executed? profiling: what gets executed the most? disassembly: turn raw executables into assembly code. 2/22

  8. Static and Dynamic Analyses control-flow graphs: representation of functions. call graphs: representation of (possible) function calls. debugging: what path does the program take? tracing: which functions/system calls get executed? profiling: what gets executed the most? disassembly: turn raw executables into assembly code. decompilation: turn raw assembly code into source code. 2/22

  9. Outline Static Analysis 1 Control-flow analysis Reconstituting source 2 Disassembly Static Analysis 3/22

  10. Control-flow Graphs (CFGs) A way to represent functions. Nodes are called basic blocks. Each block consists of straight-line code ending (possibly) in a branch. An edge A → B : control could flow from A to B . Static Analysis 4/22

  11. ✞ ☎ int modexp ( int y , int x [ ] , int w, int n ) { ✞ ☎ int R , L ; ( 1) k=0 int k = 0; ( 2) s=1 s = 1; ( 3) ( k > = w) goto (12) int i f while ( k < w) { ( 4) i f ( x [ k ]!=1) goto ( 7) ( x [ k] == 1) ( 5) R=(s ∗ y)%n i f R = ( s ∗ y ) % n ; ( 6) goto ( 8) ( 7) R=s else R = s ; ( 8) s=R ∗ R%n s = R ∗ R % n ; ( 9) L=R L = R; (10) k++ k++; (11) goto ( 3) } (12) return L ✝ ✆ return L ; Static Analysis 5/22 }

  12. The resulting graph B 0 : (1) k=0 (2) s=1 B 1 : (3) if (k>=w)goto B 6 B 6 : B 2 : (12) return L (4) if (x[k]!=1) goto B 4 B 4 : B 3 : (7) R=s (5) R=(s*y) mod n (6) goto B 5 B 5 : (8) s=R*R mod n (9) L = R (10) k++ (11) goto B 1 Static Analysis 6/22

  13. BuildCFG( F ) : 1 Mark every instruction which can start a basic block as a leader : the first instruction is a leader; any target of a branch is a leader; the instruction following a conditional branch is a leader. 2 A basic block consists of the instructions from a leader up to, but not including, the next leader. 3 Add an edge A → B if A ends with a branch to B or can fall through to B . Static Analysis 7/22

  14. Interprocedural control flow Interprocedural analysis also considers flow of information between functions. Call graphs are a way to represent possible function calls. Each node represents a function. An edge A → B : A might call B . Static Analysis 8/22

  15. Building call-graphs ✞ ☎ void h ( ) ; f () { void h ( ) ; } void g () { f f ( ) ; } k void h ( ) { main g h f ( ) ; g ( ) ; } void k () {} Static Analysis 9/22

  16. Outline Static Analysis 1 Control-flow analysis Reconstituting source 2 Disassembly Reconstituting source 10/22

  17. Reconstituting source p.o p p.c p.s p’ as cc header ld header strip .data .data header .text .text .data symbols symbols .text relocation relocation trans Reconstituting source 11/22

  18. Attacking stripped binary code p’ hex p’’ editor header header .data .data .text .text dis p’.s p’.c p’’.c p’’ dcc edit cc Reconstituting source 12/22

  19. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Reconstituting source 13/22

  20. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Reconstituting source 13/22

  21. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Indirect jumps — must assume that any location could be the start of an instruction! Reconstituting source 13/22

  22. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Indirect jumps — must assume that any location could be the start of an instruction! Find the beginning of functions if all calls are indirect. Reconstituting source 13/22

  23. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Indirect jumps — must assume that any location could be the start of an instruction! Find the beginning of functions if all calls are indirect. Finding the end of fuctions — if no dedicated return instruction. Reconstituting source 13/22

  24. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Indirect jumps — must assume that any location could be the start of an instruction! Find the beginning of functions if all calls are indirect. Finding the end of fuctions — if no dedicated return instruction. Handwritten assembly code — won’t conform to the standard calling conventions. Reconstituting source 13/22

  25. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Indirect jumps — must assume that any location could be the start of an instruction! Find the beginning of functions if all calls are indirect. Finding the end of fuctions — if no dedicated return instruction. Handwritten assembly code — won’t conform to the standard calling conventions. code compression — the code of two functions may overlap. Reconstituting source 13/22

  26. Why is disassembly hard? Variable length instruction sets — overlapping instructions. Mixing data and code — misclassify data as instructions. Indirect jumps — must assume that any location could be the start of an instruction! Find the beginning of functions if all calls are indirect. Finding the end of fuctions — if no dedicated return instruction. Handwritten assembly code — won’t conform to the standard calling conventions. code compression — the code of two functions may overlap. Self-modifying code. Reconstituting source 13/22

  27. Instruction set 1 opcode mnemonic operands semantics 0 function call to addr addr call 1 function call to address in reg reg calli 2 branch to pc + offset if flags for brg offset > are set 3 reg reg ← reg + 1 inc 4 offset branch to pc + offset bra 5 reg jump to address in reg jmpi 6 beginning of function prologue 7 return from function ret Instruction set for a small architecture. All operators and operands are one byte long. Instructions can be 1-3 bytes long. Reconstituting source 14/22

  28. Instruction set 2 opcode mnemonic operands semantics 8 reg 1 , ( reg 2 ) reg 1 ← [ reg 2 ] load 9 reg , imm reg ← imm loadi 10 reg , imm compare reg and imm and set cmpi flags 11 reg 1 ← reg 1 + reg 2 add reg 1 , reg 2 12 branch to pc + offset if flags for brge offset ≥ are set 13 offset branch to pc + offset if flags for breq = are set 14 ( reg 1 ) , reg 2 [ reg 1 ] ← reg 2 store Reconstituting source 15/22

  29. Disassembly — example ✞ ☎ 6 0 1 0 9 0 4 3 1 0 7 0 6 9 0 1 1 0 0 1 2 2 6 9 1 3 0 1 1 1 0 8 2 1 5 2 3 2 3 7 9 1 3 4 7 9 1 4 4 2 7 6 9 0 3 7 6 9 0 1 7 4 2 2 4 3 1 7 4 3 4 1 ✝ ✆ Next few slides show the results of different disassembly algorithms. Correctly disassembled regions are in pink. Reconstituting source 16/22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend