reassembleable disassembling
play

Reassembleable Disassembling Shuai Wang, Pei Wang, Dinghao Wu - PowerPoint PPT Presentation

Reassembleable Disassembling Shuai Wang, Pei Wang, Dinghao Wu Pennsylvania State University 24th USENIX Security Symposium, August 2015 1 / 14 Motivation Analysing and retrofitting COTS binaries with. . . software fault isolation control-flow


  1. Reassembleable Disassembling Shuai Wang, Pei Wang, Dinghao Wu Pennsylvania State University 24th USENIX Security Symposium, August 2015 1 / 14

  2. Motivation Analysing and retrofitting COTS binaries with. . . software fault isolation control-flow integrity symbolic taint analysis elimination of ROP gadgets 2 / 14

  3. Motivation Analysing and retrofitting COTS binaries with. . . software fault isolation control-flow integrity symbolic taint analysis elimination of ROP gadgets Binary rewriting comes with major drawbacks/limitations runtime overhead from patching due to control-flow transfers patching requires PIC if code is relocated instrumentation significantly increases binary size binary reuse only works for small binaries (coverage) 2 / 14

  4. Goal Produce reassembleable assembly code from stripped COTS binaries in a fully automated manner. Allows binary-based whole program transformations Requires relocatable assembly code → symbolization of immediate values Complementary to existing work 3 / 14

  5. Symbolization Given an immediate value in assembly code, is it a constant or a memory address? Reassembling transformed program changes binary layout Address changes invalidate memory references x86 No distinction between code and data Variable-length instruction encoding 4 / 14

  6. (Un)Relocatable Assembly Code mov 0xc0, %eax .text mov 0xc0, %eax assemble 0xa08 .data .long 0xa08 mov 0xc0, %eax 0xc0: ? unrelocatable 0xc0: 0xa08 .text mov Glob, %eax mov Glob, %eax binary assemble .data Glob: Glob: 0xa08 .long 0xa08 relocatable 5 / 14

  7. Types of Symbol References Code Section Data Section fun1: ptr: call fun2 .long table c2d c2c d2d fun2: table: mov ptr, %eax .long handler1 lea (%eax, %ebx, 4), %ecx .long handler2 call *%ecx handler1: ... d2c handler2: ... 6 / 14

  8. Symbolization of c2c and c2d References Valid memory references point into code or data section Assume all immediates to be references and filter out invalid ones 7 / 14

  9. Symbolization of d2c and d2d References Assumption 1 “ All symbol references stored in data sections are n-byte aligned, where n is 4 for 32-bit binaries and 8 for 64-bit binaries. ” → Consider only n-byte values which are n-byte aligned 8 / 14

  10. Symbolization of d2c and d2d References Assumption 1 “ All symbol references stored in data sections are n-byte aligned, where n is 4 for 32-bit binaries and 8 for 64-bit binaries. ” → Consider only n-byte values which are n-byte aligned Assumption 2 “ Users do not need to perform transformation on the original binary data. ” → Keep start addresses of data sections during reassembly and ignore d2d references 8 / 14

  11. Symbolization of d2c and d2d References Assumption 1 “ All symbol references stored in data sections are n-byte aligned, where n is 4 for 32-bit binaries and 8 for 64-bit binaries. ” → Consider only n-byte values which are n-byte aligned Assumption 2 “ Users do not need to perform transformation on the original binary data. ” → Keep start addresses of data sections during reassembly and ignore d2d references Assumption 3 “ d2c symbol references are only used as function pointers or jump table entries. ” → References need to point to start of a function or form a jump table 8 / 14

  12. Evaluation Uroboros : 13,209 SLOC in OCaml and Python; works with x86/x64 ELF binaries Intel Core i7-3770 @ 3.4GHz with 8GiB RAM running Ubuntu 12.04 122 programs compiled for 32- and 64-bit targets gcc 4.6.3 with default configuration and optimization of each program strip ped before testing Collection Size Content C OREUTILS 103 GNU Core Utilities R EAL 7 bc, ctags, gzip, mongoose, nweb, oftpd, thttpd S PEC 12 C programs in SPEC2006 9 / 14

  13. Architecture of Uroboros Data Disassembly Module Analysis Module Relocatable Linear Symbol Lifting Binary Assembly Meta-Data Disassembler External Control-Flow Disassembly Code Analyses & Structure Recovery Validator Transformations 10 / 14

  14. Architecture of Uroboros Data Disassembly Module Analysis Module Relocatable Linear Symbol Lifting Binary Assembly Meta-Data Disassembler External Control-Flow Disassembly Code Analyses & Structure Recovery Validator Transformations https://openclipart.org/detail/215030/ 10 / 14

  15. Correctness Test input shipped with programs or custom test of major functionality (some of REAL) Binaries Failing Functionality Tests Assumption Set 32-bit 64-bit {} h264ref, gcc, gobmk, hmmer perlbench, gcc, gobmk, hmmer, sjeng, h264ref, lbm, sphinx3 { A1 } h264ref, gcc, gobmk perlbench, gcc, gobmk { A1 , A2 } h264ref, gcc, gobmk perlbench, gcc, gobmk { A1 , A3 } gobmk gcc, gobmk { A1 , A2 , A3 } gobmk 11 / 14 2 8 Normalized Overhead (%) Normalized Overhead (%) 1.5 6 1 4 0.5 0 2 -0.5 0 -1 -2 -1.5 p b g m g h s l h m l s c g b n t m o i b [ base64 basename cat cksum comm cp csplit cut date tty uname unexpand uniq unlink uptime users vdir wc who b h e z c o m j e 2 p t z c w f c i m a t o t r i c b q 6 l h i t p l p f m n c g p e p n b m u 4 i d 2 g n s b a d g e e r x k e o n r n f 3 o c t u s h m e

  16. Symbolization Errors Table 4: Symbolization false positives of 32-bit S PEC , R EAL and C OREUTILS (Others have zero false positive) Assumption Set Benchmark # of Ref. {} { A1 } { A1 , A2 } { A1 , A3 } { A1 , A2 , A3 } FP FP Rate FP FP Rate FP FP Rate FP FP Rate FP FP Rate perlbench 76538 2 0.026‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ hmmer 13127 12 0.914‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ h264ref 20600 27 1.311‰ 1 0.049‰ 1 0.049‰ 0 0.000‰ 0 0.000‰ gcc 262698 49 0.187‰ 32 0.122‰ 32 0.122‰ 0 0.000‰ 0 0.000‰ gobmk 65244 1348 20.661‰ 985 15.097‰ 912 13.978‰ 78 1.196‰ 5 0.077‰ Table 5: Symbolization false negatives of 32-bit S PEC , R EAL and C OREUTILS (Others have zero false negative) 8 2 Assumption Set Normalized Overhead (%) Normalized Overhead (%) 1.5 Benchmark # of Ref. {} { A1 } { A1 , A2 } { A1 , A3 } { A1 , A2 , A3 } 6 1 FN FN Rate FN FN Rate FN FN Rate FN FN Rate FN FN Rate 4 perlbench 76538 2 0.026‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0.5 hmmer 13127 12 0.914‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 2 h264ref 20600 27 1.311‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ -0.5 gcc 262698 11 0.042‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 -1 gobmk 65244 86 1.318‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ -2 -1.5 p b g m g h s l h m l s c g b n t m o [ base64 basename cat cksum comm cp csplit cut date tty uname unexpand uniq unlink uptime users vdir wc who i b h e z c o m j b 2 p t z c w f e m a i c c q i l h i t o t r p b n 6 c p e t p l f m g p n b m u 4 i d 2 g n s b d g e e a r x 12 / 14 k n e o n r 3 f o c t u s h m e 40 2 Processing Time (Seconds) Processing Time (Seconds) 30 1.5 20 1 10 0.5 0 0 m b o n g c t b s l m s m p l h g h g [ base64 basename cat chcon chgrp chmod chown chroot cksum unexpand uniq unlink uptime users vdir wc who whoami yes h i b c f w z t z p b j e m c 2 o a e m o t t q c i p e i p t i p h n l c r c 6 b n g p f l m d i u b 4 m g b s d 2 n g a e e r o x k n e 3 n r o t f c s u h e m

  17. Overhead for REAL and SPEC 8 2 2 40 Processing Time (Seconds) Processing Time (Seconds) Normalized Overhead (%) Normalized Overhead (%) 1.5 6 30 1.5 1 4 0.5 20 1 0 2 -0.5 10 0.5 0 -1 -2 0 -1.5 0 m b o n g c t b s l m s m p l h g h g [ base64 basename cat chcon chgrp chmod chown chroot cksum unexpand uniq unlink uptime users vdir wc who whoami yes p b g m g h s l i h m l b s c g b n t m o [ base64 basename cat cksum comm cp csplit h cut date i tty uname unexpand uniq unlink b uptime users vdir wc who b h c f w z t z p b j e m c 2 o e z c o m j 2 p t z c w f o t a t c e i m c e i m a t o t i i h q l c r i c b q 6 l h i t p p e p g t p f n c r 6 b l p f m n c g p e p n n p u l m b m u 4 i d d i b 4 m 2 g n s b g b s d 2 n g a d g a e e r e e r x o x e k k n e o n n r n r 3 o 3 f o t f c t u c u s s h h m e e m No increase in binary size after first disassemble-assemble cycle 13 / 14

  18. Conclusion Heuristic-based symbolization of memory references Uroboros 1 provides reassembleable disassembly Assumes availability of raw disassembly and function starting addresses Tested with gcc and Clang compiled binaries Limited support for C++ (need to parse DWARF) 1 Available at https://github.com/s3team/uroboros 14 / 14

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend