tracelet based code search in executables
play

Tracelet-Based Code Search in Executables Yaniv David & Eran - PowerPoint PPT Presentation

Tracelet-Based Code Search in Executables Yaniv David & Eran Yahav Technion,Israel 1 Finding vulnerable apps We can find identical or patched code int patchedFoo() int foo() { int alsoFoo() { { // buffer // buffer //


  1. Tracelet-Based Code Search in Executables Yaniv David & Eran Yahav Technion,Israel 1

  2. Finding vulnerable apps We can find identical or patched code int patchedFoo() int foo() { int alsoFoo() { { … … … // buffer // buffer // buffer // overflow // overflow // overflow … … … printf( … ) printf( … ) if ( … ) {} … … printf( … ) } } … } Where else does this vulnerable function exist? 2

  3. Finding vulnerable apps We can find identical or patched code int patchedFoo() int foo() { int alsoFoo() { { … … … // buffer // buffer // buffer // overflow // overflow What if we don ’ t have the source code? // overflow … … … printf( … ) printf( … ) if ( … ) {} … … printf( … ) } } … } Where else does this vulnerable function exist? 3

  4. binary functions ... mov [esp+18h+var_18],offset aD1 mov ecx,1 mov [esp+18h+var_14], ecx call _printf ... Search in Binaries Function 1 - wc Coreutils 6.12 Function 2 – diff Coreutils 7.15 4

  5. Search engine core Similarity score Measure Similarity • Fast & Scalable • Accurate (low false positives) 5

  6. Challenge1: similarity at the binary level printf( … )@foo(): printf (…)@ patchedFoo(): int patchedFoo() int foo() { { … … // buffer // buffer // overflow // overflow … … printf( … ) if ( … ) {} … printf( … ) … } } 6

  7. Challenge1: similarity at the binary level loc_401370: loc_401358: mov [esp+28h+var_28],offset aD1 mov [esp+18h+var_18],offset aD1 mov ebx,1 mov ecx,1 mov esi,4 mov [esp+18h+var_14], ecx mov [esp+28h+var_24], ebx call _printf call _printf - Offsets in memory - Register allocation - New Instruction 7

  8. Challenge2: similarity between different structures foo’s CFG: patchedFoo ’ s CFG: loc_401370: mov [esp+28h+var_28], offset aD1 mov ebx, 1 loc_401358: mov esi, 4 mov [esp+18h+var_18], offset aD1 mov [esp+28h+var_24], ebx mov ecx, 1 call _printf mov [esp+18h+var_14], ecx call _printf 8

  9. In this talk • A system for searching code in executables – Based on tracelet decomposition of each function – Works by solving a set of alignment and dataflow constraints with minimal violations on tracelets • An evaluation methodology based on tools from Information Retrieval – How do we know that our search engine is good? 9

  10. Our Approach Similarity score Pair tracelets Extract using alignment tracelets and rewrite Deal with structural changes Deal with the code changes 10

  11. Using tracelets to deal with CFG structural changes A tracelet is a fixed length sub-trace A1 A1 A1 For length=3, In this example we get: (A1,A2,A5) A3 A3 A3 mov [esp+18h+var_18], offset aD1 mov [esp+18h+var_18], offset aD1 mov [esp+18h+var_18], offset aD1 mov ecx, 1 mov ecx, 1 mov ecx, 1 (A1,A3,A5) mov [esp+18h+var_14], ecx mov [esp+18h+var_14], ecx mov [esp+18h+var_14], ecx A2 A2 A2 call _printf call _printf call _printf (A3,A4,A5) A4 A4 A4 A5 A5 A5 11

  12. Using tracelets calculate similarity between different structures foo ’ s CFG: patchedFoo ’ s CFG: B1 A1 B1 B1 B1 B3 B3 B3 B3 A3 mov [esp+28h+var_28], offset aD1 mov [esp+28h+var_28], offset aD1 mov [esp+28h+var_28], offset aD1 mov [esp+28h+var_28], offset aD1 mov ebx, 1 mov ebx, 1 mov ebx, 1 mov ebx, 1 mov esi, 4 mov esi, 4 mov esi, 4 mov esi, 4 mov [esp+28h+var_24], ebx mov [esp+28h+var_24], ebx mov [esp+28h+var_24], ebx mov [esp+28h+var_24], ebx B4 B4 B4 B4 B2 B2 B2 B2 call _printf loc_401358: call _printf call _printf call _printf A4 mov [esp+18h+var_18], offset aD1 mov ecx, 1 mov [esp+18h+var_14], ecx A2 call _printf B6 B6 B6 B6 B5 B5 B5 B5 A5 B7 B7 B7 B7 We need to find the corresponding tracelet 12

  13. Comparing tracelets foo ’ s tracelet A1 A1 A1 (1) mov [esp+18h+var_18], offset aD1 mov [esp+18h+var_18], offset aD1 (2) mov ecx, 1 A2 mov ecx, 1 mov [esp+18h+var_14], ecx A2 (3) mov [esp+18h+var_14], ecx A2 loc_401358: call _printf (4) call _printf mov [esp+18h+var_18], offset aD1 mov ecx, 1 mov [esp+18h+var_14], ecx A5 call _printf A5 Edit A5 distance patchedFoo ’ s tracelet: B1 B1 mov [esp+28h+var_28], offset aD1 (1) mov [esp+28h+var_18], offset aD1 mov ebx, 1 (2) mov ecx, 1 B2 B1 mov esi, 4 (X) mov esi, 4 mov [esp+28h+var_24], ebx (3) mov [esp+28h+var_14], ecx B2 B2 call _printf (4) call _printf mov [esp+28h+var_28], offset aD1 mov ebx, 1 B7 B7 mov esi, 4 mov [esp+28h+var_24], ebx call _printf B7 Graph -> Align & RW linear code 13

  14. Dealing with code changes: Align A1 B1 mov [esp+28h+var_28], offset aD1 mov [esp+18h+var_18], offset aD1 mov ebx, 1 mov ecx, 1 mov esi, 4 mov [esp+18h+var_14], ecx A2 mov [esp+28h+var_24], ebx call _printf B2 call _printf A5 B7 Align tracelets using specialized edit-distance A1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+28h+var_28], offset aD1 (2) mov ecx, 1 (2) mov ebx, 1 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+28h+var_24], ebx A2 B2 (4) call _printf (4) call _printf A5 B7 14

  15. Dealing with code changes: DFA A1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+28h+var_28], offset aD1 (2) mov ecx, 1 (2) mov ebx, 1 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+28h+var_24], ebx A2 B2 (4) call _printf (4) call _printf A5 B7 Analyze data flow Record live registers A1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+28h+var_28], offset aD1 (2) mov ecx, 1 (2) mov ebx, 1 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+28h+var_24], ebx A2 B2 (4) call _printf (4) call _printf A5 B7 15

  16. Dealing with code changes: Symbolize A1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+28h+var_28], offset aD1 (2) mov ecx, 1 (2) mov ebx, 1 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+28h+var_24], ebx A2 B2 (4) call _printf (4) call _printf A5 B7 move to symbolic names B1 A1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [r11 11+28h+m12 12], OF OF13 13 (2) mov ecx, 1 (2) mov r21 21, 1 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [r31 31+28h+m31 31], r33 33 A2 B2 (4) call _printf (4) call FC FC41 41 B7 A5 16

  17. Dealing with code changes: Solve & Rewrite A1 A1 B1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [r11 (1) mov [r11+28h+m12], OF13 11+28h+m12 12], OF OF13 13 (2) mov ecx, 1 (2) mov ecx, 1 (2) mov r21 (2) mov r21, 1 21, 1 (X) mov esi, 4 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+18h+var_14], ecx (3) mov [r31+28h+m32], r33 (3) mov [r31 31+28h+m31 31], r33 33 A2 A2 B2 B2 (4) call _printf (4) call _printf (4) call FC41 (4) call FC FC41 41 A5 A5 B7 B7 Use alignment & DFA Solve them using constraint to create constraints solver with minimal conflicts Data Flow constraints: Alignment constraints: r21=r33; r11=esp;F13= … ; m12=var_18; r11=r31; r21=ecx;e31=esp; m32=var_14; r33=ecx; FC41=_printf; 17

  18. Dealing with code changes: Solve & Rewrite A1 A1 B1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [r11+28h+m12], OF13 (1) mov [r11 11+28h+m12 12], OF OF13 13 (2) mov ecx, 1 (2) mov ecx, 1 (2) mov r21 (2) mov r21, 1 21, 1 (X) mov esi, 4 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+18h+var_14], ecx (3) mov [r31 (3) mov [r31+28h+m32], r33 31+28h+m31 31], r33 33 A2 A2 B2 B2 (4) call _printf (4) call _printf (4) call FC (4) call FC41 FC41 41 Distance after rewrite = 1 instruction A5 A5 B7 B7 delete + 2 value changes A1 B1 (1) mov [esp+18h+var_18], offset aD1 (1) mov [esp+28h+var_18], offset aD1 (2) mov ecx, 1 (2) mov ecx, 1 (X) mov esi, 4 (3) mov [esp+18h+var_14], ecx (3) mov [esp+28h+var_14], ecx A2 B2 (4) call _printf (4) call _printf A5 B7 18

  19. Our Approach Similarity score Pair tracelets Extract using alignment tracelets and rewrite Deal with structural changes Deal with the code changes 19

  20. From paired tracelets to function similarity score Ratio Containment 20

  21. Using tracelets calculate similarity between different structures foo ’ s CFG: patchedFoo ’ s CFG: A1 B1 B3 A3 mov [esp+28h+var_28], offset aD1 mov ebx, 1 mov esi, 4 mov [esp+28h+var_24], ebx B4 loc_401358: B2 A4 call _printf mov [esp+18h+var_18], offset aD1 mov ecx, 1 mov [esp+18h+var_14], ecx A2 call _printf B6 B5 A5 B7 (A1,A2,A5)~(B1,B2,B7),(A1,A3,A4)~(B1,B3,B4), (A3,A4,A5)~(B3,B4,B7),(A1,A3,A5) -> “ lost ” 21

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend