resource a framework for online matching of assembly with
play

RESource: A Framework for Online Matching of Assembly with Open - PowerPoint PPT Presentation

RESource: A Framework for Online Matching of Assembly with Open Source Code Ashkan Rahimian*, Philippe Charland**, Stere Preda*, and Mourad Debbabi* *Computer Security Laboratory, CIISE Concordia University, Montreal, Quebec, Canada **Mission


  1. RESource: A Framework for Online Matching of Assembly with Open Source Code Ashkan Rahimian*, Philippe Charland**, Stere Preda*, and Mourad Debbabi* *Computer Security Laboratory, CIISE Concordia University, Montreal, Quebec, Canada **Mission Critical Cyber Security Section, Defence R&D Canada - Valcartier, Quebec, Canada ETS – Montréal Oct. 26 th 2012

  2. Outline • Background • Motivation • Methodology • Case study • Conclusion

  3. Background • Software Reverse Engineering • Problem : Binary (Assembly) to Source Matching • Domain : Malware Analysis • Facts : Code Reuse • Code Search Engines • Shared Library Imports and Utilization • E.g., cryptographic libraries • Free and Open Source Software (FOSS) • Assumptions: No obfuscation, De-obfuscated code

  4. Background Malware might be built on top of standard components. – e.g. VCL, MFC, … Malware developers use specific development environment. – MS Visual Studio, Borland (Embarcadero), Eclipse, … Some code may contain fingerprints of the programmer. – Executable File Malware authors may utilize free and open-source software. – Encryption algorithms Malware often call low-level kernel APIs. – User level vs. Kernel level, Bypass common signature templates

  5. Outline • Background • Motivation • Methodology • Case study • Conclusion

  6. Motivation • 26 million new malware samples identified in 2011 [1] • Software reverse engineering is a manually intensive and time- consuming process • Malware authors share source code • Code sharing websites, Forums, etc. • E.g. Flame and Stuxnet are linked • Open source libraries widespread • Koders, Ohloh, Antepidia, Krugle, Google Code, etc. • Software reverse engineers need Automated Tools • Mapping ASM to Source Code • First attempt: RE-Google [1] Panda Security, “PandaLabs Annual Report: 2011 Summary,” Jan. 2011; http://press.pandasecurity.com/wp-content/uploads/2012/01/Annual-Report- PandaLabs-2011.pdf.

  7. Outline • Background • Motivation • Methodology • Case study • Conclusion

  8. Methodology (1/4) • Static Code Analysis • Input: ASM file obtained with IDA Pro Query Generator Code Search Engines S.E. Driver Request Processing Engine • RESource Response Parser Engine Data Extraction Feature Extraction Offline Analysis ASM

  9. Methodology (2/4) • Features Extraction • Something exploitable at both ASM and Source Code levels • E.g., function names int sum (int a, int b){ return a + b; • Types of Features } • Immediate Values (Constants) sum : push %ebp • Strings mov %esp,%ebp mov 0xc(%ebp),%eax • Functions Imports add 0x8(%ebp),%eax pop %ebp • Exports (By name, Ordinal) ret • Function Prototypes (Signatures) • Stack Frame Information (Offline Analysis) • Var., Ret. Values, Parameters, Arguments • Size, Number, Sequence • Register utilization

  10. Methodology (3/4) • Processing Engine • Query Building for Code Search Engines • Encoding HTTP Requests • Query Filtering (Removing Special Chars) • Parsing and Information Extraction • Filenames and URLs • Pre-defined Regex Template • Online Analysis • Search Code Repositories for a close match • Specify programming languages as part of Request

  11. Methodology (4/4) • Offline Analysis • Information about function prototypes: • Complement Online Analysis Results • Lower level analysis for each function • Function stack frame analysis • Dictionary of low-level system calls (Windows API) • A statement for describing the overall functionality • Return values, Number and size of arguments • Number and size of parameters and type information • Rank the results best of typing information • Output: ASM file with Comments, Analysis Report

  12. Implementation (1/2) • Plug-in for IDA Pro • Execution Flow • Python 2.7.3, IDAPython 1.5.2, IDA Pro 6.1+

  13. Implementation (2/2) • Example for query building: • Multiple search engine support • Interleaving algorithm (Optimizing Time) • The results are added as comments in the ASM file • for both Online and Offline analysis

  14. Outline • Background • Motivation • Methodology • Case study • Conclusion

  15. Case Study (1/2) • PreciseCalc Project • Open source project • Hosted on Sourceforge • Using the Koders seach engine • Several full matches found • Matches for mathematical functions

  16. Case Study (2/2) • Malware Analysis • Low level APIs matching • Offline Analysis proves more useful • Gives insight into the potential code output • Screenshots • Example1: File I/O • Example2: Screen Capture • Example3: Network Connectivity • Example4: Loading Libraries • Example5: Services • Example6: Low-level Network

  17. Example 1. File I/O

  18. Example 2. Screen Capture

  19. Example 3. Network Connectivity

  20. Example 4. Loading Libraries

  21. Example 5. Services

  22. Example 6. Low-level Network Con.

  23. Outline • Background • Motivation • Methodology • Case study • Conclusion

  24. Conclusion • Improved the idea of Re-Google • Offline Analysis, Multiple Search Engines • Better results handling • Automated tool for reverse engineers • Malware Analysis • Limitation • Quality of output depends on the repositories • Currently optimized of C/C++ • Some features may not be always available • For validation, we need all source files (CFG)

  25. Q&A • Thank you. • Q&A?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend