Context-Sensitive Analysis of Obfuscated x86 Executables Arun - PowerPoint PPT Presentation

Context-Sensitive Analysis of Obfuscated x86 Executables Arun Lakhotia(1), Davidson Boccardo(2), Anshuman Singh(1), and Aleardo Manacero Jr.(2) (1)University of Louisiana at Lafayette, USA (2)Paulista State University (UNESP), Brazil PEPM 2010 (01/19/10) Madrid, Spain 1 / 29

Disassembled binary with procedures: An example Main: Max: L1: PUSH 4 L9: MOV eax, [esp+4] L2: PUSH 2 L10: MOV ebx, [esp+8] L3: CALL Max L11: CMP eax, ebx L4: PUSH 6 L12: JG L14 L5: PUSH 4 L13: MOV eax, ebx L6: CALL Max L14: RET 8 L7: PUSH 0 L8: CALL ExitProcess 2 / 29

Context-sensitive interprocedural data-flow analysis - Classical methods Call-string Sharir and Pnueli’s k-call string method that maps a call string to its k -length suffix. Emami et al. ’s method of reducing recursive paths in a call string by a single node. Procedure summary Inlining 3 / 29

Assumptions of call string based approaches The program uses special instructions like call and ret that can be identified and paired statically. Valid/invalid paths in ICFG can be described in terms of appropriate pairing of call-ret edges. 4 / 29

Call and Ret are atomic Call and Ret are atomic in the sense that they: Transfer control; and Change context 5 / 29

Call obfuscation Call and Ret can be obfuscated using instructions that transfer control and change context separately. Call obfuscation can be employed by: Malware writers ⇒ to hide malicious behavior and to evade detection. Software developers ⇒ to protect intellectual property and to increase security. 6 / 29

Call obfuscation using push/ret instructions 7 / 29

Call obfuscation using push/jmp instructions 8 / 29

Motivation Classical call string based analyses are not directly applicable for context-sensitive analysis of binaries that have obfuscated calls. This is because: They are tied to semantics of procedure call and return statements of high-level languages, and therefore, call and ret instructions of assembly language. 9 / 29

Proposed method Objective: Design of a context-sensitive analysis based on program semantics and abstract interpretation resilient from call and ret obfuscation attacks. 10 / 29

Steps Context abstractions (generic versions independent of 1 ICFG based definitions) Context-trace semantics (can not rely on ICFG based 2 soundness results) Language (a simple assembly language without call and 3 ret) Stack context (to model change of context) 4 Transfer of control (is modeled using value-set analysis) 5 Derive the context sensitive analyzer from 6 context-insensitive one Prove soundness of our analysis 7 11 / 29

Generalized notion of contexts Opening and closing instructions are defined by: � ⊆ I - the set of instructions that open contexts. � ⊆ I - the set of instructions that close contexts. For example, in the conventional interprocedural analysis, the set � contains the call instructions and � contains the ret instructions. A context-string is a sequence of instructions that open contexts, represented by � ∗ ⊆ I ∗ . 12 / 29

k -context Let � k represent the set of sequences of opening contexts of length ≤ k and k + 1 length sequences created by appending ⊤ = � � to k -length sequences of opening contexts. An element of � k is called a k-context . We can establish a map α k : � ∗ → � k as: � ν if | ν | ≤ k α k ν � otherwise, where ∃ ν ′ : ν = ν k ∧ | ν k | = k . ν k . ⊤ � ∗ and � k form a Galois insertion with the abstraction map α k 13 / 29

ℓ -context � ℓ represent the set of sequence that open contexts with size ≤ | � | and have cyclic sequence represented by + . For example, the term c + represents all cyclic context strings from c to c . A map α ℓ : � ∗ → � ℓ can be defined such that � ∗ and � ℓ form a Galois insertion with the abstraction map α ℓ . 14 / 29

Examples of context abstractions Context 2-Context ℓ - Context c 2 c 1 c 2 c 1 c 2 c 1 c + c 2 c 3 c 2 c 1 c 2 c 3 ⊤ 2 c 1 c + c 2 c 4 c 2 c 1 c 2 c 4 ⊤ 2 c 1 c + c 2 c 4 ⊤ c 2 c 4 c 2 c 3 c 2 c 1 2 c 1 c + c 2 c 3 c 2 c 4 c 2 c 1 c 2 c 3 ⊤ 2 c 1 c 3 c + c 3 c 2 c 4 c 2 c 1 c 3 c 2 ⊤ 2 c 1 c + c 2 c 4 c 2 c 1 c 2 c 4 ⊤ 2 c 1 c 5 c + c 5 c 2 c 4 c 2 c 1 c 5 c 2 ⊤ 2 c 1 c 3 c 5 c + c 3 c 5 c 2 c 4 c 2 c 1 c 3 c 5 ⊤ 2 c 1 c + 5 c + c 5 c 5 c 2 c 4 c 2 c 1 c 5 c 5 ⊤ 2 c 1 c 2 c 1 c 2 c 1 c 2 c 1 ǫ ǫ ǫ 15 / 29

Context-trace semantics A context-trace is a pair of a context string and a trace ( ν, σ ) ∈ ( � ∗ × Σ ∗ ) . The set of all context-traces of a program, denoted by ℘ ( � ∗ × Σ ∗ ) ≡ � ∗ → ℘ (Σ ∗ ) , gives its context-trace semantics. 16 / 29

Language Syntax: e ::= l | z | r | ∗ r | e 1 op e 2 Syntactic Categories: ( op ∈ { + , − , ∗ , /, ... } ) b ∈ B (boolean expressions) b ::= true | false | e 1 < e 2 |¬ b | e , e ′ ∈ E (integer expressions) b 1 && b 2 i ::= l : esp = esp + e � eip = e ′ | i ∈ I (instructions) l , l ′ ∈ L ⊆ Z l : esp = e � eip = e ′ | (labels) l : ∗ esp = e � eip = e ′ | z ∈ Z (integers) l : r = e � eip = e ′ | p ∈ P (programs) l : ∗ r = e � eip = e ′ | r ∈ R (references) l : if ( b ) eip = e ; eip = l ′ p ::= seq ( i ) 17 / 29

Mapping Call and Ret in our language An instruction “ Call l ” may be mapped to the following sequence of instructions in our language: l 0 : esp = esp − 1 � eip = l 1 l 1 : ∗ esp = l 2 � eip = l where l 2 is the address of the instruction after the call instruction. It is not necessary that these two instructions appear contiguously in code. A Ret instruction may be mapped to the following instruction in our language: l 0 : esp = esp + 1 � eip = ∗ esp 18 / 29

Stack Context Idea: To have the information about instructions that manipulate the stack pointer as a part of the context. The stack context can be described as the set of opening contexts and closing contexts represented by domains � asm ⊆ I × N and � asm ⊆ I × N resp. that are defined as: � asm � { ( i , n ) | ∃ δ, δ ′ : δ ′ ∈ ( I i δ ) ∧ ( δ ′ esp ) = ( δ esp ) − n } � asm � { ( i , n ) | ∃ δ, δ ′ : δ ′ ∈ ( I i δ ) ∧ ( δ ′ esp ) = ( δ esp ) + n } A context string is a sequence belonging to � ∗ asm . Abstractions k-context and l-context can be applied to � ∗ asm to reduce the complexity of the analysis. 19 / 29

Transfer of control Upon execution of each instruction the instruction pointer register, eip , is updated with the label (a numerical value) of the next instruction to be executed. The value of the label may be computed from an expression involving values of registers and memory locations. We use Balakrishnan and Reps’ Value-Set Analysis ( VSA ) to recover information about the contents of memory locations and registers. VSA uses the domain RIC = N × Z × Z to abstract ℘ ( Z ) . 20 / 29

Derivation of a static analyzer The analysis is derived from a chain of Galois connections linking the concrete domain ℘ (( I × Store ) ∗ ) to the analysis domain I → AbStore . The steps of the derivation are: The set ℘ (( I × Store ) ∗ ) , called set of traces, is approximated to trace of sets, represented by ( ℘ ( I × Store )) ∗ . The trace of sets is equivalent to ( I → ℘ ( Store )) ∗ . This sequence of mapping of instructions to set of stores can be approximated to I → ℘ ( Store ) . Finally, a Galois connection between ℘ ( Store ) and AbStore completes the analysis. 21 / 29

Deriving the context-sensitive analyzer Π asm Starting from concrete domain � ∗ → ℘ (Σ ∗ ) and the domain − − − asm for Venable et al. ’s context insensitive analyzer I → R + L → ASG × RIC , we obtain our context sensitive ℓ analyzer analyzer ˆ � asm → I → R + L → RIC using the following results: ℓ asm ⊑ ˆ � ∗ � 1 asm ℘ ( Z ) ⊑ RIC 2 � ∗ Π − → ℘ (Σ ∗ ) ≡ ℘ (Σ ∗ ) 3 22 / 29

Soundness The concrete context-trace semantics is given by the least fixpoint of the function Π asm Π asm F c : � ∗ − − − → ℘ (Σ ∗ ) − → � ∗ − − − → ℘ (Σ ∗ ) ,where asm asm Σ = I × R + L → Z . The context-trace semantics of the context-sensitive analyzer is given by the least fixpoint of the function F # : ℓ ℓ (ˆ → (ˆ � asm → I → R + L → RIC ) − � asm → I → R + L → RIC ) . 23 / 29

Soundness Lemma ℓ Π asm → ℘ (Σ ∗ ) ⊑ ˆ � ∗ − − − � asm → I → R + L → RIC. asm It follows from the lemma and the fixpoint transfer theorem that F # is a sound approximation of F c . 24 / 29

DOC (Detector of Obfuscated Calls) We implemented our derived analysis in a tool called DOC. We studied the improvements in analysis of obfuscated code resulting from the use of our ℓ -context-sensitive version of Venable et al. ’s analysis against its context-insensitive version. We performed the analysis using two sets of programs: Programs in the first set were hand-crafted with a certain known obfuscated calling structure. The second set contains W32.Evol.a, a metamorphic virus that employs call obfuscation. 25 / 29

Time evaluation 26 / 29

Size of sets evaluation 27 / 29

Histogram of evaluations for Win32.Evol.a 28 / 29

Context-Sensitive Analysis of Obfuscated x86 Executables Arun - PowerPoint PPT Presentation

Context-Sensitive Analysis of Obfuscated x86 Executables Arun Lakhotia(1), Davidson Boccardo(2), Anshuman Singh(1), and Aleardo Manacero Jr.(2) (1)University of Louisiana at Lafayette, USA (2)Paulista State University (UNESP), Brazil PEPM 2010

x86-32 and x86-64 Assembly (Part 2) (I know Kung-Fu !) Emmanuel Fleury

x86 Introduction Philipp Koehn 25 October 2019 Philipp Koehn Computer Systems Fundamentals: x86

x86 basics ISA context and x86 history Translation tools: C --> assembly <--> machine

Context Sensitivity Example of a CSG Informatics 2A: Lecture 26 2 Context in Programming

A6: Sensitive Data Exposure A6 Sensitive Data Exposure Sensitive data stored or transmitted

Analyzing Memory Accesses in x86 Executables Gogul Balakrishnan Thomas Reps University of

Virtual Memory in x86 Nima Honarmand Fall 2017 :: CSE 306 x86 Processor Modes Real mode

Context Sensitive Solutions Context Context Sensitive Solutions (CSS) is a collaborative approach

Context-sensitive Analysis Attribute Grammar And Type Checking cs5363 1 Context-Sensitive

Compiler CS 449 Executables and gcc Object Linking Preprocessed C source files source

Process Control processes and executables job control ps and kill top at

Oak Hill Parkway Oak Hill Parkway Context Sensitive Solutions CSS Workshop No. 1 October 9,

Context-sensitive languages Informatics 2A: Lecture 28 Alex Simpson School of Informatics

Context-sensitive languages Informatics 2A: Lecture 28 John Longley School of Informatics

dirtbox a x86/Windows dirtbox, a x86/Windows Emulator Georg Wicherski Virus Analyst, Global

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. Farley George Mason University

Crypto Mining CHRISTOS HADJISTYLLIS EPL682 - ADVANCED SECURITY TOPICS, SPRING 2018/2019

Before the FEDERAL COMMUNICATIONS COMMISSION Washington, DC 20554 In the Matter of ) ) WC

TRALSE POSITIVE Simple Methods for Confirming IDS/IPS Alerts Introduc3on

M eeting Critical Security Objectives with Security-Enhanced Linux Peter A. Loscocco

Runnable JAR archives % java -cp eharold.jar MainClassName Inner Classes Inner Classes

Applets as front-ends to server-side programming DD1335 (Lecture 7) Basic Internet Programming

Find your own Applet! AppletPro helps users efficiently search and manage applets from different

HTML HTML is the HyperText Markup Language. HTML files are text files featuring

Context-Sensitive Analysis of Obfuscated x86 Executables Arun - PowerPoint PPT Presentation

Context-Sensitive Analysis of Obfuscated x86 Executables Arun Lakhotia(1), Davidson Boccardo(2), Anshuman Singh(1), and Aleardo Manacero Jr.(2) (1)University of Louisiana at Lafayette, USA (2)Paulista State University (UNESP), Brazil PEPM 2010

x86-32 and x86-64 Assembly (Part 2) (I know Kung-Fu !) Emmanuel Fleury

x86 Introduction Philipp Koehn 25 October 2019 Philipp Koehn Computer Systems Fundamentals: x86

x86 basics ISA context and x86 history Translation tools: C --&gt; assembly &lt;--&gt; machine

Context Sensitivity Example of a CSG Informatics 2A: Lecture 26 2 Context in Programming

A6: Sensitive Data Exposure A6 Sensitive Data Exposure Sensitive data stored or transmitted

Analyzing Memory Accesses in x86 Executables Gogul Balakrishnan Thomas Reps University of

Virtual Memory in x86 Nima Honarmand Fall 2017 :: CSE 306 x86 Processor Modes Real mode

Context Sensitive Solutions Context Context Sensitive Solutions (CSS) is a collaborative approach

Context-sensitive Analysis Attribute Grammar And Type Checking cs5363 1 Context-Sensitive

Compiler CS 449 Executables and gcc Object Linking Preprocessed C source files source

Process Control processes and executables job control ps and kill top at

Oak Hill Parkway Oak Hill Parkway Context Sensitive Solutions CSS Workshop No. 1 October 9,

Context-sensitive languages Informatics 2A: Lecture 28 Alex Simpson School of Informatics

Context-sensitive languages Informatics 2A: Lecture 28 John Longley School of Informatics

dirtbox a x86/Windows dirtbox, a x86/Windows Emulator Georg Wicherski Virus Analyst, Global

Toward Automated Forensic Analysis of Obfuscated Malware Ryan J. Farley George Mason University

Crypto Mining CHRISTOS HADJISTYLLIS EPL682 - ADVANCED SECURITY TOPICS, SPRING 2018/2019

Before the FEDERAL COMMUNICATIONS COMMISSION Washington, DC 20554 In the Matter of ) ) WC

TRALSE POSITIVE Simple Methods for Confirming IDS/IPS Alerts Introduc3on

M eeting Critical Security Objectives with Security-Enhanced Linux Peter A. Loscocco

Runnable JAR archives % java -cp eharold.jar MainClassName Inner Classes Inner Classes

Applets as front-ends to server-side programming DD1335 (Lecture 7) Basic Internet Programming

Find your own Applet! AppletPro helps users efficiently search and manage applets from different

HTML HTML is the HyperText Markup Language. HTML files are text files featuring

x86 basics ISA context and x86 history Translation tools: C --> assembly <--> machine