sattva sparsity inspired
play

SATTVA: SpArsiTy inspired classificaTion of malware VAriants - PowerPoint PPT Presentation

SATTVA: SpArsiTy inspired classificaTion of malware VAriants Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara Sattva ( ) means Purity 1 Introduction


  1. SATTVA: SpArsiTy inspired classificaTion of malware VAriants Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara Sattva ( सत्थत्थव ) means Purity 1

  2. Introduction • The number of malware is increasing! • In 2014, Kaspersky Lab reported they process on average 325,000 malware per day • The main reason for such a deluge is: malware mutation: the process of creating new malware from existing ones http://usa.kaspersky.com/about-us/press-center/press-releases/kaspersky-lab-detecting-325000- 2 new-malicious-files-every-day

  3. Introduction • Variants are created either by making small changes to the malware code or by changing the structure of the code using executable packers • Based on their function, variants are classified into different malware families • Identifying the family of a malware plays an important role in understanding and thwarting new attacks 3

  4. Examples of malware variants Variants of Family Alueron.gen!J Variants of Family Fakerean

  5. Problem Statement • Consider a Malware Dataset comprising of: • N labelled malware • L malware families • P malware per family • Problem is to identify the family of an unknown malware 𝐯 5

  6. Related Work • Static Code analysis based features • Disassembles the executable code and studies its control flow • Suffers from obfuscation (packing) • Dynamic analysis based features • Executes malware in a virtual environment and studies its behavior • Time consuming and many recent aware are VM aware • Statistical and Content based features • Analyzes statistical patterns based on the malware content • n-grams, fuzzy hashing, Image similarity based features 6

  7. Statistical and Content based Features • n-grams • n-grams are computed either on raw bytes or instructions • n > 1 which makes this computationally expensive • Fuzzy hashing (ssdeep, pehash) • Fuzzy hashes are computed on raw bytes or PE parsed data • Does not work well on packed malware • Image similarity • Malware binaries are converted to digital images • Image Similarity features (GIST) are computed on the malware Malware Images: Visualization and Automatic Classification, L. Nataraj, S.Karthikeyan, G. Jacob, B.S. Manjunath, VizSec 2011 7

  8. Image Similarity based Features Sub-band Sub-block L-D Feature Filtering Averaging Vector N = 1 . . . . . . . . . kL-D Feature Malware Sub-band Sub-block L-D Feature Resize Vector Image Filtering Averaging Vector . . . . . . . . . Sub-band Sub-block L-D Feature Filtering Averaging Vector N = k N = k

  9. Image Similarity based Features • Pros • Fast and compact • Better than static code based analysis (works on both packed and unpacked malware) • Comparable with dynamic analysis • Cons • Arbitrary column cutting and reshaping • Images are resized to a small size for normalization which introduces interpolation artifacts • A large malware image, on resizing, lose lots of information 9

  10. Approach – Signal Representation • Let 𝐲 be the signal representation of a malware sample • Every entry of 𝐲 is a byte value of the sample in the range [0,255] 10

  11. Variants in Signal Representation Variant 1 Variant 2 Variants of recently exposed Regin malware. Differ only in 7 out of 13,284 (0.0527%) 11

  12. Approach – Dataset as a Matrix • Since malware are of different sizes, the vectors are zero padded such that all vectors are of length M , the number of bytes in the largest malware. • We now represent the dataset as an 𝑁 × 𝑂 matrix A , where every column of A is a malware sample 12

  13. Approach – Dataset as a Matrix • Further, for every family k , ( k = 1,2,…,L), we define an M x P block matrix 𝐵 𝑙 : 𝐁 𝑙 = [𝐲 𝑙1 , 𝐲 𝑙2 , … , 𝐲 𝑙𝑄 ] • 𝐁 can now be represented as a concatenation of block matrices: 𝐁 = [𝐁 1 , 𝐁 2 , … , 𝐁 𝑀 ] 13

  14. Approach – Sparse Linear Combination • Let 𝐯 ∈ R 𝑁 be an unknown malware test sample whose family is to be determined. • Then 𝐯 can be represented as a sparse linear combination of the training samples: 𝑀 𝑄 𝐯 = 𝛽 𝑗𝑘 𝒚 𝑗𝑘 = 𝐁𝜷 𝑗=1 𝑘=1 where 𝜷 = [ 𝛽 11 , 𝛽 12 , … , 𝛽 𝑗𝑘 , … , 𝛽 𝑀𝑄 ] 𝑈 is the coefficient vector 14

  15. Approach – Sparse Linear Combination 𝐯 = 𝐁𝜷 u 1 α 1 u 2 α 2 . 𝐵 1 𝐵 2 𝐵 𝑂 = . . . . . . u 𝑁 . . 𝑁 × 𝑂 𝑁 × 1 α 𝑂 𝑂 × 1 𝛽 u A Matrix of Sparse Unknown training samples Coefficient Vector test sample 15

  16. Illustration • Let the unknown malware belong to family 2 = 𝛽 21 + 𝛽 22 𝜷 = [ 0,0 … , 𝛽 21 , 𝛽 22 , … , 0 , 0] 𝑈 16

  17. Approach – Sparse Solution • Sparsest solution can be obtained by Basis Pursuit by solving the 𝑚 1 - norm minimization problem: 𝛽 ′ ∈R 𝑂 ||𝜷 ′ || 𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐯 = 𝐁𝜷 ′ 𝜷 = argmin where ||. || 𝟐 represents the 𝑚 1 - norm 17

  18. Approach – Minimal Residue • To estimate the family of 𝐯 , we compute residues for every family in the training set and then choose the family with minimal residue: 𝑠 𝑙 𝐯 = ||𝐯 − 𝐁 ( 𝜷) || 𝟑 𝒍 𝐝 = argmin 𝑠 𝑙 𝐯 𝑙 where 𝒍 ( 𝜷) is the characteristic function that selects coefficients from 𝜷 that are associated with family k and zeros out the rest, 𝐝 is the index of the estimated family 18

  19. Random Projections • Dimensionality of malware M can be high • We project all the malware to lower dimensions using Random Projections: 𝐱 = 𝐒𝐯 = 𝐒𝐁𝜷 where 𝐒 is a 𝐸 × 𝑁 pseudo random matrix ( 𝐸 ≪ 𝑁) and 𝐱 is a 𝐸 × 1 lower dimensional vector 19

  20. Sparse Solution • The system of equations are underdetermined and can be solved using 𝑚 1 - norm minimization: 𝛽 ′ ∈R 𝑂 ||𝜷 ′ || 𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐱 = 𝐒𝐁𝜷 ′ 𝜷 = argmin w 1 α 1 . 𝑆𝐵 1 𝑆𝐵 𝑂 α 2 . = . . . . w 𝐸 . 𝐸 × 1 D × 𝑂 . . α 𝑂 𝑂 × 1 𝛽 w RA 20

  21. Complete Approach Malware Signal Representation Data Sparse Modeling w 1 u 1 α 1 α 1 u 2 . 𝑆𝐵 1 𝑆𝐵 𝑂 α 2 α 2 . = . . . . . 𝐵 1 𝐵 2 𝐵 𝑂 = w 𝐸 . . . . . . . 𝐸 × 1 D × 𝑂 Random . u 𝑁 . Projections . . 𝑁 × 𝑂 𝑁 × 1 α 𝑂 α 𝑂 𝑂 × 1 𝑂 × 1 𝛽 𝛽 w RA A u

  22. Modeling Malware Variants • New variants are created from existing malware samples by making small changes and both variants share code • We model a malware variant as: 𝐯 ′ = 𝐯 + 𝐟 𝐯 = 𝐁𝜷 + 𝐟 𝐯 where 𝐯 ′ is the vector representing malware variant and 𝐟 𝐯 is the error vector 22

  23. Modeling Malware Variants • This can be expressed in matrix form as: 𝜷 𝐯 ′ = 𝐁 𝐉 𝑵 𝐟 𝐯 = 𝐂 𝐯 𝐭 𝐯 where 𝐂 𝐯 = 𝐁 𝐉 𝑵 is an 𝑁 × 𝑂 + 𝑁 matrix, 𝐉 𝑵 is an 𝑁 × 𝑁 Identity matrix, and 𝐭 𝐯 = 𝜷 𝐟 𝐯 𝑼 • This ensures that the above system of equations is always underdetermined and spare solutions can be obtained 23

  24. Sparse Solutions in Lower Dimensions 𝛽 ′ ∈R 𝑂 ||𝜷 ′ || 𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐱 ′ = 𝐂 𝐱 𝐭 𝐱 𝜷 = argmin 𝑙 𝐱 ′ = ||𝐱 ′ − 𝐂 𝐱 𝐭 𝐱 𝑠 ( 𝜷) || 𝟑 𝒍 𝑙 𝐱 ′ 𝐝 = argmin 𝑠 𝑙 where 𝐱 ′ = 𝐱 + 𝒇 𝐱 = 𝐒𝐯 + 𝒇 𝐱 , 𝐂 𝐱 = 𝐒𝐁𝜷 𝐉 𝑬 is a 𝐸 × 𝑂 + 𝐸 matrix, 𝐉 𝑬 is a 𝐸 × 𝐸 Identity matrix and 𝐭 𝐱 = 𝜷 𝐟 𝐱 𝑼 . 24

  25. Experiments • Two datasets: Malimg and Malheur • Malimg Dataset: 25 families, 80 samples per family, M = 840,960. • Malheur Dataset: 23 families, 20 samples per family, M = 3,364,864. • Vary Randomly projected dimensions D in {48,96,128,256,512} • We compare with GIST features of same dimensions • Two Classification methods: Sparse Representation based Classification (SRC) and Nearest Neighbor (NN) Classifier • 80% Training and 20% Testing 25

  26. Results on Malimg Dataset 100 RP+NN GIST+NN GIST+SRC 95 RP+SRC Accuracy 90 85 80 0 50 100 150 200 250 300 350 400 450 500 550 Dimensions 26

  27. Results on Malimg Dataset • Best classification accuracy of 92.83% for combination of Random Projections (RP) + Sparse Representation based Classification (SRC) at D = 512 • Accuracies of GIST features for both classifiers almost the same in the range 88% - 90% • Lowest accuracy for RP + Nearest Neighbor (NN) classifier 27

  28. Results on Malheur Dataset 100 95 Accuracy 90 RP+NN GIST+NN GIST+SRC RP+SRC 85 80 0 50 100 150 200 250 300 350 400 450 500 550 Dimensions 28

  29. Results on Malheur Dataset • Again, best classification accuracy of 98.66% for combination of Random Projections (RP) + Sparse Representation based Classification (SRC) at D = 512 • Accuracies of GIST features for both classifiers almost the same at around 93%. • However, the combination of RP + Nearest Neighbor (NN) classifier also had high accuracy of 96.06% - Projections Closely Packed 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend