SATTVA: SpArsiTy inspired classificaTion of malware VAriants - PowerPoint PPT Presentation

SATTVA: SpArsiTy inspired classificaTion of malware VAriants Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara Sattva ( सत्थत्थव ) means Purity 1

Introduction • The number of malware is increasing! • In 2014, Kaspersky Lab reported they process on average 325,000 malware per day • The main reason for such a deluge is: malware mutation: the process of creating new malware from existing ones http://usa.kaspersky.com/about-us/press-center/press-releases/kaspersky-lab-detecting-325000- 2 new-malicious-files-every-day

Introduction • Variants are created either by making small changes to the malware code or by changing the structure of the code using executable packers • Based on their function, variants are classified into different malware families • Identifying the family of a malware plays an important role in understanding and thwarting new attacks 3

Examples of malware variants Variants of Family Alueron.gen!J Variants of Family Fakerean

Problem Statement • Consider a Malware Dataset comprising of: • N labelled malware • L malware families • P malware per family • Problem is to identify the family of an unknown malware 𝐯 5

Related Work • Static Code analysis based features • Disassembles the executable code and studies its control flow • Suffers from obfuscation (packing) • Dynamic analysis based features • Executes malware in a virtual environment and studies its behavior • Time consuming and many recent aware are VM aware • Statistical and Content based features • Analyzes statistical patterns based on the malware content • n-grams, fuzzy hashing, Image similarity based features 6

Statistical and Content based Features • n-grams • n-grams are computed either on raw bytes or instructions • n > 1 which makes this computationally expensive • Fuzzy hashing (ssdeep, pehash) • Fuzzy hashes are computed on raw bytes or PE parsed data • Does not work well on packed malware • Image similarity • Malware binaries are converted to digital images • Image Similarity features (GIST) are computed on the malware Malware Images: Visualization and Automatic Classification, L. Nataraj, S.Karthikeyan, G. Jacob, B.S. Manjunath, VizSec 2011 7

Image Similarity based Features Sub-band Sub-block L-D Feature Filtering Averaging Vector N = 1 . . . . . . . . . kL-D Feature Malware Sub-band Sub-block L-D Feature Resize Vector Image Filtering Averaging Vector . . . . . . . . . Sub-band Sub-block L-D Feature Filtering Averaging Vector N = k N = k

Image Similarity based Features • Pros • Fast and compact • Better than static code based analysis (works on both packed and unpacked malware) • Comparable with dynamic analysis • Cons • Arbitrary column cutting and reshaping • Images are resized to a small size for normalization which introduces interpolation artifacts • A large malware image, on resizing, lose lots of information 9

Approach – Signal Representation • Let 𝐲 be the signal representation of a malware sample • Every entry of 𝐲 is a byte value of the sample in the range [0,255] 10

Variants in Signal Representation Variant 1 Variant 2 Variants of recently exposed Regin malware. Differ only in 7 out of 13,284 (0.0527%) 11

Approach – Dataset as a Matrix • Since malware are of different sizes, the vectors are zero padded such that all vectors are of length M , the number of bytes in the largest malware. • We now represent the dataset as an 𝑁 × 𝑂 matrix A , where every column of A is a malware sample 12

Approach – Dataset as a Matrix • Further, for every family k , ( k = 1,2,…,L), we define an M x P block matrix 𝐵 𝑙 : 𝐁 𝑙 = [𝐲 𝑙1 , 𝐲 𝑙2 , … , 𝐲 𝑙𝑄 ] • 𝐁 can now be represented as a concatenation of block matrices: 𝐁 = [𝐁 1 , 𝐁 2 , … , 𝐁 𝑀 ] 13

Approach – Sparse Linear Combination • Let 𝐯 ∈ R 𝑁 be an unknown malware test sample whose family is to be determined. • Then 𝐯 can be represented as a sparse linear combination of the training samples: 𝑀 𝑄 𝐯 = 𝛽 𝑗𝑘 𝒚 𝑗𝑘 = 𝐁𝜷 𝑗=1 𝑘=1 where 𝜷 = [ 𝛽 11 , 𝛽 12 , … , 𝛽 𝑗𝑘 , … , 𝛽 𝑀𝑄 ] 𝑈 is the coefficient vector 14

Approach – Sparse Linear Combination 𝐯 = 𝐁𝜷 u 1 α 1 u 2 α 2 . 𝐵 1 𝐵 2 𝐵 𝑂 = . . . . . . u 𝑁 . . 𝑁 × 𝑂 𝑁 × 1 α 𝑂 𝑂 × 1 𝛽 u A Matrix of Sparse Unknown training samples Coefficient Vector test sample 15

Illustration • Let the unknown malware belong to family 2 = 𝛽 21 + 𝛽 22 𝜷 = [ 0,0 … , 𝛽 21 , 𝛽 22 , … , 0 , 0] 𝑈 16

Approach – Sparse Solution • Sparsest solution can be obtained by Basis Pursuit by solving the 𝑚 1 - norm minimization problem: 𝛽 ′ ∈R 𝑂 ||𝜷 ′ || 𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐯 = 𝐁𝜷 ′ 𝜷 = argmin where ||. || 𝟐 represents the 𝑚 1 - norm 17

Approach – Minimal Residue • To estimate the family of 𝐯 , we compute residues for every family in the training set and then choose the family with minimal residue: 𝑠 𝑙 𝐯 = ||𝐯 − 𝐁 ( 𝜷) || 𝟑 𝒍 𝐝 = argmin 𝑠 𝑙 𝐯 𝑙 where 𝒍 ( 𝜷) is the characteristic function that selects coefficients from 𝜷 that are associated with family k and zeros out the rest, 𝐝 is the index of the estimated family 18

Random Projections • Dimensionality of malware M can be high • We project all the malware to lower dimensions using Random Projections: 𝐱 = 𝐒𝐯 = 𝐒𝐁𝜷 where 𝐒 is a 𝐸 × 𝑁 pseudo random matrix ( 𝐸 ≪ 𝑁) and 𝐱 is a 𝐸 × 1 lower dimensional vector 19

Sparse Solution • The system of equations are underdetermined and can be solved using 𝑚 1 - norm minimization: 𝛽 ′ ∈R 𝑂 ||𝜷 ′ || 𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐱 = 𝐒𝐁𝜷 ′ 𝜷 = argmin w 1 α 1 . 𝑆𝐵 1 𝑆𝐵 𝑂 α 2 . = . . . . w 𝐸 . 𝐸 × 1 D × 𝑂 . . α 𝑂 𝑂 × 1 𝛽 w RA 20

Complete Approach Malware Signal Representation Data Sparse Modeling w 1 u 1 α 1 α 1 u 2 . 𝑆𝐵 1 𝑆𝐵 𝑂 α 2 α 2 . = . . . . . 𝐵 1 𝐵 2 𝐵 𝑂 = w 𝐸 . . . . . . . 𝐸 × 1 D × 𝑂 Random . u 𝑁 . Projections . . 𝑁 × 𝑂 𝑁 × 1 α 𝑂 α 𝑂 𝑂 × 1 𝑂 × 1 𝛽 𝛽 w RA A u

Modeling Malware Variants • New variants are created from existing malware samples by making small changes and both variants share code • We model a malware variant as: 𝐯 ′ = 𝐯 + 𝐟 𝐯 = 𝐁𝜷 + 𝐟 𝐯 where 𝐯 ′ is the vector representing malware variant and 𝐟 𝐯 is the error vector 22

Modeling Malware Variants • This can be expressed in matrix form as: 𝜷 𝐯 ′ = 𝐁 𝐉 𝑵 𝐟 𝐯 = 𝐂 𝐯 𝐭 𝐯 where 𝐂 𝐯 = 𝐁 𝐉 𝑵 is an 𝑁 × 𝑂 + 𝑁 matrix, 𝐉 𝑵 is an 𝑁 × 𝑁 Identity matrix, and 𝐭 𝐯 = 𝜷 𝐟 𝐯 𝑼 • This ensures that the above system of equations is always underdetermined and spare solutions can be obtained 23

Sparse Solutions in Lower Dimensions 𝛽 ′ ∈R 𝑂 ||𝜷 ′ || 𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐱 ′ = 𝐂 𝐱 𝐭 𝐱 𝜷 = argmin 𝑙 𝐱 ′ = ||𝐱 ′ − 𝐂 𝐱 𝐭 𝐱 𝑠 ( 𝜷) || 𝟑 𝒍 𝑙 𝐱 ′ 𝐝 = argmin 𝑠 𝑙 where 𝐱 ′ = 𝐱 + 𝒇 𝐱 = 𝐒𝐯 + 𝒇 𝐱 , 𝐂 𝐱 = 𝐒𝐁𝜷 𝐉 𝑬 is a 𝐸 × 𝑂 + 𝐸 matrix, 𝐉 𝑬 is a 𝐸 × 𝐸 Identity matrix and 𝐭 𝐱 = 𝜷 𝐟 𝐱 𝑼 . 24

Experiments • Two datasets: Malimg and Malheur • Malimg Dataset: 25 families, 80 samples per family, M = 840,960. • Malheur Dataset: 23 families, 20 samples per family, M = 3,364,864. • Vary Randomly projected dimensions D in {48,96,128,256,512} • We compare with GIST features of same dimensions • Two Classification methods: Sparse Representation based Classification (SRC) and Nearest Neighbor (NN) Classifier • 80% Training and 20% Testing 25

Results on Malimg Dataset 100 RP+NN GIST+NN GIST+SRC 95 RP+SRC Accuracy 90 85 80 0 50 100 150 200 250 300 350 400 450 500 550 Dimensions 26

Results on Malimg Dataset • Best classification accuracy of 92.83% for combination of Random Projections (RP) + Sparse Representation based Classification (SRC) at D = 512 • Accuracies of GIST features for both classifiers almost the same in the range 88% - 90% • Lowest accuracy for RP + Nearest Neighbor (NN) classifier 27

Results on Malheur Dataset 100 95 Accuracy 90 RP+NN GIST+NN GIST+SRC RP+SRC 85 80 0 50 100 150 200 250 300 350 400 450 500 550 Dimensions 28

Results on Malheur Dataset • Again, best classification accuracy of 98.66% for combination of Random Projections (RP) + Sparse Representation based Classification (SRC) at D = 512 • Accuracies of GIST features for both classifiers almost the same at around 93%. • However, the combination of RP + Nearest Neighbor (NN) classifier also had high accuracy of 96.06% - Projections Closely Packed 29

SATTVA: SpArsiTy inspired classificaTion of malware VAriants - PowerPoint PPT Presentation

SATTVA: SpArsiTy inspired classificaTion of malware VAriants Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara Sattva ( ) means Purity 1 Introduction

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

The Role of CAR T cells in DLBCL Sattva S. Neelapu, M.D. Professor and Deputy Chair ad interim

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Biologically I nspired Hardware System What is Bio-Inspired System? Why do we need

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

RAMSEY CLASSES SPARSITY AND MODELS FOR FINITE - NESIETPIIL JAROSLAV UNIVERSITY CHARLES

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

Sparsity-aware sampling theorems and applications Rachel Ward University of Texas at Austin

Structural Sparsity Jaroslav Neetil Patrice Ossona de Mendez Charles University CAMS,

The Sparsity Gap Joel A. Tropp Computing & Mathematical Sciences California Institute

Computing sparsity stuff in real world graphs Marcin Pilipczuk a lot of slides by Wojciech Nadara

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Southern California Green Growth Initiative California Workforce Investment Board Regional

A RIZONA S TATE N AME T ITLE R OLE IN E XECUTING R ACE TO THE T OP P LAN AZ Eileen Klein Chief of

Indexing the Pickup and Drop-off Locations of NYC Taxi Trips in PostgreSQL Lessons from the

IN THE ISA Jason Lowe-Power , Venkatesh Akella, Matthew K. Farrens, Samuel T. King, Christopher

ANDROID APPLICATIONS PRESENTER: DON HART Don spent several years as an independent computer

THE GIST IM SO EXCITED!!

Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen

ACTION VERB B-I-N-G-O Susan E. Hall, Ph.D. Assistant Professor Marketing & Real Estate

SATTVA: SpArsiTy inspired classificaTion of malware VAriants - PowerPoint PPT Presentation

SATTVA: SpArsiTy inspired classificaTion of malware VAriants Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara Sattva ( ) means Purity 1 Introduction

Sparsity, Randomness and Compressed Sensing Petros Boufounos Mitsubishi Electric Research Labs

The Role of CAR T cells in DLBCL Sattva S. Neelapu, M.D. Professor and Deputy Chair ad interim

Introduction to Sparsity in Modeling and Learning Introduction to Sparsity in Modeling and

Sparsity and image processing Aurlie Boisbunon INRIA-SAM, AYIN March 26, 2014 Why sparsity?

Biologically I nspired Hardware System What is Bio-Inspired System? Why do we need

Sparsity in Information Theory and Biology Olgica Milenkovic ECE Department, UIUC Joint work

Sparsity and optimality of splines: Deterministic vs. statistical justification Michael Unser

Blind Image Deconvolution Need for Theoretical . . . Based on Sparsity: Need for Improvement

RAMSEY CLASSES SPARSITY AND MODELS FOR FINITE - NESIETPIIL JAROSLAV UNIVERSITY CHARLES

Structured sparsity and convex optimization Francis Bach INRIA - Ecole Normale Sup erieure,

Sparsity-aware sampling theorems and applications Rachel Ward University of Texas at Austin

Structural Sparsity Jaroslav Neetil Patrice Ossona de Mendez Charles University CAMS,

The Sparsity Gap Joel A. Tropp Computing &amp; Mathematical Sciences California Institute

Computing sparsity stuff in real world graphs Marcin Pilipczuk a lot of slides by Wojciech Nadara

Commonsense Explanations Zipfs Law: A Brief . . . of Sparsity, Zipf Law, and Main Idea Behind

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity 2020/11

Southern California Green Growth Initiative California Workforce Investment Board Regional

A RIZONA S TATE N AME T ITLE R OLE IN E XECUTING R ACE TO THE T OP P LAN AZ Eileen Klein Chief of

Indexing the Pickup and Drop-off Locations of NYC Taxi Trips in PostgreSQL Lessons from the

IN THE ISA Jason Lowe-Power , Venkatesh Akella, Matthew K. Farrens, Samuel T. King, Christopher

ANDROID APPLICATIONS PRESENTER: DON HART Don spent several years as an independent computer

THE GIST IM SO EXCITED!!

Emotion Recognition in the Wild Karan Sikka, Karmen Dykstra, Suchitra Sathyanarayana, Gwen

ACTION VERB B-I-N-G-O Susan E. Hall, Ph.D. Assistant Professor Marketing &amp; Real Estate

The Sparsity Gap Joel A. Tropp Computing & Mathematical Sciences California Institute

ACTION VERB B-I-N-G-O Susan E. Hall, Ph.D. Assistant Professor Marketing & Real Estate