SATTVA: SpArsiTy inspired classificaTion of malware VAriants
Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara
1
Sattva (सत्थत्थव) means Purity
SATTVA: SpArsiTy inspired classificaTion of malware VAriants - - PowerPoint PPT Presentation
SATTVA: SpArsiTy inspired classificaTion of malware VAriants Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara Sattva ( ) means Purity 1 Introduction
Lakshmanan Nataraj, S. Karthikeyan, B.S. Manjunath Vision Research Lab University of California, Santa Barbara
1
Sattva (सत्थत्थव) means Purity
increasing!
they process on average 325,000 malware per day
is: malware mutation: the process of creating new malware from existing
2
http://usa.kaspersky.com/about-us/press-center/press-releases/kaspersky-lab-detecting-325000- new-malicious-files-every-day
3
Variants of Family Alueron.gen!J Variants of Family Fakerean
5
6
7
Malware Images: Visualization and Automatic Classification, L. Nataraj, S.Karthikeyan, G. Jacob, B.S. Manjunath, VizSec 2011
Malware Image Sub-band Filtering . . . . . . kL-D Feature Vector . . . . . .
N = k
Sub-block Averaging . . . . . . Resize Sub-band Filtering Sub-band Filtering Sub-block Averaging Sub-block Averaging L-D Feature Vector L-D Feature Vector L-D Feature Vector
N = 1 N = k
malware)
interpolation artifacts
9
10
11
Variants of recently exposed Regin malware. Differ only in 7 out of 13,284 (0.0527%)
Variant 1 Variant 2
12
13
𝑗=1 𝑀 𝑘=1 𝑄
14
𝐯 = 𝐁𝜷
15
𝐵1𝐵2 𝐵𝑂
. . .
𝑁 × 𝑂 𝑁 × 1
=
u A 𝛽
u2 u𝑁 u1
. . .
. . .
α1 α2 α𝑂
𝑂 × 1
Unknown test sample Matrix of training samples Sparse Coefficient Vector
16
𝜷 = [0,0 … , 𝛽21, 𝛽22, … , 0,0]𝑈
𝛽′∈R𝑂 ||𝜷′||𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐯 = 𝐁𝜷′
17
𝑙 𝐯 = ||𝐯 − 𝐁 𝒍
𝑙
𝑙 𝐯
18
19
𝛽′∈R𝑂 ||𝜷′||𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐱 = 𝐒𝐁𝜷′
20
. . .
α1 α2
D × 𝑂
α𝑂
𝐸 × 1
=
w RA 𝛽
. . .
𝑆𝐵𝑂 𝑆𝐵1
. . .
w𝐸 w1
𝑂 × 1
Random Projections Signal Representation Sparse Modeling
𝐵1𝐵2 𝐵𝑂
. . .
𝑁 × 𝑂 𝑁 × 1
=
u A 𝛽
u2 u𝑁 u1
. . .
. . .
α1 α2 α𝑂
𝑂 × 1
. . .
α1 α2
D × 𝑂
α𝑂
𝐸 × 1
=
w RA 𝛽
. . .
𝑆𝐵𝑂 𝑆𝐵1
. . .
w𝐸 w1
𝑂 × 1
22
23
𝛽′∈R𝑂 ||𝜷′||𝟐 𝑡𝑣𝑐𝑘𝑓𝑑𝑢 𝑢𝑝 𝐱′ = 𝐂𝐱𝐭𝐱
𝑙 𝐱′ = ||𝐱′ − 𝐂𝐱𝐭𝐱 𝒍
𝑙
𝑙 𝐱′
24
25
26
50 100 150 200 250 300 350 400 450 500 550 80 85 90 95 100 Dimensions Accuracy RP+NN GIST+NN GIST+SRC RP+SRC
27
28
50 100 150 200 250 300 350 400 450 500 550 80 85 90 95 100 Dimensions Accuracy RP+NN GIST+NN GIST+SRC RP+SRC
29
Dataset ssdeep GIST 2-grams RP Malimg Dataset 67.63 89.08 91.75 92.83 Malheur Dataset 81.6 94.21 94.26 98.55
30
31
𝑀.𝑛𝑏𝑦 || 𝑗(𝛃)||𝟐 ||𝛃||𝟐
−1 𝑀−1
32
33
34
35
36
Dataset BP Accuracy OMP Accuracy BP Comp Time OMP Comp Time Malimg Dataset 92.83 89.25 420 24 Malheur Dataset 98.55 97.39 180 6
37
38
39
40
represent the malware as bag of randomly projected features
exact positions where they vary
41
42
43