Automatic code features extraction using bio-inspired algorithms - - - PDF document

automatic code features extraction using bio inspired
SMART_READER_LITE
LIVE PREVIEW

Automatic code features extraction using bio-inspired algorithms - - - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/308793546 Automatic code features extraction using bio-inspired algorithms - presentation Data October 2016 CITATIONS READS 0 35 3


slide-1
SLIDE 1

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/308793546

Automatic code features extraction using bio-inspired algorithms - presentation

Data · October 2016

CITATIONS READS

35

3 authors, including: Some of the authors of this publication are also working on these related projects: Virtualization-Based Security of User Security-Sensitive Applications View project Ciprian Oprisa Universitatea Tehnica Cluj-Napoca

32 PUBLICATIONS 70 CITATIONS

SEE PROFILE

Adrian Coleșa Universitatea Tehnica Cluj-Napoca

32 PUBLICATIONS 67 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ciprian Oprisa on 03 October 2016.

The user has requested enhancement of the downloaded file.

slide-2
SLIDE 2

Automatic Code Features Extraction Using Bio-inspired Algorithms

EICAR 2013 Ciprian Opris

,a, George Cab˘

au and Adrian Coles

,a

Bitdefender, Technical University of Cluj-Napoca

November 18, 2013

slide-3
SLIDE 3

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 2 / 25

slide-4
SLIDE 4
  • 1. Introduction

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 3 / 25

slide-5
SLIDE 5
  • 1. Introduction

Where are we? (1)

We need to detect malware.

Hash(es) ↓? Malware database ւ ց

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 4 / 25

slide-6
SLIDE 6
  • 1. Introduction

Where are we? (2)

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-7
SLIDE 7
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpcmlpctjczczczmJ

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-8
SLIDE 8
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpc mlpctjczczczmJ

<pmsmplpc>

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-9
SLIDE 9
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ p msmplpcm lpctjczczczmJ

<pmsmplpc>, <msmplpcm>

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-10
SLIDE 10
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pm smplpcml pctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-11
SLIDE 11
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pms mplpcmlp ctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-12
SLIDE 12
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsm plpcmlpc tjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>, <plpcmlpc>

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-13
SLIDE 13
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpcmlpctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>, <plpcmlpc>, . . .

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-14
SLIDE 14
  • 1. Introduction

Where are we? (2)

→ push, mov, sub, mov, push, lea, push, call, mov, . . .

→ pmsmplpcmlpctjczczczmJ

<pmsmplpc>, <msmplpcm>, <smplpcml>, <mplpcmlp>, <plpcmlpc>, . . .

↓?

Malware database

ւ ց

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 5 / 25

slide-15
SLIDE 15
  • 2. Objectives

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 6 / 25

slide-16
SLIDE 16
  • 2. Objectives

Objectives

Goal

Improve detection on .NET malware by filtering the OpCodes to extract more meaningful n-grams. Extract OpCode sequences from .NET applications. Eliminate unreachable code. Design a fitness function to evaluate the quality of an OpCode filter. Use bio-inspired algorithms to find the best filter.

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 7 / 25

slide-17
SLIDE 17
  • 3. OpCodes Extraction and Normalization

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 8 / 25

slide-18
SLIDE 18
  • 3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of the Microsoft Portable Executable format Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C; FA=0x0000074C; size=0x9A === = Exception handlers: 000025D6; = 0000254C: [00] nop 0000254D: [28 0E 00 00 0A] call 0x0A00000E 00002552: [12 00] ldloca.s 0x00 00002554: [28 03 00 00 06] call 0x06000003 00002559: [13 06] stloc.s 0x06 0000255B: [11 06] ldloc.s 0x06 0000255D: [2D 10] brtrue.s 0x10 0000255F: [00] nop 00002560: [72 01 00 00 70] ldstr 0x70000001 00002565: [72 23 00 00 70] ldstr 0x70000023 0000256A: [28 0F 00 00 0A] call 0x0A00000F 0000256F: [26] pop 00002570: [15] ldc.i4.m1 00002571: [13 05] stloc.s 0x05 00002573: [2B 02] br.s 0x02 00002575: [26] pop 00002576: [06] ldloc.0 00002577: [28 10 00 00 0A] call 0x0A000010 0000257C: [80 01 00 00 04] stsfld 0x04000001 00002581: [7E 01 00 00 04] ldsfld 0x04000001 00002586: [6F 11 00 00 0A] callvirt 0x0A000011 0000258B: [0B] stloc.1 ...

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

slide-19
SLIDE 19
  • 3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of the Microsoft Portable Executable format Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C; FA=0x0000074C; size=0x9A === = Exception handlers: 000025D6; = 0000254C: [00] nop 0000254D: [28 0E 00 00 0A] call 0x0A00000E 00002552: [12 00] ldloca.s 0x00 00002554: [28 03 00 00 06] call 0x06000003 00002559: [13 06] stloc.s 0x06 0000255B: [11 06] ldloc.s 0x06 0000255D: [2D 10] brtrue.s 0x10 0000255F: [00] nop 00002560: [72 01 00 00 70] ldstr 0x70000001 00002565: [72 23 00 00 70] ldstr 0x70000023 0000256A: [28 0F 00 00 0A] call 0x0A00000F 0000256F: [26] pop 00002570: [15] ldc.i4.m1 00002571: [13 05] stloc.s 0x05 00002573: [2B 02] br.s 0x02 00002575: [26] pop 00002576: [06] ldloc.0 00002577: [28 10 00 00 0A] call 0x0A000010 0000257C: [80 01 00 00 04] stsfld 0x04000001 00002581: [7E 01 00 00 04] ldsfld 0x04000001 00002586: [6F 11 00 00 0A] callvirt 0x0A000011 0000258B: [0B] stloc.1 ...

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

slide-20
SLIDE 20
  • 3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of the Microsoft Portable Executable format Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C; FA=0x0000074C; size=0x9A === = Exception handlers: 000025D6; = 0000254C: [00] nop 0000254D: [28 0E 00 00 0A] call 0x0A00000E 00002552: [12 00] ldloca.s 0x00 00002554: [28 03 00 00 06] call 0x06000003 00002559: [13 06] stloc.s 0x06 0000255B: [11 06] ldloc.s 0x06 0000255D: [2D 10] brtrue.s 0x10 0000255F: [00] nop 00002560: [72 01 00 00 70] ldstr 0x70000001 00002565: [72 23 00 00 70] ldstr 0x70000023 0000256A: [28 0F 00 00 0A] call 0x0A00000F 0000256F: [26] pop 00002570: [15] ldc.i4.m1 00002571: [13 05] stloc.s 0x05 00002573: [2B 02] br.s 0x02 00002575: [26] pop 00002576: [06] ldloc.0 00002577: [28 10 00 00 0A] call 0x0A000010 0000257C: [80 01 00 00 04] stsfld 0x04000001 00002581: [7E 01 00 00 04] ldsfld 0x04000001 00002586: [6F 11 00 00 0A] callvirt 0x0A000011 0000258B: [0B] stloc.1 ...

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

slide-21
SLIDE 21
  • 3. OpCodes Extraction and Normalization

Parsing and disassembling .NET

An extension of the Microsoft Portable Executable format Many, many tables

=== Method 4: name=’mpress._::Main’; RVA=0x0000254C; FA=0x0000074C; size=0x9A === = Exception handlers: 000025D6; = 0000254C: [00] nop 0000254D: [28 0E 00 00 0A] call 0x0A00000E 00002552: [12 00] ldloca.s 0x00 00002554: [28 03 00 00 06] call 0x06000003 00002559: [13 06] stloc.s 0x06 0000255B: [11 06] ldloc.s 0x06 0000255D: [2D 10] brtrue.s 0x10 0000255F: [00] nop 00002560: [72 01 00 00 70] ldstr 0x70000001 00002565: [72 23 00 00 70] ldstr 0x70000023 0000256A: [28 0F 00 00 0A] call 0x0A00000F 0000256F: [26] pop 00002570: [15] ldc.i4.m1 00002571: [13 05] stloc.s 0x05 00002573: [2B 02] br.s 0x02 00002575: [26] pop 00002576: [06] ldloc.0 00002577: [28 10 00 00 0A] call 0x0A000010 0000257C: [80 01 00 00 04] stsfld 0x04000001 00002581: [7E 01 00 00 04] ldsfld 0x04000001 00002586: [6F 11 00 00 0A] callvirt 0x0A000011 0000258B: [0B] stloc.1 ...

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 9 / 25

slide-22
SLIDE 22
  • 3. OpCodes Extraction and Normalization

CIL instruction types

instructions that move data around: ldc (load constant), ldarg (load argument), . . . arithmetic and logic instructions: add, div, or, and, xor, . . .

  • bject model instructions:

newobj, . . . instructions that modify the control flow

returning instructions (call, callvirt) unconditional branches (br, br.s) conditional branches (brtrue, brfalse, breq.s) flow disruptive instructions (ret, throw, jmp)

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 10 / 25

slide-23
SLIDE 23
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-24
SLIDE 24
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

Queue:

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-25
SLIDE 25
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

Queue: i1 i2 i3 . . .

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-26
SLIDE 26
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

Queue: i1 i2 i3 . . . ik ik+1

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-27
SLIDE 27
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

Queue: i1 i2 i3 br . . . ik

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-28
SLIDE 28
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

Queue: i1 i2 i3 brtrue i5 . . .

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-29
SLIDE 29
  • 3. OpCodes Extraction and Normalization

Eliminating unreachable code

Enqueue the entry point and exception handlers. While queue is not empty: Dequeue the next address. Sweep until already reached code

  • r end of the buffer is

encountered

Unconditional branch → follow the branch Conditional branch → enqueue branch, continue sweeping Flow disruptive instruction → stop current sweeping

Queue: i1 i2 i3 ret . . .

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 11 / 25

slide-30
SLIDE 30
  • 3. OpCodes Extraction and Normalization

OpCodes normalization

Definition

The basic normalization function: normal : O → Σ ∪ {ǫ} normal(nop) = ǫ normal(brtrue) = normal(brfalse)

Definition

Filtering (Λ-normalization), Λ ⊆ Σ: normalΛ(o) = normal(o) , if normal(o) ∈ Λ ǫ , otherwise

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 12 / 25

slide-31
SLIDE 31
  • 4. Automatic Filters Selection

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 13 / 25

slide-32
SLIDE 32
  • 4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from: 558695 clean methods 272 malware clusters Different n-grams for different filters Λ.

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

slide-33
SLIDE 33
  • 4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from: 558695 clean methods 272 malware clusters Different n-grams for different filters Λ. p1 → p2 → p3 →

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

slide-34
SLIDE 34
  • 4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from: 558695 clean methods 272 malware clusters Different n-grams for different filters Λ. p1 → ng1, ng3, ng4, ng5, ng7 p2 → ng2, ng4, ng5, ng8, ng9 p3 → ng3, ng4, ng5, ng6

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

slide-35
SLIDE 35
  • 4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from: 558695 clean methods 272 malware clusters Different n-grams for different filters Λ. p1 → ng1, ng3, ng4, ng5, ng7 p2 → ng2, ng4, ng5, ng8, ng9 p3 → ng3, ng4, ng5, ng6

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

slide-36
SLIDE 36
  • 4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from: 558695 clean methods 272 malware clusters Different n-grams for different filters Λ. p1 → ng1, ng3, ng4, ng5, ng7

cleanset filtering

− − − − − − − − − → ng1, ng4, ng7 p2 → ng2, ng4, ng5, ng8, ng9

cleanset filtering

− − − − − − − − − → ng2, ng4, ng8, ng9 p3 → ng3, ng4, ng5, ng6

cleanset filtering

− − − − − − − − − → ng4, ng6

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

slide-37
SLIDE 37
  • 4. Automatic Filters Selection

Λ-detectability

Sequences of symbols from: 558695 clean methods 272 malware clusters Different n-grams for different filters Λ. p1 → ng1, ng3, ng4, ng5, ng7

cleanset filtering

− − − − − − − − − → ng1, ng4, ng7 p2 → ng2, ng4, ng5, ng8, ng9

cleanset filtering

− − − − − − − − − → ng2, ng4, ng8, ng9 p3 → ng3, ng4, ng5, ng6

cleanset filtering

− − − − − − − − − → ng4, ng6

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 14 / 25

slide-38
SLIDE 38
  • 4. Automatic Filters Selection

The fitness function

Definition

The fitness function: f : P(Σ) → R f (Λ) = clusters detectability number of clusters Search space: | P(Σ) |= 2|Σ|

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

slide-39
SLIDE 39
  • 4. Automatic Filters Selection

The fitness function

Definition

The fitness function: f : P(Σ) → R f (Λ) = clusters detectability number of clusters Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed bedccecaeed

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

slide-40
SLIDE 40
  • 4. Automatic Filters Selection

The fitness function

Definition

The fitness function: f : P(Σ) → R f (Λ) = clusters detectability number of clusters Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed bedccecaeed Λ = Σ = {a, b, c, d, e} ec be ce a a ed be dc ce c a e ed

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

slide-41
SLIDE 41
  • 4. Automatic Filters Selection

The fitness function

Definition

The fitness function: f : P(Σ) → R f (Λ) = clusters detectability number of clusters Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed bedccecaeed Λ = {e} eeee eeee

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

slide-42
SLIDE 42
  • 4. Automatic Filters Selection

The fitness function

Definition

The fitness function: f : P(Σ) → R f (Λ) = clusters detectability number of clusters Search space: | P(Σ) |= 2|Σ|

Example

ecbeceaaed bedccecaeed Λ = {a, b, e} e beea a e beea e e

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 15 / 25

slide-43
SLIDE 43
  • 4. Automatic Filters Selection

Evolutionary algorithms

Start with a population of random solutions. At each step, the individuals interact and evolve towards better solutions. Eventually, they should reach an optimum solution (global or local). Genetic Algorithm Particle Swarm Optimization

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 16 / 25

slide-44
SLIDE 44
  • 4. Automatic Filters Selection

Genetic Algorithm

Binary encoding:

1 1 1 1 . . . 1

Crossover: Λ1, Λ2

crossover

− − − − − → Λ′

1, Λ′ 2

Mutation: Λ

mutation

− − − − − → Λ′ Roulette Wheel selection: Pselection(Λk) = f (Λk)

  • Λ

f (Λ) Elitism

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 17 / 25

slide-45
SLIDE 45
  • 4. Automatic Filters Selection

Particle Swarm Optimization

Representation: p = (X, V , Xbest, best fitness) X ∈ [0, 1]|Σ|, V ∈ [−1, 1]|Σ| Update: X ′ = X + V V ′ = ωV + φ1r1(Xbest − X) + φ2r2(global best − X)

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 18 / 25

slide-46
SLIDE 46
  • 5. Experimental results

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 19 / 25

slide-47
SLIDE 47
  • 5. Experimental results
  • 5. Experimental results (1)

Parallel speedup for the fitness function: Amdahl’s law: S(k) = T(1) T(k) = 1 B + 1−B

k

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 20 / 25

slide-48
SLIDE 48
  • 5. Experimental results
  • 5. Experimental results (1)

Parallel speedup for the fitness function: Amdahl’s law: S(k) = T(1) T(k) = 1 B + 1−B

k

Experimentally, B = 0.04 so Smax = lim

k→∞ S(k) = 1

B = 25

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 20 / 25

slide-49
SLIDE 49
  • 5. Experimental results
  • 5. Experimental results (2)

Learning evaluation: Best fitness learnt: GA: 0.3965 PSO: 0.4029 Cross-validation results:

GA best PSO best Similar malware samples 0.1819 0.1833 Obfuscated samples 0.8859 0.8859

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 21 / 25

slide-50
SLIDE 50
  • 6. Conclusions and future work

Agenda

1

Introduction

2

Objectives

3

OpCodes Extraction and Normalization

4

Automatic Filters Selection

5

Experimental results

6

Conclusions and future work

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 22 / 25

slide-51
SLIDE 51
  • 6. Conclusions and future work

Summary

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 23 / 25

slide-52
SLIDE 52
  • 6. Conclusions and future work

Conclusions

n-grams are a robust way to classify programs. Existing methods can be improved by filtering the OpCode sequences. Bio-inspired algorithms can be used for finding good filters.

  • C. Opris

,a (Bitdefender)

Automatic Code Features Extraction Using Bio-inspired Algorithms November 18, 2013 24 / 25

slide-53
SLIDE 53

Thank you! Questions?