Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang - - PowerPoint PPT Presentation

malware analysis using visualized image matrices
SMART_READER_LITE
LIVE PREVIEW

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang - - PowerPoint PPT Presentation

Malware Analysis Using Visualized Image Matrices Tzu-Ming Huang CISC850 Cyber Analy@cs CISC850 Cyber Analy@cs Overview malware visual analysis method convert binary files into images Reduce computa@on major block


slide-1
SLIDE 1

Malware Analysis Using Visualized Image Matrices

Tzu-Ming Huang

CISC850 Cyber Analy@cs

slide-2
SLIDE 2

Overview

  • malware visual analysis method

– convert binary files into images

  • Reduce computa@on – major block

– similarity calcula@on method between these images

CISC850 Cyber Analy@cs

slide-3
SLIDE 3

Method Overview

slide-4
SLIDE 4

Extract opcode sequences from binary

1. 1. 2. 3.

slide-5
SLIDE 5

Repe@@on Filtering

slide-6
SLIDE 6

Extract opcode sequences from binary

1. 1. 2. 3.

slide-7
SLIDE 7

Major Block Selec@on

  • Not all of the basic blocks (file header,

meaning less blocks)

  • Target suspicious behavior
  • Blocks include “CALL” instruc@on
slide-8
SLIDE 8

Major Block Selec@on

slide-9
SLIDE 9

Extract opcode sequences from binary

1. 1. 2. 3.

slide-10
SLIDE 10

Parsing Opcode Sequence

  • First three characters of opcode

– 41.4% of opcodes have3 characters – Meaning is maintained – Eg. PUSH -> PUS; CALL -> CAL; OR?

  • These three-character opcodes are

concatenated together

slide-11
SLIDE 11

Parsing Opcode Sequence

slide-12
SLIDE 12

Generate Image Matrix

  • Use hash func@on (SimHash) to decide X-Y

coordinate and RGB colors of the pixels

  • Length and width of matrix are 2n (8)
  • If hash in same X-Y coordinate, simply sum the

RGB colors value

slide-13
SLIDE 13

Generate Image Matrix

slide-14
SLIDE 14

Choose Representa@ve Image Matrix

slide-15
SLIDE 15

Similarity Calcula@on Using Image Matrix

  • Faster performance than opcode string

comparison

  • Finding pairs in string: O(n2)
  • Simhash and calculate similarity in image: O(n)
slide-16
SLIDE 16

Similarity Calcula@on Using Image Matrix

slide-17
SLIDE 17

Similarity Calcula@on Using Image Matrix

  • vector angular-based distance measurement

algorithm

– Pixels are viewed as 3D vector

slide-18
SLIDE 18

Similarity Calcula@on Using Image Matrix

slide-19
SLIDE 19

Experiment: Major Blocks Selec@on?

slide-20
SLIDE 20

Experiment: Major Blocks Selec@on?

slide-21
SLIDE 21

Experiment: Feasibility

slide-22
SLIDE 22

Experiment: Feasibility

  • Similarity of sample malwares from same

family: 0.19 ~ 0.36

  • Similarity of sample malwares from different

family: < 0.05

  • Classifica@on accuracy = 0.9896