SLIDE 1
Malware Analysis Using Visualized Image Matrices
Tzu-Ming Huang
CISC850 Cyber Analy@cs
SLIDE 2 Overview
- malware visual analysis method
– convert binary files into images
- Reduce computa@on – major block
– similarity calcula@on method between these images
CISC850 Cyber Analy@cs
SLIDE 3
Method Overview
SLIDE 4 Extract opcode sequences from binary
1. 1. 2. 3.
SLIDE 5
Repe@@on Filtering
SLIDE 6 Extract opcode sequences from binary
1. 1. 2. 3.
SLIDE 7 Major Block Selec@on
- Not all of the basic blocks (file header,
meaning less blocks)
- Target suspicious behavior
- Blocks include “CALL” instruc@on
SLIDE 8
Major Block Selec@on
SLIDE 9 Extract opcode sequences from binary
1. 1. 2. 3.
SLIDE 10 Parsing Opcode Sequence
- First three characters of opcode
– 41.4% of opcodes have3 characters – Meaning is maintained – Eg. PUSH -> PUS; CALL -> CAL; OR?
- These three-character opcodes are
concatenated together
SLIDE 11
Parsing Opcode Sequence
SLIDE 12 Generate Image Matrix
- Use hash func@on (SimHash) to decide X-Y
coordinate and RGB colors of the pixels
- Length and width of matrix are 2n (8)
- If hash in same X-Y coordinate, simply sum the
RGB colors value
SLIDE 13
Generate Image Matrix
SLIDE 14
Choose Representa@ve Image Matrix
SLIDE 15 Similarity Calcula@on Using Image Matrix
- Faster performance than opcode string
comparison
- Finding pairs in string: O(n2)
- Simhash and calculate similarity in image: O(n)
SLIDE 16
Similarity Calcula@on Using Image Matrix
SLIDE 17 Similarity Calcula@on Using Image Matrix
- vector angular-based distance measurement
algorithm
– Pixels are viewed as 3D vector
SLIDE 18
Similarity Calcula@on Using Image Matrix
SLIDE 19
Experiment: Major Blocks Selec@on?
SLIDE 20
Experiment: Major Blocks Selec@on?
SLIDE 21
Experiment: Feasibility
SLIDE 22 Experiment: Feasibility
- Similarity of sample malwares from same
family: 0.19 ~ 0.36
- Similarity of sample malwares from different
family: < 0.05
- Classifica@on accuracy = 0.9896