NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC - - PowerPoint PPT Presentation

needles in a haystack mining information from public
SMART_READER_LITE
LIVE PREVIEW

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC - - PowerPoint PPT Presentation

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC SANDBOXES FOR MALWARE INTELLIGENCE Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi and Davide Balzarotti Eurecom Symantec Research Labs Universit degli Studi di


slide-1
SLIDE 1

NEEDLES IN A HAYSTACK: MINING INFORMATION FROM PUBLIC DYNAMIC SANDBOXES FOR MALWARE INTELLIGENCE

Mariano Graziano, Davide Canali, Leyla Bilge, Andrea Lanzi and Davide Balzarotti

Eurecom Symantec Research Labs Università degli Studi di Milano

USENIX Security ’15 - Washington DC, USA

slide-2
SLIDE 2

A ¡PILE ¡OF ¡MALWARE ¡SAMPLES

slide-3
SLIDE 3

CAMPAIGN TIME BEFORE PUBLIC DISCLOSURE SUBMITTED BY Operation Aurora 4 months US Red October 8 months Romania APT1 43 months US Stuxnet 1 month US Beebus 22 months Germany LuckyCat 3 months US BrutePOS 5 months France NetTraveller 14 months US Pacific PluX 12 months US Pitty Tiger 42 months US Regin 44 months UK Equation 23 months US

slide-4
SLIDE 4

Constant interaction criminals vs sandbox

slide-5
SLIDE 5

GOAL

  • Observation: Malware authors use public

sandboxes to test their developments

  • Design data mining techniques to automatically

discover malware developments

slide-6
SLIDE 6

SYSTEM ¡OVERVIEW

slide-7
SLIDE 7

SYSTEM ¡OVERVIEW

slide-8
SLIDE 8

DATA ¡REDUCTION

32M

Initial Dataset

slide-9
SLIDE 9

DATA ¡REDUCTION

6M

Submitted by regular users

slide-10
SLIDE 10

DATA ¡REDUCTION

522K

Not already part of large submissions

slide-11
SLIDE 11

DATA ¡REDUCTION

214K

Previously unknown by Symantec & VirusTotal

slide-12
SLIDE 12

DATA ¡REDUCTION

121K

Final (not packed binary)

slide-13
SLIDE 13

SYSTEM ¡OVERVIEW

slide-14
SLIDE 14

CLUSTERING

  • Agglomerative clustering (similarity threshold: 70%):
  • Binary similarity (ssdeep)
  • Submissions metadata
  • Sliding window of seven days:
  • Reduce comparisons
  • Ensure binary similarity
  • 5972 clusters 4.5 elements each
slide-15
SLIDE 15

SYSTEM ¡OVERVIEW

slide-16
SLIDE 16

FINE-­‑GRAINED ¡ANALYSIS

  • Binary code normalisation
  • Call graph comparison [Flake04,Gao08]
  • Control flow graph comparison [Flake04,Kruegel06,Jang13]

74% 87% 92%

slide-17
SLIDE 17

SYSTEM ¡OVERVIEW

slide-18
SLIDE 18

FEATURE ¡EXTRACTION

  • Comprise two phases:
  • Per sample (25 features in 6 groups)
  • Per cluster (48 features in 5 groups)
slide-19
SLIDE 19

SAMPLE ¡FEATURES

slide-20
SLIDE 20

CLUSTER ¡FEATURES

slide-21
SLIDE 21

CLUSTER ¡FEATURES

slide-22
SLIDE 22

CLUSTER ¡FEATURES

UNKOWN UNKOWN MALICIOUS

slide-23
SLIDE 23

CLUSTER ¡FEATURES

COMPLEX BEHAVIOR COMPLEX BEHAVIOR NO BEHAVIOR

slide-24
SLIDE 24

SYSTEM ¡OVERVIEW

slide-25
SLIDE 25

MACHINE ¡LEARNING

  • Logistic Model Tree (LMT)
  • Training Set (157 clusters):
  • Non development: 91 clusters
  • Development: 66 clusters
slide-26
SLIDE 26

RESULTS

  • 3038 potential development clusters
  • 1474 malicious clusters
  • 135 days on average for the detection
  • Thousands of computers infected in 13 countries

CLUSTERS TYPE

1082 Trojans 83 Backdoors 65 Worms 45 Botnets 21 Tools 4 Keyloggers

slide-27
SLIDE 27

EXAMPLES

slide-28
SLIDE 28

ANTI-­‑SANDBOX

16:59:13 16:59:33 17:05:21 17:06:06 17:13:26 17:14:16 t

Submission time Compile time

Sample 1 Sample 1 Sample 2 Sample 3

U n k n

  • w

n U n k n

  • w

n M a l i c i

  • u

s

slide-29
SLIDE 29

ANTI-­‑SANDBOX

16:59:13 16:59:33 17:05:21 17:06:06 17:13:26 17:14:16 t

Submission time Compile time

Sample 1 Sample 1 Sample 2 Sample 3

U n k n

  • w

n U n k n

  • w

n M a l i c i

  • u

s

slide-30
SLIDE 30

TROJAN ¡DROPPER

1992-06-20 1992-06-20 1992-06-20 1992-06-20 22:35:08 00:44:06 01:18:48 01:25:16 13:07:26 2008-10-04

SUBMISSION TIME COMPILE TIME

DELPHI VB

slide-31
SLIDE 31

TROJAN ¡DROPPER

1992-06-20 1992-06-20 1992-06-20 1992-06-20 22:35:08 00:44:06 01:18:48 01:25:16 13:07:26 2008-10-04

SUBMISSION TIME COMPILE TIME

DELPHI VB

  • VirusTotal: 37/50 (trojan dropper)
  • Two IP addresses:
  • Dynamic DNS service (no-ip)
  • Connect-back behavior overall 1817 clusters
slide-32
SLIDE 32

LIMITATIONS

  • No packed binaries
  • Evasions:
  • Sandbox interaction still required to develop evasion

techniques

  • Most sophisticated analysis techniques require to

link a probe to the final malware

slide-33
SLIDE 33

CONCLUSION

slide-34
SLIDE 34

THE ¡END

THANK YOU

graziano@eurecom.fr magrazia@cisco.com @emd3l