Malware Datasets Aleieldin Salem and Alexander Pretschner Technische - - PowerPoint PPT Presentation

malware datasets
SMART_READER_LITE
LIVE PREVIEW

Malware Datasets Aleieldin Salem and Alexander Pretschner Technische - - PowerPoint PPT Presentation

Poking the Bear: Lessons Learned from Probing Three Android Malware Datasets Aleieldin Salem and Alexander Pretschner Technische Universitt Mnchen Garching bei Mnchen {salem, pretschn @in.tum.de} Montpellier, 04.09.2018 Abstract


slide-1
SLIDE 1

Aleieldin Salem and Alexander Pretschner Technische Universität München Garching bei München {salem, pretschn @in.tum.de} Montpellier, 04.09.2018

Poking the Bear: Lessons Learned from Probing Three Android Malware Datasets

slide-2
SLIDE 2
  • Stumbled upon some inconsistencies while experimenting with different

Android malware datasets

  • Investigate the source of discrepancies
  • A series of experiments performed on three Android malware datasets
  • Some (interesting) findings

Abstract

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 2

slide-3
SLIDE 3
  • Working on a solution based on “Active Learning”
  • Evaluating on Malgenome vs. Piggybacking
  • Datasets of Repackaged/Piggybacked Malware
  • Malgenome = great results!
  • Piggybacking = mediocre results?
  • Trying on AMD and Drebin
  • Works like a charm!
  • What the .. ?

Background

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 3

slide-4
SLIDE 4

Research Questions

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 4

slide-5
SLIDE 5
  • Infer some information about the malicious instances found in:
  • Malgenome (Zhou et al. 2012)
  • Piggybacking (Li et al. 2017)
  • AMD (Wei et al. 2017)
  • VirusTotal detection rates, involved marketplaces, malware types, etc.
  • Backed up by information in Euphony (Hurier et al. 2017)

Dissection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 5

slide-6
SLIDE 6
  • Backed up by information in Euphony (Hurier et al. 2017)

Dissection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 7

More information: https://androidmalwareinsights.github.io

around 50

slide-7
SLIDE 7
  • Backed up by information in Euphony (Hurier et al. 2017)

Dissection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 8

More information: https://androidmalwareinsights.github.io

around 50

slide-8
SLIDE 8
  • Backed up by information in Euphony (Hurier et al. 2017)

Dissection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 9

More information: https://androidmalwareinsights.github.io

slide-9
SLIDE 9
  • What about repackaging?
  • What is in fact the definition of repackaging?
  • E.g. must the app be decompiled/disassembled?
  • Wei et al. [authors of AMD] claim it has been declining
  • How to quickly infer whether an app is repackaged?
  • Simple technique using compiler fingerprinting (with APKiD1)

Dissection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 10

1 https://rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/

slide-10
SLIDE 10
  • Simple technique using compiler fingerprinting (with APKiD1)
  • Legitimate developer = access to source code = using IDE
  • Compile app using Android SDK’s dx and dexmerge compilers
  • If app compiled using other compilers (e.g., dexlib)

= repackaged = no access to source code != legitimate developer?

  • Different compilers leave unique marks on the compiled code

Dissection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 11

1 https://rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/

slide-11
SLIDE 11
  • What about repackaging?
  • What is in fact the definition of repackaging?

Dissection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 12

slide-12
SLIDE 12
  • What about repackaging?
  • What is in fact the definition of repackaging?

Dissection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 13

lazy developers? wrong labeling?

slide-13
SLIDE 13
  • What about repackaging?
  • What is in fact the definition of repackaging?

Dissection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 14

86% repackaged?! declining?

slide-14
SLIDE 14
  • How do conventional detection techniques fare against different datasets?
  • Conventional:
  • Machine learning classifiers
  • Trained with static/dynamic features
  • Validated using K-fold CV

Detection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 15

slide-15
SLIDE 15
  • How do conventional detection techniques fare against different datasets?
  • Ensemble classifier
  • KNN, with K = {10, 25, 50, 100, 250, 500}
  • Random Forests with estimators = {10, 25, 50, 75, 100}
  • Support Vector machine with linear kernel
  • 10-Fold CV
  • Trained with static/dynamic features
  • Static: Extracted from APK using androguard
  • Dynamic: Running apps within VM + recording issued API calls

Detection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 16

slide-16
SLIDE 16
  • How do conventional detection techniques fare against different datasets?

Detection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 17

slide-17
SLIDE 17
  • How do conventional detection techniques fare against different datasets?
  • But why?
  • Piggybacking = original, benign apps + repackaged, malicious versions
  • Majority = Adware
  • ~70% of misclassified apps = Adware

Detection Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 18

slide-18
SLIDE 18
  • What is the lifespan of malware datasets?
  • Can we use an old/new dataset to detect newer/older datasets?
  • Train voting classifier using dataset A, and test using dataset B

Detection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 19

slide-19
SLIDE 19
  • What is the lifespan of malware datasets?
  • Can we use an old/new dataset to detect newer/older datasets?
  • Train voting classifier using dataset A, and test using dataset B

Detection Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 20

slide-20
SLIDE 20
  • How can an adversary make use of this?
  • Consider a marketplace using a ML classifier as its “bouncer”
  • The classifier is trained using malicious + benign apps
  • If I [adversary] figure out one (or more) of the benign apps
  • Repackage benign apps + upload to marketplace
  • Classifier will be confused!!

Adversarial Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 21

slide-21
SLIDE 21
  • How can an adversary make use of this?
  • If I [adversary] figure out one (or more) of the benign apps
  • Many people presume apps on Google Play to be benign
  • Use Google Play apps as benchmark/reference for benign behaviors
  • Adversary make the same assumption!

Adversarial Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 22

slide-22
SLIDE 22
  • Piggybacking dataset = benign apps + repackaged versions
  • Train voting classifier with dataset A, and test with dataset B
  • Observe the effect of adding “Original” segment of Piggybacking on

classification accuracy

Adversarial Experiments (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 23

slide-23
SLIDE 23
  • Observe the effect of adding “Original” segment of Piggybacking on

classification accuracy

Adversarial Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 24

slide-24
SLIDE 24
  • Observe the effect of adding “Original” segment of Piggybacking on

classification accuracy

Adversarial Experiments

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 25

slide-25
SLIDE 25
  • Trojans appear to be most popular malware type
  • Adware is the go-to model for repackaging
  • Repackaging is losing popularity
  • Malicious apps continue to bypass Google Play’s safeguards

Conclusion

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 27

slide-26
SLIDE 26
  • AMD is 5-6 years younger than Malgenome
  • Yet, apps from Malgenome are still out there!
  • Malware authors prefer re-using/building on older malware
  • Five years to use a dataset for training?

Conclusion (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 28

slide-27
SLIDE 27
  • Already answered that in the detection experiments.
  • Adware most challenging to detect = Ambiguous nature
  • Binary-labeling problem? What are the alternatives?

Conclusion (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 29

slide-28
SLIDE 28
  • In what we called as “adversarial setting”
  • Effectively circumvent app vetting safeguards (especially ML-based ones)
  • Repackaging benign apps used during training

Conclusion (cont'd)

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 30

slide-29
SLIDE 29

31

Thank You

Any questions?

slide-30
SLIDE 30
  • Working on a solution based on “Active Learning”

How it all began

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 32

slide-31
SLIDE 31
  • Working on a solution based on “Active Learning”

How it all began

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 33

slide-32
SLIDE 32
  • Working on a solution based on “Active Learning”

How it all began

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 34

slide-33
SLIDE 33
  • Working on a solution based on “Active Learning”

How it all began

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 35

slide-34
SLIDE 34
  • Working on a solution based on “Active Learning”

How it all began

Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France 36