malware datasets
play

Malware Datasets Aleieldin Salem and Alexander Pretschner Technische - PowerPoint PPT Presentation

Poking the Bear: Lessons Learned from Probing Three Android Malware Datasets Aleieldin Salem and Alexander Pretschner Technische Universitt Mnchen Garching bei Mnchen {salem, pretschn @in.tum.de} Montpellier, 04.09.2018 Abstract


  1. Poking the Bear: Lessons Learned from Probing Three Android Malware Datasets Aleieldin Salem and Alexander Pretschner Technische Universität München Garching bei München {salem, pretschn @in.tum.de} Montpellier, 04.09.2018

  2. Abstract • Stumbled upon some inconsistencies while experimenting with different Android malware datasets • Investigate the source of discrepancies • A series of experiments performed on three Android malware datasets • Some (interesting) findings 2 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  3. Background • Working on a solution based on “Active Learning” • Evaluating on Malgenome vs. Piggybacking • Datasets of Repackaged/Piggybacked Malware • Malgenome = great results! • Piggybacking = mediocre results? • Trying on AMD and Drebin • Works like a charm! • What the .. ? 3 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  4. Research Questions 4 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  5. Dissection Experiments • Infer some information about the malicious instances found in: • Malgenome (Zhou et al. 2012) • Piggybacking (Li et al. 2017) • AMD (Wei et al. 2017) • VirusTotal detection rates, involved marketplaces, malware types, etc. • Backed up by information in Euphony (Hurier et al. 2017) 5 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  6. Dissection Experiments • Backed up by information in Euphony (Hurier et al. 2017) around 50 More information: https://androidmalwareinsights.github.io 7 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  7. Dissection Experiments • Backed up by information in Euphony (Hurier et al. 2017) around 50 More information: https://androidmalwareinsights.github.io 8 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  8. Dissection Experiments • Backed up by information in Euphony (Hurier et al. 2017) More information: https://androidmalwareinsights.github.io 9 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  9. Dissection Experiments (cont'd) • What about repackaging? • What is in fact the definition of repackaging? • E.g. must the app be decompiled/disassembled? • Wei et al. [authors of AMD] claim it has been declining • How to quickly infer whether an app is repackaged? • Simple technique using compiler fingerprinting (with APKiD 1 ) 1 https://rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/ 10 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  10. Dissection Experiments (cont'd) • Simple technique using compiler fingerprinting (with APKiD 1 ) • Legitimate developer = access to source code = using IDE • Compile app using Android SDK’s dx and dexmerge compilers • If app compiled using other compilers (e.g., dexlib ) = repackaged = no access to source code != legitimate developer? • Different compilers leave unique marks on the compiled code 1 https://rednaga.io/2016/07/31/detecting_pirated_and_malicious_android_apps_with_apkid/ 11 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  11. Dissection Experiments (cont'd) • What about repackaging? • What is in fact the definition of repackaging? 12 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  12. Dissection Experiments (cont'd) • What about repackaging? • What is in fact the definition of repackaging? lazy developers? wrong labeling? 13 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  13. Dissection Experiments (cont'd) • What about repackaging? • What is in fact the definition of repackaging? 86% repackaged?! declining? 14 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  14. Detection Experiments • How do conventional detection techniques fare against different datasets? • Conventional: • Machine learning classifiers • Trained with static/dynamic features • Validated using K-fold CV 15 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  15. Detection Experiments • How do conventional detection techniques fare against different datasets? • Ensemble classifier • KNN, with K = {10, 25, 50, 100, 250, 500} • Random Forests with estimators = {10, 25, 50, 75, 100} • Support Vector machine with linear kernel • 10-Fold CV • Trained with static/dynamic features • Static: Extracted from APK using androguard • Dynamic: Running apps within VM + recording issued API calls 16 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  16. Detection Experiments • How do conventional detection techniques fare against different datasets? 17 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  17. Detection Experiments • How do conventional detection techniques fare against different datasets? • But why? • Piggybacking = original, benign apps + repackaged, malicious versions • Majority = Adware • ~70% of misclassified apps = Adware 18 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  18. Detection Experiments (cont'd) • What is the lifespan of malware datasets? • Can we use an old/new dataset to detect newer/older datasets? • Train voting classifier using dataset A, and test using dataset B 19 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  19. Detection Experiments (cont'd) • What is the lifespan of malware datasets? • Can we use an old/new dataset to detect newer/older datasets? • Train voting classifier using dataset A, and test using dataset B 20 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  20. Adversarial Experiments • How can an adversary make use of this? • Consider a marketplace using a ML classifier as its “bouncer” • The classifier is trained using malicious + benign apps • If I [adversary] figure out one (or more) of the benign apps • Repackage benign apps + upload to marketplace • Classifier will be confused!! 21 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  21. Adversarial Experiments (cont'd) • How can an adversary make use of this? • If I [adversary] figure out one (or more) of the benign apps • Many people presume apps on Google Play to be benign • Use Google Play apps as benchmark/reference for benign behaviors • Adversary make the same assumption! 22 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  22. Adversarial Experiments (cont'd) • Piggybacking dataset = benign apps + repackaged versions • Train voting classifier with dataset A, and test with dataset B • Observe the effect of adding “Original” segment of Piggybacking on classification accuracy 23 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  23. Adversarial Experiments • Observe the effect of adding “Original” segment of Piggybacking on classification accuracy 24 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  24. Adversarial Experiments • Observe the effect of adding “Original” segment of Piggybacking on classification accuracy 25 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  25. Conclusion • Trojans appear to be most popular malware type • Adware is the go-to model for repackaging • Repackaging is losing popularity • Malicious apps continue to bypass Google Play’s safeguards 27 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  26. Conclusion (cont'd) • AMD is 5-6 years younger than Malgenome • Yet, apps from Malgenome are still out there! • Malware authors prefer re-using/building on older malware • Five years to use a dataset for training? 28 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  27. Conclusion (cont'd) • Already answered that in the detection experiments. • Adware most challenging to detect = Ambiguous nature • Binary-labeling problem? What are the alternatives? 29 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  28. Conclusion (cont'd) • In what we called as “adversarial setting” • Effectively circumvent app vetting safeguards (especially ML-based ones) • Repackaging benign apps used during training 30 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  29. Thank You Any questions? 31

  30. How it all began • Working on a solution based on “Active Learning” 32 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  31. How it all began • Working on a solution based on “Active Learning” 33 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  32. How it all began • Working on a solution based on “Active Learning” 34 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  33. How it all began • Working on a solution based on “Active Learning” 35 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

  34. How it all began • Working on a solution based on “Active Learning” 36 Alei Salem (TUM) | A-Mobile 2018 | Montpellier, France

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend