M ARVIN : Efficient And Comprehensive Mobile App Classification - - PowerPoint PPT Presentation

m arvin efficient and comprehensive mobile app
SMART_READER_LITE
LIVE PREVIEW

M ARVIN : Efficient And Comprehensive Mobile App Classification - - PowerPoint PPT Presentation

M ARVIN : Efficient And Comprehensive Mobile App Classification Through Static and Dynamic Analysis Martina Lindorfer, Matthias Neugschwandtner, Christian Platzer SBA Research, Vienna, Austria IBM Research, Zurich, Switzerland International


slide-1
SLIDE 1

Martina Lindorfer, Matthias Neugschwandtner, Christian Platzer

SBA Research, Vienna, Austria IBM Research, Zurich, Switzerland International Secure Systems Lab, Vienna University of Technology, Austria

MARVIN: Efficient And Comprehensive Mobile App Classification
 Through Static and Dynamic Analysis

slide-2
SLIDE 2

Martina Lindorfer: MARVIN (COMPSAC 2015)

State of Mobile Malware

2

slide-3
SLIDE 3

3

?

Real or Fake Flappy Bird App?

Origin Reviews Permissions Antivirus Appverify

Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-4
SLIDE 4

Use Cases

4 2:30

SELECT * FROM ¡ apps ¡ ¡ ¡ ¡ ¡ ¡WHERE ¡ malice_score > 5.0 ¡ ¡ ¡ ¡ ¡AND ¡ has_nw_traffic = True ¡ ...

Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-5
SLIDE 5

Outline

  • App Classification
  • Evaluation
  • Future Work and Conclusion

5

App Classification

Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-6
SLIDE 6

Classification Goals

  • Use machine learning to classify Android apps
  • Address grey area between malware and goodware
  • Provide user with a malice score from 0 to 10
  • Address drawbacks of related work
  • Only consider static features
  • Trained and evaluated on very small dataset
  • Do not account for history of dataset
  • Long-term practicality through efficient retraining

6 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-7
SLIDE 7

Feature Extraction Dynamic Analysis Static Analysis Feature Selection Classification Training Model Malice Score TRAINING MODE CLASSIFICATION MODE End-User Apps Reference Apps

System Overview

7 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-8
SLIDE 8

Static vs. Dynamic Analysis

  • Static analysis…
  • code is not executed
  • all possible branches can be examined (in theory)
  • quite fast
  • Problems of static analysis…
  • undecidable in general case, approximations necessary
  • obfuscated & packed code
  • self-modifying code
  • code (down)loaded at runtime

8 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-9
SLIDE 9

Static vs. Dynamic Analysis

  • Dynamic analysis…
  • code is executed
  • sees behavior that is actually executed
  • sees dynamically loaded code
  • Problems of dynamic analysis…
  • in general, single path is examined
  • analysis environment possibly not invisible
  • scalability issues

9 Martina Lindorfer: MARVIN (COMPSAC 2015)

Combine features from static AND dynamic analysis

slide-10
SLIDE 10

Feature Extraction in ANDRUBIS

  • Extended ANDRUBIS app analysis sandbox [BADGERS2014]
  • Static Analysis
  • Required/Used permissions, Activities, Services, Receivers, …
  • Certificate metadata (owner, validity, …)
  • Included libraries
  • Dynamic Analysis
  • File/network/phone activities
  • Cryptographic operations
  • Leaked data
  • Loading of dynamic code (DEX and native code)
  • Output: Sparse feature vector of binary features

10 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-11
SLIDE 11

Feature Extraction Dynamic Analysis Static Analysis Feature Selection Classification Training Model Malice Score TRAINING MODE CLASSIFICATION MODE End-User Apps Reference Apps

System Overview

11 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-12
SLIDE 12

Classification Challenges

  • High-dimensional feature space
  • Explicit feature selection:


Order features by discriminative power (F-Score)

  • Implicit feature selection:


Order features by weights from classifier

  • Sparse data
  • Grey area between malware and goodware
  • Classifier outputs probability that sample belongs to class
  • Scale probability in interval [0,10]
  • Performance

12 Martina Lindorfer: MARVIN (COMPSAC 2015)

Experiments with SVM and linear classifier with different regularization methods

slide-13
SLIDE 13

Outline

  • App Classification
  • Evaluation
  • Future Work and Conclusion

13

Evaluation

Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-14
SLIDE 14

Evaluation Overview

  • Large training and testing sets
  • Set of goodware apps from Google Play Store
  • Set of known malware with AV labels from VirusTotal
  • 135,823 unique Android applications (15,741 known malware)

Goals:

  • 1. Evaluate accuracy of different classifiers
  • 2. Evaluate performance (market-scale classification)
  • 3. Evaluate long-term practicality
  • History of samples in dataset matters [ESSoS2015]
  • Estimate retraining intervals and efficiency
  • 4. Evaluate most distinguishing features

14 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-15
SLIDE 15

Classification Accuracy

15 Martina Lindorfer: MARVIN (COMPSAC 2015)

  • Accuracy of 99.83% overall
  • 0.0275% false positives
  • 1.3543% false negative
  • Bayesian detection rate of 98.24%
slide-16
SLIDE 16

Market-Scale Classification

16 Martina Lindorfer: MARVIN (COMPSAC 2015)


 —> Best config: 58.5 false alarms —> Worst config: 471 false alarms ~ 1,500,000 apps in Google Play

slide-17
SLIDE 17

Market-Scale Classification

17

Google Play: up to 45,000 new apps per month

Martina Lindorfer: MARVIN (COMPSAC 2015)

Our current capacity: 3,500 apps/day

slide-18
SLIDE 18

Long-Term Practicality (Less Features)

18 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-19
SLIDE 19

Long-Term Practicality (More Features)

19 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-20
SLIDE 20

Distinguishing Features

  • Gain insights into classification through F-Score/feature weights
  • Features most relevant for classification of malware:
  • Required/Used permissions
  • Certificates
  • SMS-related features
  • Information leaks
  • Dynamic code loading
  • Network activity and contacted hosts

20 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-21
SLIDE 21

Outline

  • App Classification
  • Evaluation
  • Future Work and Conclusion

21

Future Work and Conclusion

Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-22
SLIDE 22

Future Work

  • Dynamic features++
  • System-level events from native code analysis
  • More intelligent, user-like UI interactions
  • Static features ++
  • Meta info in app markets from AndRadar [DIMVA2014]
  • Interception of app installation process
  • Defence against analysis evasion (arms race)

22 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-23
SLIDE 23

Conclusion

  • Classification of Android apps using machine learning
  • Based on static AND dynamic features
  • Represented as a malice score
  • Large-scale evaluation on over 135,000 apps
  • Correctly classifies 98.24% of malware samples
  • Very low positives of < 0.04%
  • Retraining to maintain accuracy
  • Publicly available for submissions through web interface and

dedicated mobile app

23 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-24
SLIDE 24

Questions?

email 
 twitter http mlindorfer@iseclab.org
 andrubis@iseclab.org @iseclaborg http://www.iseclab.org/people/mlindorfer https://anubis.iseclab.org https://play.google.com/store/apps/details?id=org.iseclab.andrubis

24 Martina Lindorfer: MARVIN (COMPSAC 2015)

slide-25
SLIDE 25

References

[BADGERS2014] Martina Lindorfer, Matthias Neugschwandtner, Lukas Weichselbaum, Yanick Fratantonio, Victor van der Veen, Christian Platzer
 Andrubis - 1,000,000 Apps Later: A View on Current Android Malware Behaviors
 International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS), 2014. [ESSoS2015] Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, Yves Le Traon
 Are Your Training Datasets Yet Relevant? 
 International Symposium on Engineering Secure Software and Systems (ESSoS), 2015. [DIMVA2014] Martina Lindorfer, Stamatis Volanis, Alessandro Sisto, Matthias Neugschwandtner, Elias Athanasopoulos, Federico Maggi, Christian Platzer, Stefano Zanero, Sotiris Ioannidis
 AndRadar: Fast Discovery of Android Applications in Alternative Markets
 Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), 2014.

25 Martina Lindorfer: MARVIN (COMPSAC 2015)