Malware Defense II
TDDD17 – Information Security, Second Course
Alireza Mohammadinodooshan
Department of Computer and Information Science Linköping University
Malware Defense II TDDD17 Information Security, Second Course - - PowerPoint PPT Presentation
Malware Defense II TDDD17 Information Security, Second Course Alireza Mohammadinodooshan Department of Computer and Information Science Linkping University TDDD17 - Malware Defense II 1/31/2020 2 What Has Been Covered Malware
Alireza Mohammadinodooshan
Department of Computer and Information Science Linköping University
– Different types of functionality – Different infection Methods
– Signatures based detection – More complex signatures and static heuristics – Static unpacking and emulation – Cloud-based detection – Machine learning detection
1/31/2020 2 TDDD17 - Malware Defense II
– Specific challenges – Specific risks – Security models and their effect on malware detection
– Detection countermeasures
– Motivation – Terminology – Learning types – Machine learning-based malware detection challenges
1/31/2020 3 TDDD17 - Malware Defense II
1/31/2020 4 TDDD17 - Malware Defense II
https://gs.statcounter.com/os-market-share
target for the malware authors.
to 20 million
1/31/2020 5 TDDD17 - Malware Defense II
https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf
– Phone – Tablet – Watch – TV
1/31/2020 6 TDDD17 - Malware Defense II
– PawnStorm.A
– YiSpecter
– Android/Filecoder.C
your local files in exchange for a ransom between $94 and $188. – Plankton
and sends premium SMS messages
1/31/2020 7 TDDD17 - Malware Defense II
https://forensics.spreitzenbarth.de/
1. Lots of users
– Botnets
– Banking info – Personal Photos – Contact info
– 4G – Wifi – Bluetooth
1/31/2020 8 TDDD17 - Malware Defense II
– Limited capabilities for on-device detection
– Repackaged apps
and re-bundle and publish the app
– Fake apps also exist!
1/31/2020 9 TDDD17 - Malware Defense II
in apps is moved to app stores to analyze the apps
– While for the 3rd party stores and somehow even for the google play store, this is a mistrust(we will elaborate on this …) – Attackers also have the motivation to deliver their malware through stores(official or third party) – Drive-by-downloads also exist, but are rare
have a better understanding of its vulnerabilities if they exist
1/31/2020 10 TDDD17 - Malware Defense II
compared to PC malware due to stronger isolation between apps
– Memory isolation – User isolation
limited access to the system as well as other apps resources
1/31/2020 11 TDDD17 - Malware Defense II
– Battery draining – Disabling system functions
– Sending SMS or MMS messages to premium numbers – Dialing premium numbers – Deleting important data
1/31/2020 12 TDDD17 - Malware Defense II
Peng, S., Yu, S., & Yang, A. (2013). Smartphone malware and its propagation modeling: A
– Privacy – Stealing bank account information
– Denial-of-service (DoS)
1/31/2020 13 TDDD17 - Malware Defense II
– Startup and updates are authorized
– File-level data protection uses strong encryption keys derived from the user’s unique passcode.
– Application run in their sandboxes. – More important than this …
1/31/2020 14 TDDD17 - Malware Defense II
https://developer.apple.com/app-store/review/
vetting process – Manual testing – Static analysis – Apps can not do actions outside of what they claim
1/31/2020 15 TDDD17 - Malware Defense II
– provides standard interfaces that make the device hardware capabilities available to the higher-level Java API framework.
– For new Android devices, each app runs in its
Android Runtime (ART). Before ART, the Dalvik VM has been used
– It is possible to have compiled c/c++ code packaged with an Apk which can be called through Java Native Interface (JNI)
1/31/2020 16 TDDD17 - Malware Defense II
https://developer.android.com/guide/platform
1/31/2020 17 TDDD17 - Malware Defense II
Native
https://justamomentgoose.wordpress.com/2013/06/04/android-started-note-2-android-file- apk-decompile/
system regarding this app – Minimum android API – Linked libraries – Components, activities, services, … – Required permissions
1/31/2020 18 TDDD17 - Malware Defense II
– Android automatically assigns a unique UID to each app at installation – App is allowed to access :
– More access :
– <uses-permission android:name="android.permission.READ_PHONE_STATE " />
1/31/2020 19 TDDD17 - Malware Defense II
– Does not require an exhaustive app vetting process
– Apps are dynamically tested with a Google security service known as Bouncer.
– Researchers have shown the feasibility of fingerprinting Bouncer
– Malware may be able to bypass Bouncer
detect that they are running in bouncer they do not show their actual behavior
1/31/2020 20 TDDD17 - Malware Defense II
– Signature-Based Technique
– Permission-Based Technique
malware samples – Dalvik Bytecode-Based Technique
samples(API calls, data flows,…)
1/31/2020 21 TDDD17 - Malware Defense II
– Sequence of system calls – Accessed files
1/31/2020 22 TDDD17 - Malware Defense II
– Obfuscation
harder
– Packing
– Sandbox detection
– E.g. do not support GPS or do not have a real GPS accuracy
1/31/2020 23 TDDD17 - Malware Defense II
– garble the key identifiers used in their source code. e.g., ’a’, ‘b’, ‘aa’, ‘ab’, ‘ac’
– Replacing the constant strings in the dex file with their encrypted form and adding the code to decrypt them on the fly
– Injecting dead code – Re-ordering statements – Inserting opaque predicates.
1/31/2020 24 TDDD17 - Malware Defense II
var1 = 10 var2 = [var1 for i in range(10)] if var1 == var2[0]:
1/31/2020 25 TDDD17 - Malware Defense II
couldn’t keep up with the emerging flow of malware. – Zero-day malware
relation between the app features is hard to find for the human
– A procedure we use to prioritize the apps that should be examined
1/31/2020 27 TDDD17 - Malware Defense II
computers the ability to learn without being explicitly programmed – Learning from the data – It is used when we want to (explicitly or implicitly) learn the relation using some available data (known as training data)
1/31/2020 28 TDDD17 - Malware Defense II
model
model
– Unsupervised – Supervised
1/31/2020 29 TDDD17 - Malware Defense II
– Given X(X1 and X2 in the following figure) – The goal is to discover the structure of the data
– Application » Malware family detection
1/31/2020 30 TDDD17 - Malware Defense II
X1
X1 X2
fig)
– we try to find the relation between them(X and y)
– Malware detection
– Discriminative methods
– Anomaly detection
1/31/2020 31 TDDD17 - Malware Defense II
X1
X1 X2
– When y is a discrete variable
– When Y is a continuous variable
family(E.g. can be used for triaging the app)
1/31/2020 32 TDDD17 - Malware Defense II
1/31/2020 33 TDDD17 - Malware Defense II
1/31/2020 34 TDDD17 - Malware Defense II
Benign apps Malwares Feature extraction Benign app features Malware features Training Predictive model Predictive model Unknown app Testing Benign Malware
– Example of bad practice
apps but all the malware samples we collected have a size between 1-2 megabytes which is not representative of real malware
– Can have disastrous effects as we have seen in the previous lecture
1/31/2020 35 TDDD17 - Malware Defense II
– Examples
– The header values
– Set of Privileges
– Obfuscation status – Feature selection methods can be used to limit the number of features
1/31/2020 36 TDDD17 - Malware Defense II
during the training phase are optimized using the training data – This optimization happens based on a particular metric. – This particular metric is usually the classification
1/31/2020 37 TDDD17 - Malware Defense II
– We have a set of (Xi , Yi) training points – We want to find the regression line
points
– aopt ? – bopt ?
1/31/2020 38 TDDD17 - Malware Defense II
X1
X Y
b
– For each point Xi compute the response Fi
– Compute ERRtot = SUM((Fi - Yi)2) – Now we can compute aopt and bopt
– Closed form – Optimization
– For the classification, for example
1/31/2020 39 TDDD17 - Malware Defense II
X1
X Y
b
b) we test it on testing data. – To see whether it can generalize to unseen data – Or it has just memorized the training data
– We train a model by minimizing its error on the training data – The training error is different from the testing error – This testing error value is computed on test data.
1/31/2020 40 TDDD17 - Malware Defense II
1/31/2020 41 TDDD17 - Malware Defense II
– The model is unable to obtain a low error even on the training set.
– The training error is small enough but not the testing error.
1/31/2020 42 TDDD17 - Malware Defense II
1/31/2020 43 TDDD17 - Malware Defense II
https://blog.booleanhunter.com/using-machine-learning-to-predict-the-quality-of-wines/
1/31/2020 44 TDDD17 - Malware Defense II
Model Capacity Error
https://www.kaggle.com/dansbecker/underfitting-and-overfitting
– Each observation in our dataset has the opportunity of being tested
– we divide the dataset to k sets – For k rounds, we go over the dataset and in each round:
we can tune the model capacity
1/31/2020 45 TDDD17 - Malware Defense II
http://ethen8181.github.io/machine-learning/model_selection/model_selection.html
are benign – Now a naïve malware detection classifier which classifies all the samples as being benign reaches an accuracy of 99 percent – Probably no other model can reach this optimal accuracy – But is accuracy a good metric to train the model on ? – Evidently not. This model can not detect any malware!
1/31/2020 46 TDDD17 - Malware Defense II
–
𝑈𝑄+𝑈𝑂 𝑈𝑄+𝐺𝑄+𝑈𝑂+𝐺𝑂
–
𝑈𝑄 𝑈𝑄+𝐺𝑄
–
𝑈𝑄 𝑈𝑄+𝐺𝑂
–
2 ∗ 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜 ∗ 𝑠𝑓𝑑𝑏𝑚𝑚 𝑞𝑠𝑓𝑑𝑗𝑡𝑗𝑝𝑜+𝑠𝑓𝑑𝑏𝑚𝑚
1/31/2020 47 TDDD17 - Malware Defense II
https://en.wikipedia.org/wiki/Precision_and_recall
machine learning methods. – Recall the bad practice for data collection
points – We cannot collect representative data there!
1/31/2020 48 TDDD17 - Malware Defense II
– Lots of users, privacy concerns, Widespread access to networks, ...
– System damage, Economic risks, ....
– We discussed the differences between iOS and android vetting processes – We have seen how it effects the malware prevalence in each platform
– Static, dynamic, hybrid
1/31/2020 49 TDDD17 - Malware Defense II
– Supervised – Unsupervised
– Collecting training data – Extracting features from training data – Training the model – Validating the model
– Under- and Over-fitting – Imbalanced datasets – Performance evaluation measures – Dataset quality
1/31/2020 50 TDDD17 - Malware Defense II