Tricking binary trees: The (in)security of machine learning
JOE GARDINER (@THECYBERJOE)
machine learning JOE GARDINER (@THECYBERJOE) Who am I? Final year - - PowerPoint PPT Presentation
Tricking binary trees: The (in)security of machine learning JOE GARDINER (@THECYBERJOE) Who am I? Final year PhD student at Lancaster University President Lancaster University Ethical Hacking Group (LUHack) Joining University of
JOE GARDINER (@THECYBERJOE)
2016
sufficient
large
ML goes here
and x, length is always 9 characters.
using a clustering algorithm
IT’S JUST IF STATEMENTS RIGHT?
about new data
3 As 1 B -> Assign A
classes
fall of hyperplane
assign all data points to nearest centroid
clusters from layer below
you read at
behavior.
Legitimate traffic Malware traffic Malware traffic
Description Influence Causative Alter training process through influence
Exploratory Use probing or offline analysis to discover information Specificity Targeted Focus on a particular set of points Indiscriminate No specific target, flexible goal e.g. increase false positives Security Violation Integrity Result in attack points labelled as normal (false negatives) Availability Increase false positives and false negatives so system becomes unusable
Marco Barreno, Blaine Nelson, Russell Sears, Anthony D. Joseph, and J. D. Tygar. 2006. Can Machine Learning Be Secure? (2006)
Nedim Srndic and Pavel Laskov. Practical Evasion of a Learning-Based Classifier: A Case Study. (2014)
According to Biggio et al (for classifiers) 1. The attacker influence in terms of causative or exploratory. 2. Whether (and to what extent) the attack affects the class priors. 3. The amount of and which samples (training and testing) can be controlled in each class. 4. Which features, and to what extent, can be modified by the adversary Also applicable to clustering (with the exemption of (2)
Knowledge and Data Engineering (2014).
exact same algorithm as target learner, and may use an estimated dataset for training/testing
resembles benign point
bayes, neural networks
variants
values
malicious file to match
learner
Img Src: Practical Evasion of a Learning Based Classifier: A Case Study, Srndic and Laskov 2014
and trailer
viewers
modify 35
and target 7, attack string “obj obj” is injected
to 3 by adding “/Author(abc)”
Img Src: Practical Evasion of a Learning Based Classifier: A Case Study, Srndic and Laskov 2014
and limited knowledge (LK) cases
(identifying web pages by traffic volume)
version
benign points to the point where they are clustered together
hierarchical clustering
algorithms based on distance functions
clusters and ability to change feature values
find optimal attack point, changing features until point is misclassified
change feature values
reduced as the accuracy of the surrogate dataset is reduced
misclassification
Source: Adversarial Label Flips Attack on Support Vector Machines, Xiao et al 2012
learner
90% FP rate
the target email causes 60% FP rate
May be viable with a surrogate dataset, although not tested in the literature
while not under attack
points
C&C dataset and handwritten digits). Causes clusters to merge in all three cases
much larger system
(e.g. two rounds of clustering with different algorithms)
systems (usually just feature extraction and the algorithm itself) unclear how attacks will perform
selling points for papers)
etc), no clear metrics for easily evaluating security performance
PRE-PRINT PAPER AVAILABLE FOR FREE AT HTTP://EPRINTS.LANCS.AC.UK/83888/1/PAPER_ACMSURVEYS_CHANGES.PDF (OR EMAIL ME FOR PRINT VERSION J.GARDINER1@LANCASTER.AC.UK)