Alert classification to reduce false positives in intrusion - PowerPoint PPT Presentation

Alert classification to reduce false positives in intrusion detection P h D D e f e n s e P r e s e n t a t i o n / 't ʌ ·deu ʃ p ɪ e·'tr ʌ · ʃ ek / Tadeusz Pietraszek tadek@pietraszek.org Albert-Ludwigs-Universität Freiburg Fakultät für Angewandte Wissenschaften Dec 5, 2006

Thesis Statement Thesis at the intersection of machine learning and computer security . 1. Using machine learning it is possible to train classifiers in the form of human readable classification rules by observing the human analyst. 2. Abstaining Classifiers can significantly reduce the number of misclassified alerts with acceptable abstention rate and are useful in intrusion detection. 3. Combining supervised and unsupervised learning in a two-stage alert-processing system forms a robust framework for alert processing. 13.08.07 PhD Defense 2

Outline � Background and problem statement. 1. Adaptive learning for alert classification. 2. Abstaining classifiers. 3. Combining supervised and unsupervised learning. � Summary and conclusions. 13.08.07 PhD Defense 3

Intrusion Detection Background � Intrusion Detection Systems (IDSs) [And80,Den87] detect intrusions i.e. sets of actions that attempt to compromise the integrity , confidentiality, or availability of computer resource [HLMS90]. � IDS have to be effective (detect as many intrusions as possible) and keep false positives to the acceptable level, however, in real environments 95-99% alerts are false positives [Axe99, Jul01, Jul03]. � Eliminating false positives is a difficult problem: – intrusion may only slightly differ from normal actions (IDSs have limited context processing capabilities), – writing a good signature is a difficult task (specific vs. general), – actions considered intrusive in one systems, may be normal in others, – viewed as a statistical problem – base rate fallacy. 13.08.07 PhD Defense 4

Global picture – IDS monitoring � Manual knowledge acquisition is not used for classifying alerts – Fact 1: Large database of historical alerts. – Fact 2: Analyst typically analyzes alerts in real time. 13.08.07 PhD Defense 5

Problem statement � Given – A sequence of alerts (A 1 , A 2 , …, A i , …) in an alert log L – A set of classes C = {C 1 , C 2 , …, C n } – An intrusion detection analyst O sequentially and in real-time assigning classes to alerts – A utility function U describing the value of a classifier to the analyst O � Find – A system classifying alerts, maximizing the utility function U • Misclassified alerts • Analyst’s workload • Abstentions 13.08.07 PhD Defense 6

ALAC (Adaptive Learner for Alert Classification) � Automatically learn an alert classifier based on analyst’s feedback using machine learning techniques. Alerts Classified Alerts Feedback Alert Classifier IDS ID Analyst Update Rules Rules Params Background Training Knowledge Examples Machine Learning Model Update Recommender mode • Misclassifications 13.08.07 PhD Defense 8

ALAC (Adaptive Learner for Alert Classification) Alerts Feedback Alert No Confident? Classifier IDS Yes ID Analyst Process Update Rules Rules Params Background Training Knowledge Examples Machine Learning Agent mode Model Update • Misclassifications • Analyst’s workload 13.08.07 PhD Defense 9

Why does learning work and why can it be difficult? � The approach hinges on the two assumptions – Analysts are able to classify most of alerts correctly – It is possible to learn a classifier based on historical alerts � Difficult learning problem 1. Use analyst’s feedback (learning from training examples). 2. Generate the rules in a human readable form (correctness can be verified). 3. Be efficient for large data files. 4. Use background knowledge. 5. Asses the confidence of classification. 6. Work with skewed class distributions / misclassification costs. 7. Adapt to environment changes. 13.08.07 PhD Defense 10

Requirements - revisited 1. Core algorithm - RIPPER. 2. Rules in readable form. 3. Efficient to work on large datasets. 4. Background knowledge represented in attribute-value form. 5. Confidence – rule performance on testing data with Laplace correction. 6. Cost Sensitivity – weighted examples. 7. Incremental Learning – “batch incremental approach” – batch size depends on the current classification accuracy. 13.08.07 PhD Defense 11

Results - Thesis Statement (1) � Adaptive Learner for Alert Classification (ALAC) • Human feedback, background knowledge, ML techniques. – Recommender Mode (focusing on the misclassifications in the utility function U ). • Good performance: fn=0.025 , fp=0.038 (DARPA), fn = 0.003 , fp = 0.12 (Data Set B). – Agent Mode (focusing on the misclassifications and the workload in the utility function U ). • Similar number of misclassifications and more than 66% of false positives are automatically discarded. – Many rules are interpretable. 13.08.07 PhD Defense 12

Metaclassifier A α , β � Abstaining binary classifier A is a classifier that in certain case can refrain from classification. We c onstruct it as follows: ⎧ + = + C ( x ) α C α C β Result ⎪ ) ( ) ( = = − ∧ = + A C C ⎨ ( x ) ? ( x ) ( x ) α β α β , + + + ⎪ − = − C ( x ) ⎩ β - + ? + - Impossible where C α , C β is such that: ∀ = + ⇒ = + C C - - - x : ( ( x ) ( x ) ) α β ∧ = − ⇒ = − C C ( ( x ) ( x ) ) β α (Conditions used by Flach&Wu [FW05] in their work on repairing concavities of the ROC curves, met in particular if C α , C β are constructed from a single scoring classifier R). � Can we optimally select C α , C β ? 13.08.07 PhD Defense 14

“Optimal” Metaclassifier A α , β � How do we compare binary classifiers and abstaining classifiers? How to select an optimal classifier? � No clear answer – Use cost based model (Cost-Based Model) (extension of [Tor04] – Use boundary conditions: Maximum number of instances classified as “ ? ” (Bounded- • Abstention Model) • Maximum misclassification cost (Bounded-Improvement Model) 13.08.07 PhD Defense 15

Cost-based model – a simulated example Misclassification cost for different Misclassification cost for different ROC curve with two optimal classifiers combinations of A and B combinations of A and B 1.0 Classifier B 0.5 0.5 0.8 0.4 0.4 Cost Cost 0.6 0.3 0.3 TP Classifier A 0.4 c N ′ = 23 f ( fp ) 0.2 0.2 β − ROC c c P 0.0 0.0 12 13 1.0 1.0 0.2 0.2 0.2 − 0.8 0.8 c c N 0.4 0.4 ′ = 21 23 0.6 0.6 f ( fp ) FP(b) FP(b) 0.6 0.6 α ROC F F 0.4 0.4 P P c P ( ( a a ) ) 13 0.8 0.8 0.0 0.2 0.2 1.0 1.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 FP 13.08.07 PhD Defense 16

Bounded models � Problem: 2x3 cost matrix is not always given and would have to be estimated. However, classifier is very sensitive to c 13 , c 23 . � Finding other optimization criteria for an abstaining classifier using a standard cost matrix. – Calculate misclassification costs per classified instance. � Follow the same reasoning to find the optimal classifier. 13.08.07 PhD Defense 17

Bounded models equation � Obtained the following equation, determining the relationship between k and rc for as a function of classifiers C α , C β . ) ( ) 1 = + rc FP c FN c ( )( α β − + 21 12 1 k N P ( ( ) ( ) ) 1 = − + − k FP FP FN FN β α α β + N P – Constrain k , minimize rc → bounded-abstention – Constrain rc , minimize k → bounded-improvement � No algebraic solution, however, for a convex ROCCH we can show an efficient algorithm. 13.08.07 PhD Defense 18

Bounded-abstention model � Among classifiers abstaining for no more than a fraction of k MAX instances find the one that minimizes rc . � Useful application in real-time processing instances where the non-classified instances will be processed by another classifier with a limited processing speed. � Algorithm: Three-step derivation – Step 1: Show an (impractical) solution for a smooth ROCCH and equality k = k MAX . – Step 2: Extend for a inequality k ≤ k MAX – Step 3: Derive an algorithm for ROCCH. 13.08.07 PhD Defense 19

Alert classification to reduce false positives in intrusion - PowerPoint PPT Presentation

Alert classification to reduce false positives in intrusion detection P h D D e f e n s e P r e s e n t a t i o n / 't deu p e'tr ek / Tadeusz Pietraszek tadek@pietraszek.org Albert-Ludwigs-Universitt Freiburg

Post hoc bounds on false positives using Post hoc bounds on false positives using reference

# of true positives true positive rate = # of known positives (Proportion of actual positives

# of true positives true positive rate = # of known positives (Proportion of actual positives

PUBLIC POLICY TOWARD ABUSE OF FIRM DOMINANCE Outline Public policy: false positives and

False fasting is driven by pride False fasting is driven by pride False fasting is

Building Your Own WAF as a Service and Forgetting about False Positives 1 Building Your Own WAF

Alert acknowledgement With Alertmanager ukasz Mierzwa Alert states in Alertmanager An alert

Scoring model for IoCs by combining open intelligence feeds to reduce false positives Authors:

Pedestrian Alert System (PAS) Improve safety for pedestrians with PAS - a proximity alert system

ALERT Network Santiago Repeater ALERT Repeater Building Upgrades and Lessons Learned ALERT

False Layers Delmarva Variant Strain Phylogenetic Tree Cloacal/Pharyngal One of these 50 week

FALSE CREEK SOUTH TOPIC WORKSHOP 2: SUSTAINABILITY Saturday, December 2, 2017 | False Creek

PARCEL ALERT Presented by Property Solutions PARCEL ALERT Maximizing Logistics Industry Facts

Rheumatoid Arthritis Diagnosis Avoiding CCP False Positives Through Test Selection Dr. Teresa

Evaluating Sensitive Question Techniques An Approach that Detects False Positives oglinger 1

False-Positives, p-Hacking, Statistical Power, and Evidential Value Leif D. Nelson University of

Stream Sequential Pattern Mining with Precise Error Bounds Luiz F. Mendes 1,2 Bolin Ding 1 Jiawei

Inferring XML Schema Definitions from XML Data Geert Jan Bex, Frank Neven and Stijn Vansummeren

A Simula)on of Document Detec)on Methods and Reducing False

Interest Points Computer Vision Jia-Bin Huang, Virginia Tech Many slides from N Snavely, K.

I2RS RIB Route Example Sue Hares i2RS Client config Client Hackathon NETCONF CLI/GUI with

Compiler Construction Compiler Construction 1 / 111 Mayer Goldberg \ Ben-Gurion University

Lecture 2 Lecture 2 One One- -way Joist way Joist Slab System Slab System Dr. Hazim

CSE543 - Computer and Network Security Module: Intrusion Detection Professor Trent Jaeger