Anomaly Detection in Computer Networks Carolina Fortuna, Bla - - PowerPoint PPT Presentation
Anomaly Detection in Computer Networks Carolina Fortuna, Bla - - PowerPoint PPT Presentation
Anomaly Detection in Computer Networks Carolina Fortuna, Bla Fortuna, Mihael Mohoric Outline Intrusion Detection Evaluation Criteria Scenarios Data Acquisition Results Datasets Conclusions Related Work Data
Outline
- Intrusion Detection
- Data Acquisition
- Datasets
- Related Work
- Data
Preprocessing
- Evaluation Criteria
- Scenarios
- Results
- Conclusions
Intrusion Detection
- The intrusion detector learning task is to
build a predictive model (i.e. a classifier) capable of distinguishing between ``bad'' connections, called intrusions or attacks, and ``good'' normal connections.
http://www.acm.org/sigs/sigkdd/kddcup/index.php?section=1999&method=task
Data Acquisition
- Nine weeks of raw TCP dump data for
a local-area network (LAN) simulating a typical U.S. Air Force LAN.
- TCP dump data was manipulated,
prepared for the learning contest
Data Acquisition
- Attacks fall into four main categories:
– DOS: denial-of-service – R2L: unauthorized access from a remote machine – U2R: unauthorized access to local superuser (root) privileges – probe: surveillance and other probing
Training Datasets
Full dataset:
- 4898431 instances
- 42 attributes
– 8 nominal – 34 continuous
- Nominal target
variable
– 23 values
- No missing values
For the experiments:
- a 10% subset of the full
dataset
- a 100.000 instance
subset
- a filtered subset
Test Dataset
- 311029 instances
- 42 attributes
– 8 nominal – 34 continuos
- Nominal target variable
– 38 values with 17 values that are not in the training dataset (Table 2)
- No missing values
Data Analysis
Attack category Datasets 10%
- Rel. no of
Examples full
- Rel. no of
examples test
- Rel. no of
examples probe 4107 0,8% 41102 0.8% 4166 1.3% DoS 391458 79.2% 3883370 79.2% 229853 73.9% U2R 52 0.01% 52 0.001% 70 0.2% R2L 1126 0.2% 1126 0.02% 16347 5.2%
1 10 100 1000 10000 100000 1000000 10000000 instances [log] normal probe DoS U2R R2L categories
Traffic categories
10% Full Test
Data Analysis
10%
- Rel. no of
examples full Rel. no of examples test Rel. no of examples Normal traffic 97278 19.6% 972781 19.8% 60593 19.4%
Data Preprocessing
- Two class input data for SVM
- Continuous values, encode nominal
values in a binary manner
- Normalized feature values
- Instance format: “class feature_No:value
… feature_No:value”; features having zero value can be omitted.
Related Work
- Mixture of bagging a boosting with
modified sampling and replacement.
- Decision trees.
- Dynamic subset selection and self-
- rganizing maps.
- K-nearest neighbors
- SVM
- …
Support Vector Machines
- Determines a
hyperplane that is able to separate positive examples from negative examples.
- A linear classifier
known as the maximum margin classifier.
Evaluation Criteria
normal probe DOS U2R R2L normal 1 2 2 2 probe 1 2 2 2 DOS 2 1 2 2 U2R 3 2 2 2 R2L 4 2 2 2
j i j i j i
c cm C Cm
, , ,
) (
j i j i j i
c cm ACTE
, , ,
311029 1
- Criteria used for the
KDD Cup 1999
- Confusion matrix
- Average cost per
test example computed using the entrywise product between the cost and the confusion matrices.
The One-to-all Scenario
The One-to-one Scenario
The One-to-all3Categ Scenario
Results (100k instances)
One-to-all One-to-one One-to-all- 3categ ACTE 0.5306 1.6656 0.2641 Detection rate 99.2% 95.0% 90.3% Diagnosis rate 91.3% 3.3% 90.1% False alarm rate 99.6% 12.8% 1.6%
- One-to-one IDS has the poorest ACTE, One-to-
all3categ IDS has the best ACTE
- One-to-all IDS
– high detection rate, good diagnosis rate, very high false alarm rate: classifies most of the normal traffic as intrusion. – doesn’t detect probe, R2L and U2R – confuses DoS with normal quite often – needs parameter optimization
Results (100k instances)
One-to-all One-to-one One-to-all- 3categ ACTE 0.5306 1.6656 0.2641 Detection rate 99.2% 95.0% 90.3% Diagnosis rate 91.3% 3.3% 90.1% False alarm rate 99.6% 12.8% 1.6%
- One-to-one scenario has lower false alarm rate,
poor diagnosis performance: detects most of the alarms, but it doesn’t classify them correctly.
- The high ACTE seems to come from misclassifying
DoS attacks for R2L attacks.
- One-to-all-3categ IDS gives the best results: good
ACTE, good detection and diagnosis rates and low false alarm rate.
Results
- Next:
– tune parameters using 10 fold cross validation – use larger training dataset
U2R(j=10)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 c % F1 sigma_F1 BEP sigma_BEP F1 0.633831 0.63881 0.628171 0.537989 0.124624 sigma_F1 0.154595 0.133095 0.140434 0.123092 0.0824277 BEP 0.578254 0.578254 0.594087 0.638532 0.651032 0.548254 0.563135 sigma_BEP 0.185968 0.185968 0.181891 0.150817 0.154317 0.101332 0.21024 0.01 0.1 1 10 100 1000
Results (10% dataset)
One-to-all One-to-one One-to-all- 3categ ACTE 0.2625 0.2479 0.2653 Detection rate 90.2% 90.9% 90.3% Diagnosis rate 90.1% 90.7% 90.1% False alarm rate 1.6% 2.02% 1.6%
- One-to-all IDS improved the overall
performance as well as the detection, diagnosis and false alarm rates.
- One-to-one IDS also improved: it has the
smallest ACTE and good detection and diagnosis rate.
Results (10% dataset)
One-to-all One-to-one One-to-all- 3categ ACTE 0.2625 0.2479 0.2653 Detection rate 90.2% 90.9% 90.3% Diagnosis rate 90.1% 90.7% 90.1% False alarm rate 1.6% 2.02% 1.6%
- One-to-all-3categ IDS:
– unexpected result – there is no improvement in the detection, diagnosis and false alarm rates.
Results (10% dataset)
normal probe DOS U2R R2L % normal 59611 300 678 4 98.3 probe 1053 2922 191 70.1 DOS 7242 22 222589 0 96.8 U2R 54 11 5 15.7 R2L 15959 16 2 2 368 2.2 % 71.0 89.6 99.6 64.7 98.6 normal probe DOS U2R R2L % normal 59367 211 818 12 185 97.9 probe 901 3002 148 115 72.0 DOS 7047 52 222754 0 96.9 U2R 32 32 6 45.7 R2L 14791 11 2 11 1532 9.3 % 72.2 91.6 99.5 58.1 83.3 normal probe DOS U2R R2L % normal 59593 313 672 5 10 98.3 probe 767 3120 181 6 92 74.8 DOS 7113 324 222406 0 10 96.7 U2R 60 5 5 7.1 R2L 16186 11 2 1 147 0.8 % 71.1 82.8 99.6 29.4 55.6
Results (10% dataset)
normal probe DOS U2R R2L % normal 59611 300 678 4 98.3 probe 1053 2922 191 70.1 DOS 7242 22 222589 0 96.8 U2R 54 11 5 15.7 R2L 15959 16 2 2 368 2.2 % 71.0 89.6 99.6 64.7 98.6 normal probe DOS U2R R2L % normal 59367 211 818 12 185 97.9 probe 901 3002 148 115 72.0 DOS 7047 52 222754 0 96.9 U2R 32 32 6 45.7 R2L 14791 11 2 11 1532 9.3 % 72.2 91.6 99.5 58.1 83.3 normal probe DOS U2R R2L % normal 59593 313 672 5 10 98.3 probe 767 3120 181 6 92 74.8 DOS 7113 324 222406 0 10 96.7 U2R 60 5 5 7.1 R2L 16186 11 2 1 147 0.8 % 71.1 82.8 99.6 29.4 55.6
Results (10% dataset)
normal probe DOS U2R R2L % normal 59611 300 678 4 98.3 probe 1053 2922 191 70.1 DOS 7242 22 222589 0 96.8 U2R 54 11 5 15.7 R2L 15959 16 2 2 368 2.2 % 71.0 89.6 99.6 64.7 98.6 normal probe DOS U2R R2L % normal 59367 211 818 12 185 97.9 probe 901 3002 148 115 72.0 DOS 7047 52 222754 0 96.9 U2R 32 32 6 45.7 R2L 14791 11 2 11 1532 9.3 % 72.2 91.6 99.5 58.1 83.3 normal probe DOS U2R R2L % normal 59593 313 672 5 10 98.3 probe 767 3120 181 6 92 74.8 DOS 7113 324 222406 0 10 96.7 U2R 60 5 5 7.1 R2L 16186 11 2 1 147 0.8 % 71.1 82.8 99.6 29.4 55.6
Results
- Tradeoff: the more accurate the SVM model for
classifying R2L connections, the poorest in classifying normal connections and the other way around.
- One-to-all-3categ IDS performs worse than the other two
IDSs in classifying R2L and U2R attacks, and performs slightly better on classifying probe attacks.
- Even though we introduced the One-toall-3categ IDS in
- rder to perform better at separating the three minority
classes from the two major ones, it seems like the model built using SVM is not accurate enough so that this voting system proves efficient.
Evaluation
- One-to-one IDS with 0.2479 ACTE would
rank 8th in the KDD Cup 1999.
- Less accurate then other results in the
literature, but more simple system.
- Higher accuracy can be obtained by
increasing the complexity of the system:
– SVMs with different kernels – Hybrid systems
- that combine several machine learning methods
- combine machine learning methods with the more
classical ones based on signatures
Conclusions
- Very large and unbalanced dataset.
- Proposed a two level voting IDS that
proved to perform well on a small training set but performed relatively poor when the training dataset increased.
- Attacks such as R2L and U2R that result
in small number of traffic packets seem to pose a real challenge for detection and diagnosis.
- Usually simplicity and speed are traded for
accuracy and machine learning methods are complemented by traditional signature based methods.
Bibliography
- KDD Cup 1999 Task Description,
http://kdd.ics.uci.edu/databases/kddcup99/task.html
- Bernhard Pfahringer, Winning the KDD99 Classification Cup: Bagged Boosting,
ACM SIGKDD Explorations Newsletter, Volume 1, Issue 2, p. 65-66 January 2000.
- Itzhak Levin, KDD-99 Classifier Learning Contest LLSoft’s Results Overview, ACM
SIGKDD Explorations Newsletter, Volume 1, Issue 2, p. 67-75 January 2000.
- Vladimir Miheev, Alexei Vopilov, Ivan Shabalin, The MP13 Approach to the KDD'99
Classifier Learning Contest, SIGKDD Explorations Newsletter, Volume 1, Issue 2, p76-77 January 2000.
- Computer Security and Intrusion Detection, http://www.acm.org/crossroads/xrds11-
1/csid.html
- H. Gunes Kayacik, Nur Zincir-Heywood, Malcolm I. Heywood, Selecting Features
for Intrusion Detection: A Feature Relevance Analysis on KDD ’99 Benchmark, http://www.unb.ca/pstnet/pst2005/Shaughnessy%20Room/Oct13/GK_FeatRelevan ce.ppt#256,1,Selecting Features for Intrusion Detection: A Feature Relevance Analysis on KDD 99 Benchmark
- Results of the KDD Cup 1999 Classifier Learning Contest, http://www-
cse.ucsd.edu/users/elkan/clresults.html
- TextGarden – Text Mining Tools, http://kt.ijs.si/Dunja/textgarden/
- Support Vector Machine, http://kt.ijs.si/blazf/software.html