Practical Anomaly Detection based on Classifying Frequent Traffic - - PowerPoint PPT Presentation

practical anomaly detection based on classifying frequent
SMART_READER_LITE
LIVE PREVIEW

Practical Anomaly Detection based on Classifying Frequent Traffic - - PowerPoint PPT Presentation

Practical Anomaly Detection based on Classifying Frequent Traffic Patterns Ignasi Paredes-Oliva 1 Ismael Castell-Uroz 1 Pere Barlet-Ros 1 Xenofontas Dimitropoulos 2 Josep Sol-Pareta 1 1 UPC BarcelonaTech, Spain


slide-1
SLIDE 1

Practical Anomaly Detection based on Classifying Frequent Traffic Patterns

Ignasi Paredes-Oliva1 Ismael Castell-Uroz1 Pere Barlet-Ros1 Xenofontas Dimitropoulos2 Josep Solé-Pareta1

1UPC BarcelonaTech, Spain

{iparedes,icastell,pbarlet,pareta}@ac.upc.edu

2ETH Zürich, Switzerland

fontas@tik.ee.ethz.ch 15th IEEE Global Internet Symposium (GI) Orlando, FL, United States

March 30th, 2012

slide-2
SLIDE 2

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1

Introduction

2

Related Work

3

Our Proposal

4

Performance Evaluation

5

Conclusions

2 / 21

slide-3
SLIDE 3

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1

Introduction

2

Related Work

3

Our Proposal

4

Performance Evaluation

5

Conclusions

3 / 21

slide-4
SLIDE 4

Introduction Related Work Our Proposal Performance Evaluation Conclusions

The problem

Growth of cyber-attacks1 Anomaly detection systems not widely deployed

e.g., too many false positives, complex black boxes

Anomaly classification and root-cause analysis are still open issues

e.g., manual analysis → error-prone, complex, slow and expensive2

Our goal Simple system for automatic anomaly detection and classification High classification accuracy and low false positives Conceptually simple working scheme

1Kim-Kwang Raymond Choo, The cyber threat landscape: Challenges and future research directions, Computers & Security, 2011.

  • 2M. Molina et al., Anomaly Detection in Backbone Networks: Building a Security Service Upon an Innovative Tool. TNC 2010.

4 / 21

slide-5
SLIDE 5

Introduction Related Work Our Proposal Performance Evaluation Conclusions

The problem

Growth of cyber-attacks1 Anomaly detection systems not widely deployed

e.g., too many false positives, complex black boxes

Anomaly classification and root-cause analysis are still open issues

e.g., manual analysis → error-prone, complex, slow and expensive2

Our goal Simple system for automatic anomaly detection and classification High classification accuracy and low false positives Conceptually simple working scheme

1Kim-Kwang Raymond Choo, The cyber threat landscape: Challenges and future research directions, Computers & Security, 2011.

  • 2M. Molina et al., Anomaly Detection in Backbone Networks: Building a Security Service Upon an Innovative Tool. TNC 2010.

4 / 21

slide-6
SLIDE 6

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1

Introduction

2

Related Work

3

Our Proposal

4

Performance Evaluation

5

Conclusions

5 / 21

slide-7
SLIDE 7

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Related work and contributions

Many proposals on anomaly detection Anomaly classification marginally studied Contributions of this paper

Novel approach for automatic anomaly detection and classification based on classifying frequent traffic patterns Evaluated using data from two large networks High classification accuracy and low false positives ratio System deployed in the Catalan NREN

6 / 21

slide-8
SLIDE 8

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Related work and contributions

Many proposals on anomaly detection Anomaly classification marginally studied Contributions of this paper

Novel approach for automatic anomaly detection and classification based on classifying frequent traffic patterns Evaluated using data from two large networks High classification accuracy and low false positives ratio System deployed in the Catalan NREN

6 / 21

slide-9
SLIDE 9

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1

Introduction

2

Related Work

3

Our Proposal

4

Performance Evaluation

5

Conclusions

7 / 21

slide-10
SLIDE 10

Introduction Related Work Our Proposal Performance Evaluation Conclusions

System Overview

Two phases:

Offline: build model to classify anomalies Online: use model to classify incoming traffic

Freq. Item-Set Mining Feature Extraction Machine Learning Model Freq. Item-Set Mining Feature Extraction Classification

8 / 21

slide-11
SLIDE 11

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products that were frequently bought together and make appealing offers (e.g., beer and chips) What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features e.g., Port Scan: many flows with same sIP and dIP

9 / 21

slide-12
SLIDE 12

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products that were frequently bought together and make appealing offers (e.g., beer and chips) What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features e.g., Port Scan: many flows with same sIP and dIP

9 / 21

slide-13
SLIDE 13

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products that were frequently bought together and make appealing offers (e.g., beer and chips) What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features e.g., Port Scan: many flows with same sIP and dIP

9 / 21

slide-14
SLIDE 14

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Originally used in market basket analysis to find out products that were frequently bought together and make appealing offers (e.g., beer and chips) What is an item-set?

compact summarization of elements occurring together

Why is it useful for anomaly detection?

Many attacks involve high volume of flows with common features e.g., Port Scan: many flows with same sIP and dIP

9 / 21

slide-15
SLIDE 15

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Port Scan example sIP dIP sPort dPort 1st flow X.77.17.59 Y.88.243.209 41393 21209 2nd flow X.77.17.59 Y.88.243.209 41393 54766 3rd flow X.77.17.59 Y.88.243.209 41393 31448 4th flow X.77.17.59 Y.88.243.209 41393 58514 ... 2911th flow X.77.17.59 Y.88.243.209 41393 48732 sIP dIP sPort dPort item-set X.77.17.59 Y.88.243.209 41393 * Need further information per item-set in order to classify it

10 / 21

slide-16
SLIDE 16

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Port Scan example sIP dIP sPort dPort 1st flow X.77.17.59 Y.88.243.209 41393 21209 2nd flow X.77.17.59 Y.88.243.209 41393 54766 3rd flow X.77.17.59 Y.88.243.209 41393 31448 4th flow X.77.17.59 Y.88.243.209 41393 58514 ... 2911th flow X.77.17.59 Y.88.243.209 41393 48732 sIP dIP sPort dPort item-set X.77.17.59 Y.88.243.209 41393 * Need further information per item-set in order to classify it

10 / 21

slide-17
SLIDE 17

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Frequent Item-Set Mining

Port Scan example sIP dIP sPort dPort 1st flow X.77.17.59 Y.88.243.209 41393 21209 2nd flow X.77.17.59 Y.88.243.209 41393 54766 3rd flow X.77.17.59 Y.88.243.209 41393 31448 4th flow X.77.17.59 Y.88.243.209 41393 58514 ... 2911th flow X.77.17.59 Y.88.243.209 41393 48732 sIP dIP sPort dPort item-set X.77.17.59 Y.88.243.209 41393 * Need further information per item-set in order to classify it

10 / 21

slide-18
SLIDE 18

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Feature Extraction

Computed features for each frequent item-set Value Defined Undefined Defined Src IP/Dst IP True False Src/Dst Port Port Number NaN Protocol Protocol Number NaN URG/ACK/PSH/RST/SYN/FIN True False Bytes per Packet (bpp) #Bytes/#Packets Packet per Flow (ppf) #Packets/#Flows

11 / 21

slide-19
SLIDE 19

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the classifier (offline)

Goal: build model taking into account manually labeled frequent item-sets Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), Network Scans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDP Port Scans) Normal (legitimate traffic) Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to the C5.0 algorithm (machine learning) → output: classification model

12 / 21

slide-20
SLIDE 20

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the classifier (offline)

Goal: build model taking into account manually labeled frequent item-sets Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), Network Scans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDP Port Scans) Normal (legitimate traffic) Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to the C5.0 algorithm (machine learning) → output: classification model

12 / 21

slide-21
SLIDE 21

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the classifier (offline)

Goal: build model taking into account manually labeled frequent item-sets Output classes

Anomalous: DoS (DDoS, SYN/ACK/UDP/ICMP floods), Network Scans (ICMP/Other Network Scans), Port Scans (SYN/ACK/UDP Port Scans) Normal (legitimate traffic) Unknown (not normal and did not fit in any anomalous class)

Labeled item-sets + features + output classes are given to the C5.0 algorithm (machine learning) → output: classification model

12 / 21

slide-22
SLIDE 22

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Classifying an item-set (online)

Use model to classify each incoming item-set

bpp ≤ 29 bpp > 29 proto > 6 proto ≤ 6 ... sIP_defined = false ... sIP_defined = true ppf ≤ 1.04 ppf > 1.04 ... Port Scan DDoS Start

13 / 21

slide-23
SLIDE 23

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1

Introduction

2

Related Work

3

Our Proposal

4

Performance Evaluation

5

Conclusions

14 / 21

slide-24
SLIDE 24

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Datasets

1

GÉANT

European backbone NREN Connects 34 european NRENs, 12 non-european NRENs and 2 commercial providers Sampled NetFlow

2

Anella Científica

Catalan NREN Connects more than 80 research institutions NetFlow (unsampled) Our system is currently deployed in this scenario

15 / 21

slide-25
SLIDE 25

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Datasets

1

GÉANT

European backbone NREN Connects 34 european NRENs, 12 non-european NRENs and 2 commercial providers Sampled NetFlow

2

Anella Científica

Catalan NREN Connects more than 80 research institutions NetFlow (unsampled) Our system is currently deployed in this scenario

15 / 21

slide-26
SLIDE 26

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1

Run frequent item-set mining on GÉANT NetFlow data

2

Manually analyze and classify returned item-setsa

Anomalous Normal Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

slide-27
SLIDE 27

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1

Run frequent item-set mining on GÉANT NetFlow data

2

Manually analyze and classify returned item-setsa

Anomalous Normal Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

slide-28
SLIDE 28

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1

Run frequent item-set mining on GÉANT NetFlow data

2

Manually analyze and classify returned item-setsa

Anomalous Normal Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

slide-29
SLIDE 29

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Building the Ground Truth

1

Run frequent item-set mining on GÉANT NetFlow data

2

Manually analyze and classify returned item-setsa

Anomalous Normal Unknown

Ground Truth composed by 1249 labeled item-sets

16 / 21

slide-30
SLIDE 30

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in GÉANT

ACK FLOOD ACK PORT SCAN DDOS ICMP FLOOD ICMP SCAN NETWORK SCAN NORMAL SYN FLOOD SYN PORT SCAN UDP FLOOD UDP PORT SCAN UNKNOWN 20 40 60 80 100 Precision Recall

Overall accuracy: 95,7%

Percentage

Unbalanced model → overall performance is good (≈ 96%) but not for ACK Port Scans and ICMP Floods In balanced model (representativeness of classes above was increased) → great improvement: 98% accuracy

17 / 21

slide-31
SLIDE 31

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in GÉANT

ACK FLOOD ACK PORT SCAN DDOS ICMP FLOOD ICMP SCAN NETWORK SCAN NORMAL SYN FLOOD SYN PORT SCAN UDP FLOOD UDP PORT SCAN UNKNOWN 20 40 60 80 100 Precision Recall

Overall accuracy: 98%

Percentage

Unbalanced model → overall performance is good (≈ 96%) but not for ACK Port Scans and ICMP Floods In balanced model (representativeness of classes above was increased) → great improvement: 98% accuracy

17 / 21

slide-32
SLIDE 32

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in the Catalan NREN

ACK PORT SCAN DDOS ICMP SCAN NETWORK SCAN SYN FLOOD SYN PORT SCAN UNKNOWN 20 40 60 80 100

Overall Accuracy: 94,11%

Percentage

Decision tree from GÉANT data In 10 days, 18 false positives out of 310 anomalies Low precision for DDoS and ACK Port Scans → 80% of these FP were wrongly classified replies from Network Scans and SYN Floods After improving the system to take this into account: in 10 days, 4 false positives out of 310 anomalies

18 / 21

slide-33
SLIDE 33

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Results in the Catalan NREN

ACK PORT SCAN DDOS ICMP SCAN NETWORK SCAN SYN FLOOD SYN PORT SCAN UNKNOWN 20 40 60 80 100 Precission Recall

Overall Accuracy: 99,1%

Percentage

Decision tree from GÉANT data In 10 days, 18 false positives out of 310 anomalies Low precision for DDoS and ACK Port Scans → 80% of these FP were wrongly classified replies from Network Scans and SYN Floods After improving the system to take this into account: in 10 days, 4 false positives out of 310 anomalies

18 / 21

slide-34
SLIDE 34

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Outline

1

Introduction

2

Related Work

3

Our Proposal

4

Performance Evaluation

5

Conclusions

19 / 21

slide-35
SLIDE 35

Introduction Related Work Our Proposal Performance Evaluation Conclusions

Conclusions

Novel system to detect and classify anomalies in network traffic Conceptually simple approach → Easy to comprehend and reason about detected anomalies High classification accuracy (e.g., > 98%) Low number of false positives (≈ 1%) Classification model trained in GÉANT and successfully used in the Catalan NREN System deployed in the Catalan NREN

20 / 21

slide-36
SLIDE 36

Practical Anomaly Detection based on Classifying Frequent Traffic Patterns

Ignasi Paredes-Oliva1 Ismael Castell-Uroz1 Pere Barlet-Ros1 Xenofontas Dimitropoulos2 Josep Solé-Pareta1

1UPC BarcelonaTech, Spain

{iparedes,icastell,pbarlet,pareta}@ac.upc.edu

2ETH Zürich, Switzerland

fontas@tik.ee.ethz.ch 15th IEEE Global Internet Symposium (GI) Orlando, FL, United States We thank DANTE and CESCA for having provided us access to GÉANT and Anella Científica,

  • respectively. This work was partially funded by the Spanish Ministry of Education under contract

TEC2011-27474 and the Catalan Government under contract 2009SGR-1140