Discovering Packet Structure through Lightweight Hierarchical - - PowerPoint PPT Presentation

discovering packet structure through lightweight
SMART_READER_LITE
LIVE PREVIEW

Discovering Packet Structure through Lightweight Hierarchical - - PowerPoint PPT Presentation

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation Discovering Packet Structure through Lightweight Hierarchical Clustering Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P .C. van Oorschot Anil Somayaji


slide-1
SLIDE 1

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Discovering Packet Structure through Lightweight Hierarchical Clustering

Abdulrahman Hijazi Hajime Inoue Ashraf Matrawy P .C. van Oorschot Anil Somayaji

1 / 29

slide-2
SLIDE 2

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

The Problem

Network traffic is complex

Many users and uses Numerous applications and protocols Massive operating systems and connected devices

New applications/protocols appear quickly Tools are limited and require a priori knowledge

2 / 29

slide-3
SLIDE 3

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Example

Surge in port 80 traffic. Why? Flash crowd Web service config error Worm P2P

3 / 29

slide-4
SLIDE 4

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Two common approaches

What tools are available to understand traffic?

1

Header-based classifiers (IP addresses and port numbers): fail with misleading and ambiguous protocols

2

Protocol dissectors (e.g. Wireshark): fail with unknown protocols and are knowledge intensive

4 / 29

slide-5
SLIDE 5

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Our target

Devise a technique to complement existing tools in detecting/understanding novel traffic patterns. This technique: works at wire/router speeds doesn’t require signatures or built-in knowledge groups network traffic into semantically equivalent clusters automatically adapts to the ever changing network traffic

5 / 29

slide-6
SLIDE 6

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Approach

Unsupervised clustering algorithm that creates semantically equivalent classes without manually labeled training data. The algorithm tries to find patterns (within the whole packet) across protocols, and use them to cluster the network traffic. Packets are clustered, rather than classified, in order to capture the commonalities of novel, unknown network protocols and usage patterns.

6 / 29

slide-7
SLIDE 7

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

(p, n)−grams

We define (p, n)−grams:

n−gram: n consecutive bytes within a packet (p, n)−gram: n−gram at position p

In our experiments we found that network packets generally contain a significant number of high and moderate frequency (p, n)−grams that appear to follow a power-law analogous to Zipf’s law.

7 / 29

slide-8
SLIDE 8

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

(p, n)−gram Frequency Distribution

Most frequent 1000 (p, n)−grams in a 3-hour-long dataset

8 / 29

slide-9
SLIDE 9

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

(p, n)−gram Frequency Distribution

Most frequent 1000 (p, n)−grams in a 3-hour-long dataset

9 / 29

slide-10
SLIDE 10

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

(p, n)−gram Frequency Distribution

Most frequent 1000 (p, n)−grams in a 3-hour-long dataset

10 / 29

slide-11
SLIDE 11

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

(p, n)−gram Frequency Distribution

Most frequent 1000 (p, n)−grams in a 3-hour-long dataset

11 / 29

slide-12
SLIDE 12

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

(p, n)−gram Frequency Distribution

Most frequent 1000 (p, n)−grams in a 3-hour-long dataset

12 / 29

slide-13
SLIDE 13

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC

ADHIC (Approximate Divisive HIerarchical Clustering): produces a hierarchical decomposition of network traffic in the form of a cluster-identifying decision tree ADHIC starts with one cluster and then developes and shapes the decision tree over the time using splitting and deletion. We supplement ADHIC with port-based classifier in order to label the leaves in the leaf nodes.

13 / 29

slide-14
SLIDE 14

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC Decision Tree

How ADHIC Trees Start and Develop

N1 10565 (100.00%) 80250 (100.00%) 22

ADHIC Tree starts with one node accepting all traffic

14 / 29

slide-15
SLIDE 15

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC Decision Tree

How ADHIC Trees Start and Develop

N1 11109 (100.00%) 148757 (100.00%) 21

ADHIC Tree starts with one node accepting all traffic

15 / 29

slide-16
SLIDE 16

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC Decision Tree

How ADHIC Trees Start and Develop

N2 43, 0x00 0x00 11100 (100.00%) 11100 (5.89%) N3 5228 (47.10%) 5228 (2.77%) 7 N4 5872 (52.90%) 5872 (3.12%) 22

ADHIC choses a (p, n)-gram matching 40%-60% and splits <== matching | non-matching ==>

16 / 29

slide-17
SLIDE 17

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC Decision Tree

How ADHIC Trees Start and Develop

N2 43, 0x00 0x00 8713 (100.00%) 19813 (10.67%) N3 4013 (46.06%) 9241 (4.98%) 8 N4 4700 (53.94%) 10572 (5.69%) 22

ADHIC choses a (p, n)-gram matching 40%-60% and splits <== matching | non-matching ==>

17 / 29

slide-18
SLIDE 18

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC Decision Tree

How ADHIC Trees Start and Develop

N2 43, 0x00 0x00 7053 (100.00%) 153214 (100.00%) N5 51, 0x00 0x00 2724 (38.62%) 30980 (20.22%) N8 31, 0x75 0x15 4329 (61.38%) 55875 (36.47%) N6 1581 (22.42%) 14147 (9.23%) 4 N7 1143 (16.21%) 16833 (10.99%) 6 N9 1365 (19.35%) 32485 (21.20%) 9 N10 2964 (42.02%) 23390 (15.27%) 21

Further splitting...

18 / 29

slide-19
SLIDE 19

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC Decision Tree

How ADHIC Trees Start and Develop

N2 43, 0x00 0x00 8820 (100.00%) 155076 (100.00%) N5 51, 0x00 0x00 4093 (46.41%) 44132 (28.46%) N8 31, 0x75 0x15 4727 (53.59%) 71887 (46.36%) N6 2616 (29.66%) 22025 (14.20%) 6 N7 1477 (16.75%) 22107 (14.26%) 8 N9 1851 (20.99%) 40391 (26.05%) 7 N10 2876 (32.61%) 31496 (20.31%) 20

Further splitting...

19 / 29

slide-20
SLIDE 20

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

ADHIC

ADHIC (Approximate Divisive HIerarchical Clustering) Recursively subdivides traffic into binary classes until:

Volume is below threshold Group is too similar or too dissimilar

Produces binary decision-tree

Internal nodes match against (p, n)-grams Leaf nodes constitute terminal clusters

Path from root to leaf constitutes boolean-expression

20 / 29

slide-21
SLIDE 21

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Example Tree

N2 43, 0x00 0x00 N5 51, 0x00 0x00 N14 6, 0x00 0x01 N11 ARP N53 TCP (control) N29 16, 0x00 0x30 N218 64, 0x00 0x0f N245 0, 0x00 0x03 N335 82, 0x00 0x00 N219 EIGRP N246 TCP (control) N222 EIGRP N409 TCP (control) N336 DTP N32 16, 0x00 0x30 N56 TCP (control) N80 22, 0x01 0x11 N221 25, 0x29 0x86 N338 0, 0x00 0x03 N81 Ganglia N339 TCP (control) N412 IGMP N8 31, 0x75 0x15 N17 16, 0x00 0x28 N35 TCP (control) N62 9, 0x70 0xad N101 8, 0xd3 0x3b N254 HTTP + TCP (control) N179 CUPS N98 HTTP + TCP (control) N20 37, 0xc1 0x00 N41 HSRP N44 9, 0x70 0xad N158 46, 0x50 0x18 N653 POP N299 16, 0x05 0x8c N443 54, 0x01 0x01 N545 22, 0x2c 0x06 N546 IMAPS N608 HTTP N548 27, 0x75 0x1b N566 46, 0x80 0x10 N569 IMAPS + TCP (control) N683 TCP (control) N686 IPP + TCP (control) N116 7, 0xd0 0xd3 N170 16, 0x05 0x8c N458 HTTP N227 56, 0x00 0x00 N308 54, 0x01 0x01 N379 IMAPS + TCP (control) N228 NBSS + TCP (control) N452 TCP (control) N140 61, 0x00 0x0c N141 STP N173 50, 0x00 0x00 N497 ARP N203 55, 0x53 0x63 N204 CUPS N416 30, 0xff 0xff N417 Mix. N527 174, 0x00 0x00 N528 NBDGM

21 / 29

slide-22
SLIDE 22

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Case Study: Exploring P2P

Ports not needed to classify traffic New traffic segregated, then Goes back to original tree

22 / 29

slide-23
SLIDE 23

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Case Study: Exploring P2P

A decision tree just BEFORE P2P traffic

23 / 29

slide-24
SLIDE 24

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Case Study: Exploring P2P

The decision tree WITH P2P traffic dark blue = P2P UDP tracking packets green and yellow = P2P TCP Data packets running on port 80

24 / 29

slide-25
SLIDE 25

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Case Study: Exploring P2P

The decision tree WITH P2P traffic dark blue = P2P UDP tracking packets green and yellow = P2P TCP Data packets running on port 80

25 / 29

slide-26
SLIDE 26

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Case Study: Exploring P2P

The decision tree WITH P2P traffic dark blue = P2P UDP tracking packets green and yellow = P2P TCP Data packets running on port 80

26 / 29

slide-27
SLIDE 27

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Case Study: Exploring P2P

The decision tree just AFTER P2P traffic

27 / 29

slide-28
SLIDE 28

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

Evaluation

In experiments with labs and university level traffic, ADHIC: Finds semantically equivalent clusters Adapts to changing nature of traffic patterns Uses no knowledge of packet structure Does not rely on specific fields Runs in sub-linear time (Currently: better than 200 Mb/s) Limitation: ADHIC does not tell you what you find! Therefore, it is best complemented with other tools

28 / 29

slide-29
SLIDE 29

Understanding Network Traffic Approach Network Traffic Clustering: ADHIC Evaluation

NetADHICT

NetADHICT is Licensed under GNU General Public License, and is available from: http://www.ccsl.carleton.ca/software

Questions? See me for Demo...

29 / 29