Towards Reliable Traffic Classification Using Visual Motifs Wilson - - PowerPoint PPT Presentation

towards reliable traffic classification using visual
SMART_READER_LITE
LIVE PREVIEW

Towards Reliable Traffic Classification Using Visual Motifs Wilson - - PowerPoint PPT Presentation

Background Visual Motifs Traffic Classification Evaluation Towards Reliable Traffic Classification Using Visual Motifs Wilson Lian 1 John McHugh 1 , 2 Fabian Monrose 1 1 University of North Carolina at Chapel Hill 2 RedJack, LLC FloCon 2010


slide-1
SLIDE 1

Background Visual Motifs Traffic Classification Evaluation

Towards Reliable Traffic Classification Using Visual Motifs

Wilson Lian1 John McHugh1,2 Fabian Monrose1

1University of North Carolina at Chapel Hill 2RedJack, LLC

FloCon 2010

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-2
SLIDE 2

Background Visual Motifs Traffic Classification Evaluation

Overview

Background Visual Motifs Traffic Classification Evaluation

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-3
SLIDE 3

Background Visual Motifs Traffic Classification Evaluation

Motivation

Internet Network Administrator

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-4
SLIDE 4

Background Visual Motifs Traffic Classification Evaluation

Motivation

GET /index.ht... d3b07384d113e... d41d8cd98f00b... 7d8ad5cb9c940... MAIL FROM: foo@... d41d8cd98f00b...

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-5
SLIDE 5

Background Visual Motifs Traffic Classification Evaluation

Goals

Port 22 36fd6d8c3f5af4... Port 22 fc2394c1a922... Port 22 f4d6d8c3f5a36... Port 22 222394c1a9fc... Port 22 5ad6d8c3ff436... Port 22 a92394c122fc... Port 25 MAIL FROM: foo@... Port 25 f98698466c3ef... Port 25 ef8698466c3f9... Port 25 DATA\r\nSubject: fo... Port 25 f98698466c3ef... Port 80 b314caafaa3e... Port 80 b314caafaa3e... Port 80 POST /login.ph... Port 80 b314caafaa3e... Port 80 POST /AuthSv... Port 80 GET /index.ht... Port 1214 113edec49eaa... Port 1214 006f7b3db8f4f... Port 1214 aa3edec49e11... Port 1214 4f006f7b3db8f0... Port 1214 9e3edec4aa11... Port 1214 b8006f7b3d4ff0...

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-6
SLIDE 6

Background Visual Motifs Traffic Classification Evaluation

Assumptions

Reliable transport via TCP Stream Cipher

No access to payload Length preservation

Negligible packet loss & retransmission

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-7
SLIDE 7

Background Visual Motifs Traffic Classification Evaluation

Related Work

Scatter (and other) Plots for Visualizing User Profiling Data and Network Traffic, Goldring 2004. Using Visual Motifs to Classify Encrypted Traffic, Wright et

  • al. 2006

Intelligent Classification and Visualization of Network Scans Muelder et al. 2008. FloVis: A Network Security Visualization Framework, Taylor 2009.

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-8
SLIDE 8

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-9
SLIDE 9

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-10
SLIDE 10

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-11
SLIDE 11

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-12
SLIDE 12

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-13
SLIDE 13

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-14
SLIDE 14

Background Visual Motifs Traffic Classification Evaluation

Timeline Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-15
SLIDE 15

Background Visual Motifs Traffic Classification Evaluation

Unigram Heatmaps

Image credit: Wright et al. 2006 Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-16
SLIDE 16

Background Visual Motifs Traffic Classification Evaluation

Bigram Heatmaps

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-17
SLIDE 17

Background Visual Motifs Traffic Classification Evaluation

Heatmap Construction

Time SYN 48 bytes SYN-ACK 48 bytes ACK 40 bytes HTTP Request 891 bytes 1500 bytes 40 bytes 40 bytes 1500 bytes 270 bytes 40 bytes Client Server

48

  • 48

40 891

  • 40
  • 270
  • 1500

40

  • 1500

40

(48, -48) (-48, 40) (40, 891) (891, -40) (-40, -270) (-270, -1500) (-1500, 40) (40, -1500) (-1500, 40)

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-18
SLIDE 18

Background Visual Motifs Traffic Classification Evaluation

Heatmap Construction

(48, -48) (-48, 40) (40, 891) (891, -40) (-40, -270) (-270, -1500) (-1500, 40) (40, -1500) (-1500, 40) (40, 891) (48, -48) (-48, 40) (40, 891) (891, -40) (-40, -270) (-270, -1500) (-1500, 40) (40, -1500) (-1500, 40)

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-19
SLIDE 19

Background Visual Motifs Traffic Classification Evaluation

Heatmap Construction

(48, -48) (-48, 40) (40, 891) (891, -40) (-40, -270) (-270, -1500) (-1500, 40) (40, -1500) (-1500, 40) (40, 891) (48, -48) (-48, 40) (40, 891) (891, -40) (-40, -270) (-270, -1500) (-1500, 40) (40, -1500) (-1500, 40) 3/9 = 33.3% 2/9 = 22.2% 1/9 = 11.1% 3/9 = 33.3%

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-20
SLIDE 20

Background Visual Motifs Traffic Classification Evaluation

Heatmap Construction

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-21
SLIDE 21

Background Visual Motifs Traffic Classification Evaluation

Bigram Heatmaps

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-22
SLIDE 22

Background Visual Motifs Traffic Classification Evaluation

Modeling Protocol Behavior

(40, 891) (48, -48) (-48, 40) (40, 891) (891, -40) (-40, -270) (-270, -1500) (-1500, 40) (40, -1500) (-1500, 40) 3/9 = 33.3% 2/9 = 22.2% 1/9 = 11.1% 3/9 = 33%

1 3 4 2

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-23
SLIDE 23

Background Visual Motifs Traffic Classification Evaluation

Modeling Protocol Behavior

1 2 3 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bin Probability .333 .111 .222 .333

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-24
SLIDE 24

Background Visual Motifs Traffic Classification Evaluation

Comparing Protocol Models

1 2 3 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bin Probability .333 .100 .111 .222 .333 .700 .150 .050

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-25
SLIDE 25

Background Visual Motifs Traffic Classification Evaluation

Comparing Protocol Models

Atotal =

n

  • k=1

Ak Btotal =

n

  • k=1

Bk ScoreA↔B =

n

  • i=1
  • Ai

Atotal − Bi Btotal

  • =

1 Atotal · Btotal

n

  • i=1

|Ai · Btotal − Bi · Atotal|

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-26
SLIDE 26

Background Visual Motifs Traffic Classification Evaluation

Comparing Protocol Models

5 1 2 3 4 Probability .333 .100 .111 .222 .333 .700 .150 .050 Score = .233+.589+.072+.283 = 1.177 |.333-.100| = .233 |.111-.700| = .589 |.333-.050| = .283 |.222-.150| = .072 Difgerence

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-27
SLIDE 27

Background Visual Motifs Traffic Classification Evaluation

Comparing Protocol Models

1 2 3 4 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Bin Probability .333 .400 .111 .222 .333 .150 .150 .300

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-28
SLIDE 28

Background Visual Motifs Traffic Classification Evaluation

Comparing Protocol Models

5 1 2 3 4

  • 0.2
  • 0.1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 Probability .333 .400 .111 .222 .333 .150 .150 .300 Score = .067+.039+.072+.033 = .211 |.333-.400| = .067 |.111-.150| = .039 |.333-.300| = .033 |.222-.150| = .072 Difgerence

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-29
SLIDE 29

Background Visual Motifs Traffic Classification Evaluation

Classifying Samples: Easy as 1-2-3

1 Create training models for desired protocols 2 Build distribution for sample network trace 3 Find training model with lowest difference score

ScoreA↔B =

n

  • i=1
  • Ai

Atotal − Bi Btotal

  • =

1 Atotal · Btotal

n

  • i=1

|Ai · Btotal − Bi · Atotal|

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-30
SLIDE 30

Background Visual Motifs Traffic Classification Evaluation

Evaluation

How much traffic must be collected for:

Training Testing

Precision? true positives true positives + false positives Recall? true positives true positives + false negatives

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-31
SLIDE 31

Background Visual Motifs Traffic Classification Evaluation

Data

CRAWDAD Dataset Weekdays: January 19, 2004 – February 6, 2004 Ports with sufficient traffic

≥ 1M packets 0.3% of ports → 95.21% of packets

Keep top 10 ports by number of sessions observed No ground truth Total Packets 1.3 Billion Traffic Volume 707 GB Observed Ports 64,214 Sessions 5.2 Million Port 80 Sessions 1.7 Million

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-32
SLIDE 32

Background Visual Motifs Traffic Classification Evaluation

Methodology

Trial :=

1

Randomly sample some percentage of available data for each port and train classifier

2

Randomly sample some number of the remaining data points for each port and create testing samples

3

Classify testing samples

50 Trials

80 110 445 Training Data Testing Data

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-33
SLIDE 33

Background Visual Motifs Traffic Classification Evaluation

Training Size Selection

16 2 4 6 8 10 12 14 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 Training Sample Size (%) Average Recall .952 .884 Average Recall for Varying Training Size 6.8% improvement

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-34
SLIDE 34

Background Visual Motifs Traffic Classification Evaluation

Training Size Selection

16 2 4 6 8 10 12 14 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 Training Sample Size (%) Average Precision .887 .953 Average Precision for Varying Training Size 6.6% improvement

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-35
SLIDE 35

Background Visual Motifs Traffic Classification Evaluation

Testing Size Selection

10,000 20,000 30,000 40,000 50,000 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 Testing Sample Size (Data Points) Average Recall .898 .926 .954 Average Recall for Varying Testing Size 5.6% improvement

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-36
SLIDE 36

Background Visual Motifs Traffic Classification Evaluation

Testing Size Selection

10,000 20,000 30,000 40,000 50,000 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 Testing Sample Size (Data Points) Average Precision .961 .914 4.7% improvement Average Precision for Varying Testing Size

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-37
SLIDE 37

Background Visual Motifs Traffic Classification Evaluation

Results

50 Trials 15% Training Set Size 50,000 Data Points Testing Set Size 96.5% Precision 96.0% Recall

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-38
SLIDE 38

Background Visual Motifs Traffic Classification Evaluation

Classification Confidence Threshold

Goal: Eliminate close calls Require 1st place candidate to lead 2nd place by certain amount to make decision Standard deviation of scores

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-39
SLIDE 39

Background Visual Motifs Traffic Classification Evaluation

Methodology v2.0

Randomly sample some percentage of available data for each port and train classifier Randomly sample some number of the remaining data points for each port and create testing samples Attempt to classify testing samples

If all testing samples reach threshold, done. If any testing sample fails, rebuild testing samples and try again.

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-40
SLIDE 40

Background Visual Motifs Traffic Classification Evaluation

Classification Confidence

50 Trials 5% Training Set Size 35,000 Data Points Testing Set Size 1.0 Lead Threshold 96.9% Precision 96.6% Recall

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-41
SLIDE 41

Background Visual Motifs Traffic Classification Evaluation

Classification Confidence

0.25 0.5 0.75 1 1.25 500,000 1,000,000 1,500,000 2,000,000 2,500,000 3,000,000 3,500,000 4,000,000 4,500,000 5,000,000 5,500,000 Lead Threshold Testing Sessions Sampled 1.0 M 1.0 M 1.2 M 2.4 M Testing Session Sampled For 50 Trials 35,000 Data Point Testing Samples 5.4 M

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-42
SLIDE 42

Background Visual Motifs Traffic Classification Evaluation

Ground Truth Testing

MIT Lincoln Labs DARPA Data 50 trials, 5% training sample size, 35,000 data point testing sample size, 1.25 lead threshold Precision: 98.3% Recall: 98.0%

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-43
SLIDE 43

Background Visual Motifs Traffic Classification Evaluation

Results

50 Trials 5% Training Set Size 35,000 Data Points Testing Size 1.25 Lead Threshold Ground Truth

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-44
SLIDE 44

Background Visual Motifs Traffic Classification Evaluation

Evasion

One might attempt to thwart our technique by padding all packets to MTU. Reduces problem to 4-quadrant problem. Can still make decisions based on relative prevalence of each quadrant.

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-45
SLIDE 45

Background Visual Motifs Traffic Classification Evaluation

Current/Future Work

Packet loss/re-transmission may cause unpredictable results On-line classification Training and testing from separate datasets UDP Subcategorization

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-46
SLIDE 46

Background Visual Motifs Traffic Classification Evaluation

Conclusion

Modeling protocol behavior using only packet size, direction, and order Resistant to encryption and padding Average precision and recall > 97% Quick and reliable traffic inspection Useful for pre-screening traffic for deeper analysis

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs

slide-47
SLIDE 47

Background Visual Motifs Traffic Classification Evaluation

Questions?

Thanks for listening. Q & A wwlian@gmail.com

Wilson Lian, John McHugh, Fabian Monrose Towards Reliable Traffic Classification UsingVisual Motifs