Network Telescope Data Analysis: IBR Monitoring - - PowerPoint PPT Presentation

network telescope data analysis ibr monitoring
SMART_READER_LITE
LIVE PREVIEW

Network Telescope Data Analysis: IBR Monitoring - - PowerPoint PPT Presentation

Network Telescope Data Analysis: IBR Monitoring telescope/darknets/darkspace, 22 Mar 11 Nevil Brownlee IBR Monitoring CAIDA, 2011 p.1/17 Background, Earlier Work Moore, Shannon et al, CAIDA/UCSD, 2000..2006 telescope gives a whole-world


slide-1
SLIDE 1

Network Telescope Data Analysis: IBR Monitoring

telescope/darknets/darkspace, 22 Mar 11

Nevil Brownlee

IBR Monitoring CAIDA, 2011 – p.1/17

slide-2
SLIDE 2

Background, Earlier Work

Moore, Shannon et al, CAIDA/UCSD, 2000..2006

telescope gives a whole-world (1/250) view, mathematical model of that used it to track the rise of fast-spreading worms

Pang, Yegneswaran, Barford, Paxson and Peterson Characteristics of Internet Background Radiation, SIGCOMM 2004

passive analysis showing activity over time for and different ports & telescopes (darkspaces) active responders to investigate what sources were trying to do, e.g. Code Red, Agobot, Welchia, etc

Wustrow, Karir, Bailey, Jahanian and Huston Internet Background Radiation Revisited, IMC, 2010

evolution of IBR since 2004 steady increase in Mb/s each year address pollution (looking at newly-allocated /8 prefixes; traffic to /8 prefixes within 1.0.0.0/8

IBR Monitoring CAIDA, 2011 – p.2/17

slide-3
SLIDE 3

Telescope Monitoring – what do we want?

A set of web pages we can look at each day that tells us “something interesting is happening” Would like to classify the unsolicited traffic sources into groups somehow, so that we could look for

changes in levels of each groups new groups appearing

  • ld groups disappearing

This is the same problem as that of managing a network

Network Managers want a display that shows them “what’s happening in the network now” and the ability to ’drill down’ (by clicking on the display) to find more detail

IBR Monitoring CAIDA, 2011 – p.3/17

slide-4
SLIDE 4

Constraints, Approaches

Problems:

data volume: UCSD telescope trace files are big, about 4 ∼ 10 GiB every hour we we only do passive monitoring we need to do the monitoring in near-real time so as to see changes as they appear we’d like to save ‘interesting’ trace files for later fine-detail analysis

Many opinions about what’s ‘interesting!’

for long-term monitoring (per-hour plots) we need to decide what we want to plot e.g. (simple example) TCP/UDP/ICMP source/packet/byte volumes

Two approaches

(1) Use fixed groups (2) Automated grouping (clustering)

IBR Monitoring CAIDA, 2011 – p.4/17

slide-5
SLIDE 5

Approach 1: Pre-determined Groups

Nevil’s work Mar 2010 – Feb 2011 Define taxonomy of ‘interesting’ source groups

TCP: port probe, vertical & horizontal scans, other UDP: port probe, vertical & horizontal scans, other Backscatter: (TCP ACK+SYN & TCP ACK+RST, ICMP TTL exceeded & destination unreachable) Others: Conficker C, ICMP only, . . .

Analysis methodology

build table of sources, count number of TCP/UDP/ICMP/other packets, and ports used by TCP/UDP at end of trace, use those counts to classify sources into above groups. Write summary file for the trace summary has counts & distributions of various packet metrics for each group, e.g. source lifetime, number of packets sent by source, . . .

IBR Monitoring CAIDA, 2011 – p.5/17

slide-6
SLIDE 6

Approach 1, example plots (a) probe ports

03 Apr 04 Apr 05 Apr 06 Apr 07 Apr 08 Apr 09 Apr 10 Apr 100 1000 10000 50 100 150 200 250 300 350 400 kS/h TCP port number Time (UTC)

TCP

03 Apr 04 Apr 05 Apr 06 Apr 07 Apr 08 Apr 09 Apr 10 Apr 100 1000 10000 50 100 150 200 250 300 350 400 450 500 kS/h TCP port number Time (UTC)

UDP

Number of probe sources sending to top 100 destination ports each hour for the week 03-10 Apr 2010

  • nly two popular ports for TCP probe sources

UDP probe sources used a wide range of ephemeral high-numbered ports

IBR Monitoring CAIDA, 2011 – p.6/17

slide-7
SLIDE 7

Approach 1, example plots (b)

20 40 60 80 100 120 140 16 Jan 30 Jan 13 Feb 27 Feb 13 Mar 27 Mar 10 Apr Counts: Thousands of Conficker P2P Sources, Jan - Apr 2010 (UTC) kS

Conficker C (p2p) sources seen each hour, showing a steady decrease from early January

20 40 60 80 100 16 Jan 30 Jan 13 Feb 27 Feb 13 Mar 27 Mar 10 Apr (b) Stacked-bar Time Series: Source % by Group, Jan - Apr 2010 (UTC) % Conficker P2P Other UDP TCP and UDP TCP Unclassified

Percentage of source groups seen each hour: – 30 Jan 1.8 MS/h – UDP sources growth early Feb

IBR Monitoring CAIDA, 2011 – p.7/17

slide-8
SLIDE 8

Approach 2: Automated grouping of traffic sources

Classify into groups using a ‘volume’ metric (bytes/packets/flows) Split the groups into smaller groups using a ‘classifier’ metric Example analysis systems

aguri: volume = byte/s

classifier = source address / prefix length simple system, no GUI (produces lists of prefix hierarchy)

nethadict: volume = bytes

classifier = n-grams (p, n) p = byte position in pkt, n = value of byte(s) Automatically determines n-gram used to split a group, find some bytes common to 50% of group picks arbitrary n-grams

IBR Monitoring CAIDA, 2011 – p.8/17

slide-9
SLIDE 9

Clustering metrics

Volume metrics:

sources seen / s packets seen / s sources seen / s . . .

Classifier metrics:

source address / length (/ means ‘split on’) source port / port number → p% in group IP protocol (really only see TCP , UDP , ICMP) average packet length (not useful for TCP) packets/bytes (big ⇒ DOS attack, small ⇒ vulnerability probe) packet inter-arrival distribution (Nevil’s current project)

IBR Monitoring CAIDA, 2011 – p.9/17

slide-10
SLIDE 10

Comments on clustering

When we observe a group, we don’t know what application is generating its packets

to find that out we need to select out packets for sources in the group, and examine them so as to determine their protocol and (perhaps) generating application that’s hard to do automatically!

Groups found by automatic classifiers are not stable. If we use clustering techniques to make groups 0..n

n will vary over time a group with the same characteristics may change group numbers with each sample Such variability makes automatic grouping difficult to use for long-term trend monitoring

IBR Monitoring CAIDA, 2011 – p.10/17

slide-11
SLIDE 11

Approach 2: Clustering, using kmeans

Nevil’s work 7-18 Mar 2011 Look at packet interarrival time (IAT) distributions for each

  • source. Can we use IAT statistics to identify source

applications? Collect IAT distributions (180 bins) for every source in an

  • hour. Hour ending 1600 pm 8 March had 1.5 M sources.

Find metrics we can use to represent an IAT distribution

use log-scale bins, 0.012 to 600 s two metrics: median and skewness

Tried clustering using using Using Dan Pelleg’s kmeans program (Dan is at the Auton lab, CMU).

k-means clustering finds clusters in n −dimensional space, given that you know n. Dan has extended this idea so that the system determines how many clusters it can reliably find.

This idea simply did not work well for IBR IATs

IBR Monitoring CAIDA, 2011 – p.11/17

slide-12
SLIDE 12

IATs using pre-determined groups

Nevil’s work from 19 Mar 2011 (!) Simpler pragmatic approach:

make ‘postage-stamp’ sheets showing individual IAT distributions find metrics for the distributions, print them on the sheets look for recurring patterns, i.e. source groups; find metric ranges that could be used to determine each distribution’s group print new postage-stamp sheets, one for each group iterate as more groups become apparent

IAT metrics

bin-zero %: > 95% → DOS source mode IAT: 2.5..3.5s → Windows XP’s TCP retry skewness: → left, right or evenly balanced maximum IAT: high values → ‘stealth’ probe sources

IBR Monitoring CAIDA, 2011 – p.12/17

slide-13
SLIDE 13

Group 0: DOS sources

10 20 30 40 50 60 70 80 90 100 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 7 %

b0pc=99.85 mode=0.01 skew=-0.09 max=0.99

10 20 30 40 50 60 70 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 280 %

b0pc=33.33 mode=1.05 skew=0.00 max=1.05

5 10 15 20 25 30 35 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 168 %

b0pc=33.33 mode=500.49 skew=0.00 max=637.39

5 10 15 20 25 30 35 40 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 424 %

b0pc=40.00 mode=6.07 skew=40.00 max=179.11

IBR Monitoring CAIDA, 2011 – p.13/17

slide-14
SLIDE 14

Group 1: XP_even sources

2 4 6 8 10 12 14 16 18 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 3 %

b0pc=0.16 mode=3.12 skew=-7.44 max=60.34

2 4 6 8 10 12 14 16 18 20 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 140 %

b0pc=0.62 mode=2.94 skew=3.50 max=68.09

2 4 6 8 10 12 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 326 %

b0pc=0.52 mode=3.12 skew=-6.64 max=76.84

5 10 15 20 25 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 188 %

b0pc=4.23 mode=3.12 skew=14.08 max=242.32

IBR Monitoring CAIDA, 2011 – p.14/17

slide-15
SLIDE 15

Group 2 XP_left sources

2 4 6 8 10 12 14 16 18 20 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 2 %

b0pc=0.12 mode=3.32 skew=-39.12 max=110.44

2 4 6 8 10 12 14 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 38 %

b0pc=0.51 mode=3.12 skew=-47.18 max=21.59

0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 17 %

b0pc=2.05 mode=3.12 skew=-79.76 max=22.94

2 4 6 8 10 12 14 16 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 47 %

b0pc=12.00 mode=2.94 skew=-44.00 max=9.84

IBR Monitoring CAIDA, 2011 – p.15/17

slide-16
SLIDE 16

Group 3: Other sources

1 2 3 4 5 6 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 57 %

b0pc=0.64 mode=2.60 skew=29.87 max=32.97

0.5 1 1.5 2 2.5 3 3.5 4 4.5 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 145 %

b0pc=2.66 mode=0.51 skew=16.16 max=10.45

5 10 15 20 25 30 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 144 %

b0pc=0.11 mode=2.31 skew=46.22 max=29.21

5 10 15 20 25 30 35 40 45 50 0.01 0.03 0.1 0.3 1 3 10 30 100 300 packet inter-arrival time (s) SAN, 1600 Tue 8 Mar 2011 (UTC): 09.10000-0-50, distributions 244 %

b0pc=0.11 mode=3.12 skew=49.89 max=8.21

IBR Monitoring CAIDA, 2011 – p.16/17

slide-17
SLIDE 17

Conclusion, Future Work

Work on understanding the IAT distributions Look at their counts

how many are there in each group over an hour? how does time of first packet within the hour influence the distribution?

How many IAT groups do we think are common? How do the IAT groups relate to the “Nevil’s taxonomy” groups?

IBR Monitoring CAIDA, 2011 – p.17/17