Mining Network Traffic Data Ljiljana Trajkovi ljilja@cs.sfu.ca - - PowerPoint PPT Presentation
Mining Network Traffic Data Ljiljana Trajkovi ljilja@cs.sfu.ca - - PowerPoint PPT Presentation
Mining Network Traffic Data Ljiljana Trajkovi ljilja@cs.sfu.ca Communication Networks Laboratory http://www.ensc.sfu.ca/cnl School of Engineering Science Simon Fraser University, Vancouver, British Columbia Canada Roadmap Introduction
July 19-20, 2007 IWCSN 2007, Guilin, China 2
Roadmap
Introduction Traffic data and analysis tools:
data collection, statistical analysis, clustering tools,
prediction analysis
Case studies:
satellite network: ChinaSat packet data networks: Internet
public safety wireless network: E-Comm
Conclusions and references
July 19-20, 2007 IWCSN 2007, Guilin, China 3
M.Sc. and M.Eng. students at SFU:
- ChinaSat data analysis:
Qing (Kenny) Shao Savio Lau
- E-Comm data analysis:
Duncan Sharp Hao (Leo) Chen Bozidar Vujičić Nikola Cackov Svetlana Vujičić Nenad Lasković
Graduate students
- Internet data analysis:
Hao (Johnson) Chen
July 19-20, 2007 IWCSN 2007, Guilin, China 4
Roadmap
Introduction Traffic data and analysis tools:
data collection, statistical analysis, clustering tools,
prediction analysis
Case studies:
satellite network: ChinaSat packet data networks: Internet public safety wireless network: E-Comm
Conclusions and references
July 19-20, 2007 IWCSN 2007, Guilin, China 5
Traffic measurements in operational networks help:
understand traffic characteristics in deployed
networks
develop traffic models evaluate performance of protocols and applications
Analysis of traffic:
provides information about the user behavior
patterns
enables network operators to understand the
behavior of network users
Traffic prediction: important to assess future network
capacity requirements and to plan future network developments
Network traffic measurements
July 19-20, 2007 IWCSN 2007, Guilin, China 6
Self-similarity
Self-similarity implies a ‘‘fractal-like’’ behavior:
data on various time scales have similar patterns
A wide-sense stationary process X(n) is called (exactly
second order) self-similar if its autocorrelation function satisfies:
r(m)(k) = r(k), k ≥ 0, m = 1, 2, …, n,
where m is the level of aggregation
Implications:
no natural length of bursts bursts exist across many time scales traffic does not become ‘‘smoother” when
aggregated (unlike Poisson traffic)
July 19-20, 2007 IWCSN 2007, Guilin, China 7
Properties:
slowly decaying variance long-range dependence Hurst parameter (H)
Processes with only short-range dependence (Poisson):
H = 0.5
Self-similar processes: 0.5 < H < 1.0 As the traffic volume increases, the traffic becomes
more bursty, more self-similar, and the Hurst parameter increases
Self-similar processes
July 19-20, 2007 IWCSN 2007, Guilin, China 8
Long-range dependence: properties
High variability:
when the sample size increases, variance of the
sample mean decays more slowly than expected
Burstiness over a range of timescales:
long runs of large values followed by long runs of
small values, repeated in aperiodic patterns
fGn trace
July 19-20, 2007 IWCSN 2007, Guilin, China 9
Estimation of H
Various estimators:
- variance-time plots
- R/S plots
- periodograms
- wavelets
Their performance often depends on the characteristics of the data trace under analysis 2 / 1 slope H + =
July 19-20, 2007 IWCSN 2007, Guilin, China 10
Clustering analysis
Clustering analysis groups or segments a collection of
- bjects into subsets or clusters
Objects within a cluster are more similar to each other
than objects in distinct clusters
An object can be described by a set of measurements
- r by its relations to other objects
Network users are classified into clusters, according
to the similarity of their behavior patterns
July 19-20, 2007 IWCSN 2007, Guilin, China 11
Clustering analysis
- Groups collection of objects into subsets (clusters):
resulting intra-cluster similarity is high while inter-
cluster similarity is low
- The inter-cluster distance reflects dissimilarity between
clusters:
Euclidean distance between two cluster centroids (mean
value of objects in a cluster, viewed as cluster’s center
- f gravity)
- The intra-cluster distance expresses coherent similarity of
data in the same cluster:
average distance of objects from their cluster centroids
- Better clustering:
large inter-cluster and small intra-cluster distances
July 19-20, 2007 IWCSN 2007, Guilin, China 12
Clustering quality
Overall clustering quality: defined as difference
between minimum inter-cluster and maximum intra- cluster distances
- larger indicator implies better overall clustering
quality
Silhouette coefficient (x):
(b(x) - a(x)) / max {a(x), b(x)} a(x) and b(x) are average distances between data point x and other data points in clusters A and B, respectively
independent of number of clusters K
July 19-20, 2007 IWCSN 2007, Guilin, China 13
Clustering algorithms
Two approaches:
partitioning clustering (k-means) hierarchical clustering
Clustering tools:
AutoClass tool k-means algorithm
- P. Cheeseman and J. Stutz, “Bayesian classification (AutoClass): theory
and results,” in Advances in Knowledge Discovery and Data Mining,
- U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds.,
AAAI Press/MIT Press, 1996.
- L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to
Cluster Analysis. New York: John Wiley & Sons, 1990.
July 19-20, 2007 IWCSN 2007, Guilin, China 14
Clustering algorithms: k-means
The k-means algorithm is commonly used for data
clustering
The algorithm is well-known for its simplicity and
efficiency
Based on the input parameter k, it partitions a set of n
- bjects into k clusters so that the resulting intra-
cluster similarity is high and the inter-cluster similarity is low
Similarity of clusters is measured with respect to the
mean value of the objects in a cluster (viewed as the cluster's center of gravity)
July 19-20, 2007 IWCSN 2007, Guilin, China 15
k-means: partitioning clustering
Constructs k partitions of the data from n objects,
where k ≤ n
Two constraints:
each cluster must contain at least one object each object must belong to exactly one group
Requires exhaustive enumeration of all possible
combinations to find the optimal cluster solution
July 19-20, 2007 IWCSN 2007, Guilin, China 16
k-means clustering
- Generates k clusters from n objects
- Requires two inputs:
k: number of desired partitions n objects
- Uses random placement of initial clusters
- Determines clustering results through an iteration technique
to relocate objects to the most similar cluster:
similarity is defined as the distance between objects
- bjects that are closer to each other are more similar
- Computational complexity of O(nkt), where t is the maximum
number of iterations
July 19-20, 2007 IWCSN 2007, Guilin, China 17
k-means clustering: algorithm
1. Randomly select k objects to be the center of k clusters
- 2. Assign each remaining object to the cluster to which
it is the most similar
- 3. Re-calculate the cluster mean after all objects are
(re)assigned
- 4. Re-evaluate all objects and place them in the cluster
to which they are the most similar
- 5. Repeat Steps 3 and 4 until no changes have been made
(full convergence) or the maximum number of iterations is reached (partial convergence)
July 19-20, 2007 IWCSN 2007, Guilin, China 18
Finding number of clusters
The number of clusters k is not known a priori k-means algorithm is repeated for different k values Number of clusters is found by comparing average SC
value for various values of k:
average SC is calculated for all objects the natural number of clusters k is found at the
local maxima
SC: silhouette coefficient
July 19-20, 2007 IWCSN 2007, Guilin, China 19
Hierarchical clustering
- Objects are grouped into a tree of clusters (dendrogram)
- Two approaches are employed: agglomerative and divisive
- Agglomerative approach (bottom-up):
- bjects begin in its own cluster
successive steps merge objects close to each other until
all objects belong to one cluster or reach termination condition
- Clusters are merged (or split) based on distance measure
- Four distance measures commonly employed: minimum,
maximum, mean, and average
July 19-20, 2007 IWCSN 2007, Guilin, China 20
Hierarchical clustering: algorithm
1. For n objects, a similarity matrix of n x n is
- generated. Each value records the distance between
the two objects or (the number of identical values if a series of values is used)
- 2. Objects are assigned to clusters from 1 to n
- 3. Each iteration merges two clusters that are closest to
each other (minimum similarity value)
- 4. Repeat steps 2 and 3 until all objects are merged into
a single cluster or until termination condition is reached.
- 5. Groups can be found by selecting k or selecting a
maximum merge distance
July 19-20, 2007 IWCSN 2007, Guilin, China 21
Hierarchical clustering
Visualized by dendrograms Determined by two choices:
desired number of clusters k selected cutoff based on inconsistency coefficients:
inconsistency coefficient is the difference
between the height of a dendrogram link and the average height of links at the same level
links connecting two distinct clusters have higher
inconsistency coefficient
July 19-20, 2007 IWCSN 2007, Guilin, China 22
Dendrogram example
July 19-20, 2007 IWCSN 2007, Guilin, China 23
Dendrogram example
July 19-20, 2007 IWCSN 2007, Guilin, China 24
Traffic prediction: ARIMA model
Auto-Regressive Integrated Moving Average (ARIMA)
model:
general model for forecasting time series past values: AutoRegressive (AR) structure past random fluctuant effect: Moving Average (MA)
process
ARIMA model explicitly includes differencing ARIMA (p, d, q):
autoregressive parameter: p number of differencing passes: d moving average parameter: q
July 19-20, 2007 IWCSN 2007, Guilin, China 25
Traffic prediction: SARIMA model
Seasonal ARIMA is a variation of the ARIMA model Seasonal ARIMA (SARIMA) model:
captures seasonal pattern
SARIMA additional model parameters:
seasonal period parameter: S seasonal autoregressive parameter: P number of seasonal differencing passes: D seasonal moving average parameter: Q
( ) ( )S
Q D P q d p , , , , ×
July 19-20, 2007 IWCSN 2007, Guilin, China 26
SARIMA models: selection criteria
Order (p,d,q) selected based on:
time series plot of traffic data autocorrelation and partial autocorrelation
functions
Validity of parameter selection:
Akaike’s information criterion:
- AIC
corrected AICc
Bayesian information criterion BIC
July 19-20, 2007 IWCSN 2007, Guilin, China 27
Roadmap
Introduction Traffic data and analysis tools:
data collection, statistical analysis, clustering tools,
prediction analysis
Case studies:
satellite network: ChinaSat packet data networks: Internet public safety wireless network: E-Comm
Conclusions and references
July 19-20, 2007 IWCSN 2007, Guilin, China 28
ChinaSat data: analysis
Analysis of network traffic:
characteristics of TCP connections network traffic patterns statistical and cluster analysis of traffic anomaly detection:
statistical methods wavelets principle component analysis
TCP: transport control protocol
July 19-20, 2007 IWCSN 2007, Guilin, China 29
Network and traffic data
ChinaSat: network architecture and TCP Analysis of billing records:
aggregated traffic user behavior
Analysis of tcpdump traces:
general characteristics TCP options and operating system (OS)
fingerprinting
network anomalies
July 19-20, 2007 IWCSN 2007, Guilin, China 30
DirecPC system diagram
July 19-20, 2007 IWCSN 2007, Guilin, China 31
Characteristics of satellite links
Large coverage area High bandwidth Long propagation delay Large bandwidth-delay product High bit error rates:
10-6 without error correction 10-3 or 10-2 due to extreme weather and
interference
Path asymmetry
July 19-20, 2007 IWCSN 2007, Guilin, China 32
Characteristics of satellite links
- ChinaSat hybrid satellite network
Employs geosynchrous satellites deployed by Hughes
Network Systems Inc.
Provides data and television services:
DirecPC (Classic): unidirectional satellite data service DirecTV: satellite television service DirecWay (Hughnet): new bi-directional satellite data
service that replaces DirecPC
DirecPC transmission rates:
400 kb/s from satellite to user 33.6 kb/s from user to network operations center
(NOC) using dial-up
Improves performance using TCP splitting with spoofing
July 19-20, 2007 IWCSN 2007, Guilin, China 33
ChinaSat data: analysis
ChinaSat traffic is self-similar and non-stationary Hurst parameter differs depending on traffic load Modeling of TCP connections:
inter-arrival time is best modeled by the Weibull
distribution
number of downloaded bytes is best modeled by the
lognormal distribution
The distribution of visited websites is best modeled by
the discrete Gaussian exponential (DGX) distribution
July 19-20, 2007 IWCSN 2007, Guilin, China 34
ChinaSat data: analysis
Traffic prediction:
autoregressive integrative moving average (ARIMA)
was successfully used to predict uploaded traffic (but not downloaded traffic)
wavelet + autoregressive model outperforms the
ARIMA model
- Q. Shao and Lj. Trajkovic, “Measurement and analysis of traffic in a hybrid
satellite-terrestrial network,” Proc. SPECTS 2004, San Jose, CA, July 2004,
- pp. 329–336.
July 19-20, 2007 IWCSN 2007, Guilin, China 35
Analysis of collected data
Analysis of patterns and statistical properties of two
sets of data from the ChinaSat DirecPC network:
billing records tcpdump traces
Billing records:
daily and weekly traffic patterns user classification:
single and multi-variable k-means clustering time series clustering using hierarchical
clustering and empirical approach
July 19-20, 2007 IWCSN 2007, Guilin, China 36
Analysis of collected data
Analysis of tcpdump trace
tcpdump trace:
protocols and applications TCP options
- perating system fingerprinting
network anomalies
C program pcapread that process tcpdump files
without using packet capture library libpcap
July 19-20, 2007 IWCSN 2007, Guilin, China 37
Network anomalies
Scans and worms:
packets are sent to probe network hosts used to discover and exploit resources
Denial of service:
large number of packets is directed to a single
destination
makes a host incapable of handling incoming
connections or exhausts available bandwidth along paths to the destination
July 19-20, 2007 IWCSN 2007, Guilin, China 38
Network anomalies
Flash crowd:
high volume of traffic is destined to a single
destination
caused by breaking news, availability of new
software
Traffic shift:
redirection of traffic from one set of paths to
another
caused by route changes, link unavailability, or
network congestion
July 19-20, 2007 IWCSN 2007, Guilin, China 39
Network anomalies
- Alpha traffic:
unusually high volume of traffic between two endpoints caused by file transfers or bandwidth measurements
- Traffic volume anomalies:
significant deviation of traffic volume from usual daily or
weekly patterns
classified as:
- utages: caused by unavailable links, crasher servers,
- r routing problems
short term increases in demand: caused by short term
events such as holiday traffic
involve multiple sources and destinations
July 19-20, 2007 IWCSN 2007, Guilin, China 40
Billing records
Records were collected during the continuous period
from 23:00 on Oct. 31, 2002 to 11:00 on Jan. 10, 2003
Each file contains the hourly traffic summary for each
user
Fields of interests:
SiteID (user identification) Start (record start time) CTxByt (number of bytes downloaded by a user) CRxByt (number of bytes uploaded by a user) CTxPkt (number of packets downloaded by a user) CRxPkt (number of packets uploaded by a user)
July 19-20, 2007 IWCSN 2007, Guilin, China 41
Billing records: characteristics
186 unique SiteIDs Daily and weekly cycles:
lower traffic volume on weekends daily cycle starts at 7 AM, rises to three daily
maxima at 11 AM, 3 PM, and 7 PM, then decrease monotonically until 7 AM
Highest daily traffic recorded on Dec. 24, 2002 Outage occurred on Jan. 3, 2003
July 19-20, 2007 IWCSN 2007, Guilin, China 42
Aggregated hourly traffic
July 19-20, 2007 IWCSN 2007, Guilin, China 43
Aggregated daily traffic
July 19-20, 2007 IWCSN 2007, Guilin, China 44
Daily diurnal traffic: average downloaded bytes
July 19-20, 2007 IWCSN 2007, Guilin, China 45
Weekly traffic: average downloaded bytes
July 19-20, 2007 IWCSN 2007, Guilin, China 46
Ranking of user traffic
Users are ranked according to the traffic volume The top user downloaded 78.8 GB, uploaded 11.9 GB,
and downloaded/uploaded ~205 million packets
Most users download/uploaded little traffic Cumulative distribution functions (CDFs) are
constructed from the ranks:
top user accounts for 11% of downloaded bytes top 25 users contributed 93.3% of downloaded
bytes
top 37 users contributed 99% of total traffic
(packets and bytes)
July 19-20, 2007 IWCSN 2007, Guilin, China 47
Cumulative distribution functions
July 19-20, 2007 IWCSN 2007, Guilin, China 48
k-means: clustering results
Natural number of clusters is k=3 for downloaded and
uploaded bytes
Most users belong to the group with small traffic
volume
For k=3:
159 users in group 1 (average 0.0–16.8 MB
downloaded per hour)
24 users in group 2 (average 16.8–70.6 MB
downloaded per hour)
3 users in group 3 (average 70.6–110.7 MB
downloaded per hour)
July 19-20, 2007 IWCSN 2007, Guilin, China 49
Three most common traffic patterns
Idle users:
rarely download/upload traffic represented by zero traffic
Active users:
download/upload traffic for more than 18 hours a
day
represented by traffic over 24 hours each day
Semi-active users:
download/upload traffic for 8–12 hours a day represented by a cycle of 10 hours ACTIVE/14
hours IDLE cycle for each day
July 19-20, 2007 IWCSN 2007, Guilin, China 50
Clustering results using three most common traffic patterns
186 Total number of users 8 Semi-active 16 Active 162 Idle Number of users Traffic pattern
July 19-20, 2007 IWCSN 2007, Guilin, China 51
tcpdump traces
- Traces were continuously collected from 11:30 on Dec. 14,
2002 to 11:00 on Jan. 10, 2003 at the NOC
- The first 68 bytes of a each TCP/IP packet were captured
- ~63 GB of data contained in 127 files
- User IP address is not constant due to the use of the
private IP address range and dynamic IP
- Majority of traffic is TCP:
94% of total bytes and 84% of total packets WWW (port 80) accounts for 90% of TCP connections
and 76% of TCP bytes
FTP (port 21) accounts for 0.2% of TCP connections and
11% of TCP bytes
July 19-20, 2007 IWCSN 2007, Guilin, China 52
OS fingerprinting results
Analyzed 9 hours of tcpdump trace on Dec. 14, 2002
using the open-source tool p0f.v2
Assumed constant IP addresses Detected 171 users:
137 users did not initiate any connections and
cannot be identified (no SYN packets)
14 users employ Microsoft Windows 2 users employ Linux 1 user employs an unknown OS (identified as an
MSS-modifying proxy)
OS: operating system
July 19-20, 2007 IWCSN 2007, Guilin, China 53
Network anomalies
Ethereal/Wireshark, tcptrace, and pcapread Four types of network anomalies were detected:
invalid TCP flag combinations large number of TCP resets UDP and TCP port scans traffic volume anomalies
July 19-20, 2007 IWCSN 2007, Guilin, China 54
Analysis of TCP flags
100.000 39,283,305 Total packet count 0.300 112,419 *Total number of packets with invalid TCP flag combinations 0.020 8,329 *RST+FIN+PSH 0.050 18,111 *RST+PSH (no FIN) 0.200 85,571 *RST+FIN (no PSH) 0.001 408 *SYN+FIN 32.300 12,679,619 FIN only 18.900 7,440,418 RST only 48.500 19,050,849 SYN only % of Total Packet count TCP flag
July 19-20, 2007 IWCSN 2007, Guilin, China 55
Large number of TCP resets
Connections are terminated by either TCP FIN or TCP
RST:
12,679,619 connections were terminated by FIN
(63%)
7,440,418 connections were terminated by RST
(37%)
Large number of TCP RST indicates that connections
are terminated in error conditions
TCP RST is employed by Microsoft Internet Explorer
to terminate connections instead of TCP FIN
TCP: transport control protocol
July 19-20, 2007 IWCSN 2007, Guilin, China 56
UDP and TCP port scans
- UDP port scans are found on UDP port 137 (NETBEUI)
- TCP port scans are found on these TCP ports:
80 Hypertext transfer protocol (HTTP) 139 NETBIOS extended user interface (NETBEUI) 434 HTTP over secure socket layer (HTTPS) 1433 Microsoft structured query language (MS SQL) 27374 Subseven trojan
- No HTTP(S) servers were active in the ChinaSat network
- MSSQL vulnerability was discovered on Oct. 2002, which
may be the cause of scans on TCP port 1433
- The Subseven trojan is a backdoor program used in malicious
intents
TCP: transport control protocol UDP: user defined protocol
July 19-20, 2007 IWCSN 2007, Guilin, China 57
UDP port scans originating from the ChinaSat network
192.168.2.30:137 - 195.x.x.98:1025 192.168.2.30:137 - 202.x.x.153:1027 192.168.2.30:137 - 210.x.x.23:1035 192.168.2.30:137 - 195.x.x.42:1026 192.168.2.30:137 - 202.y.y.226:1026 192.168.2.30:137 - 218.x.x.238:1025 192.168.2.30:137 - 202.y.y.226:1025 192.168.2.30:137 - 202.y.y.226:1027 192.168.2.30:137 - 202.y.y.226:1028 192.168.2.30:137 - 202.y.y.226:1029 192.168.2.30:137 - 202.y.y.242:1026 192.168.2.30:137 - 61.x.x.5:1028 192.168.2.30:137 - 219.x.x.226:1025 192.168.2.30:137 - 213.x.x.189:1028 192.168.2.30:137 - 61.x.x.193:1025 192.168.2.30:137 - 202.y.y.207:1028 192.168.2.30:137 - 202.y.y.207:1025 192.168.2.30:137 - 202.y.y.207:1026 192.168.2.30:137 - 202.y.y.207:1027 192.168.2.30:137 - 64.x.x.148:1027
Client (192.168.2.30) source
port (137) scans external network addresses at destination ports (1025-1040):
> 100 are recorded within a
three-hour period
targeted IP addresses are
variable
multiple ports are scanned
per IP
may correspond to Bugbear,
OpaSoft, or other worms
July 19-20, 2007 IWCSN 2007, Guilin, China 58
UDP port scans direct to the ChinaSat network
210.x.x.23:1035 - 192.168.1.121:137 210.x.x.23:1035 - 192.168.1.63:137 210.x.x.23:1035 - 192.168.2.11:137 210.x.x.23:1035 - 192.168.1.250:137 210.x.x.23:1035 - 192.168.1.25:137 210.x.x.23:1035 - 192.168.2.79:137 210.x.x.23:1035 - 192.168.1.52:137 210.x.x.23:1035 - 192.168.6.191:137 210.x.x.23:1035 - 192.168.1.241:137 210.x.x.23:1035 - 192.168.2.91:137 210.x.x.23:1035 - 192.168.1.5:137 210.x.x.23:1035 - 192.168.1.210:137 210.x.x.23:1035 - 192.168.6.127:137 210.x.x.23:1035 - 192.168.1.201:137 210.x.x.23:1035 - 192.168.6.179:137 210.x.x.23:1035 - 192.168.2.82:137 210.x.x.23:1035 - 192.168.1.239:137 210.x.x.23:1035 - 192.168.1.87:137 210.x.x.23:1035 - 192.168.1.90:137 210.x.x.23:1035 - 192.168.1.177:137 210.x.x.23:1035 - 192.168.1.39:137
External address (210.x.x.23)
scans for port (137) (NETBEUI) response within the ChinaSat network from source port (1035):
> 200 are recorded within a
three-hour period
targets IP addresses are not
sequential
may correspond to Bugbear,
OpaSoft, or other worms
July 19-20, 2007 IWCSN 2007, Guilin, China 59
Detection of traffic volume anomalies using wavelets
Traffic is decomposed into various frequencies using
the wavelet transform
Traffic volume anomalies are identified by the large
variation in wavelet coefficient values
The coarsest scale level where the anomalies are found
indicates the time scale of an anomaly
July 19-20, 2007 IWCSN 2007, Guilin, China 60
Detection of traffic volume anomalies using wavelets
tcpdump traces are binned in terms of packets or
bytes (each second)
Wavelet transform of 12 levels is employed to
decompose the traffic
The coarsest level approximately represents the
hourly traffic
Anomalies are:
detected with a moving window of size 20 and by
calculating the mean and standard deviation (σ) of the wavelet coefficients in each window
identified when wavelet coefficients lie outside the
± 3σ of the mean value
July 19-20, 2007 IWCSN 2007, Guilin, China 61
Wavelet approximate coefficients
July 19-20, 2007 IWCSN 2007, Guilin, China 62
Wavelet detail coefficients: d9
July 19-20, 2007 IWCSN 2007, Guilin, China 63
Wavelet detail coefficients: d8
July 19-20, 2007 IWCSN 2007, Guilin, China 64
Roadmap
Introduction Traffic data and analysis tools:
data collection statistical analysis, clustering tools, prediction
analysis
Case studies:
satellite network: ChinaSat packet data network: Internet public safety wireless network: E-Comm
Conclusions and references
July 19-20, 2007 IWCSN 2007, Guilin, China 65
Autonomous System (AS)
Internet is a network of Autonomous Systems:
groups of networks sharing the same routing policy identified with Autonomous System Numbers (ASN)
Autonomous System Numbers:
http://www.iana.org/assignments/as-numbers
Internet topology on AS-level:
the arrangement of ASs and their interconnections
Border Gateway Protocol (BGP):
inter-AS protocol used to exchange network reachability information
among BGP systems
reachability information is stored in routing tables
July 19-20, 2007 IWCSN 2007, Guilin, China 66
Internet AS-level data
Source of data are routing tables:
Route Views: http://www.routeviews.org
most participating ASs reside in North America
RIPE (Réseaux IP européens):
http://www.ripe.net/ris
most participating ASs reside in Europe
July 19-20, 2007 IWCSN 2007, Guilin, China 67
Internet AS-level data
Data used in prior research (partial list):
Yes Yes Mihail, 2003 No Yes Vukadinovic, 2001 Yes Yes Chang, 2001 No Yes Faloutsos, 1999 RIPE Route Views
Research results have been used in developing Internet
simulation tools:
power-laws are employed to model and generate
Internet topologies: BA model, BRITE, Inet2
July 19-20, 2007 IWCSN 2007, Guilin, China 68
Data sets
Emerging concerns about the use of the two datasets:
different observations about AS degrees:
power-law distribution: Route Views [Faloutsos,
1999]
Weibull distribution: Route Views + RIPE [Chang,
2001]
data completeness:
RIPE dataset contains ~ 40% more AS connections
and 2% more ASs than Route Views [Chang, 2001]
July 19-20, 2007 IWCSN 2007, Guilin, China 69
Route Views and RIPE: statistics
35,225 34,878 AS pairs 15,433 15,418 Probed ASs 6,375,028 6,398,912 AS paths RIPE Route Views Number of
AS pair: a pair of connected ASs 15,369 probed ASs (99.7%) in both datasets are
identical
29,477 AS pairs in Route Views (85%) and in RIPE
(84%) are identical
Route Views and RIPE samples collected on May 30,
2003
July 19-20, 2007 IWCSN 2007, Guilin, China 70
Core ASs
ASs with largest
degrees
16 of the core ASs in
Route Views and RIPE are identical
Core ASs in Route Views
have larger degrees than core ASs in RIPE
281 6347 258 7132 20 296 16631 263 3786 19 305 3257 263 4766 18 305 4323 277 3257 17 313 6730 289 8220 16 412 13237 291 6347 15 429 3303 294 16631 14 450 8220 315 4323 13 476 6461 468 4513 12 482 4589 498 6461 11 489 1 556 2914 10 561 2914 562 702 9 580 702 617 3549 8 612 3549 662 3356 7 673 3356 863 209 6 705 3561 999 1 5 861 209 1036 3561 4 1638 7018 1999 7018 3 1784 1239 2569 1239 2 2448 701 2595 701 1 Degree AS Degree AS RIPE Route Views
July 19-20, 2007 IWCSN 2007, Guilin, China 71
Spectral analysis of graphs
Normalized Laplacian matrix N(G) [Chung, 1997]:
di and dj are degrees of node i and j, respectively
The second smallest eigenvalue [Fiedler, 1973] The largest eigenvalue [Chung, 1997] Characteristic valuation [Fiedler, 1975]
⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ − ≠ = =
- therwise
adjacent are j and i if d d d and j i if j i N
j i i
1 1 ) , (
July 19-20, 2007 IWCSN 2007, Guilin, China 72
Characteristic valuation: example
The second smallest eigenvector: 0.1, 0.3, -0.2, 0 AS1(0.1), AS2(0.3), AS3(-0.2), AS4(0) Sort ASs by element value: AS3, AS4, AS1, AS2 AS3 and AS1 are connected
AS3 AS4 AS2 AS1 1
index of elements connectivity status
July 19-20, 2007 IWCSN 2007, Guilin, China 73
Spectral analysis of topology data
- Consider only ASs with the first 30,000 assigned AS numbers
- AS degree distribution in Route Views and RIPE datasets:
July 19-20, 2007 IWCSN 2007, Guilin, China 74
(c) RouteViews_min (d) RIPE_min (a) RouteViews_original (b) RIPE_original Before the sort After the sort
July 19-20, 2007 IWCSN 2007, Guilin, China 75
Before the sort (a) RouteViews_original (b) RIPE_original (c) RouteViews_max (d) RIPE_max After the sort
July 19-20, 2007 IWCSN 2007, Guilin, China 76
Data analysis results
The second smallest eigenvector:
separates connected ASs from disconnected ASs Route Views and RIPE datasets are similar on a coarser
scale
The largest eigenvector:
reveals highly connected clusters Route Views and RIPE datasets differ on a finer scale
July 19-20, 2007 IWCSN 2007, Guilin, China 77
Observations
The two datasets are similar on coarse scales:
number of ASs, number of AS connections, core
ASs
They exhibit different clustering characteristics:
Route Views data contain larger AS clusters core ASs in Route Views have larger degrees than
core ASs in RIPE
core ASs in Route Views connect a larger number of
smaller ASs
July 19-20, 2007 IWCSN 2007, Guilin, China 78
Roadmap
Introduction Traffic data and analysis tools:
data collection, statistical analysis, clustering tools,
prediction analysis
Case studies:
satellite network: ChinaSat packet data network: Internet public safety wireless network: E-Comm
Conclusions and references
July 19-20, 2007 IWCSN 2007, Guilin, China 79
Case study: E-Comm network
E-Comm network: an operational trunked radio system
serving as a regional emergency communication system
The E-Comm network is capable of both voice and data
transmissions
Voice traffic accounts for over 99% of network traffic A group call is a standard call made in a trunked radio
system
More than 85% of calls are group calls A distributed event log database records every event
- ccurring in the network: call establishment, channel
assignment, call drop, and emergency call
July 19-20, 2007 IWCSN 2007, Guilin, China 80
E-Comm network: coverage and user agencies
RCMP and Police Ambulance Other Fire
Agency 1
(Police)
Agency 2
(Fire Dept.)
TG 1 TG 2 TG 3 TG 4
R1 R2 R3 R4 R5 R6 R7 R8
TG: Talk group R: Radio device (user)
... ... TG n ...
July 19-20, 2007 IWCSN 2007, Guilin, China 81
E-Comm network architecture
Burnaby Vancouver Other EDACS systems PSTN PBX Dispatch console Users Database server Data gateway Management console Transmitters/Repeaters Network switch
1 2 3 4 5 6 7 8 9 * 8 # I B MJuly 19-20, 2007 IWCSN 2007, Guilin, China 82
Traffic data
2001 data set:
2 days of traffic data 2001-11-1 to 2001-11-02 (110,348 calls)
2002 data set:
28 days of continuous traffic data 2002-02-10 to 2002-03-09 (1,916,943 calls)
2003 data set:
92 days of continuous traffic data 2003-03-01 to 2003-05-31 (8,756,930 calls)
July 19-20, 2007 IWCSN 2007, Guilin, China 83
Observations
Presence of daily cycles:
minimum utilization: ~ 2 PM maximum utilization: 9 PM to 3 AM
2002 sample data:
cell 5 is the busiest
- thers seldom reach their capacities
2003 sample data:
several cells (2, 4, 7, and 9) have all channels
- ccupied during busy hours
July 19-20, 2007 IWCSN 2007, Guilin, China 84
Network utilization
OPNET based simulation of two weeks of network
activity
Network utilization exhibits daily cycles Between February 2002 and March 2003:
number of calls increased by ~ 60 % average utilization increased non-uniformly across
the network
Several cells may become congested in future
- N. Cackov, B. Vujičić, S. Vujičić, and Lj. Trajković, “Using network activity data
to model the utilization of a trunked radio system,” in Proc. SPECTS 2004, San Jose, CA, July 2004, pp. 517–524.
- N. Cackov, J. Song, B. Vujičić, S. Vujičić, and Lj. Trajković, “Simulation of a
public safety wireless networks: a case study,” Simulation, vol. 81, no. 8, pp. 571–585, Aug. 2005.
July 19-20, 2007 IWCSN 2007, Guilin, China 85
Performance analysis
Modeling and Performance Analysis of Public Safety
Wireless Networks
WarnSim: a simulator for public safety wireless
networks (PSWN)
Traffic data analysis Traffic modeling Simulation and prediction
- J. Song and Lj. Trajković, “Modeling and performance analysis of public
Safety wireless networks,” in Proc. IEEE IPCCC, Phoenix, AZ, Apr. 2005,
- pp. 567–572.
July 19-20, 2007 IWCSN 2007, Guilin, China 86
WarnSim overview
Simulators such as OPNET, ns-2, and JSim are
designed for packet-switched networks
WarnSim is a simulator developed for circuit-
switched networks, such as PSWN
WarnSim:
publicly available simulator
http://www.vannet.ca/warnsim
effective, flexible, and easy to use developed using Microsoft Visual C# .NET
- perates on Windows platforms
July 19-20, 2007 IWCSN 2007, Guilin, China 87
Call arrival rate in 2002 and 2003: cyclic patterns
the busiest hour is around midnight the busiest day is Thursday useful for scheduling periodical maintenance tasks
1 5 10 15 20 24 1000 2000 3000 4000 5000 6000 Time (hours) Number of calls 2002 Data 2003 Data Sat. Sun. Mon. Tue. Wed. Thu. Fri. 4 5 6 7 8 9 10 11 12 x 10
4
Time (days) Number of calls 2002 Data 2003 Data
July 19-20, 2007 IWCSN 2007, Guilin, China 88
Modeling and characterization of traffic
We analyzed voice traffic from a public safety wireless
network in Vancouver, BC
call inter-arrival and call holding times during five
busy hours from each year (2001, 2002, 2003)
Statistical distribution and the autocorrelation function
- f the traffic traces:
Kolmogorov-Smirnov goodness-of-fit test autocorrelation functions wavelet-based estimation of the Hurst parameter
- B. Vujičić, N. Cackov, S. Vujičić, and Lj. Trajković, “Modeling and characterization
- f traffic in public safety wireless networks,” in Proc. SPECTS 2005, Philadelphia,
PA, July 2005, pp. 214–223.
July 19-20, 2007 IWCSN 2007, Guilin, China 89
Erlang traffic models
PB : probability of rejecting a call Pc : probability of delaying a call N : number of channels/lines A : total traffic volume
! !
N B x N x
A N P A x
=
=
∑
1
! ! !
N C x N N x
A N N N A P A A N x N N A
− =
− = + −
∑
Erlang B Erlang C
July 19-20, 2007 IWCSN 2007, Guilin, China 90
Erlang models
Erlang B model assumes:
call holding time follows exponential distribution blocked call will be rejected immediately
Erlang C model assumes:
call holding time follows exponential distribution blocked call will be put into a FIFO queue with
infinite size
July 19-20, 2007 IWCSN 2007, Guilin, China 91
Kolmogorov-Smirnov test
Goodness-of-fit test: quantitative decision whether
the empirical cumulative distribution function (ECDF)
- f a set of observations is consistent with a random
sample from an assumed theoretical distribution
ECDF is a step function (step size 1/N) of N ordered
data points : : the number of data samples with values smaller than
N
Y Y Y ..., , ,
2 1
( )
N i n EN =
( )
i n
i
Y
July 19-20, 2007 IWCSN 2007, Guilin, China 92
Traffic data
Records of network events:
established, queued, and dropped calls in the
Vancouver cell
Traffic data span periods during:
- 2001, 2002, 2003
March 24–30, 2003 March 1–7, 2002 November 1–2, 2001 Time span 387,340 2003 370,510 2002 110,348 2001
- No. of established calls
Trace (dataset)
July 19-20, 2007 IWCSN 2007, Guilin, China 93
Hourly traces
Call holding and call inter-arrival times from the five
busiest hours in each dataset (2001, 2002, and 2003)
4,097 29.03.2003 01:00–02:00 3,939 02.03.2002 00:00–01:00 3,227 02.11.2001 20:00–21:00 4,150 29.03.2003 02:00–03:00 3,971 01.03.2002 00:00–01:00 3,312 01.11.2001 19:00–20:00 4,222 26.03.2003 23:00–24:00 4,179 01.03.2002 23:00–24:00 3,492 02.11.2001 16:00–17:00 4,249 25.03.2003 23:00–24:00 4,314 01.03.2002 22:00–23:00 3,707 01.11.2001 00:00–01:00 4,919 26.03.2003 22:00–23:00 4,436 01.03.2002 04:00–05:00 3,718 02.11.2001 15:00–16:00 No. Day/hour No. Day/hour No. Day/hour 2003 2002 2001
July 19-20, 2007 IWCSN 2007, Guilin, China 94
Example: March 26, 2003
22:18:00 22:18:20 22:18:40 22:19:00 5 10 15 20 Time (hh:mm:ss) Call holding times (s)
call inter-arrival time
July 19-20, 2007 IWCSN 2007, Guilin, China 95
Statistical distributions
Fourteen candidate distributions:
exponetial, Weibull, gamma, normal, lognormal,
logistic, log-logistic, Nakagami, Rayleigh, Rician, t-location scale, Birnbaum-Saunders, extreme value, inverse Gaussian
Parameters of the distributions: calculated by
performing maximum likelihood estimation
Best fitting distributions are determined by:
visual inspection of the distribution of the trace
and the candidate distributions
K-S test on potential candidates
July 19-20, 2007 IWCSN 2007, Guilin, China 96
Call inter-arrival times: pdf candidates
1 2 3 4 5 6 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Call inter-arrival time (s) Probability density Traffic data Exponential model Lognormal model Weibull model Gamma model Rayleigh model Normal model
July 19-20, 2007 IWCSN 2007, Guilin, China 97
Call inter-arrival times: K-S test results (2003 data)
0.0761 0.0795 0.0657 0.0629 0.0689 k 4.851E-21 3.267E-23 2.97E-16 4.717E-15 1.015E-20 p 1 1 1 1 1 h Lognormal 0.0171 0.0163 0.0181 0.0146 0.0139 k 0.1672 0.145 0.127 0.3458 0.3956 p h Gamma 0.0159 0.014 0.0164 0.0133 0.0130 k 0.2337 0.286 0.2065 0.4662 0.4885 p h Weibull 0.0185 0.0205 0.0137 0.0214 0.0283 k 0.1101 0.0316 0.4049 0.0469 0.0027 p 1 1 1 1 h Exponential
29.03.2003, 01:00–02:00 29.03.2003, 02:00–03:00 26.03.2003, 23:00–24:00 25.03.2003, 23:00–24:00 26.03.2003, 22:00–23:00
Parameter Distribution
July 19-20, 2007 IWCSN 2007, Guilin, China 98
Call inter-arrival times: best-fitting distributions (cdf)
1 2 3 4 5 6 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Call inter-arrival time (s) Cumulative distribution Traffic data Exponential model Weibull model Gamma model
July 19-20, 2007 IWCSN 2007, Guilin, China 99
Call inter-arrival times: estimates of H
Traces pass the test for time constancy of a:
estimates of H are reliable
0.705 29.03.2003 01:00–02:00 0.747 02.03.2002 00:00–01:00 0.663 02.11.2001 20:00–21:00 0.696 29.03.2003 02:00–03:00 0.741 01.03.2002 00:00–01:00 0.774 01.11.2001 19:00–20:00 0.699 26.03.2003 23:00–24:00 0.780 01.03.2002 23:00–24:00 0.770 02.11.2001 16:00–17:00 0.832 25.03.2003 23:00–24:00 0.757 01.03.2002 22:00–23:00 0.802 01.11.2001 00:00–01:00 0.788 26.03.2003 22:00–23:00 0.679 01.03.2002 04:00–05:00 0.907 02.11.2001 15:00–16:00 H Day/hour H Day/hour H Day/hour 2003 2002 2001
July 19-20, 2007 IWCSN 2007, Guilin, China 100
Call holding times: pdf candidates
5 10 15 20 25 0.05 0.1 0.15 0.2 0.25 Call holding time (s) Probability density Traffic data Lognormal model Gamma model Weibull model Exponential model Normal model Rayleigh model
July 19-20, 2007 IWCSN 2007, Guilin, China 101
Call holding times: best-fitting distributions (cdf)
5 10 15 20 25 30 35 40 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Call holding time (s) Cumulative distribution Traffic data Lognormal model Exponential model Gamma model Weibull model
July 19-20, 2007 IWCSN 2007, Guilin, China 102
Call holding times: K-S test results (2003 data)
No distribution passes the test when the entire trace
is tested (significance levels = 0.1 and 0.01)
Lognormal distribution passes test (significance level =
0.01) for:
5-6 sub-traces from 15 randomly chosen 1,000-
sample sub-traces
passes the test for almost all 500-sample sub-traces
Test rejects null hypothesis when the sub-traces are
compared with candidate distributions:
exponential Weibull gamma
July 19-20, 2007 IWCSN 2007, Guilin, China 103
Call holding times: estimates of H
0.466 29.03.2003 01:00–02:00 0.503 02.03.2002 00:00–01:00 0.479 02.11.2001 20:00–21:00 0.526 29.03.2003 02:00–03:00 0.508 01.03.2002 00:00–01:00 0.467 01.11.2001 19:00–20:00 0.463 * 26.03.2003 23:00–24:00 0.489 01.03.2002 23:00–24:00 0.462 02.11.2001 16:00–17:00 0.483 25.03.2003 23:00–24:00 0.460 01.03.2002 22:00–23:00 0.471 01.11.2001 00:00–01:00 0.483 26.03.2003 22:00–23:00 0.490 01.03.2002 04:00–05:00 0.493 02.11.2001 15:00–16:00 H Day/hour H Day/hour H Day/hour 2003 2002 2001
All (except one) traces pass the test for constancy of a
- nly one unreliable estimate (*): consistent value
July 19-20, 2007 IWCSN 2007, Guilin, China 104
Call inter-arrival and call holding times
4.25 4.06 3.84 holding 0.88 29.03.2003 01:00–02:00 0.91 02.03.2002 00:00–01:00 1.12 02.11.2001 20:00–21:00 inter-arrival 4.14 3.95 3.97 holding 0.87 29.03.2003 02:00–03:00 0.91 01.03.2002 00:00–01:00 1.09 01.11.2001 19:00–20:00 inter-arrival 4.04 3.88 3.99 holding 0.85 26.03.2003 23:00–24:00 0.86 01.03.2002 23:00–24:00 1.03 02.11.2001 16:00–17:00 inter-arrival 4.12 3.84 3.95 holding 0.85 25.03.2003 23:00–24:00 0.83 01.03.2002 22:00–23:00 0.97 01.11.2001 00:00–01:00 inter-arrival 4.08 4.07 3.78 holding 0.73 26.03.2003 22:00–23:00 0.81 01.03.2002 04:00–05:00 0.97 02.11.2001 15:00–16:00 inter-arrival
- Avg. (s)
Day/hour
- Avg. (s)
Day/hour
- Avg. (s)
Day/hour 2003 2002 2001
- Avg. call inter-arrival times: 1.08 s (2001), 0.86 s (2002), 0.84 s (2003)
- Avg. call holding times: 3.91 s (2001), 3.96 s (2002), 4.13 s (2003)
July 19-20, 2007 IWCSN 2007, Guilin, China 105
Busy hour: best fitting distributions
0.6696 1.1704 0.8292 1.0299 1.0092 0.8579 26.03.2003 23:00–24:00 0.6715 1.1737 0.7891 1.0762 1.0376 0.8622 25.03.2003 23:00–24:00 0.6553 1.1838 0.6724 1.0910 1.0475 0.7475 26.03.2003 22:00–23:00 0.6803 1.1096 0.7623 1.1308 1.0790 0.8877 01.03.2002 23:00–24:00 0.6565 1.1157 0.7643 1.0931 1.0542 0.8532 01.03.2002 22:00–23:00 0.6671 1.1746 0.7319 1.1096 1.0603 0.8313 01.03.2002 04:00–05:00 0.6803 1.1432 0.9238 1.1189 1.0826 1.0651 02.11.2001 16:00–17:00 0.7535 1.0801 0.8977 1.0818 1.0517 0.9907 01.11.2001 00:00–01:00 0.6910 1.0913 0.9407 1.0326 1.1075 0.9785 02.11.2001 15:00–16:00 σ µ b a b a Lognormal Gamma Weibull Call holding times Call inter-arrival times Distribution Busy hour
July 19-20, 2007 IWCSN 2007, Guilin, China 106
Traffic prediction
E-Comm network and traffic data:
data preprocessing and extraction
Data clustering Traffic prediction:
based on aggregate traffic cluster based
- H. Chen and Lj. Trajković, “Trunked radio systems: traffic prediction based on
user clusters,” in Proc. IEEE ISWCS 2004, Mauritius, Sept. 2004, pp. 76–80.
- B. Vujičić, L. Chen, and Lj. Trajković, “Prediction of traffic in a public safety
network,” in Proc. ISCAS 2006, Kos, Greece, May 2006, pp. 2637–2640.
July 19-20, 2007 IWCSN 2007, Guilin, China 107
Traffic data: preprocessing
Collected data contain continuous data records from
92 days: March 1st 2003 – May 31st 2003
Original database: ~6 GBytes, with 44,786,489
record rows:
contains event log tables recording network
activities
aggregated from distributed database of individual
network management systems
sorted in 92 event log tables, each containing one
day’s events
9 (out of original 26) fields are of interest for our
analysis
July 19-20, 2007 IWCSN 2007, Guilin, China 108
Traffic data: preprocessing
Data pre-processing:
cleaning the database filtering the outliers removing redundant records extracting accurate user calling activity
After the data cleaning and extraction, number of
records was reduced to only 19% of original records
July 19-20, 2007 IWCSN 2007, Guilin, China 109
Traffic data: sample
Date Time Ms Duration Sys_id Chl_id Caller Callee C_type C_state Multi 2003-03-20 00:00:01 450 3730 8 4 6155 1801 2003-03-20 00:00:01 469 3730 6 7 6155 1801 2003-03-20 00:00:01 560 3730 3 7 6155 1801 2003-03-20 00:00:01 570 3730 2 7 6155 1801 2003-03-20 00:00:01 640 3730 1 7 6155 1801 2003-03-20 00:00:01 880 5260 9 6 13314 251 2003-03-20 00:00:01 910 5260 7 6 13314 251 2003-03-20 00:00:01 970 5260 6 8 13314 251 2003-03-20 00:00:01 980 2520 7 7 13911 418 2003-03-20 00:00:02 29 5270 4 2 13314 251 2003-03-20 00:00:02 109 5260 2 8 13314 251 2003-03-20 00:00:02 139 5270 1 8 13314 251 2003-03-20 00:00:02 9 2510 6 1 13911 418 2003-03-20 00:00:02 149 2510 2 9 13911 418 2003-03-20 00:00:05 289 3560 8 5 6011 2035 2003-03-20 00:00:05 309 3550 6 3 6011 2035 2003-03-20 00:00:05 389 3560 3 2 6011 2035 2003-03-20 00:00:05 449 3550 2 2 6011 2035 2003-03-20 00:00:05 480 3550 1 9 6011 2035 2003-03-20 00:00:05 550 3440 1 12 7614 945 2003-03-20 00:00:05 550 3440 2 3 7614 945 2003-03-20 00:00:05 949 9780 6 4 15840 418 2003-03-20 00:00:05 959 9780 7 2 15840 418 2003-03-20 00:00:06 679 3040 2 6 13931 471 2003-03-20 00:00:06 709 3040 1 2 13931 471 2003-03-20 00:00:06 130 9780 2 4 15840 418 2003-03-20 00:00:08 109 6640 9 2 13420 251 2003-03-20 00:00:08 179 6630 7 3 13420 251 2003-03-20 00:00:08 200 6640 6 5 13420 251 2003-03-20 00:00:08 270 6630 4 5 13420 251 2003-03-20 00:00:08 329 6640 1 4 13420 251 2003-03-20 00:00:08 340 6640 2 7 13420 251
July 19-20, 2007 IWCSN 2007, Guilin, China 110
Traffic data: sample
Date Time Ms Duration Caller Callee C_type C_state Multi # Sys System List 2003-03-20 00:00:01 450 3730 6155 1801 5 8,6,3,2,1 2003-03-20 00:00:01 980 2520 13911 418 3 7,6,2 2003-03-20 00:00:01 880 5260 13314 251 6 9,7,6,4,2,1 2003-03-20 00:00:05 550 3440 7614 945 2 1,2 2003-03-20 00:00:05 289 3560 6011 2035 5 8,6,3,2,1 2003-03-20 00:00:05 949 9780 15840 418 3 6,7,2 2003-03-20 00:00:06 810 2350 8022 817 1 1 2003-03-20 00:00:06 819 1590 13902 497 4 10,9,8,4 2003-03-20 00:00:06 440 3030 13931 471 5 10,9,4,2,1 2003-03-20 00:00:08 109 6640 13420 251 6 9,7,6,4,1,2
Traffic data after cleaning and extraction:
July 19-20, 2007 IWCSN 2007, Guilin, China 111
Data preparation
Date Original Cleaned Combined 2003/03/01 466,862 204,357 91,143 2003/03/02 415,715 184,973 88,014 2003/03/03 406,072 182,311 76,310 2003/03/04 464,534 207,016 84,350 2003/03/05 585,561 264,226 97,714 2003/03/06 605,987 271,514 104,715 2003/03/07 546,230 247,902 94,511 2003/03/08 513,459 233,982 90,310 2003/03/09 442,662 201,146 79,815 2003/03/10 419,570 186,201 76,197 2003/03/11 504,981 225,604 88,857 2003/03/12 516,306 233,140 94,779 2003/03/13 561,253 255,840 95,662 2003/03/14 550,732 248,828 99,458 Total 92 Days 44,786,489 20,130,718 8,663,586 44.95% 19.34%
July 19-20, 2007 IWCSN 2007, Guilin, China 112
User clustering with K-means: k = 3
First cluster (heavy network users):
17 talk groups, contributing to 59% of the calls, with
an average number of calls ranging from 94 to 208 per hour
They are dispatch groups that assign and schedule
- ther talk groups for specific tasks
Second cluster (average network users):
31 talk groups, contributing to 26% of the calls
Third cluster (least frequent network users):
569 talk groups, contributing to only 15% of the
calls
They represent over 90% of all talk groups
July 19-20, 2007 IWCSN 2007, Guilin, China 113
User clusters with K-means: k = 3
User clusters with K-means: k = 6
July 19-20, 2007 IWCSN 2007, Guilin, China 114
Clustering results
Larger values of silhouette coefficient produce better
results:
values between 0.7 and 1.0 imply clustering with
excellent separation between clusters
Cluster sizes:
17, 31, and 569 for K =3 17, 33, 4, and 563 for K =4 13, 17, 22, 3, 34, and 528 for K =6
K = 3 produces the best clustering results (based on
- verall clustering quality and silhouette coefficient)
Interpretations of three clusters have been confirmed
by the E-Comm domain experts
July 19-20, 2007 IWCSN 2007, Guilin, China 115
K-means clusters of talk groups: k = 3
15 1,310,836 0-16 1-1613 569 26 2,261,055 17-66 135-641 0-3 31 59 5,091,695 94-208 352-700 0-6 17 Total number of calls (%) Total number of calls Average number of calls Maximum number of calls Minimum number of calls Cluster size
July 19-20, 2007 IWCSN 2007, Guilin, China 116
Traffic prediction
Traffic prediction: important to assess future network
capacity requirements and to plan future network developments
A network traffic trace consists of a series of
- bservations in a dynamical system environment
Traditional prediction: considers aggregate traffic and
assumes a constant number of network users
Approach that focuses on individual users has high
computational cost for networks with thousands of users
Employing clustering techniques for predicting
aggregate network traffic bridges the gap between the two approaches
July 19-20, 2007 IWCSN 2007, Guilin, China 117
Prediction based on aggregate traffic
- The aggregate network traffic consists of all network
users' traffic
- The R system was used to identify, estimate, and verify
the SARIMA model for the aggregate users' traffic
- Both 24-hour (one day) and 168-hour (one week) intervals
were selected as seasonal period parameters
- Based on m past traffic data samples, we forecast the
future n traffic data samples
- The prediction quality was measured using the normalized
mean square error nmse:
- where:
is the observed and is the predicted data
, ) ( ) , (
1 2 2
∑
+ + =
− =
n m m i i i i
a b a b a nmse
i
a
i
b
July 19-20, 2007 IWCSN 2007, Guilin, China 118
SARIMA models: selection criteria
Order (0,1,1) is used for seasonal part (P,D,Q ):
cyclical seasonal pattern is usually random-walk may be modeled as MA process after one-time
differencing
Model’s goodness-of-fit is validated using null
hypothesis test:
time plot analysis and autocorrelation of model
residual
July 19-20, 2007 IWCSN 2007, Guilin, China 119
Prediction quality
Models (2,0,9)×(0,1,1)24 and (2,0,1)×(0,1,1)168 have
smallest criterion values based on 1,680 training data
Normalized mean square error (nmse) is used to
measure prediction quality by comparing deviation between predicted and observed data
The nmse of forecast is equal to ratio of normalized
sum of variance of forecast to squared bias of forecast
Smaller values of nmse indicate better prediction
model
July 19-20, 2007 IWCSN 2007, Guilin, China 120
SARIMA models: summary of selection criteria
25371.2 25332.6 25332.6 0.411 1680 (1,0,2) x (1,1,1)24 25399.7 25361.2 25361.2 0.404 1680 (3,0,1) x (0,1,1)24 25392.6 25360.6 25360.5 0.408 1680 (2,0,1) x (0,1,1)24 25382.1 25292.4 25292.1 0.525 1680 (2,0,9) x (1,1,1)24 23170.8 23145.1 23145.1 0.175 1680 (1,0,1) x (0,1,1)168 23161.9 23129.8 23129.8 0.174 1680 (2,0,1) x (0,1,1)168 22826.8 22744.9 22744.6 0.379 1680 (2,0,9) x (0,1,1)24 BIC AICc AIC nmse m (p,d,q) x (P,D,Q)s
July 19-20, 2007 IWCSN 2007, Guilin, China 121
Prediction: based on the aggregate traffic
0.1178 168 2016 168 1 1 1 2 C4 0.1282 168 2016 168 1 1 9 2 C3 0.3433 168 2016 24 1 1 1 2 C2 0.3384 168 2016 24 1 1 9 2 C1 0.1745 168 1680 168 1 1 1 2 B4 0.1736 168 1680 168 1 1 9 2 B3 0.4079 168 1680 24 1 1 1 2 B2 0.3790 168 1680 24 1 1 9 2 B1 0.1732 672 1512 168 1 1 1 2 A4 0.1742 672 1512 168 1 1 9 2 A3 0.3803 672 1512 24 1 1 1 2 A2 0.3790 672 1512 24 1 1 9 2 A1 nmse n m S Q D P q d p No.
Models forecast future n traffic data based on m past traffic data samples
July 19-20, 2007 IWCSN 2007, Guilin, China 122
Prediction: based on the aggregate traffic
Two groups of models, with 24-hour and 168-hour
seasonal periods:
SARIMA (2, 0, 9) x (0, 1, 1)24 and 168 SARIMA (2, 0, 1) x (0, 1, 1)24 and 168
Comparisons:
rows A1 with A2, B1 with B2, and C1 with C2 SARIMA (2, 0, 9) × (0, 1, 1)24 gives better prediction
results than SARIMA (2, 0, 1)×(0, 1, 1)24
Models with a 168-hour seasonal period provided
better prediction than the four 24-hour period based models, particularly when predicting long term traffic data
July 19-20, 2007 IWCSN 2007, Guilin, China 123
Prediction of 168 hours of traffic based on 1,680 past hours: sample
Comparison of the 24-hour and the 168-hour models
- Solid line: observation
- : prediction of 168-hour seasonal model
- *: prediction of 24-hour seasonal model
July 19-20, 2007 IWCSN 2007, Guilin, China 124
Prediction with user clustering
Raw network log data collected over 92 days:
March 1st 2003 – May 31st 2003
Footprint of network usage for talk groups: the hourly
number of calls
AutoClass and the K-means algorithm were used to
classify network talk groups into clusters
The behavior of each user cluster was predicted using
Seasonal Autoregressive Integrated Moving Average (SARIMA)
We used aggregation to predict the overall network
behavior
July 19-20, 2007 IWCSN 2007, Guilin, China 125
Traffic prediction based on user clusters
- The developed aggregate users based prediction assumes
that the adopted model is static: the number of network users and their behavior pattern are constant in time
- This assumption does not hold when planning further
network expansions and cannot be used to forecast network traffic
- We employed a user clusters based prediction approach to
predict the network traffic by accumulating the prediction results from user clusters
- In large networks with many users, it is impractical to
predict individual users' traffic and then aggregate the predicted data
- With user clusters, traffic prediction is reduced to
predicting and aggregating users' traffic from few clusters
July 19-20, 2007 IWCSN 2007, Guilin, China 126
Prediction of 168 hours of traffic based on 1,680 past hours
Comparisons: model (1,0,1)x(0,1,1)168 * observation * prediction without clustering
- prediction with clustering
July 19-20, 2007 IWCSN 2007, Guilin, China 127
Traffic prediction with user clusters: examples (2,0,1) x (0,1,1)
nmse n m S Cluster 0.4093 168 1,680 24 A 0.4079 168 1,680 24 * 0.2852 168 1,680 24 3 0.6883 168 1,680 24 2 0.5477 168 1,680 24 1 0.2052 24 2,920 24 A 0.1941 24 1,920 24 * 0.3020 24 1,920 24 3 0.2697 24 1,920 24 2 0.2508 24 1,920 24 1 0.1175 24 1,920 168 A 0.0969 24 1,920 168 * 0.1163 24 1,920 168 3 0.3818 24 1,920 168 2 0.2241 24 1,920 168 1
July 19-20, 2007 IWCSN 2007, Guilin, China 128
Prediction results with user clusters
For each group, rows 1, 2, and 3: traffic prediction
results for user clusters 1, 2, and 3
Row *: the aggregate user traffic prediction obtained
without clustering the users
Row A: the aggregate prediction of network traffic
based on the three user clusters
The performance of the clusters based prediction
(nmse: 0.1175) is comparable to the best prediction based on aggregate traffic (nmse: 0.0969)
Prediction of traffic in networks with a variable
number of users is possible, as long as the new user groups could be classified into the existing user clusters
July 19-20, 2007 IWCSN 2007, Guilin, China 129
Prediction based on user clusters model (2, 0, 1)×(0, 1, 1)
n/a 0.116 0.114 0.095 0.467 0.380 48 1680 168 12 n/a 0.129 0.132 0.115 0.444 0.367 24 1680 168 11 n/a 0.178 0.180 0.155 0.375 0.348 504 1512 168 10 0.436 0.507 0.365 0.168 0.747 3.401 24 1176 168 9 n/a 0.224 0.237 0.190 0.446 0.439 504 1008 168 8 n/a 0.260 0.285 0.190 0.466 0.616 336 1008 168 7 n/a 0.399 0.396 0.236 0.647 0.665 144 1200 24 6 n/a 0.467 0.463 0.245 0.703 0.840 120 1200 24 5 0.610 0.613 0.611 0.260 0.866 1.319 96 1200 24 4 0.846 0.886 0.884 0.270 1.976 1.774 72 1200 24 3 n/a 0.332 0.343 0.445 0.712 0.394 48 240 24 2 n/a 0.241 0.254 0.308 0.548 0.323 24 240 24 1 nmse
- ptimized
nmse cluster nmse aggregate nmse cluster 3 nmse cluster 2 nmse cluster 1 n m S Test no.
July 19-20, 2007 IWCSN 2007, Guilin, China 130
Traffic prediction with user clusters
nmse > 1.0 for cluster 1 (tests 3, 4, and 9) and for
cluster 2 (test 3) implies that prediction is worse than prediction based on the mean value of past data
Mean value prediction leads to better prediction
results shown in column “nmse optimized” (optimized cluster-based prediction) for:
Test 3: clusters 1 and 2 Test 4: cluster 1
Prediction based on clusters performs better than the
prediction based on aggregate traffic:
Tests 1, 2, 7, 8, 10, and 11
July 19-20, 2007 IWCSN 2007, Guilin, China 131
Traffic prediction with user clusters
57% of cluster-based predictions perform better than
aggregate-traffic-based prediction with SARIMA model (2,0,1)×(0,1,1)168
Prediction of traffic in networks with a variable
number of users is possible, as long as the new user groups could be classified into the existing user clusters
July 19-20, 2007 IWCSN 2007, Guilin, China 132
Roadmap
Introduction Traffic data and analysis tools:
data collection statistical analysis, clustering tools, prediction
analysis
Case studies:
satellite network: ChinaSat public safety wireless network: E-Comm packet data network: Internet
Conclusions and references
July 19-20, 2007 IWCSN 2007, Guilin, China 133
Conclusions
- We used simulation tools and analytical methods to analyze
traffic data from three deployed networks: ChinaSat, the Internet, and E-Comm
- Network: network performance was evaluated
- Traffic characterization and modeling: models of inter-
arrival and call holding times were developed
- Users: clustering algorithms were employed to classify
network users into user clusters
- Traffic prediction: SARIMA models were used to predict
network traffic based on aggregate user traffic and based
- n three user clusters
- Network anomalies: wavelet analysis was employed to detect
traffic anomalies
July 19-20, 2007 IWCSN 2007, Guilin, China 134
References: downloads
http://www.ensc.sfu.ca/~ljilja/publications_date.html
- S. Lau and Lj. Trajkovic, “Analysis of traffic data from a hybrid satellite-terrestrial network,” in
- Proc. QShine 2007, Vancouver, BC, Canada, Aug. 2007, to appear.
- B. Vujičić, L. Chen, and Lj. Trajković, “Prediction of traffic in a public safety network,” in Proc. ISCAS
2006, Kos, Greece, May 2006, pp. 2637–2640.
- N. Cackov, J. Song, B. Vujičić, S. Vujičić, and Lj. Trajković, “Simulation of a public safety wireless
networks: a case study,” Simulation, vol. 81, no. 8, pp. 571–585, Aug. 2005.
- B. Vujičić, N. Cackov, S. Vujičić, and Lj. Trajković, “Modeling and characterization of traffic in public
safety wireless networks,” in Proc. SPECTS 2005, Philadelphia, PA, July 2005, pp. 214–223.
- J. Song and Lj. Trajković, “Modeling and performance analysis of public safety wireless networks,” in
- Proc. IEEE IPCCC, Phoenix, AZ, Apr. 2005, pp. 567–572.
- H. Chen and Lj. Trajković, “Trunked radio systems: traffic prediction based on user clusters,” in Proc.
IEEE ISWCS 2004, Mauritius, Sept. 2004, pp. 76–80.
- D. Sharp, N. Cackov, N. Lasković, Q. Shao, and Lj. Trajković, “Analysis of public safety traffic on
trunked land mobile radio systems,” IEEE J. Select. Areas Commun., vol. 22, no. 7, pp. 1197–1205,
- Sept. 2004.
- Q. Shao and Lj. Trajković, “Measurement and analysis of traffic in a hybrid satellite-terrestrial
network,” in Proc. SPECTS 2004, San Jose, CA, July 2004, pp. 329–336.
- N. Cackov, B. Vujičić, S. Vujičić, and Lj. Trajković, “Using network activity data to model the
utilization of a trunked radio system,” in Proc. SPECTS 2004, San Jose, CA, July 2004, pp. 517–524.
- J. Chen and Lj. Trajkovic, “Analysis of Internet topology data,” Proc. IEEE Int. Symp. Circuits and
Systems, Vancouver, British Columbia, Canada, May 2004, vol. IV, pp. 629-632.
July 19-20, 2007 IWCSN 2007, Guilin, China 135
References: self-similarity
- A. Feldmann, “Characteristics of TCP connection arrivals,” in Self-similar
Network Traffic and Performance Evaluation, K. Park and W. Willinger, Eds., New York: Wiley, 2000, pp. 367–399.
- T. Karagiannis, M. Faloutsos, and R. H. Riedi, “Long-range dependence: now
you see it, now you don't!,” in Proc. GLOBECOM '02, Taipei, Taiwan, Nov. 2002, pp. 2165–2169.
- W. Leland, M. Taqqu, W. Willinger, and D. Wilson, “On the self-similar
nature of ethernet traffic (extended version),” IEEE/ACM Transactions on Networking, vol. 2, no. 1, pp. 1–15, Feb. 1994.
- M. S. Taqqu and V. Teverovsky, “On estimating the intensity of long-range
dependence in finite and infinite variance time series,” in A Practical Guide to Heavy Tails: Statistical Techniques and Applications. Boston, MA: Birkhauser, 1998, pp. 177–217.
July 19-20, 2007 IWCSN 2007, Guilin, China 136
References: self-similarity
- P. Abry and D. Veitch, “Wavelet analysis of long-range dependence traffic,”
IEEE Transactions on Information Theory, vol. 44, no. 1, pp. 2–15, Jan. 1998.
- P. Abry, P. Flandrin, M. S. Taqqu, and D. Veitch, “Wavelets for the analysis,
estimation, and synthesis of scaling data,” in Self-similar Network Traffic and Performance Evaluation, K. Park and W. Willinger, Eds. New York: Wiley, 2000, pp. 39–88.
- P. Barford, A. Bestavros, A. Bradley, and M. Crovella, “Changes in Web client
access patterns: characteristics and caching implications in world wide web,” World Wide Web, Special Issue on Characterization and Performance Evaluation, vol. 2, pp. 15–28, 1999.
- Z. Bi, C. Faloutsos, and F. Korn, “The ‘DGX’ distribution for mining massive,
skewed data,” in Proc. of ACM SIGCOMM Internet Measurement Workshop, San Francisco, CA, Aug. 2001, pp. 17–26.
- M. E. Crovella and A. Bestavros, “Self-similarity in world wide web traffic:
evidence and possible causes,” IEEE/ACM Transactions on Networking, vol. 5, no. 6, pp. 835–846, Dec. 1997.
July 19-20, 2007 IWCSN 2007, Guilin, China 137
References: time series
- G. Box and G. Jenkins, Time Series Analysis: Forecasting and Control, 2nd
- edition. San Francisco, CA: Holden-Day, 1976, pp. 208–329.
- P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting,
2nd Edition. New York: Springer-Verlag, 2002.
- N. H. Chan, Time Series: Applications to Finance. New York: Wiley-
Interscience, 2002.
- K. Burnham and D. Anderson, Model Selection and Multimodel Inference,
2nd ed. New York, NY: Springer-Verlag, 2002.
- G. Schwarz, “Estimating the dimension of a model,” Annals of Statistics,
- vol. 6, no. 2, pp. 461–464, Mar. 1978.
July 19-20, 2007 IWCSN 2007, Guilin, China 138
References: clustering analysis
- P. Cheeseman and J. Stutz, “Bayesian classification (AutoClass): theory and
results,” in Advances in Knowledge Discovery and Data Mining, U. M. Fayyad,
- G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, Eds., AAAI Press/MIT
Press, 1996.
- J. W. Han and M. Kamber, Data Mining: Concepts And Techniques. San
Francisco: Morgan Kaufmann, 2001.
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. New York: Springer, 2001.
- L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to
Cluster Analysis. New York: John Wiley & Sons, 1990.
July 19-20, 2007 IWCSN 2007, Guilin, China 139
References: data mining
- J. Han and M. Kamber, Data Mining: concept and techniques. San Diego, CA:Academic
Press, 2001.
- W. Wu, H. Xiong, and S. Shekhar, Clustering and Information Retrieval. Norwell,MA:
Kluwer Academic Publishers, 2004.
- Z. Chen, Data Mining and Uncertainty Reasoning: and integrated approach. New York,
NY: John Wiley & Sons, 2001.
- T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Y. Wu, “An
efficient k-means clustering algorithm: analysis and implementation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 7, pp. 881–892, July. 2002.
- P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining. Reading,MA:
Addison-Wesley, 2006, pp. 487–568.
- L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: an introduction to cluster
- analysis. New York, NY: John Wiley & Sons, 1990.
- M. Last, A. Kandel, and H. Bunke, Eds., Data Mining in Time Series Databases.
Singapore: World Scientific Publishing Co. Pte. Ltd., 2004.
- W.-K. Ching and M. K.-P. Ng, Eds., Advances in Data Mining and Modeling. Singapore:
World Scientific Publishing Co. Pte. Ltd., 2003.
July 19-20, 2007 IWCSN 2007, Guilin, China 140
References: protocols
- D. E. Comer, Internetworking with TCP/IP, Vol 1: Principles, Protocols, and
Architecture, 4th ed. Upper Saddle River, NJ: Prentice-Hall, 2000.
- W. R. Stevens, TCP/IP Illustrated (vol. 1): The Protocols. Reading, MA: Addison-Wesley,
1994.
- J. Postel, Ed., “Transmission Control Protocol,” RFC 793, Sep. 1981.
- J. Postel, “TCP and IP bake off,” RFC 1025, Sep. 1987.
- J. Mogul and S. Deering, “Path MTU discovery,” RFC 1191, Nov. 1990.
- V. Jacobson, R. Braden, and D. Borman, “TCP extensions for high performance,” RFC
1323, May 1992.
- M. Allman, S. Floyd, and C. Partridge, “Increasing TCP’s initial window,” RFC 2414, Sep.
1998.
- M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow, “TCP selective acknowledgment
- ptions,” RFC 2018, Oct. 1996.
- M. Allman, D. Glover, and L. Sanchez, “Enhancing TCP over satellite channels using
standard mechanisms,” RFC 2488, Jan. 1999.
- M. Allman, S. Dawkins, D. Glover, J. Griner, D. Tran, T. Henderson, J. Heidemann, J.
Touch, H. Kruse, S. Ostermann, K. Scott, and J. Semke, “Ongoing TCP research related to satellites,” RFC 2760, Feb. 2000.
- J. Border, M. Kojo, J. Griner, G. Montenegro, and Z. Shelby, “Performance enhancing
proxies intended to mitigate link-related degradations,” RFC 3135, Jun. 2001.
- S. Floyd, “Inappropriate TCP resets considered harmful,” RFC 3360, Aug. 2002.
July 19-20, 2007 IWCSN 2007, Guilin, China 141
References: fingerprinting
- R. Beverly, “A Robust Classifier for Passive TCP/IP Fingerprinting,” in Proc. Passive and
Active Meas. Workshop 2004, Antibes Juan-les-Pins, France, Apr. 2004, pp. 158–167.
- C. Smith and P. Grundl, “Know your enemy: passive fingerprinting,” The
- Honeynet Project, Mar. 2002. [Online]. Available:
http://www.honeynet.org/papers/finger/
- Passive OS fingerprinting tool ver. 2 (p0f v2). [Online]. Available:
http://lcamtuf.coredump.cx/p0f.shtml/
- B. Petersen, “Intrusion detection FAQ: What is p0f and what does it do?” The
SysAdmin, Audit, Network, Security (SANS) Institute. [Online]. Available: http://www.sans.org/resources/idfaq/p0f.php
- T. Miller, “Passive OS fingerprinting: details and techniques,” The SysAdmin, Audit,
Network, Security (SANS) Institute. [Online]. Available: http://www.sans.org/reading room/special.php/
July 19-20, 2007 IWCSN 2007, Guilin, China 142
References: anomalies
- P. Barford and D. Plonka, “Characteristics of network traffic flow anomalies,” in Proc.
ACM SIGCOMM Internet Meas. Workshop 2001, Nov. 2001, pp. 69–73.
- P. Barford, J. Kline, D. Plonka, and A. Ron, “A signal analysis of network traffic
anomalies,” in Proc. ACM SIGCOMM Internet Meas. Workshop 2002, Marseille, France,
- Nov. 2002, pp. 71–82.
- Y. Zhang, Z. Ge, A. Greenberg, and M. Roughan, “Network anomography,” in Proc. ACM
SIGCOMM Internet Meas. Conf. 2005, Berkeley, CA, Oct. 2005, pp. 317–330.
- A. Soule, K. Salamatian, and N. Taft, “Combining filtering and statistical methods for
anomaly detection,” in Proc. ACM SIGCOMM Internet Meas. Conf. 2005, Berkeley, CA,
- Oct. 2005, pp. 331–344.
- P. Huang, A. Feldmann, and W. Willinger, “A non-instrusive, wavelet-based approach to
detecting network performance problems,” in Proc. ACM SIGCOMM Internet Meas. Workshop 2001, San Francisco, CA, Nov. 2001, pp. 213–227.
- A. Lakhina, M. Crovella, and C. Diot, “Characterization of network-wide anomalies in
traffic flows,” in Proc. ACM SIGCOMM Internet Meas. Conf. 2004, Taormina, Italy,
- Oct. 2004, pp. 201–206.
- A. Lakhina, M. Crovella, and C. Diot, “Diagnosing network-wide traffic anomalies,” ACM
SIGCOMM Comput. Commun. Rev., vol. 34, no. 4, pp. 219–230, Oct. 2004.
- M. Arlitt and C. Williamson, “An analysis of TCP reset behaviour on the Internet,” ACM
SIGCOMM Comput. Commun. Rev., vol. 35, no. 1, pp. 37–44, Jan. 2005.
July 19-20, 2007 IWCSN 2007, Guilin, China 143
References: spectral analysis
- M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the
Internet topology,” Proc. of ACM SIGCOMM ’99, Cambridge, MA, Aug. 1999,
- pp. 251–262.
- H. Chang, R. Govindan, S. Jamin, S. Shenker, and W. Willinger, “Towards
capturing representative AS-level Internet topologies,” Proc. of ACM SIGMETRICS 2002, New York, NY, June 2002, pp. 280–281.
- D. Vukadinovic, P. Huang, and T. Erlebach, “On the Spectrum and Structure of
Internet Topology Graphs,” in H. Unger et al., editors, Innovative Internet Computing Systems, LNCS2346, pp. 83–96. Springer, Berlin, Germany, 2002.
- M. Mihail, C. Gkantsidis, and E. Zegura, “Spectral analysis of Internet
topologies,” Proc. of Infocom 2003, San Francisco, CA, Mar. 2003, vol. 1,
- pp. 364-374.
- G. Huston, “Interconnection, peering and settlements-Part II,” Internet
Protocol Journal, June 1999: http://www.cisco.com/warp/public/759/ipj_2- 2/ipj_2-2_ps1.html.
- F. R. K. Chung, Spectral Graph Theory. Providence, Rhode Island: Conference
Board of the Mathematical Sciences, 1997, pp. 2–6.
- M. Fiedler, “Algebraic connectivity of graphs,” Czech. Math. J., vol. 23, no. 2,
- pp. 298–305, 1973.
July 19-20, 2007 IWCSN 2007, Guilin, China 144
References: traffic analysis
- Y. W. Chen, “Traffic behavior analysis and modeling sub-networks,”
International Journal of Network Management, John Wiley & Sons, vol. 12,
- pp. 323–330, 2002.
- Y. Fang and I. Chlamtac, “Teletraffic analysis and mobility modeling of PCS
networks,” IEEE Trans. on Communications, vol. 47, no. 7, pp. 1062–1072, July 1999.
- N. K. Groschwitz and G. C. Polyzos, “A time series model of long-term
NSFNET backbone traffic,” in Proc. IEEE International Conference on Communications (ICC'94), New Orleans, LA, May 1994, vol. 3, pp. 1400–1404.
- D. Papagiannaki, N. Taft, Z.-L. Zhang, and C. Diot, “Long-term forecasting of
Internet backbone traffic: observations and initial models,” in Proc. IEEE INFOCOM 2003, San Francisco, CA, April 2003, pp. 1178–1188.
- D. Tang and M. Baker, “Analysis of a metropolitan-area wireless network,”
Wireless Networks, vol. 8, no. 2/3, pp. 107–120, Mar.-May 2002.
July 19-20, 2007 IWCSN 2007, Guilin, China 145
References: traffic analysis
- R. B. D'Agostino and M. A. Stephens, Eds., Goodness-of-Fit Techniques. New
York: Marcel Dekker, 1986. pp. 63–93, pp. 97–145, pp. 421–457.
- F. Barceló and J. I. Sάnchez, “Probability distribution of the inter-arrival
time to cellular telephony channels,” in Proc. of the 49th Vehicular Technology Conference, May 1999, vol. 1, pp. 762–766.
- F. Barcelo and J. Jordan, “Channel holding time distribution in public
telephony systems (PAMR and PCS),” IEEE Trans. Vehicular Technology, vol. 49, no. 5, pp. 1615–1625, Sept. 2000.