Network traffic characterization A historical perspective 1 - - PowerPoint PPT Presentation

network traffic characterization
SMART_READER_LITE
LIVE PREVIEW

Network traffic characterization A historical perspective 1 - - PowerPoint PPT Presentation

Network traffic characterization A historical perspective 1 Incoming AT&T traffic by port (18 hours of traffic to AT&T dial clients on July 22, 1997) N a m e port % bytes % packets bytes per packet w o r l d - w i d e


slide-1
SLIDE 1

1

Network traffic characterization

A historical perspective

slide-2
SLIDE 2

2

N a m e port % bytes % packets bytes per packet w

  • r

l d

  • w

i d e

  • w

e b 8 5 6 . 7 5 4 4 . 7 9 8 1 9 n e t n e w s 1 1 9 2 4 . 6 5 1 2 . 9 1 2 3 5 p

  • p
  • 3

m a i l 1 1 1 . 8 8 3 . 1 7 3 8 4 c u s e e m e 7 6 4 8 . 9 5 1 . 8 5 3 3 3 s e c u r e w e b 4 4 3 . 7 4 . 7 9 6 3 i n t e r n e t c h a t 6 6 6 7 . 2 7 . 7 4 2 3 9 f i l e t r a n s f e r 2 . 6 5 . 6 4 6 5 9 d

  • m

a i n n a m e 5 3 . 1 9 . 5 8 2 1 . . .

Incoming AT&T traffic by port

(18 hours of traffic to AT&T dial clients on July 22, 1997)

World Wide Web traffic dominates traffic mix

slide-3
SLIDE 3

3

MWN traffic by port

(24 hours of traffic to/from MWN clients in 2006) 0.00% 0.00% 1.66% 1042 1.71% 1.05% 1.85% Mail 25 1.71% 1.75% 2.12% SSH 22 1.29% 2.08% 2.34% Web 443 0.00% 0.00% 1.06% 1433 0.00% 0.01% 3.53% 445

72.59% 68.13% 70.82%

Web 80

20.95% 4.08% 16.32% > 1024

79.05% 73.73% 83.68% < 1024 0.00% 0.00% 1.04% 135 % Payload % Success % Conns Port

slide-4
SLIDE 4

4

Grouping IP Packets Into Flows

Group packets with the “same” address

Application-level: single transfer web server to client Host-level: multiple transfers from server to client Subnet-level: multiple transfers to a group of clients

Group packets that are “close” in time

60-second spacing between consecutive packets

flow 1 flow 2 flow 3 flow 4

slide-5
SLIDE 5

5

Name port %bytes %pkts %flows pkts per flow bytes per packet duration (seconds)

world-wide-web 80 56.75 44.79 74.58 12 819 11.2 netnews 119 24.65 12.90 1.20 210 1235 132.6 pop-3 mail 110 1.88 3.17 2.80 22 384 10.3 cuseeme 7648 0.95 1.85 0.03 1375 333 192.0 secure web 443 0.74 0.79 0.99 16 603 14.2 internet chat 6667 0.27 0.74 0.16 89 239 384.6 file transfer 20 0.65 0.64 0.26 47 659 30.1 domain name 53 0.19 0.58 10.69 1 210 0.5 . . .

Incoming WorldNet traffic by port

(18 hours of traffic to WorldNet dial clients on July 22, 1997)

Incoming application flows with a 60-second timeout Diverse flow characteristics across different protocols

slide-6
SLIDE 6

6

Short-vs. long-lived Web flows

Many very short flows (30% are less than 300 bytes) Many medium-sized flows (short web transfers) Most bytes belong to long flows (large images, files)

Flow densities are signatures

slide-7
SLIDE 7

7

Traffic measurements: Pre-1990

Early Telephony: Importance of measurements

(e.g., Erlang, Palm, Wilkinson, ...)

Modern Telephony: Measurements are a scarce

commodity; supposedly „well-understood“ characteristics

Early data networking: Importance of

measurements (e.g., ARPANET measurements by Kleinrock et al.)

Modern data networking: No data or only a few

small data sets are available

slide-8
SLIDE 8

8

Traffic measurements: Pre-1990

Traffic data analysis

Strictly traditional inference techniques Focus on choosing best-fitting model Obsession with „Squeezing a data set dry“

Traffic and performance modeling

Black-box or operational models dominate No real need to talk to subject-matter experts Traffic is viewed as „just another time series...“ Main objective: „What can be analyzed?“

slide-9
SLIDE 9

9

Post-1990: What has changed?

Traffic measurements

Abundance of traffic measurements; reproducibility

Traffic data analysis

Data exhibits unusual features From statistical inference to scientific inference Networks are complex; need for subject-matter expertise

Traffic and performance modeling

Need for physical-based or structural models Main objective: „What matters for performance?“

slide-10
SLIDE 10

10

Traffic measurement challenges

Telephone networks are static entities

Have hardly changed for years and decades

(exception cellular phone systems...)

Have evolved in a predictable manner

Modern data networks are highly dynamic entities

User population, services and applications Traffic mix, protocols, ... Data networks that don‘t change are suspicious Internet as an example of extreme heterogeneity

slide-11
SLIDE 11

11

Traffic measurement challenges

Measuring high-speed network traffic

High-quality: Special-purpose traffic recorders High-volume: Terabyte storage devices Diversity: many large datasets from

  • Different networks
  • Different times
  • Different points in the network

Sensitivity: Who can record and collect what data?

High-speed network traffic is complex

Unusual behavior, constant surprises, ... What are interesting/relevant measurements?

slide-12
SLIDE 12

12

Sample data trace

slide-13
SLIDE 13

13

Netdynamics – „Killer application“

WWW and the Internet

1993: ... Hardly any WWW traffic on the Internet 1994: ... About 10% of total Internet traffic is WWW 95/96: ... Up to 60-70% of overall Internet traffic is

WWW

06/07: … Up to 60-70% of overall Internet traffic P2P

New applications and services

Games? IPTV?

New network protocols

slide-14
SLIDE 14

14

Network dynamics: User population

Number of Internet hosts

Early 1989:

80,000

Early 1992:

727,000

  • Oct. 1993:

2,056,000

Late 1996:

10,000,000

Now:

100xxxxxxxxxx

Internet traffic volume (Merit; Inc.)

March 1991: 1.3* 1012 bytes/month March 1994: 1.1* 1013 bytes/month

slide-15
SLIDE 15

15

High-volume measurements

1 hour of ETHERNET LAN traffic (10 Mbits)

About 1 million packets

1 day of uninterrupted ETHERNET LAN

About 2 Gigabytes of data

1 hour of ATM traffic (155 Mbits)

About 100 million packets

1 day of uninterrupted ATM measurements

About 1 Terabyte of data

1 day of uninterrupted 1 Gigabit measurements

About 10 Terabyte of data

slide-16
SLIDE 16

16

High-quality measurements

Timestamp accuracy

From millisecond to microsecond accuracy

More than just another time series

Information about all layers in network hierarchy

  • TCP/IP header information
  • Payload
  • Higher level protocol information

Active measurements

Actively injecting traffic into the network

Passive measurements

Passively monitoring network information

slide-17
SLIDE 17

17

Plane old telephony (POTS)

Billing data

Signaling for each phone call Billing on a call by call basis Source, destination, start time, duration

Studies

Call arrival process Call holding time distributions Spatial calling patterns

Application

Network planning, Dimensioning, etc.

slide-18
SLIDE 18

18

CCS/SS7 measurements

Common Channel Signaling (CCS) Network

Slow but mature packet network: 56 Kbps Running Signaling System 7 (SS7) protocol Measurements a the level of individual SS7 messages Variable length messages Days/weeks worth of data Hundreds of millions of messages

Study of SS7 traffic at message-level Study of telephone traffic (POTS)

Call arrival process Call holding time distributions Spatial calling patterns

slide-19
SLIDE 19

19

Data sources in IP networks

Configuration data

Network Service Customer registration

Usage data

Network data for each

  • Packet, flow, dial session
  • Routers MIB: utilization, loss statistics
  • Routing tables
  • Active probes

Servers

Customer care Email, Web hosting, E-commerce

slide-20
SLIDE 20

20

Measurement design considerations

Network operation has priority

Unless crucial for billing

Network measurement as an afterthought

Design of new protocols Design of network hardware Design of networks

Security

Who Where How Impact on network

slide-21
SLIDE 21

21

slide-22
SLIDE 22

22

slide-23
SLIDE 23

23

slide-24
SLIDE 24

24

slide-25
SLIDE 25

25

slide-26
SLIDE 26

26

slide-27
SLIDE 27

27

slide-28
SLIDE 28

28

slide-29
SLIDE 29

29

slide-30
SLIDE 30

30

slide-31
SLIDE 31

31

Time Series

Example

# of packets (bytes) per 10 mseconds # of TCP connections arriving per second # of modem sessions arriving per second

Definitions

Time series: X1, X2, …, Xn Aggregated process: X(m) Stationary time series:

distribution of X independent of time

1 k ), X ... X ( m 1 ) k ( X

km 1 m ) 1 k ( ) m (

≥ + + =

+ −