What TCP/IP Protocol Headers What TCP/IP Protocol Headers What - - PowerPoint PPT Presentation

what tcp ip protocol headers what tcp ip protocol headers
SMART_READER_LITE
LIVE PREVIEW

What TCP/IP Protocol Headers What TCP/IP Protocol Headers What - - PowerPoint PPT Presentation

University of North Carolina at University of North Carolina at University of North Carolina at Motivation Motivation Chapel Hill Chapel Hill Chapel Hill Traffic Modeling and Characterization Traffic Modeling and Characterization Can


slide-1
SLIDE 1

1 1

What TCP/IP Protocol Headers Can Tell Us About the Web What TCP/IP Protocol Headers What TCP/IP Protocol Headers Can Tell Us About the Web Can Tell Us About the Web

http://www.cs.unc.edu/Research/dirt http://www.cs.unc.edu/Research/dirt

University of North Carolina at Chapel Hill University of North Carolina at University of North Carolina at Chapel Hill Chapel Hill

SIGMETRICS, June 2001 SIGMETRICS, June 2001

Félix Hernández Campos

  • F. Donelson Smith

Kevin Jeffay David Ott

2 2

Motivation Motivation

Traffic Modeling and Characterization Traffic Modeling and Characterization

  • Can we continuously acquire network traffic data

using off-the-shelf hardware and software?

  • Can we continuously acquire network traffic data

Can we continuously acquire network traffic data using off-the-shelf hardware and software? using off-the-shelf hardware and software?

  • Can we use this information to construct up-to-date,

application-level traffic models?

– Populate traffic generator with analytic distributions for simulations and lab experiments

  • Can we study the traffic generated by a large

population of users while protecting their privacy?

  • Case study: Web Traffic
  • Can we use this information to construct up-to-date,

Can we use this information to construct up-to-date, application-level traffic models? application-level traffic models?

– – Populate traffic generator with analytic distributions for Populate traffic generator with analytic distributions for simulations and lab experiments simulations and lab experiments

  • Can we study the traffic generated by a

Can we study the traffic generated by a large large population population of users while protecting their privacy?

  • f users while protecting their privacy?
  • Case study:

Case study: Web Traffic Web Traffic

3 3

Internet Traffic Characterization Internet Traffic Characterization

Previous Work Previous Work

  • Traffic modeling before the WWW explosion

– Danzig et al. (91, 92) – Paxson (94)

  • Traffic modeling before the WWW explosion

Traffic modeling before the WWW explosion

– – Danzig Danzig et al. (91, 92) et al. (91, 92) – – Paxson Paxson (94) (94)

  • Browsing-based web traffic models

– Mah (95) – Crovella et al. (95, 98)

  • Models of TCP connections in the web

– Cleveland et al. (00)

  • Other large-scale trace analyses related to the web

– Gribble & Brewer (97), Balakrishnan et al. (98), Wolman et al. (99), and Feldmann (00)

  • Browsing-based web traffic models

Browsing-based web traffic models

– – Mah Mah (95) (95) – – Crovella Crovella et al. (95, 98) et al. (95, 98)

  • Models of TCP connections in the web

Models of TCP connections in the web

– – Cleveland et al. (00) Cleveland et al. (00)

  • Other large-scale trace analyses related to the web

Other large-scale trace analyses related to the web

– – Gribble Gribble & Brewer (97), & Brewer (97), Balakrishnan Balakrishnan et al. (98), et al. (98), Wolman Wolman et al. (99), and et al. (99), and Feldmann Feldmann (00) (00)

4 4

Methodology Methodology

Trace Acquisition Trace Acquisition

University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill

Internet Internet

Traffic Monitor Traffic Monitor ( (tcpdump tcpdump) ) 35,000 people 35,000 people

  • Study Internet traffic generated by a large and

diverse population

  • Study Internet traffic generated by a large and

Study Internet traffic generated by a large and diverse population diverse population

Gigabit Ethernet Gigabit Ethernet

slide-2
SLIDE 2

5 5

Methodology Methodology

Benefits of TCP/IP Header Tracing Benefits of TCP/IP Header Tracing

  • Light-weight

– Off-the-shelf hardware – Freely available software

  • Light-weight

Light-weight

– – Off-the-shelf hardware Off-the-shelf hardware – – Freely available software Freely available software

  • Privacy

– Easy to address by anonymizing IP address offline

  • Efficient

– Reduces storage requirements

» E.g. 161 GB for headers instead of 803 GB for entire packets

– Reduces processing requirements during tracing

» Header extraction and recording only

  • Large-scale

– E.g. 7 days x 12 hr, 1 Gbps link (20% avg. util.), 35K users

  • Privacy

Privacy

– – Easy to address by Easy to address by anonymizing anonymizing IP address offline IP address offline

  • Efficient

Efficient

– – Reduces storage requirements Reduces storage requirements

» » E.g. 161 GB for headers instead of 803 GB for entire packets E.g. 161 GB for headers instead of 803 GB for entire packets

– – Reduces processing requirements during tracing Reduces processing requirements during tracing

» » Header extraction and recording only Header extraction and recording only

  • Large-scale

Large-scale

– – E.g. E.g. 7 days x 12 hr, 1

7 days x 12 hr, 1 Gbps Gbps link (20% link (20% avg avg. . util util.), 35K users .), 35K users

6 6

Trace Collection Trace Collection

Summary Summary

  • Three sets of traces from UNC

– October 99, October 00, April 01 – 1 hour-long tracing periods (1-6 GB per trace) – 42 traces in each set

  • Three sets of traces from

Three sets of traces from UNC UNC

– – October 99, October 00, April 01 October 99, October 00, April 01 – – 1 hour-long tracing periods (1-6 GB per trace) 1 hour-long tracing periods (1-6 GB per trace) – – 42 traces in each set 42 traces in each set

  • Two sets of traces from NLANR (for comparison)

– October 99, October 00 – 2 sites

» San Diego Supercomputing Center » Univ. of Michigan/Merit

– 90 second tracing periods (3-67 MB per trace) – 58 traces in each set

  • Two sets of traces from

Two sets of traces from NLANR NLANR (for comparison) (for comparison)

– – October 99, October 00 October 99, October 00 – – 2 sites 2 sites

» » San Diego Supercomputing Center San Diego Supercomputing Center » » Univ

  • Univ. of Michigan/Merit

. of Michigan/Merit

– – 90 second tracing periods (3-67 MB per trace) 90 second tracing periods (3-67 MB per trace) – – 58 traces in each set 58 traces in each set

7 7

Trace Collection Trace Collection

Summary Summary 99 99 00 00 01 01 Total Total 525 M 525 M 1873 M 1873 M 2419 M 2419 M TCP TCP 85% 85% 91% 91% 91% 91% HTTP HTTP 38% 38% 29% 29% 28% 28% Total Total 212 GB 212 GB 721 GB 721 GB 905 GB 905 GB TCP TCP 86% 86% 90% 90% 91% 91% HTTP HTTP 56% 56% 35% 35% 36% 36% 36 GB 36 GB 127 GB 127 GB 164 GB 164 GB 0 % 0 % 0.02 % 0.02 % 0.003 % 0.003 % Packets Packets Bytes Bytes Total Traces Size Total Traces Size

  • Avg. % of Packets
  • Avg. % of Packets

Lost by Monitor Lost by Monitor

8 8

Case Study: Web Traffic Case Study: Web Traffic

Packet Capturing Packet Capturing

University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill

Internet Internet

  • We study a large collection of users as web content

consumers

  • We study a large collection of users as web content

We study a large collection of users as web content consumers consumers

Web Servers Web Servers Web Clients Web Clients

  • We only capture TCP/IP headers

– No HTTP headers

  • We only capture TCP/IP headers

We only capture TCP/IP headers

– – No HTTP headers No HTTP headers

HTTP Request HTTP Request HTTP Response HTTP Response

slide-3
SLIDE 3

9 9

Methodology Methodology

Do We Really Need HTTP Headers? Do We Really Need HTTP Headers?

  • We can infer plenty of HTTP information from

TCP/IP headers

– Request size – Response size – Embedded objects per web page – Servers per page – Use of persistent connections – ...

  • TCP/IP headers are sufficient for

– Constructing application-level traffic models – Studying the impact of new HTTP dynamics

  • We can infer plenty of HTTP information from

We can infer plenty of HTTP information from TCP/IP headers TCP/IP headers

– – Request size Request size – – Response size Response size – – Embedded objects per web page Embedded objects per web page – – Servers per page Servers per page – – Use of persistent connections Use of persistent connections – – ... ...

  • TCP/IP headers are sufficient for

TCP/IP headers are sufficient for

– – Constructing application-level traffic models Constructing application-level traffic models – – Studying the impact of new HTTP dynamics Studying the impact of new HTTP dynamics

10 10

Web Traffic Analysis Web Traffic Analysis

Processing Sequence Overview Processing Sequence Overview

Raw TCP/IP Raw TCP/IP headers headers trace trace tcpdump tcpdump TCP TCP Connections Connections

(Port 80) (Port 80)

Filter & Sort Filter & Sort HTTP HTTP Req Req/ /Rsp Rsp Exchanges Exchanges Connection Connection Analysis Analysis HTTP HTTP Distributions Distributions Statistics Statistics

11 11

Methodology Methodology

Request/Response Traces Request/Response Traces Web Client Web Client Web Server Web Server HTTP HTTP Request Request

x bytes x bytes

HTTP HTTP Response Response

y bytes y bytes

12 12

TCP/IP Headers and HTTP TCP/IP Headers and HTTP

Request/response Exchange Request/response Exchange

Web Client (UNC) Web Client (UNC) Web Server (Internet) Web Server (Internet)

HTTP HTTP Response Response HTTP HTTP Request Request

D A T A D A T A A C K A C K D A T A D A T A D A T A D A T A A C K A C K s e q n

  • s

e q n

  • 3

5 3 5 a c k n

  • a

c k n

  • 1

1 s e q n

  • s

e q n

  • 1

1 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 1

4 6 1 1 4 6 1 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 2

8 7 6 2 8 7 6 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 3

5 3 5 a c k n

  • a

c k n

  • 2

8 7 6 2 8 7 6 F I N F I N F I N

  • A

C K F I N

  • A

C K F I N F I N F I N

  • A

C K F I N

  • A

C K S Y N S Y N S Y N

  • A

C K S Y N

  • A

C K A C K A C K 304 bytes 304 bytes 2875 bytes 2875 bytes

slide-4
SLIDE 4

13 13

Packet Capturing Packet Capturing

Inbound TCP/IP Headers Only Inbound TCP/IP Headers Only Internet Internet

University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill

Gigabit Ethernet Gigabit Ethernet

Web Servers Web Servers Web Clients Web Clients

14 14

Packet Capturing Packet Capturing

Inbound TCP/IP Headers Only Inbound TCP/IP Headers Only

  • Two fiber links
  • Two fiber links

Two fiber links Internet Internet

University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill

Outbound Fiber Outbound Fiber Inbound Fiber Inbound Fiber

Web Servers Web Servers Web Clients Web Clients

15 15

Packet Capturing Packet Capturing

Inbound TCP/IP Headers Only Inbound TCP/IP Headers Only

  • Only inbound TCP/IP headers are captured

– Eliminate synchronization and buffering issues on the NIC – Reduce trace size

  • Only inbound TCP/IP headers are captured

Only inbound TCP/IP headers are captured

– – Eliminate synchronization and buffering issues on the NIC Eliminate synchronization and buffering issues on the NIC – – Reduce trace size Reduce trace size

Internet Internet

University of University of North Carolina North Carolina at Chapel Hill at Chapel Hill

Outbound Fiber Outbound Fiber

Traffic Monitor Traffic Monitor ( (tcpdump tcpdump) )

Inbound Fiber Inbound Fiber

Web Servers Web Servers Web Clients Web Clients

16 16

TCP/IP Headers and HTTP TCP/IP Headers and HTTP

Request/response Exchange Request/response Exchange

Web Client (UNC) Web Client (UNC) Web Server (Internet) Web Server (Internet)

HTTP HTTP Response Response

2875 bytes 2875 bytes

HTTP HTTP Request Request

304 bytes 304 bytes D A T A D A T A A C K A C K D A T A D A T A D A T A D A T A A C K A C K s e q n

  • s

e q n

  • 3

5 3 5 a c k n

  • a

c k n

  • 1

1 s e q n

  • s

e q n

  • 1

1 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 1

4 6 1 1 4 6 1 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 2

8 7 6 2 8 7 6 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 3

5 3 5 a c k n

  • a

c k n

  • 2

8 7 6 2 8 7 6 F I N F I N F I N

  • A

C K F I N

  • A

C K F I N F I N F I N

  • A

C K F I N

  • A

C K S Y N S Y N S Y N

  • A

C K S Y N

  • A

C K A C K A C K

slide-5
SLIDE 5

17 17

TCP/IP Headers and HTTP TCP/IP Headers and HTTP

Server-to-client Segments Only Server-to-client Segments Only

Web Client (UNC) Web Client (UNC) Web Server (Internet) Web Server (Internet)

HTTP HTTP Response Response

2875 bytes 2875 bytes

HTTP HTTP Request Request

304 bytes 304 bytes A C K A C K D A T A D A T A D A T A D A T A s e q n

  • s

e q n

  • 1

1 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 1

4 6 1 1 4 6 1 a c k n

  • a

c k n

  • 3

5 3 5 s e q n

  • s

e q n

  • 2

8 7 6 2 8 7 6 a c k n

  • a

c k n

  • 3

5 3 5 F I N F I N F I N

  • A

C K F I N

  • A

C K S Y N

  • A

C K S Y N

  • A

C K

18 18

Methodology Methodology

Request/Response Traces Request/Response Traces Web Client Web Client Web Server Web Server

Computed Computed Directly Observed Directly Observed

HTTP Request HTTP Request

304 bytes 304 bytes

HTTP Response HTTP Response

2875 bytes 2875 bytes

  • Unidirectional TCP/IP header traces are sufficient for

capturing application-level behavior

  • Unidirectional TCP/IP header traces are sufficient for

Unidirectional TCP/IP header traces are sufficient for capturing application-level behavior capturing application-level behavior

19 19

HTTP Characterization HTTP Characterization

Response Data Sizes – Body CDF Response Data Sizes – Body CDF

Cumulative Probability Cumulative Probability (% Responses) (% Responses) Response Size (in bytes) Response Size (in bytes)

0.85 0.85

2001 2001 1999 1999

20 20

HTTP Characterization HTTP Characterization

Response Data Volumes – Body CDF Response Data Volumes – Body CDF

Response Size (in bytes) Response Size (in bytes) Cumulative Probability Cumulative Probability (% Bytes) (% Bytes)

0.25 0.25 10000 10000 1e+05 1e+05

slide-6
SLIDE 6

21 21

HTTP Characterization HTTP Characterization

Response Data Sizes – Tail CCDF Response Data Sizes – Tail CCDF

Complementary Cumulative Probability Complementary Cumulative Probability Response Size (in bytes) Response Size (in bytes)

90 second 90 second traces traces 1-hour long 1-hour long traces traces

22 22

HTTP Characterization HTTP Characterization

Request Data Size – Body CDF Request Data Size – Body CDF

Cumulative Probability Cumulative Probability Request Size (in bytes) Request Size (in bytes)

1999 1999 2000 2000 2001 2001

23 23

HTTP Characterization HTTP Characterization

Request Data Size – Tail CCDF Request Data Size – Tail CCDF

Complementary Cumulative Probability Complementary Cumulative Probability Request Size (in bytes) Request Size (in bytes)

24 24

Persistent Connections in HTTP Persistent Connections in HTTP

Effective Persistence Effective Persistence

  • An HTTP persistent connection can use a single TCP

connection to carry one or more request/response exchanges

  • An HTTP

An HTTP persistent connection persistent connection can use a single TCP can use a single TCP connection to carry one or more request/response connection to carry one or more request/response exchanges exchanges

  • This feature is supported in newer versions of the

protocol

– HTTP/1.0 (limited support) – HTTP/1.1

  • We study how persistent connections are used

– We define effective persistence as two or more request/response exchanges in the same TCP connection

  • This feature is supported in newer versions of the

This feature is supported in newer versions of the protocol protocol

– – HTTP/1.0 (limited support) HTTP/1.0 (limited support) – – HTTP/1.1 HTTP/1.1

  • We study how persistent connections are used

We study how persistent connections are used

– – We define We define effective persistence effective persistence as as two or more two or more request/response exchanges in the same TCP connection request/response exchanges in the same TCP connection

slide-7
SLIDE 7

25 25

Persistent Connections in HTTP Persistent Connections in HTTP

Example – TCP/IP Headers Example – TCP/IP Headers

Web Client (UNC) Web Client (UNC) Web Server (Internet) Web Server (Internet)

S Y N

  • A

C K S Y N

  • A

C K A C K A C K s e q n

  • s

e q n

  • 1

1 a c k n

  • a

c k n

  • 3

5 3 5 D A T A D A T A s e q n

  • s

e q n

  • 1

4 6 1 1 4 6 1 a c k n

  • a

c k n

  • 3

5 3 5 D A T A D A T A s e q n

  • s

e q n

  • 2

8 7 6 2 8 7 6 a c k n

  • a

c k n

  • 3

5 3 5 F I N F I N F I N

  • A

C K F I N

  • A

C K A C K A C K s e q n

  • s

e q n

  • 2

8 7 6 2 8 7 6 a c k n

  • a

c k n

  • 5

6 7 5 6 7 D A T A D A T A s e q n

  • s

e q n

  • 4

3 3 6 4 3 3 6 a c k n

  • a

c k n

  • 5

6 7 5 6 7 D A T A D A T A s e q n

  • s

e q n

  • 5

7 9 6 5 7 9 6 a c k n

  • a

c k n

  • 5

6 7 5 6 7 D A T A D A T A s e q n

  • s

e q n

  • 6

3 4 1 6 3 4 1 a c k n

  • a

c k n

  • 5

6 7 5 6 7 s e q n

  • s

e q n

  • 1

1 a c k n

  • a

c k n

  • 1

1 Ackno Ackno increased increased

Request 1 Request 1

304 bytes 304 bytes

Response 1 Response 1

2875 bytes 2875 bytes

Response 2 Response 2

3465 bytes 3465 bytes

Request 2 Request 2

262 bytes 262 bytes Ackno Ackno increased increased Seqno Seqno increased increased

26 26

Persistent Connections in HTTP Persistent Connections in HTTP

Example – Request/Response Exchanges Example – Request/Response Exchanges

Computed Computed Directly Observed Directly Observed

HTTP Request 1 HTTP Request 1

304 bytes

304 bytes

HTTP Response 1 HTTP Response 1

2875 bytes

2875 bytes Computed Computed Directly Observed Directly Observed

HTTP Request 2 HTTP Request 2

262 bytes

262 bytes

HTTP Response 2 HTTP Response 2

3465 bytes

3465 bytes

Web Client (UNC) Web Client (UNC) Web Server (Internet) Web Server (Internet)

27 27

Effective Persistent Connections Effective Persistent Connections

Summary Statistics Summary Statistics Objects Objects Persistent Persistent 49.7% 49.7% 42.8% 42.8% Non-Persistent Non-Persistent 50.3% 50.3% 57.2% 57.2% Connections Connections Persistent Persistent 15.1% 15.1% 13.8% 13.8% Non-Persistent Non-Persistent 78.1% 78.1% 63.4% 63.4% Unclassified Unclassified 6.8% 6.8% 22.8% 22.8% UNC 00 UNC 00 NLANR 00 NLANR 00 Bytes Bytes Persistent Persistent 40.4% 40.4% 35.7% 35.7% Non-Persistent Non-Persistent 49.6% 49.6% 54.3% 54.3%

28 28

HTTP Characterization HTTP Characterization

Objects in Persistent Connections Objects in Persistent Connections

Cumulative Probability Cumulative Probability

  • No. of Request/Response Exchanges
  • No. of Request/Response Exchanges

0.87 0.87

slide-8
SLIDE 8

29 29

HTTP Characterization HTTP Characterization

Other Statistics Other Statistics

  • Page-based statistics (based on Mah and Crovella et al.)

– Think times – Top-level vs. embedded objects

» Requests and Responses

– Unique TCP connections per page – Unique server IP addresses per page – Consecutive pages per server – Number of pages per client – Primary vs. secondary servers

» Requests and Responses

  • Other non-page-based statistics

– Number of exchanges per client

  • Page-based statistics

Page-based statistics (based on

(based on Mah Mah and and Crovella Crovella et al.) et al.)

– – Think times Think times – – Top-level vs. embedded objects Top-level vs. embedded objects

» » Requests and Responses Requests and Responses

– – Unique TCP connections per page Unique TCP connections per page – – Unique server IP addresses per page Unique server IP addresses per page – – Consecutive pages per server Consecutive pages per server – – Number of pages per client Number of pages per client – – Primary Primary vs

  • vs. secondary servers

. secondary servers

» » Requests and Responses Requests and Responses

  • Other non-page-based statistics

Other non-page-based statistics

– – Number of exchanges per client Number of exchanges per client

30 30

Limitations Limitations

TCP/IP Header Tracing TCP/IP Header Tracing

  • Uncertainties arise when application-level

information is inferred from transport-level headers

  • Uncertainties

Uncertainties arise when application-level arise when application-level information is inferred from transport-level headers information is inferred from transport-level headers

  • We discuss several issues in our paper

– Pipelining – User/browser interactions

» Stop and reload

– Caches

» Local cache and proxies

– TCP segment processing

» Segment reordering

  • In summary, limited or no impact in our results
  • We discuss several issues in our paper

We discuss several issues in our paper

– – Pipelining Pipelining – – User/browser interactions User/browser interactions

» » Stop and reload Stop and reload

– – Caches Caches

» » Local cache and proxies Local cache and proxies

– – TCP segment processing TCP segment processing

» » Segment reordering Segment reordering

  • In summary, limited or no impact in our results

In summary, limited or no impact in our results

31 31

Summary and Conclusions Summary and Conclusions

Methodology Methodology

  • Unidirectional TCP/IP header tracing is a powerful

and light-weight traffic measurement methodology

  • Unidirectional TCP/IP header tracing is a powerful

Unidirectional TCP/IP header tracing is a powerful and light-weight traffic measurement methodology and light-weight traffic measurement methodology

  • Limitations have a minor impact in application-level

results

  • We also applied this methodology to

– SMTP – FTP – Other application-level protocol

  • Limitations have a minor impact in application-level

Limitations have a minor impact in application-level results results

  • We also applied this methodology to

We also applied this methodology to

– – SMTP SMTP – – FTP FTP – – Other application-level protocol Other application-level protocol

32 32

Summary and Conclusions Summary and Conclusions

Web Traffic Characterization Web Traffic Characterization

  • New data to populate traffic generators

– Request sizes – Response sizes – Use of persistent connections – ...

  • New data to populate traffic generators

New data to populate traffic generators

– – Request sizes Request sizes – – Response sizes Response sizes – – Use of persistent connections Use of persistent connections – – ... ...

  • 1-hour long traces are sufficient to capture

application-level behavior

– Short traces cut off large objects, which skews the tails of the distributions

  • Persistent Connections:

– ~15% of all the HTTP connections – 40-50% of all the transferred HTTP bytes

  • 1-hour long traces are sufficient to capture

1-hour long traces are sufficient to capture application-level behavior application-level behavior

– – Short traces cut off large objects, which skews the tails of Short traces cut off large objects, which skews the tails of the distributions the distributions

  • Persistent Connections:

Persistent Connections:

– – ~15% of all the HTTP connections ~15% of all the HTTP connections – – 40-50% of all the transferred HTTP bytes 40-50% of all the transferred HTTP bytes