An Empirical Model of HTTP Network Traffic Bruce A. Mah - - PowerPoint PPT Presentation

an empirical model of http network traffic
SMART_READER_LITE
LIVE PREVIEW

An Empirical Model of HTTP Network Traffic Bruce A. Mah - - PowerPoint PPT Presentation

An Empirical Model of HTTP Network Traffic Bruce A. Mah bmah@ca.sandia.gov University of California at Berkeley and Sandia National Laboratories T Y O I F S R C E A V A L I I F N O U R L L E E I G H T


slide-1
SLIDE 1

An Empirical Model of HTTP Network Traffic Last Change: April 4, 1997 Page 1 of 15

An Empirical Model of HTTP Network Traffic

Bruce A. Mah bmah@ca.sandia.gov University of California at Berkeley and Sandia National Laboratories IEEE INFOCOM ’97 10 April 1997

A
  • T

H E

  • U

N I V E R S I T Y

  • O

F

  • C

A L I F O R N I A

  • 1

8 6 8

  • L
E T T H E R E B E L I G H T
slide-2
SLIDE 2

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 2 of 15

Motivation

HTTP dominates wide-area Internet traffi c

Leading contributer to byte- and packet-count across NSFNET backbone as early as April 1995

A synthetic workload of this application is needed

Network simulators Benchmarks

slide-3
SLIDE 3

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 3 of 15

Synopsis

We have constructed a synthetic workload of HTTP network activity based on traffi c traces. This model is consistent with prior Web measurement studies.

slide-4
SLIDE 4

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 4 of 15

Overview

Prior Work Model Components Methodology Measurements

slide-5
SLIDE 5

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 5 of 15

Prior Work

Measurement Studies

Server logs Only measure activity at one server Client logs Require instrumented clients

Static Document surveys

Document indices Don’t capture user reference patterns

Synthetic Workloads

tcplib Predates the World Wide Web

slide-6
SLIDE 6

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 6 of 15

Model Components

Each component defi ned by a pr obability distribution

Server Client Document Length User Think Time Request Sizes Consecutive Documents Per Server Server Selection Reply Sizes

Primary Secondary

slide-7
SLIDE 7

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 7 of 15

Methodology

Packet Traces

tcpdump on an Ethernet Easily trace many clients Capture protocol overheads Lose higher-level information (documents, cache behavior)

Filtering

Remove non-local clients Remove periodic retrievals

Apply Heuristics

Attempt to recover some higher-level structures

Construct Probability Distributions

slide-8
SLIDE 8

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 8 of 15

Request Sizes

Request sizes have a bimodal distribution

0.2 0.4 0.6 0.8 1 500 1000 1500 2000 CDF Request Length in Bytes Primary Secondary

slide-9
SLIDE 9

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 9 of 15

Reply Sizes

HTTP reply sizes have a heavy-tailed distribution Primary replies tend to be longer than secondary replies

0.2 0.4 0.6 0.8 1 5000 10000 15000 20000 25000 30000 CDF Reply Length in Bytes Primary Secondary Primary Mean 17932 Median 2099 Secondary Mean 6868 Median 1985

slide-10
SLIDE 10

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 10 of 15

Document Length

A timing heuristic can discriminate between documents 80% of documents require fewer than four fi le transfers Picked idle threshhold Tthresh = 1.0 seconds

0.2 0.4 0.6 0.8 1 2 4 6 8 10 CDF Number of Connections Tthresh = 0.1 Tthresh = 0.2 Tthresh = 0.5 Tthresh = 1.0 Tthresh = 2.0 Tthresh = 5.0 Tthresh = 10.0 Tthresh = 20.0

slide-11
SLIDE 11

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 11 of 15

User Think Time

0.2 0.4 0.6 0.8 1 1000 2000 CDF User Think Time (seconds) Mean 1313 Median 15

slide-12
SLIDE 12

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 12 of 15

Consecutive Documents per Server

80% of visits to a server retrieve fewer than six documents

0.2 0.4 0.6 0.8 1 2 4 6 8 10 12 14 CDF Consecutive Documents Retrieved Mean 4.1 Median 2.0

slide-13
SLIDE 13

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 13 of 15

Server Selection

Four hosts in the top ten ranking are local Sample size seems small, Zipf’s Law approximation

Rank # Location 1 43 *.cs.berkeley.edu 2 11 *.berkeley.edu 3 8 *.*.com 4 7 *.*.com 5 6 *.cs.berkeley.edu 6 6 *.*.com 7 6 www4.*.net 8 6 *.cs.berkeley.edu 9 5 www5.*.com 10 5 www.*.com

slide-14
SLIDE 14

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 14 of 15

Areas for Future Work

Better Server Selection Distribution

Filter effects caused by local servers Larger sample size

Persistent-Connection HTTP

Can’t use TCP connection sizes to determine request/reply sizes

Correlation between model distributions

Do users retrieve more or fewer consecutive documents from “popular” sites?

Newer datasets

slide-15
SLIDE 15

An Empirical Model of HTTP Network Traffi c Last Change: April 4, 1997 Page 15 of 15

Summary

Packet traces can help to build a model of HTTP traffi c Characterized HTTP network traffi c to build a synthetic workload Results consistent with prior Web measurement studies C++ source code available for simulators For more information and model data:

http://http.cs.berkeley.edu/~bmah/Software/HttpModel