Making AQM Work: Making AQM Work: Active Queue Management Active - - PowerPoint PPT Presentation

making aqm work making aqm work
SMART_READER_LITE
LIVE PREVIEW

Making AQM Work: Making AQM Work: Active Queue Management Active - - PowerPoint PPT Presentation

The UNIVERSITY UNIVERSITY of of NORTH CAROLINA NORTH CAROLINA The Making AQM Work Making AQM Work at at CHAPEL HILL CHAPEL HILL Outline Outline Background: Router-based congestion control Background: Router-based congestion


slide-1
SLIDE 1

1 1

Making AQM Work: Making AQM Work: An Efficient Alternative to ECN An Efficient Alternative to ECN

Long Le, Jay Aikat, Kevin Jeffay, and Don Smith Long Le, Jay Aikat, Kevin Jeffay, and Don Smith The The UNIVERSITY UNIVERSITY of

  • f NORTH CAROLINA

NORTH CAROLINA at at CHAPEL HILL CHAPEL HILL

http://www.cs.unc.edu/Research/dirt

October October 2003 2003

2 2

Making AQM Work Making AQM Work

Outline Outline

  • Background: Router-based congestion control

Background: Router-based congestion control

– – Active Queue Management Active Queue Management – – Explicit Congestion Notification Explicit Congestion Notification

  • State of the art in active queue management (AQM)

State of the art in active queue management (AQM)

– – Control theoretic Control theoretic v

  • v. traditional randomized dropping AQM

. traditional randomized dropping AQM

  • Do AQM schemes work?

Do AQM schemes work?

– – An empirical study of the effect of AQM on web performance An empirical study of the effect of AQM on web performance

  • Analysis of AQM performance

Analysis of AQM performance

– – The case for The case for differential congestion notification differential congestion notification (DCN) (DCN)

  • A DCN prototype and its empirical evaluation

A DCN prototype and its empirical evaluation

3 3

Router-Based Congestion Control Router-Based Congestion Control

Status quo Status quo

  • On the Internet today, packet loss is

On the Internet today, packet loss is the end-system the end-system’ ’s only indication of congestion s only indication of congestion

  • As switch

As switch’ ’s queues overflow, arriving packets are dropped s queues overflow, arriving packets are dropped

– – “ “Drop-tail Drop-tail” ” FIFO queuing is the default FIFO queuing is the default

  • TCP end-systems detect loss and respond by reducing

TCP end-systems detect loss and respond by reducing their transmission rate their transmission rate

P1 P2 P3

FCFS FCFS Scheduler Scheduler Router Router

4 4

Router-Based Congestion Control Router-Based Congestion Control

The case against drop-tail queuing The case against drop-tail queuing

  • Large (full) queues in routers are a bad thing

Large (full) queues in routers are a bad thing

– – End-to-end latency is dominated by the length of queues End-to-end latency is dominated by the length of queues at switches in the network at switches in the network

  • Allowing queues to overflow is a bad thing

Allowing queues to overflow is a bad thing

– – Connections that transmit at high rates can starve Connections that transmit at high rates can starve connections that transmit at low rates connections that transmit at low rates – – Causes connections to synchronize their response to Causes connections to synchronize their response to congestion and become unnecessarily congestion and become unnecessarily bursty bursty P1 P2 P3 P4 P5 P6

FCFS FCFS Scheduler Scheduler

slide-2
SLIDE 2

5 5

P1 P2 P3 P4 P5 P6

FCFS FCFS Scheduler Scheduler

Router-Based Congestion Control Router-Based Congestion Control

Active queue management (AQM) Active queue management (AQM)

  • Key concept: Drop packets

Key concept: Drop packets before before a queue overflows to a queue overflows to signal signal incipient incipient congestion to end-systems congestion to end-systems

  • Basic mechanism: When the queue length exceeds a

Basic mechanism: When the queue length exceeds a threshold, packets are probabilistically dropped threshold, packets are probabilistically dropped

Enqueue Flip a coin

  • Random Early Detection

Random Early Detection (RED) AQM: (RED) AQM:

– – Always Always enqueue enqueue if queue length less than a low-water mark if queue length less than a low-water mark – – Always drop if queue length is greater than a high-water mark Always drop if queue length is greater than a high-water mark – – Probalistically Probalistically drop/ drop/enqueue enqueue if queue length is in between if queue length is in between P1 P2 P3 P4 P5 P6

FCFS FCFS Scheduler Scheduler

Enqueue Always drop Flip a coin

6 6

Active Queue Management Active Queue Management

The RED Algorithm [Floyd & Jacobson 93] The RED Algorithm [Floyd & Jacobson 93]

Time Time

Max Max queue length queue length Min Min threshold threshold

Drop Drop probability probability

No drop No drop Max Max threshold threshold Forced drop Forced drop Probabilistic Probabilistic early drop early drop

Router queue length Router queue length Weighted average queue length Weighted average queue length

  • RED computes a weighted moving average of queue

RED computes a weighted moving average of queue length to accommodate length to accommodate bursty bursty arrivals arrivals

  • Drop probability is a function of the current average

Drop probability is a function of the current average queue length queue length

– – The larger the queue, the higher the drop probability The larger the queue, the higher the drop probability

7 7

Drop probability Drop probability Weighted Weighted Average Average Queue Length Queue Length

100% 100% min minth

th

max maxth

th

max maxp

p

Time Time

Max Max queue length queue length Min Min threshold threshold

Drop Drop probability probability

No drop No drop Max Max threshold threshold Forced drop Forced drop Probabilistic Probabilistic early drop early drop

Router queue length Router queue length Weighted average queue length Weighted average queue length

Active Queue Management Active Queue Management

The RED Algorithm [Floyd & Jacobson 93] The RED Algorithm [Floyd & Jacobson 93]

8 8

Active Queue Management Active Queue Management

Explicit Congestion Notification (ECN) Explicit Congestion Notification (ECN)

  • Dropping packets is a simple means of signaling

Dropping packets is a simple means of signaling congestion but it congestion but it’ ’s less than ideal s less than ideal

– – It may take a long time for a sender to detect and react to It may take a long time for a sender to detect and react to congestion signaled by packet drops congestion signaled by packet drops – – There are subtle fairness issues in the way flows are treated There are subtle fairness issues in the way flows are treated P1 P2 P3 P4 P5 P6

FCFS FCFS Scheduler Scheduler

Enqueue Always drop Flip a coin

  • ECN: Instead of dropping packets, send an explicit

ECN: Instead of dropping packets, send an explicit signal back to the sender to indicate congestion signal back to the sender to indicate congestion

– – (An old concept: ICMP Source Quench, (An old concept: ICMP Source Quench, DECbit DECbit, ATM, , ATM, … …) )

slide-3
SLIDE 3

9 9

ACK

Explicit Congestion Notification Explicit Congestion Notification

Overview Overview

  • Modify a RED router to

Modify a RED router to “ “mark mark” ” packets rather packets rather than dropping them than dropping them

  • Set a bit in a packet

Set a bit in a packet’ ’s header and forward towards s header and forward towards the ultimate destination the ultimate destination

Router Router

P1 P2 P3 P4 P5 P6 P7 P8 P9

Sched- uler

data data data ACK ACK data ACK data ACK data ACK ACK data

  • A receiver recognizes the marked packet and sets a

A receiver recognizes the marked packet and sets a corresponding bit in the next outgoing ACK corresponding bit in the next outgoing ACK

data ACK

10 10

Explicit Congestion Notification Explicit Congestion Notification

Overview Overview

  • When a sender receives an ACK with ECN it

When a sender receives an ACK with ECN it invokes a response similar to that for packet loss: invokes a response similar to that for packet loss:

Router Router

P1 P2 P3 P4 P5 P6 P7 P8 P9

Sched- uler

ACK

– – Halve the congestion window Halve the congestion window cwnd cwnd and halve the slow- and halve the slow- start threshold start threshold ssthresh ssthresh – – Continue to use ACK-clocking to pace transmission of Continue to use ACK-clocking to pace transmission of data packets data packets

11 11

Explicit Congestion Notification Explicit Congestion Notification

Overview Overview

  • When a sender receives an ACK with ECN it

When a sender receives an ACK with ECN it invokes a response similar to that for packet loss invokes a response similar to that for packet loss

  • In any given RTT, a sender should react to either

In any given RTT, a sender should react to either ECN or packet loss ECN or packet loss but not both but not both! !

– – Once a response has begun, wait until all outstanding Once a response has begun, wait until all outstanding data has been data has been ACKed ACKed before beginning a new response before beginning a new response Router Router

P1 P2 P3 P4 P5 P6 P7 P8 P9

Sched- uler

ACK

16 16

Explicit Congestion Notification Explicit Congestion Notification

Putting the pieces together: AQM + ECN Putting the pieces together: AQM + ECN

Time Time

Max Max queue length queue length Min Min threshold threshold

Mark/Drop Mark/Drop probability probability

No No mark/drop mark/drop Max Max threshold threshold Forced drop Forced drop Probabilistic Probabilistic early mark early mark/drop /drop

Router queue length Router queue length Weighted average queue length Weighted average queue length

  • If a RED router detects congestion it will mark arriving

If a RED router detects congestion it will mark arriving packets packets

  • The router will then forward marked packets from

The router will then forward marked packets from ECN-capable senders ECN-capable senders… …

…and drop marked packets from all other senders and drop marked packets from all other senders

slide-4
SLIDE 4

17 17

Making AQM Work Making AQM Work

Outline Outline

  • Background: Router-based congestion control

Background: Router-based congestion control

– – Active Queue Management Active Queue Management – – Explicit Congestion Notification Explicit Congestion Notification

  • State of the art in active queue management (AQM)

State of the art in active queue management (AQM)

– – Control theoretic Control theoretic v

  • v. traditional randomized dropping AQM

. traditional randomized dropping AQM

  • Do AQM schemes work?

Do AQM schemes work?

– – An empirical study of the effect of AQM on web performance An empirical study of the effect of AQM on web performance

  • Analysis of AQM performance

Analysis of AQM performance

– – The case for The case for differential congestion notification differential congestion notification (DCN) (DCN)

  • A DCN prototype and its empirical evaluation

A DCN prototype and its empirical evaluation

18 18

The State of the ART in AQM The State of the ART in AQM

Adaptive/Gentle RED (ARED) Adaptive/Gentle RED (ARED)

Mark/Drop probability Mark/Drop probability Weighted Weighted Average Average Queue Length Queue Length

100% 100% min minth

th

max maxth

th

max maxp

p

Time Time

Max Max queue length queue length Forced drop Forced drop Min Min threshold threshold

Mark/Drop Mark/Drop Probability Probability

No mark/drop No mark/drop Max Max threshold threshold Probabilistic Probabilistic early mark/drop early mark/drop

Router queue length Router queue length

2 2× ×Max Max threshold threshold Probabilistic Probabilistic “ “gentle gentle” ” drop drop

Weighted average queue length Weighted average queue length

19 19

Mark/Drop Probability Mark/Drop Probability Weighted Weighted Average Average Queue Queue Length Length

100% 100% min minth

th

2 2× ×max maxth

th

max maxp

p

max maxth

th

The State of the ART in AQM The State of the ART in AQM

Adaptive/Gentle RED (ARED) Adaptive/Gentle RED (ARED)

Time Time

Max Max queue length queue length Forced drop Forced drop Min Min threshold threshold

Mark/Drop Mark/Drop Probability Probability

No mark/drop No mark/drop Max Max threshold threshold Probabilistic Probabilistic early mark/drop early mark/drop

Router queue length Router queue length

2 2× ×Max Max threshold threshold Probabilistic Probabilistic “ “gentle gentle” ” drop drop

AIMD Adaptation of maxp

20 20

The State of the ART in AQM The State of the ART in AQM

The Proportional Integral (PI) controller The Proportional Integral (PI) controller

  • PI attempts to maintain an explicit target queue length

PI attempts to maintain an explicit target queue length

Time Time Router queue length Router queue length

  • PI samples instantaneous queue length at fixed intervals

PI samples instantaneous queue length at fixed intervals and computes a mark/drop probability at and computes a mark/drop probability at k kth

th sample:

sample:

– p(kT) = a × (q(kT) – qref) – b × (q((k-1)T) – qref) + p((k-1)T) – – a a, , b b, and , and T T depend on link capacity, maximum RTT and the depend on link capacity, maximum RTT and the number of flows at a router number of flows at a router

Target Target Queue Queue Reference Reference

(qref)

slide-5
SLIDE 5

21 21

The State of the ART in AQM The State of the ART in AQM

Random Exponential Marking (REM) Random Exponential Marking (REM)

  • REM is similar to PI (though differs in details)

REM is similar to PI (though differs in details)

Time Time Router queue length Router queue length

  • REM mark/drop probability depends on:

REM mark/drop probability depends on:

– – Difference between input and output rate Difference between input and output rate – – Difference between instantaneous queue length and target Difference between instantaneous queue length and target – p(t) = p(t–1) + γ [α (q(t) – qref)) + x(t) – c] – prob(t) = 1 – φ -p(t), φ > 1 a constant

Target Target Queue Queue Reference Reference

(qref)

22 22

ISP1 ISP1 Browsers/ Browsers/ Servers Servers ISP2 ISP2 Browsers/ Browsers/ Servers Servers

Do AQM Schemes Work? Do AQM Schemes Work?

Evaluation methodology Evaluation methodology

Ethernet Ethernet Switch Switch Ethernet Ethernet Switch Switch

  • Evaluate AQM schemes through

Evaluate AQM schemes through “ “live simulation live simulation” ”

  • Emulate the browsing behavior of a large population of

Emulate the browsing behavior of a large population of users surfing the web in a laboratory testbed users surfing the web in a laboratory testbed

– – Construct a physical network emulating a congested peering Construct a physical network emulating a congested peering link between two ISPs link between two ISPs

ISP 1 Edge ISP 1 Edge Router Router ISP 2 Edge ISP 2 Edge Router Router

– – Generate synthetic HTTP requests and responses but transmit Generate synthetic HTTP requests and responses but transmit

  • ver real TCP/IP stacks, network links, and switches
  • ver real TCP/IP stacks, network links, and switches

… …

Congested Congested Link Link

23 23

Experimental Methodology Experimental Methodology

HTTP traffic generation HTTP traffic generation

  • Synthetic web traffic generated using the UNC HTTP

Synthetic web traffic generated using the UNC HTTP model [SIGMETRICS 2001, MASCOTS 2003] model [SIGMETRICS 2001, MASCOTS 2003]

REQ REQ RESP RESP

User User Server Server

REQ REQ RESP RESP REQ REQ RESP RESP REQ REQ RESP RESP REQ REQ RESP RESP

Time Time

  • Primary random variables:

Primary random variables:

– – Request sizes/Reply sizes Request sizes/Reply sizes – – User think time User think time – – Persistent connection usage Persistent connection usage – – Nbr of objects per persistent Nbr of objects per persistent connection connection

Response Time Response Time

– – Number of embedded images/page Number of embedded images/page – – Number of parallel connections Number of parallel connections – – Consecutive documents per server Consecutive documents per server – – Number of servers per page Number of servers per page

24 24

Experimental Methodology Experimental Methodology

Testbed emulating an ISP peering link Testbed emulating an ISP peering link

FreeBSD FreeBSD Router Router FreeBSD FreeBSD Router Router

Ethernet Ethernet Switch Switch

ISP1 ISP1 Browsers/ Browsers/ Servers Servers ISP2 ISP2 Browsers/ Browsers/ Servers Servers

100 100 Mbps Mbps Ethernet Ethernet Switch Switch 1 Gbps 1 Gbps 1 Gbps 1 Gbps 100 100 Mbps Mbps 100 100 Mbps Mbps

  • AQM schemes implemented in FreeBSD routers using

AQM schemes implemented in FreeBSD routers using ALTQ kernel extensions ALTQ kernel extensions

10-150 10-150 ms ms RTT RTT

  • End-systems either a traffic generation client or server

End-systems either a traffic generation client or server

– – Use Use dummynet dummynet to provide to provide per-flow per-flow propagation delays propagation delays – – Two-way traffic generated, equal load generated in each Two-way traffic generated, equal load generated in each direction direction … …

slide-6
SLIDE 6

25 25

Experimental Methodology Experimental Methodology

1 Gbps network calibration experiments 1 Gbps network calibration experiments

  • Experiments run on a congested 100 Mbps link

Experiments run on a congested 100 Mbps link

  • Primary simulation parameter: Number of simulated

Primary simulation parameter: Number of simulated browsing users browsing users

  • Run calibration experiments on an uncongested 1 Gbps

Run calibration experiments on an uncongested 1 Gbps link to relate simulated user populations to average link link to relate simulated user populations to average link utilization utilization

– – (And to ensure offered load is linear in the number of (And to ensure offered load is linear in the number of simulated users simulated users — — i.e. i.e., that end-systems are not a bottleneck) , that end-systems are not a bottleneck)

Ethernet Ethernet Switch Switch

100 Mbps 100 Mbps (experiments) (experiments)

Ethernet Ethernet Switch Switch 1 1 Gbps Gbps 1 1 Gbps Gbps 100 100 Mbps Mbps 100 100 Mbps Mbps

… …

1 Gbps 1 Gbps (calibration) (calibration)

26 26

Experimental Methodology Experimental Methodology

1 Gbps network calibration experiments 1 Gbps network calibration experiments We run experiments at offered loads

  • f 80%, 90%, 98%, and 105% of the

capacity of the 100 Mbps link We run experiments at offered loads

  • f 80%, 90%, 98%, and 105% of the

capacity of the 100 Mbps link Ex: 98% load means a number of simulated users sufficient to generate 98 Mbps (on average) on the 1 Gbps network Ex: 98% load means a number of simulated users sufficient to generate 98 Mbps (on average) on the 1 Gbps network Generating 98 Mbps of HTTP traffic requires simulating 9,330 users Generating 98 Mbps of HTTP traffic requires simulating 9,330 users

Users Link Throughput (Mbps) 20 40 60 80 100 120 140 160 180 200

27 27

Experimental Methodology Experimental Methodology

Experimental plan Experimental plan

  • Run experiments with ARED, PI, and REM using their

Run experiments with ARED, PI, and REM using their recommended parameter settings at different offered loads recommended parameter settings at different offered loads

drop-tail ARED PI REM 80% 90% 98% 105% loss rate utilization response times completed requests uncongested

  • Compare results with drop-tail FIFO at the same offered

Compare results with drop-tail FIFO at the same offered loads loads… …

– – (the (the “ “negative negative” ” baselines baselines — — the performance to beat) the performance to beat)

… …and compare with performance on the 1 Gbps network and compare with performance on the 1 Gbps network

– – (the (the “ “positive positive” ” baseline baseline — — the performance to achieve) the performance to achieve)

  • Redo the experiments with ECN

Redo the experiments with ECN

28 28

Experimental Results Experimental Results

80% Load

80% Load

Performance with packet drops Performance with packet drops 50% of responses… 50% of responses… …complete in 125 ms or less …complete in 125 ms or less

slide-7
SLIDE 7

29 29

Experimental Results Experimental Results

80% Load

80% Load

Performance with packet drops Performance with packet drops No benefit to using PI or REM

  • ver drop-tail at 80% load

No benefit to using PI or REM

  • ver drop-tail at 80% load

ARED can actually make things worse ARED can actually make things worse

30 30

Experimental Results Experimental Results

90% Load

90% Load

Performance with packet drops Performance with packet drops Drop-tail, PI, & REM equivalent for shortest 80% of responses Drop-tail, PI, & REM equivalent for shortest 80% of responses PI best overall PI best overall ARED not competitive ARED not competitive

31 31

ECN Results ECN Results

90% Load

90% Load

Comparison of all schemes Comparison of all schemes PI & REM outperform drop-tail and approximate performance on the uncongested network PI & REM outperform drop-tail and approximate performance on the uncongested network

32 32

Impact of ECN on REM Impact of ECN on REM

Performance with/without ECN at 90% load Performance with/without ECN at 90% load REM performance improved with ECN for qref = 24 REM performance improved with ECN for qref = 24

slide-8
SLIDE 8

33 33

Impact of ECN on REM Impact of ECN on REM

Performance with/without ECN at 90% load Performance with/without ECN at 90% load Big improvement for qref = 240 Big improvement for qref = 240

34 34

Impact of ECN on ARED Impact of ECN on ARED

Performance with/without ECN at 90% load Performance with/without ECN at 90% load ECN has little impact on ARED performance ECN has little impact on ARED performance

35 35

Do AQM Schemes Work? Do AQM Schemes Work?

Summary Summary

  • For offered loads up to 80% of link capacity, no AQM

For offered loads up to 80% of link capacity, no AQM scheme gives better performance than drop-tail FIFO scheme gives better performance than drop-tail FIFO

– – All give comparable response time performance, loss rates, All give comparable response time performance, loss rates, and link utilization and link utilization

  • For offered loads of 90% or greater

For offered loads of 90% or greater… …

– – Without ECN, PI results in a modest performance Without ECN, PI results in a modest performance improvement over drop-tail and other AQM schemes improvement over drop-tail and other AQM schemes – – With ECN, both PI and REM provide significant performance With ECN, both PI and REM provide significant performance improvement over drop-tail improvement over drop-tail

  • ARED consistently results in the poorest performance

ARED consistently results in the poorest performance

– – Often worse than drop-tail FIFO Often worse than drop-tail FIFO

36 36

Discussion Discussion

Why does ARED perform so poorly? Why does ARED perform so poorly?

  • ARED bases mark/drop probability on the (weighted)

ARED bases mark/drop probability on the (weighted) average queue length average queue length

  • PI, REM use instantaneous measures of queue length

PI, REM use instantaneous measures of queue length

  • ARED

ARED’ ’s reliance on the average queue length limits its s reliance on the average queue length limits its ability to react effectively in the face of bursty traffic ability to react effectively in the face of bursty traffic

Time Time Router Router queue queue length length Time Time Router Router queue queue length length

Weighted Queue Length (RED) Instantaneous Queue Length (PI/REM)

slide-9
SLIDE 9

37 37

Discussion Discussion

Why does ECN improve REM more than PI? Why does ECN improve REM more than PI?

  • Without ECN, REM drops

Without ECN, REM drops more packets than PI more packets than PI

  • REM causes more flows to

REM causes more flows to experience multiple losses experience multiple losses within a congestion window within a congestion window

– – Loss recovered through Loss recovered through timeout rather than fast timeout rather than fast recovery recovery

  • In general ECN allows more flows to avoid timeouts

In general ECN allows more flows to avoid timeouts

– – Thus ECN is ameliorating a design flaw in REM Thus ECN is ameliorating a design flaw in REM

REM Performance w/, w/o ECN at 90% Load

38 38

Discussion Discussion

Why does ARED not benefit from ECN? Why does ARED not benefit from ECN?

  • ARED drops marked packets when average queue size is

ARED drops marked packets when average queue size is above above max maxth

th

  • This is done to deal with potentially non-responsive flows

This is done to deal with potentially non-responsive flows

  • We believe this policy is a premature optimization

We believe this policy is a premature optimization

Time Time

Max Max queue length queue length Forced drop Forced drop Min Min threshold threshold

Mark/Drop Mark/Drop Probability Probability

No mark/drop No mark/drop Max Max threshold threshold Probabilistic Probabilistic early mark/drop early mark/drop

Router queue length Router queue length

2 2× ×Max Max threshold threshold Probabilistic Probabilistic “ “gentle gentle” ” drop drop

39 39

Discussion Discussion

Why does ARED perform so poorly? Why does ARED perform so poorly?

  • PI and REM measure

PI and REM measure queue length in bytes queue length in bytes

  • By default RED

By default RED measures in packets measures in packets

– – But ARED does have But ARED does have a a “ “byte mode byte mode” ”

ARED Performance w/, w/o ECN at 90% Load

  • Drop/Mark probability in PI/REM biased by packet size

Drop/Mark probability in PI/REM biased by packet size

– – SYNs and pure ACKs have a lower drop probability in PI/REM SYNs and pure ACKs have a lower drop probability in PI/REM

  • Differentiating at the packet level is critical

Differentiating at the packet level is critical

– – Is it enough? Is it enough?

40 40

Discussion Discussion

Do AQM designs inherently require ECN? Do AQM designs inherently require ECN?

  • Claim: Differentiating between flows at the flow-level

Claim: Differentiating between flows at the flow-level is important is important

  • ECN is required for good AQM performance because

ECN is required for good AQM performance because it eliminates the need for short flows to retransmit (a it eliminates the need for short flows to retransmit (a significant fraction of their) data significant fraction of their) data

– – With ECN, short flows (mostly) no longer retransmit data With ECN, short flows (mostly) no longer retransmit data – – But their performance is still hurt by AQM But their performance is still hurt by AQM

  • Why signal short flows at all?

Why signal short flows at all?

– – They have no real transmission rate to adapt They have no real transmission rate to adapt – – Hence signaling these flows provides no benefit to the Hence signaling these flows provides no benefit to the network and only hurts end-system performance network and only hurts end-system performance

slide-10
SLIDE 10

41 41

The Structure of Web Traffic The Structure of Web Traffic

Distribution of response sizes Distribution of response sizes

10 100 1K 10K 100K 1M 10M 100M 1G 1

87% of responses… 87% of responses… …are 10K bytes or less …are 10K bytes or less

42 42

The Structure of Web Traffic The Structure of Web Traffic

Percent of bytes transferred by response sizes Percent of bytes transferred by response sizes

100 1K 10K 100K 1M 10M 100M 1G

But objects that are 10K bytes or smaller… But objects that are 10K bytes or smaller… …account for

  • nly 20% of all

bytes transferred …account for

  • nly 20% of all

bytes transferred

100 90 80 70 60 50 40 30 20 10

43 43

Making AQM Work Making AQM Work

Overview Overview

  • Background: Router-based congestion control

Background: Router-based congestion control

– – Active Queue Management Active Queue Management – – Explicit Congestion Notification Explicit Congestion Notification

  • State of the art in active queue management (AQM)

State of the art in active queue management (AQM)

– – Control theoretic Control theoretic v

  • v. traditional randomized dropping AQM

. traditional randomized dropping AQM

  • Do AQM schemes work?

Do AQM schemes work?

– – An empirical study of the effect of AQM on web performance An empirical study of the effect of AQM on web performance

  • Analysis of AQM performance

Analysis of AQM performance

– – The case for The case for differential congestion notification differential congestion notification (DCN) (DCN)

  • A DCN prototype and its empirical evaluation

A DCN prototype and its empirical evaluation

44 44

Realizing Differential Notification Realizing Differential Notification

Issues and approach Issues and approach

  • How to identify packets belonging to long-lived, high

How to identify packets belonging to long-lived, high bandwidth flows with minimal state? bandwidth flows with minimal state?

– – Adopt the Adopt the Estan Estan, Varghese flow filtering scheme developed , Varghese flow filtering scheme developed for traffic accounting [SIGCOMM 2002] for traffic accounting [SIGCOMM 2002]

  • How to determine when to signal congestion (by

How to determine when to signal congestion (by dropping packets)? dropping packets)?

– – Use a PI-like scheme Use a PI-like scheme

  • Differential treatment of flows an old idea:

Differential treatment of flows an old idea:

– – FRED FRED – – SRED SRED – – CHOKe CHOKe – – SFB SFB – – AFD AFD – – RED-PD RED-PD – – RIO-PS RIO-PS – – … …

slide-11
SLIDE 11

45 45

Classifying Flows Classifying Flows

A score-boarding approach A score-boarding approach

  • Use two hash tables:

Use two hash tables:

– – A A “ “suspect suspect” ” flow table HB ( flow table HB (“ “high-bandwidth high-bandwidth” ”) and ) and – – A per-flow packet count table SB ( A per-flow packet count table SB (“ “scoreboard scoreboard” ”) ) – – Hash on IP addressing 4- Hash on IP addressing 4-tuple tuple plus protocol number plus protocol number

  • Arriving packets from flows in HB are subject to

Arriving packets from flows in HB are subject to dropping dropping

  • Arriving packets from other flows are inserted into SB

Arriving packets from other flows are inserted into SB and tested to determine if the flow should be considered and tested to determine if the flow should be considered high-bandwidth high-bandwidth

– – Use a simple packet count threshold for this determination Use a simple packet count threshold for this determination

46 46

Classifying Flows Classifying Flows

A score-boarding approach A score-boarding approach

Is Flow ID in HB? Mark or drop probabilistically yes Enqueue no Is Flow ID in SB? Increment pktcount Overwrite existing flow entry Last update w/in threshold1? Reset pktcount pktcount ≥ 4? Copy flow entry to HB threshold2 time elapsed since last decrease? Decrease pktcount by pref yes yes yes yes no no no (Enqueue if not dropped)

47 47

An Alternate Approach An Alternate Approach

AFD [Pan AFD [Pan et al et al. 2003] . 2003]

P1 P2 P3

Scheduler Scheduler

Shadow Buffer

“ “Approximate Fairness through Differential Dropping Approximate Fairness through Differential Dropping” ”

  • Sample 1 out of every

Sample 1 out of every s s packets and store in a packets and store in a shadow shadow buffer buffer of size

  • f size b

b

  • Estimate flow

Estimate flow’ ’s rate as s rate as

  • Drop packet with probability

Drop packet with probability

r rest

est = R

= R

# matches # matches b b

p = p = 1 1 – – r rfair

fair

r rest

est

Flow Table

48 48

90%

DCN Evaluation DCN Evaluation

Experimental plan Experimental plan

  • Run experiments with DCN, AFD, and PI at same offered

Run experiments with DCN, AFD, and PI at same offered loads as before loads as before

– – PI always uses ECN, test AFD with and without ECN PI always uses ECN, test AFD with and without ECN – – DCN always signals congestion via drops DCN always signals congestion via drops drop-tail DCN AFD PI 80% 98% 105% loss rate utilization response times completed requests uncongested

  • Compare DCN results against

Compare DCN results against… …

– – The better of PI or AFD (the performance to beat) The better of PI or AFD (the performance to beat) – – The The uncongested uncongested network (the performance to approximate) network (the performance to approximate)

slide-12
SLIDE 12

49 49

Experimental Results Experimental Results — — 90% Load 90% Load

DCN performance DCN performance Performance approximates that

  • n the uncongested network

Performance approximates that

  • n the uncongested network

ECN has no effect on performance ECN has no effect on performance

50 50

Experimental Results Experimental Results — — 98% Load 98% Load

DCN performance DCN performance No congestion collapse effects observed No congestion collapse effects observed Minor (but expected) improvement with ECN Minor (but expected) improvement with ECN

51 51

Experimental Results Experimental Results — — 98% Load 98% Load

DCN performance DCN performance

DCN DCN Uncongested Uncongested Loss Loss Rate Rate 2.6% 2.6% 0% 0% Nbr Nbr Completed Completed Requests Requests 15.1M 15.1M 16.2M 16.2M Throughput Throughput 89.5 Mbps 89.5 Mbps 98 Mbps 98 Mbps

52 52

Experimental Results Experimental Results — — 90% Load 90% Load

Comparison of all schemes Comparison of all schemes All schemes give comparable performance and significantly

  • utperform drop-tail

All schemes give comparable performance and significantly

  • utperform drop-tail
slide-13
SLIDE 13

53 53

Experimental Results Experimental Results — — 98% Load 98% Load

Comparison of all schemes Comparison of all schemes AFD significantly under- performs DCN and PI/ECN AFD significantly under- performs DCN and PI/ECN DCN outperforms PI/ECN DCN outperforms PI/ECN

54 54

Experimental Results Experimental Results

AFD Performance with & without ECN AFD Performance with & without ECN At 90% offered load, AFD performance is good and unimproved by ECN At 90% offered load, AFD performance is good and unimproved by ECN At 98% load, ECN gives counterintuitive results At 98% load, ECN gives counterintuitive results

55 55

Experimental Results Experimental Results — — 98% Load 98% Load

Tail of the response time distribution Tail of the response time distribution The remaining 0.5% experience sig- nificantly increased response times The remaining 0.5% experience sig- nificantly increased response times DCN improves response time for 99.5% of all responses DCN improves response time for 99.5% of all responses

56 56

Experimental Results Experimental Results — — 98% Load 98% Load

Percentage of bytes transferred by response size Percentage of bytes transferred by response size

With DCN, objects 10K bytes or smaller… With DCN, objects 10K bytes or smaller… …account for 25%

  • f all bytes trans-

ferred (was 20%) …account for 25%

  • f all bytes trans-

ferred (was 20%)

slide-14
SLIDE 14

57 57

Experimental Results Experimental Results — — 98% Load 98% Load

Percentage of bytes transferred by response size Percentage of bytes transferred by response size

Objects 100K bytes or smaller Objects 100K bytes or smaller …account for 70%

  • f all bytes trans-

ferred (was 65%) …account for 70%

  • f all bytes trans-

ferred (was 65%)

100 90 80 70 60 50 40 30 20 10

58 58

DCN Evaluation DCN Evaluation

Summary Summary

  • DCN uses a simple, tunable two-tiered classification

DCN uses a simple, tunable two-tiered classification scheme with: scheme with:

– – Tunable storage overhead Tunable storage overhead – – O O(1) complexity with high probability (1) complexity with high probability

  • DCN, without ECN, meets or exceeds the performance

DCN, without ECN, meets or exceeds the performance

  • f the best performing AQM designs with ECN
  • f the best performing AQM designs with ECN

– – The performance of 99+% of flows is improved The performance of 99+% of flows is improved – – More small and More small and “ “medium medium” ” flows complete per unit time flows complete per unit time

  • On heavily congested networks, DCN closely approx-

On heavily congested networks, DCN closely approx- imates the performance achieved on an imates the performance achieved on an uncongested uncongested network network

59 59

Making AQM Work Making AQM Work

Summary and Conclusions Summary and Conclusions

  • We emulated a peering point between two ISPs and

We emulated a peering point between two ISPs and applied AQM in ISP border routers applied AQM in ISP border routers

  • We emulated the browsing behaviors of tens of

We emulated the browsing behaviors of tens of thousands of users in a laboratory thousands of users in a laboratory testbed testbed

  • No AQM scheme with or without ECN is better than

No AQM scheme with or without ECN is better than drop-tail FIFO for offered loads up to 80% of link drop-tail FIFO for offered loads up to 80% of link capacity capacity

  • For offered loads of 90% or greater there is benefit to

For offered loads of 90% or greater there is benefit to control theoretic AQM but only when used with ECN control theoretic AQM but only when used with ECN

60 60

Making AQM Work Making AQM Work

Summary and Conclusions Summary and Conclusions

  • The reliance on ECN is required to

The reliance on ECN is required to “ “improve improve” ” (hurt (hurt less) the performance of short flows less) the performance of short flows

– – 90% of the flows in our HTTP model 90% of the flows in our HTTP model

  • But in the absolute, ECN is not helping their

But in the absolute, ECN is not helping their performance performance

  • Heuristically signaling only long-lived, high-bandwidth

Heuristically signaling only long-lived, high-bandwidth flows improves the performance of most flows and flows improves the performance of most flows and eliminates the requirement for ECN eliminates the requirement for ECN

– – One can operate links carrying HTTP traffic at near saturation One can operate links carrying HTTP traffic at near saturation levels with performance approaching an achieved on an levels with performance approaching an achieved on an uncongested uncongested network network

  • Identification of short flows can effectively be

Identification of short flows can effectively be performed with tunable state and complexity performed with tunable state and complexity

slide-15
SLIDE 15

61 61

Making AQM Work Making AQM Work

Future work Future work

  • More of the same

More of the same… …

– – Tuning, tuning, tuning Tuning, tuning, tuning… … – – Re-evaluate DCN (and other AQM schemes) with more Re-evaluate DCN (and other AQM schemes) with more diverse traffic models diverse traffic models (But where do we get these models?) (But where do we get these models?) – – Study the effect of non-responsive and malicious flows Study the effect of non-responsive and malicious flows

  • New and improved

New and improved… …

– – Deconstruct AQM and study performance contribution of Deconstruct AQM and study performance contribution of constituent components constituent components – – Understand the interplay between ECN and AQM components Understand the interplay between ECN and AQM components

62 62

Making AQM Work: Making AQM Work: An Efficient Alternative to ECN An Efficient Alternative to ECN

Long Le, Jay Aikat, Kevin Jeffay, and Don Smith Long Le, Jay Aikat, Kevin Jeffay, and Don Smith The The UNIVERSITY UNIVERSITY of

  • f NORTH CAROLINA

NORTH CAROLINA at at CHAPEL HILL CHAPEL HILL

http://www.cs.unc.edu/Research/dirt

October October 2003 2003