Randomized Network Algorithms: An Overview and Recent Results - - PowerPoint PPT Presentation

randomized network algorithms an overview and recent
SMART_READER_LITE
LIVE PREVIEW

Randomized Network Algorithms: An Overview and Recent Results - - PowerPoint PPT Presentation

Randomized Network Algorithms: An Overview and Recent Results Balaji Prabhakar Departments of EE and CS Stanford University Network algorithms Algorithms implemented in networks, e.g. in switches/routers scheduling algorithms


slide-1
SLIDE 1

Balaji Prabhakar

Departments of EE and CS Stanford University

Randomized Network Algorithms: An Overview and Recent Results

slide-2
SLIDE 2

2

Network algorithms

  • Algorithms implemented in networks, e.g. in

– switches/routers

scheduling algorithms routing lookup packet classification security

– memory/buffer managers

maintaining statistics active queue management bandwidth partitioning

– load balancers – web caches

eviction schemes placement of caches in a network

slide-3
SLIDE 3

3

Network algorithms: challenges

  • Time constraint: Need to make complicated decisions very quickly

– line speeds in the Internet core 10Gbps (40Gbps in the near future)

 i.e. packets arrive roughly every 40ns

– large number of

 distinct flows in the Internet core  requests arriving per sec at large server farms

  • But, there are limited computational resources

– due to rigid space and heat dissipation constraints

  • Algorithms need to be very simple so as to be implementable

– but simple algorithms may perform poorly, if not well-designed

slide-4
SLIDE 4

4

Cisco GSR 12416 Juniper M160

6ft 19” Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19” Capacity: 80Gb/s Power: 2.6kW

IP Routers

2ft

slide-5
SLIDE 5

5

A Detailed Sketch

Network Processor Lookup Engine Network Processor Lookup Engine Network Processor Lookup Engine

Interconnection Fabric Switch

Output Scheduler

Line cards Outputs

Packet Buffers Packet Buffers Packet Buffers

slide-6
SLIDE 6

6

Designing network algorithms

  • I will illustrate the use of two ideas for designing efficient network algorithms
  • 1. Randomization

 base decisions upon a small, randomly chosen sample of the state/input, instead of the complete state/input

  • 2. Power law distributions

 Internet packet traces exhibit power law distributions: 80% of the packets belong to 20% of the flows; i.e. most flows are small (mice), most work is brought by a few elephants  identifying the large flows cheaply can significantly simplify the implementation

  • Two applications

– switch scheduling – bandwidth partitioning

slide-7
SLIDE 7

7

Randomization: An illustrative example

  • Find the youngest person from a population of 1 billion
  • Deterministic algorithm: linear search

has a complexity of 1 billion

  • A randomized version: find the youngest of 30 randomly chosen people

has a complexity of 30

  • Performance

linear search will find the absolute youngest person (rank = 1)

if R is the person found by randomized algorithm, we can say

  • thus, we can say that the performance of the randomized algorithm is good with a high

probability

slide-8
SLIDE 8

8

Randomizing iterative schemes

  • Often, we want to perform some operation iteratively
  • Example: find the youngest person each year
  • Say in 2007 you choose 30 people at random

– and store the identity of the youngest person in memory – in 2008 you choose 29 new people at random – let R be the youngest person from these 29 + 1 = 30 people – or

slide-9
SLIDE 9

9

Randomized switch scheduling algorithms

joint work with Paolo Giaccone and Devavrat Shah

slide-10
SLIDE 10

10

A Detailed Sketch

Network Processor Lookup Engine Network Processor Lookup Engine Network Processor Lookup Engine

Interconnection Fabric Switch

Output Scheduler

Line cards Outputs

Packet Buffers Packet Buffers Packet Buffers

slide-11
SLIDE 11

11

Input queued switch

  • Crossbar constraints

– each input can connect to at most one output – each output can connect to at most one input

Crossbar fabric

1 2 1 2 3 3

slide-12
SLIDE 12

12

Switch scheduling

  • Crossbar constraints

– each input can connect to at most one output – each output can connect to at most one input

Crossbar fabric

1 2 1 2 3 3

slide-13
SLIDE 13

13

Switch scheduling

  • Crossbar constraints

– each input can connect to at most one output – each output can connect to at most one input

Crossbar fabric

1 2 1 2 3 3

slide-14
SLIDE 14

14

Switch scheduling

  • Crossbar constraints

– each input can connect to at most one output – each output can connect to at most one input

Crossbar fabric

1 2 1 2 3 3

slide-15
SLIDE 15

15

Performance measures

  • Throughput

– an algorithm is stable (or delivers 100% throughput) if for any

admissible arrival, the average backlog is bounded.

  • Average delay or average backlog (queue-size)
slide-16
SLIDE 16

16

Scheduling: Bipartite graph matching

19 3 4 21 18 7 1

Schedule or Matching

slide-17
SLIDE 17

17

Scheduling algorithms

 Not stable  Stable

(Tassiulas-Ephremides 92, McKeown et. al. 96, Dai-Prabhakar 00)

 Not stable

(McKeown-Ananthram-Walrand 96)

19 3 4 21 18 7 1

Practical Maximal Matchings Max Wt Matching

19 18

Max Size Matching

19 1 7

slide-18
SLIDE 18

18

The Maximum Weight Matching Algorithm

  • MWM: performance

– throughput: stable (Tassiulas-Ephremides 92; McKeown et al 96; Dai-Prabhakar 00) – backlogs: very low on average (Leonardi et al 01; Shah-Kopikare 02)

  • MWM: implementation

– has cubic worst-case complexity

(approx. 27,000 iterations for a 30-port switch)

– MWM algorithms involve backtracking:

i.e. edges laid down in one iteration may be removed in a subsequent iteration

  • algorithm not amenable to pipelining
slide-19
SLIDE 19

19

Switch algorithms

Stable and low backlogs Not stable

Better performance Easier implementation

Maximal matching Max Wt Matching

19 18

Max Size Matching

19 1 7

Not stable

slide-20
SLIDE 20

20

Randomized approximation to MWM

  • Consider the following randomized approximation:

At every time

  • sample d matchings independently and uniformly
  • use the heaviest of these d matchings to schedule packets
  • Ideally we would like to use a small value of d. However,…
  • Theorem. This algorithm is not stable even when d = N. In fact,

when d = N, the throughput is at most

(Giaccone-Prabhakar-Shah 02)

slide-21
SLIDE 21

21

Tassiulas’ algorithm

Next time

MAX

Previous matching S(t-1) Current matching S(t) Random Matching R(t)

slide-22
SLIDE 22

22

Tassiulas’ algorithm: Use past sample

10 50 10 10 70 60

S(t-1) W(S(t-1))=160

40 30 10 20

R(t) W(R(t))=150 MAX S(t)

slide-23
SLIDE 23

23

Performance of Tassiulas’ algorithm

Theorem (Tassiulas 98): The above scheme is stable under any admissible Bernoulli IID inputs.

slide-24
SLIDE 24

24

Backlogs under Tassiulas’ algorithm

0.01 0.1 1 10 100 1000 10000 0.2 0.4 0.6 0.8 1

Normalized Load Mean IQ Length

Tassiulas MWM

slide-25
SLIDE 25

25

10 10 10 70 60

S(t-1) W(S(t-1))=160

50 40 30 10 20

R(t) W(R(t))=150

Reducing backlogs: the Merge operation

30 v/s 120 130 v/s 30

Merge

slide-26
SLIDE 26

26

10 10 10 70 60

S(t-1) W(S(t-1))=160

50 40 30 10 20

R(t) W(R(t))=150

Reducing backlogs: the Merge operation

Merge

W(S(t)) = 250

slide-27
SLIDE 27

27

Performance of Merge algorithm

Theorem (GPS): The Merge scheme is stable under any admissible Bernoulli IID inputs.

slide-28
SLIDE 28

28

Merge v/s Max

0.01 0.1 1 10 100 1000 10000 0.2 0.4 0.6 0.8 1

Normalized Load Mean IQ Length Tassiulas Merge MWM

slide-29
SLIDE 29

29

89 3 5 23 47 11 31 97

S(t-1) W(S(t-1))=209

Use arrival information: Serena

2 7

The arrival graph

slide-30
SLIDE 30

30

89 3 5 23 47 11 31 97

S(t-1) W(S(t-1))=209

Use arrival information: Serena

2

The arrival graph

slide-31
SLIDE 31

31

89 3 6 23 47 11 31 97

S(t-1)

23

W(S(t-1))=209 W=121

Use arrival information: Serena

Merge

W(S(t))=243 S(t)

89 3 23 31 97

slide-32
SLIDE 32

32

Performance of Serena algorithm

Theorem (GPS): The Serena algorithm is stable under any admissible Bernoulli IID inputs.

slide-33
SLIDE 33

33

Backlogs under Serena

0.01 0.1 1 10 100 1000 10000 0.2 0.4 0.6 0.8 1

Normalized Load Mean IQ Length

Tassiulas Merge Serena MWM

slide-34
SLIDE 34

Bandwidth partitioning

(jointly with R. Pan, C. Psounis, C. Nair, B. Yang)

slide-35
SLIDE 35

35

The Setup

  • A congested network with many users
  • Problems:

– allocate bandwidth fairly – control queue size and hence delay

slide-36
SLIDE 36

36

Approach 1: Network-centric

  • Network node: fair queueing
  • User traffic: any type
  • problem: complex implementation
slide-37
SLIDE 37

37

Approach 2: User-centric

  • Network node: simple FIFO
  • User traffic: responsive to congestion (e.g. TCP)
  • problem: requires user cooperation
  • For example, if the red source blasts away, it will get all of the link’s

bandwidth

  • Question: Can we prevent a single source (or a small number of sources)

from hogging up all the bandwidth, without explicitly identifying the rogue source?

  • We will deal with full-scale bandwidth partitioning later
slide-38
SLIDE 38

38

A Randomized Algorithm: First Cut

  • Consider a single link shared by 1 unresponsive (red) flow and k distinct

responsive (green) flows

  • Suppose the buffer gets congested
  • Observe: It is likely there are more packets from the red (unresponsive) source
  • So if a randomly chosen packet is evicted, it will likely be a red packet
  • Therefore, one algorithm could be:

When buffer is congested evict a randomly chosen packet

slide-39
SLIDE 39

39

Comments

  • Unfortunately, this doesn’t work because there is a small non-zero chance
  • f evicting a green packet
  • Since green sources are responsive, they interpret the packet drop as a

congestion signal and back-off

  • This only frees up more room for red packets
slide-40
SLIDE 40

40

Randomized algorithm: Second attempt

  • Suppose we choose two packets at random from the queue and compare

their ids, then it is quite unlikely that both will be green

  • This suggests another algorithm:

Choose two packets at random and drop them both if their ids agree

  • This works: That is, it limits the maximum bandwidth the red source can

consume

slide-41
SLIDE 41

41

Simulation Comparison: The setup

R1 1Mbps 10Mbps S(2) S(m) S(m+n) TCP Sources S(m+1) UDP Sources S(1) R2 D(2) D(m) D(m+n) TCP Sinks D(m+1) UDP Sinks D(1) 10Mbps

slide-42
SLIDE 42

42

1 UDP source and 32 TCP sources

200 400 600 800 1000 100 1000 10000 UDP Arrival Rate (Kbps) UDP Throughput (Kbps) RED CHOKe

slide-43
SLIDE 43

43

A Fluid Analysis

discards from the queue

permeable tube with leakage

slide-44
SLIDE 44

44

The Equation

N t L dt t dL N t L t t t L t L

i i i i i i i

) ( ) ( ) ( ) ( ) (

  • =

=> = +

  • )

2 1 ( ) ( ); 1 ( ) (

i i i i i i

p D L p L

  • =
  • =
  • Boundary Conditions
  • =

D i i

N t t L p ) (

slide-45
SLIDE 45

45

Simulation Comparison: 1UDP, 32 TCPs

50 100 150 200 250 300 350 0.1 1 10

Arrival Rate Throughput

fluid model CHOKe ns simulation

slide-46
SLIDE 46

46

Complete bandwidth partitioning

  • We have just seen how to prevent a small number of sources from hogging

all the bandwidth

  • However, this is far from ideal fairness

– but, approaching ideal bandwidth partitioning, seems very costly – (recall the fair queueing algorithm)

slide-47
SLIDE 47

47

Our approach: Exploit power laws

  • Most flows are very small (mice), most bandwidth is consumed by a few large

(elephant) flows: simply partition the bandwidth amongst the elephant flows

  • New problem: Quickly (automatically) identify elephant flows, allocate bandwidth to

them

slide-48
SLIDE 48

48

Detecting large (elephant) flows

  • Detection:

Flip a coin with bias p (= 0.1, say) for heads on each arriving packet, independently from packet to packet.

A flow is “sampled” if one its packets has a head on it

  • A flow of size X has roughly 0.1X chance of being sampled

flows with fewer than 5 packets are sampled with prob 0.5

flows with more than 10 packets are sampled with prob 1

  • Most mice will not be sampled, most elephants will be

H H T T T T T T T T T T H H

slide-49
SLIDE 49

49

The AFD Algorithm

Di Data Buffer

Flow Table (Elephant Trap)

  • AFD is a randomized algorithm

– joint with Rong Pan, Flavio Bonomi, Lee Breslau, Bob Olsen and Scott Shenker

  • Current implementation plans at Cisco; 5 platforms

– Apex-Chopper NPU based SPAs for GSR12000, and 7600 – Next generation MAC ASICs for 6500, and DC3 – Cat 3K wireless service cards

slide-50
SLIDE 50

50

Conclusions

  • Efficient network hardware design poses a lot of interesting algorithmic

problems, mainly because of very tight constraints

  • Simple algorithms are needed
  • We’ve seen that randomization and power laws can be exploited