Low-Latency Scheduling in Large Switches Wladek Olesinski Hans - - PowerPoint PPT Presentation

low latency scheduling in large switches
SMART_READER_LITE
LIVE PREVIEW

Low-Latency Scheduling in Large Switches Wladek Olesinski Hans - - PowerPoint PPT Presentation

Low-Latency Scheduling in Large Switches Wladek Olesinski Hans Eberle Nils Gura Sun Microsystems Laboratories Andres Mejia Universidad Politecnica de Valencia Context Large switch fabrics with hundreds of ports Designing single-stage


slide-1
SLIDE 1

Low-Latency Scheduling in Large Switches

Wladek Olesinski Hans Eberle Nils Gura

Sun Microsystems Laboratories

Andres Mejia

Universidad Politecnica de Valencia

slide-2
SLIDE 2

2 12/3/07

Context

  • Large switch fabrics with hundreds of ports
  • Designing single-stage switch

architectures at Sun Labs based on a high-speed chip-to-chip I/O technology

  • Scheduling is challenging

> Most schemes exhibit quadratic growth

in complexity (area/time/bandwidth) with #ports

> High data rates (10Gb/s per port and up) > Short packets/cells

slide-3
SLIDE 3

3 12/3/07

Conflicting Goals – Low Latency and High Throughput

Low latency under low load High throughput under high load

slide-4
SLIDE 4

4 12/3/07

Existing Schemes

  • Iterative algorithms (e.g., PIM, iSLIP,

DRRM)

> Multiple exchanges of requests and grants > Not sufficient for large switches because of

> time and bandwidth > computational complexity

  • Pipelining iterative schemes (e.g., PMM)

> Subschedulers process several sets of

requests concurrently

> In every slot one of the subschedulers

produces a schedule

slide-5
SLIDE 5

5 12/3/07

PWWFA - High Throughput

  • Parallel version of Tamir/Chi's wrapped

wave front arbiter; PWWFA presented at HPSR 2007

  • Scheduling time grows linearly with #ports
  • Multiple subschedulers working in parallel

to increase throughput

  • Straightforward hardware implementation;

(#ports)2 elements, but small and of regular structure

slide-6
SLIDE 6

6 12/3/07

Parallel Wrapped Wave Front Arbiter

Output Port Input Port

  • Time NT: 1st schedule

ready

  • Time NT+T: 2nd

schedule ready ...and so on. In the next 2 periods of T, the other 2 subschedulers produce schedules. To provide fairness, subschedulers start at different waves in every scheduling cycle.

slide-7
SLIDE 7

7 12/3/07

Fast Scheduler – Low Latency

  • Enhance PWWFA schedule with grants to

most recent request

  • Observed that scheduling conflicts are rare

under low load

  • Fast scheduler can thus be simple, but

should have very low latency

slide-8
SLIDE 8

8 12/3/07

Fast Scheduler – Two Schemes

  • Remove All

> For a given output, give a grant if there is

exactly one request

> In case of conflict, defer all requests to

PWWFA

  • Leave one

> In case of conflict, grant one of the conflicting

requests and defer other requests to PWWFA

slide-9
SLIDE 9

9 12/3/07

Performance – 10% to 70% Load

  • Bernoulli traffic, N=256
slide-10
SLIDE 10

10 12/3/07

Performance – 70% to 80% Load

  • Bernoulli traffic, N=256
slide-11
SLIDE 11

11 12/3/07

Schedule fraction filled by FS

  • Bernoulli traffic, N=256
slide-12
SLIDE 12

12 12/3/07

Performance – 10% to 70% Load

  • On-off traffic, N=256
slide-13
SLIDE 13

13 12/3/07

Implementation – Remove All

Input enable FS request Grant Output enable

slide-14
SLIDE 14

14 12/3/07

Implementation – Leave One

Input enable FS request Grant Output enable

slide-15
SLIDE 15

15 12/3/07

Implementation – Leave One

slide-16
SLIDE 16

16 12/3/07

Implementation - Complexity

  • Remove All

> 6N2-3N 2-input gates for N ports > Longest path 2 log2(N)+4, max. fanout N > 392,448 gates for 256 ports

  • Leave one

> 12N2-10N 2-input gates, N2-N register bits for

N ports

> Longest path 3 log2(N)+3, max. fanout 3 > 783,872 gates, 65,280 register bits for 256

ports

slide-17
SLIDE 17

17 12/3/07

Conclusions and Future Work

  • Conclusions

> Presented hybrid PWWFA-FS scheduler that

provides low latency and high throughput

> Simulation results encouraging > Hardware implementation feasible with

reasonable complexity

  • Future work

> Hardware implementation and characterization

  • f the Fast Scheduler

> Improve fairness

slide-18
SLIDE 18

Nils Gura

Nils.Gura@sun.com