Low-Latency Scheduling in Large Switches Wladek Olesinski Hans - PowerPoint PPT Presentation

Low-Latency Scheduling in Large Switches Wladek Olesinski Hans Eberle Nils Gura Sun Microsystems Laboratories Andres Mejia Universidad Politecnica de Valencia

Context • Large switch fabrics with hundreds of ports • Designing single-stage switch architectures at Sun Labs based on a high-speed chip-to-chip I/O technology • Scheduling is challenging > Most schemes exhibit quadratic growth in complexity (area/time/bandwidth) with #ports > High data rates (10Gb/s per port and up) > Short packets/cells 2 12/3/07

Conflicting Goals – Low Latency and High Throughput High throughput under high load Low latency under low load 3 12/3/07

Existing Schemes • Iterative algorithms (e.g., PIM , iSLIP , DRRM ) > Multiple exchanges of requests and grants > Not sufficient for large switches because of > time and bandwidth > computational complexity • Pipelining iterative schemes (e.g., PMM ) > Subschedulers process several sets of requests concurrently > In every slot one of the subschedulers produces a schedule 4 12/3/07

PWWFA - High Throughput • Parallel version of Tamir/Chi's wrapped wave front arbiter; PWWFA presented at HPSR 2007 • Scheduling time grows linearly with #ports • Multiple subschedulers working in parallel to increase throughput • Straightforward hardware implementation; (#ports) 2 elements, but small and of regular structure 5 12/3/07

Parallel Wrapped Wave Front Arbiter • Time NT : 1 st schedule Output Port ready • Time NT+T : 2 nd schedule ready ...and so on. In the next Input Port 2 periods of T , the other 2 subschedulers produce schedules. To provide fairness, subschedulers start at different waves in every scheduling cycle. 6 12/3/07

Fast Scheduler – Low Latency • Enhance PWWFA schedule with grants to most recent request • Observed that scheduling conflicts are rare under low load • Fast scheduler can thus be simple, but should have very low latency 7 12/3/07

Fast Scheduler – Two Schemes • Remove All > For a given output, give a grant if there is exactly one request > In case of conflict, defer all requests to PWWFA • Leave one > In case of conflict, grant one of the conflicting requests and defer other requests to PWWFA 8 12/3/07

Performance – 10% to 70% Load • Bernoulli traffic, N=256 9 12/3/07

Performance – 70% to 80% Load • Bernoulli traffic, N=256 10 12/3/07

Schedule fraction filled by FS • Bernoulli traffic, N=256 11 12/3/07

Performance – 10% to 70% Load • On-off traffic, N=256 12 12/3/07

Implementation – Remove All Input enable FS request Output enable Grant 13 12/3/07

Implementation – Leave One Input enable FS request Output enable Grant 14 12/3/07

Implementation – Leave One 15 12/3/07

Implementation - Complexity • Remove All > 6N 2 -3N 2-input gates for N ports > Longest path 2 log 2 (N)+4, max. fanout N > 392,448 gates for 256 ports • Leave one > 12N 2 -10N 2-input gates, N 2 -N register bits for N ports > Longest path 3 log 2 (N)+3, max. fanout 3 > 783,872 gates, 65,280 register bits for 256 ports 16 12/3/07

Conclusions and Future Work • Conclusions > Presented hybrid PWWFA-FS scheduler that provides low latency and high throughput > Simulation results encouraging > Hardware implementation feasible with reasonable complexity • Future work > Hardware implementation and characterization of the Fast Scheduler > Improve fairness 17 12/3/07

Nils Gura Nils.Gura@sun.com

Low-Latency Scheduling in Large Switches Wladek Olesinski Hans - PowerPoint PPT Presentation

Low-Latency Scheduling in Large Switches Wladek Olesinski Hans Eberle Nils Gura Sun Microsystems Laboratories Andres Mejia Universidad Politecnica de Valencia Context Large switch fabrics with hundreds of ports Designing single-stage

SERIES 500 SWITCHES SLIDE SWITCHES SWITCHES TACT SPECIFICATIONS Contact Rating: Gold, 0.4 VA

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Scheduling Scheduling Scheduling levels Decision to switch the running process can take place

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

STORM AND LOW-LATENCY PROCESSING www.inf.ed.ac.uk Low latency processing Similar to data

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Three

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Instruction Scheduling Last time Instruction scheduling using list scheduling Today

1 AGENDA Limit Switches Production Overview Standards Overview Limit switches

POPULAR MATCHINGS D.J. Abraham, R.W. Irwing,T.Kavitha,K.Melhorn 2007, Society for Industrial and

The 6NET project An IPv6 testbed for the European Research Community 6NET Project October

AMSTERDAM UNIVERSITY PRESS Jan-Peter Wissink Director wissink@aup.nl INCONECSS Conference, 20

Animal Management Module 1. Animal orders will be in this module once delivered and identified.

Federating OpenStack Powered Supercomputers John Garbutt @johnthetubaguy Why Federate a

Possible Governance Framework for Open LightPath Exchanges (GOLEs) Open LightPath Exchanges

Modular Arithmetic (Almost remainder, except for 12 and 0 are equivalent.) What time is it in 5

Financial Audit Manual Update Presented by: Bobbie Jean Bartz, DOJ Office of the Inspector

Sambuz

Useful Links

Newsletter

Mail Us