frame aggregated concurrent aggregated concurrent frame
play

Frame- -Aggregated Concurrent Aggregated Concurrent Frame - PowerPoint PPT Presentation

Frame- -Aggregated Concurrent Aggregated Concurrent Frame Matching Switch Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel) Background Background The Concurrent Matching Switch (CMS)


  1. Frame- -Aggregated Concurrent Aggregated Concurrent Frame Matching Switch Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)

  2. Background Background • The Concurrent Matching Switch (CMS) architecture was first presented at INFOCOM 2006 • Based on any fixed configuration switch fabric and fully distributed and independent schedulers � 100% throughput � Packet ordering � O(1) amortized time complexity � Good delay results in simulations • Proofs for 100% throughput, packet ordering, and O(1) complexity provided in INFOCOM 2006 paper, but no delay guarantee was provided 2

  3. This Talk This Talk • Focus of this talk is to provide a delay bound • Show O(N log N) delay is provably achievable while retaining O(1) complexity, 100% throughput, and packet ordering • Show No Scheduling is required to achieve O(N log N) delay by modifying original CMS architecture • Improves over best previously-known O(N 2 ) delay bound given same switch properties 3

  4. This Talk This Talk • Concurrent Matching Switch • General Delay Bound • O(N log N) delay with Fair-Frame Scheduling • O(N log N) delay and O(1) complexity with Frame Aggregation instead of Scheduling 4

  5. The Problem The Problem Higher Performance Routers Needed to Keep Up Higher Performance Routers Needed to Keep Up 5

  6. Classical Switch Architecture Classical Switch Architecture Centralized Scheduler Linecards Linecards R R A2 A1 A2 A1 Out A1 A1 R Switch R B1 B1 Out Fabric B2 B1 B2 B1 R R C1 C1 Out C1 C1 C2 C1 C2 C1 6

  7. Classical Switch Architecture Classical Switch Architecture Linecards Linecards R R A2 A1 A2 A1 Out C1 C1 A1 A1 R R B1 B1 Out A1 A1 B2 B1 B2 B1 Centralized Scheduling and Centralized Scheduling and Per-Packet Switch Reconfigurations Per-Packet Switch Reconfigurations are Major Barriers to Scalability are Major Barriers to Scalability R R C1 C1 Out B1 B1 C1 C1 C2 C1 C2 C1 7

  8. Recent Approaches Recent Approaches • Scalable architectures � Load-Balanced Switch [Chang 2002] [Keslassy 2003] � Concurrent Matching Switch [INFOCOM 2006] • Characteristics � Both based on two identical stages of fixed configuration switches and fully decentralized processing � No per-packet switch reconfigurations � Constant time local processing at each linecard � 100% throughput � Amenable to scalable implementation using optics 8

  9. Basic Load- -Balanced Switch Balanced Switch Basic Load Linecards Linecards Linecards A3 A2 A1 A3 A2 A1 R R R/N R/N In Out R/N R/N R/N R/N B2 B1 B1 B2 B1 B1 R/N R/N R R In Out R/N R/N R/N R/N R/N R/N C1 C2 C1 C1 C2 C1 R/N R/N R R In Out R/N R/N 9

  10. Basic Load- -Balanced Switch Balanced Switch Basic Load Linecards Linecards Linecards R R R/N A1 R/N A1 In Out C1 B1 C1 B1 R/N R/N R/N R/N Two switching stages can be folded into one Two switching stages can be folded into one R/N R/N R R A2 A2 Can be any (multi-stage) uniform rate fabric Can be any (multi-stage) uniform rate fabric In Out R/N R/N C2 C2 B1 B1 R/N R/N Just need fixed uniform rate circuits at R/N Just need fixed uniform rate circuits at R/N R/N R/N Amenable to optical circuit switches, e.g. Amenable to optical circuit switches, e.g. R/N R/N R R A3 A3 static WDM, waveguides, etc static WDM, waveguides, etc In Out B2 B2 R/N R/N C1 C1 10

  11. Basic Load- -Balanced Switch Balanced Switch Basic Load Linecards Linecards Linecards R R R/N A1 R/N A1 In Out C1 B1 C1 B1 R/N R/N Out of R/N R/N Order R/N R/N R R A2 A2 In Out R/N R/N C2 C2 B1 B1 R/N R/N Best previously-known delay bound with Best previously-known delay bound with R/N R/N guaranteed packet ordering is O(N 2 ) using guaranteed packet ordering is O(N 2 ) using R/N R/N R R A3 A3 Full-Ordered Frame First (FOFF) In Out Full-Ordered Frame First (FOFF) B2 B2 R/N R/N C1 C1 11

  12. Concurrent Matching Switch Concurrent Matching Switch • Retains load-balanced switch structure and scalability of fixed optical switches • Load-balance “requests” instead of packets to N parallel “schedulers” • Each scheduler independently solves its own matching • Scheduling complexity amortized by factor of N • Packets delivered in order based on matching results Goal to provide low average delay with Packet Ordering Goal to provide low average delay with Packet Ordering while retaining 100% throughput and scalability while retaining 100% throughput and scalability 12

  13. Concurrent Matching Switch Concurrent Matching Switch Linecards Linecards Linecards R R A4 A3 A2 A1 2 0 0 A4 A3 A2 A1 Out 0 0 1 1 1 1 Add Request Counters R R 1 0 0 Out 0 0 1 B2 B1 1 1 0 B2 B1 Move Buffers to Input R R C2 C1 1 0 0 C2 C1 Out C2 C1 0 0 0 C2 C1 C1 0 0 0 C1 13

  14. Arrival Phase Arrival Phase Linecards Linecards Linecards A2 A1 A1 A2 A1 A1 R R A4 A3 A2 A1 2 0 0 A4 A3 A2 A1 Out 0 0 1 1 1 1 B2 B1 B1 B2 B1 B1 R R 1 0 0 Out 0 0 1 B2 B1 1 1 0 B2 B1 C4 C3 C2 C4 C3 C2 R R C2 C1 1 0 0 C2 C1 Out C2 C1 0 0 0 C2 C1 C1 0 0 0 C1 14

  15. Arrival Phase Arrival Phase Linecards Linecards Linecards R R A4 A3 A2 A1 2 0 0 A4 A3 A2 A1 Out A1 0 0 1 A1 A2 A1 1 1 1 A2 A1 R R B1 1 0 0 B1 Out B2 B1 0 0 1 B2 B1 B2 B1 1 1 0 B2 B1 R R C2 C1 1 0 0 C2 C1 Out C2 C1 0 0 0 C2 C1 C4 C3 C2 C1 0 0 0 C4 C3 C2 C1 15

  16. Matching Phase Matching Phase Linecards Linecards Linecards R R A4 A3 A2 A1 2 1 0 A4 A3 A2 A1 Out A1 1 0 1 A1 A2 A1 1 1 2 A2 A1 R R B1 1 0 1 B1 Out B2 B1 0 1 1 B2 B1 B2 B1 1 1 1 B2 B1 R R C2 C1 1 0 1 C2 C1 Out C2 C1 0 1 0 C2 C1 C4 C3 C2 C1 0 0 1 C4 C3 C2 C1 16

  17. Departure Phase Departure Phase Linecards Linecards Linecards R R A4 1 1 0 A1 A4 A1 Out A1 1 0 0 B1 A1 B1 A2 A1 1 0 2 C1 A2 A1 C1 R R B1 0 0 1 A2 B1 A2 Out 0 0 1 B1 B1 B2 1 1 0 C1 B2 C1 R R C2 C1 0 0 1 A3 C2 C1 A3 Out C2 0 0 0 B2 C2 B2 C4 C3 0 0 0 C2 C4 C3 C2 17

  18. Practicality Practicality • All linecards operate in parallel in fully distributed manner • Arrival, matching, departure phases pipelined • Any stable scheduling algorithm can be used • e.g., amortizing well-studied randomized algorithms [Tassiulas 1998] [Giaccone 2003] over N time slots, CMS can achieve � O(1) time complexity � 100% throughput � Packet ordering � Good delay results in simulations 18

  19. Performance of CMS Performance of CMS UFS FOFF CMS Basic Load-Balanced FOFF guarantees packet ordering at O(N 2 ) delay Packet ordering and low delays No packet ordering guarantees N = 128, uniform traffic 19

  20. This Talk This Talk • Concurrent Matching Switch • General Delay Bound • O(N log N) delay with Fair-Frame Scheduling • O(N log N) delay and O(1) complexity with Frame Aggregation 20

  21. Delay Bound Delay Bound • Theorem: Given Bernoulli i.i.d. arrival, let S be strongly stable scheduling algorithm with average delay W S in single switch. Then CMS using S is also strongly stable, with average delay O(N W S ) • Intuition: � Each scheduler works at an internal reference clock that is N times slower, but receives only 1/N th of the requests � Therefore, if O(W S ) is average waiting time for request to be serviced by S, then average waiting time for CMS using S is N times longer, O(N W S ) 21

  22. Delay Bound Delay Bound • Any stable scheduling algorithm can be used with CMS • Although we previously showed good delay simulations using a randomized algorithm called SERENA [Giaccone 2003] that is amortizable to O(1) complexity, no delay bounds (W S ) are known for this class of algorithms • Therefore, delay bounds for CMS using these algorithms are also unknown 22

  23. O(N log N) Delay O(N log N) Delay • In this talk, we want to show CMS can be provably bounded by O(N log N) delay for Bernoulli i.i.d. arrival, improving over the previous O(N 2 ) bound provided by FOFF • This can be achieved using a known logarithmic delay scheduling algorithm called Fair-Frame Scheduling [Neely 2004] , i.e. W S = O(log N), therefore O(N log N) for CMS 23

  24. Fair- -Frame Scheduling Frame Scheduling Fair • Suppose we accumulate incoming requests for frame of consecutive time slots, where γ is a constant with respect to the load ρ • Then the row and column sums of the arrival matrix L is bounded by T with high probability 24

  25. Fair- -Frame Scheduling Frame Scheduling Fair • For example, suppose T = 3 and 2 0 1 L = 1 2 0 0 1 2 then it can be decomposed into T = 3 permutations 2 0 1 1 0 0 1 0 0 0 0 1 1 2 0 0 1 0 0 1 0 1 0 0 = + + 0 1 2 0 0 1 0 0 1 0 1 0 • Logarithmic delay follows from T being O(log N) • Small probability of “overflows” serviced in future frames when max row/column sum less than T 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend