Frame- -Aggregated Concurrent Aggregated Concurrent Frame - PowerPoint PPT Presentation

Frame- -Aggregated Concurrent Aggregated Concurrent Frame Matching Switch Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)

Background Background • The Concurrent Matching Switch (CMS) architecture was first presented at INFOCOM 2006 • Based on any fixed configuration switch fabric and fully distributed and independent schedulers � 100% throughput � Packet ordering � O(1) amortized time complexity � Good delay results in simulations • Proofs for 100% throughput, packet ordering, and O(1) complexity provided in INFOCOM 2006 paper, but no delay guarantee was provided 2

This Talk This Talk • Focus of this talk is to provide a delay bound • Show O(N log N) delay is provably achievable while retaining O(1) complexity, 100% throughput, and packet ordering • Show No Scheduling is required to achieve O(N log N) delay by modifying original CMS architecture • Improves over best previously-known O(N 2 ) delay bound given same switch properties 3

This Talk This Talk • Concurrent Matching Switch • General Delay Bound • O(N log N) delay with Fair-Frame Scheduling • O(N log N) delay and O(1) complexity with Frame Aggregation instead of Scheduling 4

The Problem The Problem Higher Performance Routers Needed to Keep Up Higher Performance Routers Needed to Keep Up 5

Classical Switch Architecture Classical Switch Architecture Centralized Scheduler Linecards Linecards R R A2 A1 A2 A1 Out A1 A1 R Switch R B1 B1 Out Fabric B2 B1 B2 B1 R R C1 C1 Out C1 C1 C2 C1 C2 C1 6

Classical Switch Architecture Classical Switch Architecture Linecards Linecards R R A2 A1 A2 A1 Out C1 C1 A1 A1 R R B1 B1 Out A1 A1 B2 B1 B2 B1 Centralized Scheduling and Centralized Scheduling and Per-Packet Switch Reconfigurations Per-Packet Switch Reconfigurations are Major Barriers to Scalability are Major Barriers to Scalability R R C1 C1 Out B1 B1 C1 C1 C2 C1 C2 C1 7

Recent Approaches Recent Approaches • Scalable architectures � Load-Balanced Switch [Chang 2002] [Keslassy 2003] � Concurrent Matching Switch [INFOCOM 2006] • Characteristics � Both based on two identical stages of fixed configuration switches and fully decentralized processing � No per-packet switch reconfigurations � Constant time local processing at each linecard � 100% throughput � Amenable to scalable implementation using optics 8

Basic Load- -Balanced Switch Balanced Switch Basic Load Linecards Linecards Linecards A3 A2 A1 A3 A2 A1 R R R/N R/N In Out R/N R/N R/N R/N B2 B1 B1 B2 B1 B1 R/N R/N R R In Out R/N R/N R/N R/N R/N R/N C1 C2 C1 C1 C2 C1 R/N R/N R R In Out R/N R/N 9

Basic Load- -Balanced Switch Balanced Switch Basic Load Linecards Linecards Linecards R R R/N A1 R/N A1 In Out C1 B1 C1 B1 R/N R/N R/N R/N Two switching stages can be folded into one Two switching stages can be folded into one R/N R/N R R A2 A2 Can be any (multi-stage) uniform rate fabric Can be any (multi-stage) uniform rate fabric In Out R/N R/N C2 C2 B1 B1 R/N R/N Just need fixed uniform rate circuits at R/N Just need fixed uniform rate circuits at R/N R/N R/N Amenable to optical circuit switches, e.g. Amenable to optical circuit switches, e.g. R/N R/N R R A3 A3 static WDM, waveguides, etc static WDM, waveguides, etc In Out B2 B2 R/N R/N C1 C1 10

Basic Load- -Balanced Switch Balanced Switch Basic Load Linecards Linecards Linecards R R R/N A1 R/N A1 In Out C1 B1 C1 B1 R/N R/N Out of R/N R/N Order R/N R/N R R A2 A2 In Out R/N R/N C2 C2 B1 B1 R/N R/N Best previously-known delay bound with Best previously-known delay bound with R/N R/N guaranteed packet ordering is O(N 2 ) using guaranteed packet ordering is O(N 2 ) using R/N R/N R R A3 A3 Full-Ordered Frame First (FOFF) In Out Full-Ordered Frame First (FOFF) B2 B2 R/N R/N C1 C1 11

Concurrent Matching Switch Concurrent Matching Switch • Retains load-balanced switch structure and scalability of fixed optical switches • Load-balance “requests” instead of packets to N parallel “schedulers” • Each scheduler independently solves its own matching • Scheduling complexity amortized by factor of N • Packets delivered in order based on matching results Goal to provide low average delay with Packet Ordering Goal to provide low average delay with Packet Ordering while retaining 100% throughput and scalability while retaining 100% throughput and scalability 12

Concurrent Matching Switch Concurrent Matching Switch Linecards Linecards Linecards R R A4 A3 A2 A1 2 0 0 A4 A3 A2 A1 Out 0 0 1 1 1 1 Add Request Counters R R 1 0 0 Out 0 0 1 B2 B1 1 1 0 B2 B1 Move Buffers to Input R R C2 C1 1 0 0 C2 C1 Out C2 C1 0 0 0 C2 C1 C1 0 0 0 C1 13

Arrival Phase Arrival Phase Linecards Linecards Linecards A2 A1 A1 A2 A1 A1 R R A4 A3 A2 A1 2 0 0 A4 A3 A2 A1 Out 0 0 1 1 1 1 B2 B1 B1 B2 B1 B1 R R 1 0 0 Out 0 0 1 B2 B1 1 1 0 B2 B1 C4 C3 C2 C4 C3 C2 R R C2 C1 1 0 0 C2 C1 Out C2 C1 0 0 0 C2 C1 C1 0 0 0 C1 14

Arrival Phase Arrival Phase Linecards Linecards Linecards R R A4 A3 A2 A1 2 0 0 A4 A3 A2 A1 Out A1 0 0 1 A1 A2 A1 1 1 1 A2 A1 R R B1 1 0 0 B1 Out B2 B1 0 0 1 B2 B1 B2 B1 1 1 0 B2 B1 R R C2 C1 1 0 0 C2 C1 Out C2 C1 0 0 0 C2 C1 C4 C3 C2 C1 0 0 0 C4 C3 C2 C1 15

Matching Phase Matching Phase Linecards Linecards Linecards R R A4 A3 A2 A1 2 1 0 A4 A3 A2 A1 Out A1 1 0 1 A1 A2 A1 1 1 2 A2 A1 R R B1 1 0 1 B1 Out B2 B1 0 1 1 B2 B1 B2 B1 1 1 1 B2 B1 R R C2 C1 1 0 1 C2 C1 Out C2 C1 0 1 0 C2 C1 C4 C3 C2 C1 0 0 1 C4 C3 C2 C1 16

Departure Phase Departure Phase Linecards Linecards Linecards R R A4 1 1 0 A1 A4 A1 Out A1 1 0 0 B1 A1 B1 A2 A1 1 0 2 C1 A2 A1 C1 R R B1 0 0 1 A2 B1 A2 Out 0 0 1 B1 B1 B2 1 1 0 C1 B2 C1 R R C2 C1 0 0 1 A3 C2 C1 A3 Out C2 0 0 0 B2 C2 B2 C4 C3 0 0 0 C2 C4 C3 C2 17

Practicality Practicality • All linecards operate in parallel in fully distributed manner • Arrival, matching, departure phases pipelined • Any stable scheduling algorithm can be used • e.g., amortizing well-studied randomized algorithms [Tassiulas 1998] [Giaccone 2003] over N time slots, CMS can achieve � O(1) time complexity � 100% throughput � Packet ordering � Good delay results in simulations 18

Performance of CMS Performance of CMS UFS FOFF CMS Basic Load-Balanced FOFF guarantees packet ordering at O(N 2 ) delay Packet ordering and low delays No packet ordering guarantees N = 128, uniform traffic 19

This Talk This Talk • Concurrent Matching Switch • General Delay Bound • O(N log N) delay with Fair-Frame Scheduling • O(N log N) delay and O(1) complexity with Frame Aggregation 20

Delay Bound Delay Bound • Theorem: Given Bernoulli i.i.d. arrival, let S be strongly stable scheduling algorithm with average delay W S in single switch. Then CMS using S is also strongly stable, with average delay O(N W S ) • Intuition: � Each scheduler works at an internal reference clock that is N times slower, but receives only 1/N th of the requests � Therefore, if O(W S ) is average waiting time for request to be serviced by S, then average waiting time for CMS using S is N times longer, O(N W S ) 21

Delay Bound Delay Bound • Any stable scheduling algorithm can be used with CMS • Although we previously showed good delay simulations using a randomized algorithm called SERENA [Giaccone 2003] that is amortizable to O(1) complexity, no delay bounds (W S ) are known for this class of algorithms • Therefore, delay bounds for CMS using these algorithms are also unknown 22

O(N log N) Delay O(N log N) Delay • In this talk, we want to show CMS can be provably bounded by O(N log N) delay for Bernoulli i.i.d. arrival, improving over the previous O(N 2 ) bound provided by FOFF • This can be achieved using a known logarithmic delay scheduling algorithm called Fair-Frame Scheduling [Neely 2004] , i.e. W S = O(log N), therefore O(N log N) for CMS 23

Fair- -Frame Scheduling Frame Scheduling Fair • Suppose we accumulate incoming requests for frame of consecutive time slots, where γ is a constant with respect to the load ρ • Then the row and column sums of the arrival matrix L is bounded by T with high probability 24

Fair- -Frame Scheduling Frame Scheduling Fair • For example, suppose T = 3 and 2 0 1 L = 1 2 0 0 1 2 then it can be decomposed into T = 3 permutations 2 0 1 1 0 0 1 0 0 0 0 1 1 2 0 0 1 0 0 1 0 1 0 0 = + + 0 1 2 0 0 1 0 0 1 0 1 0 • Logarithmic delay follows from T being O(log N) • Small probability of “overflows” serviced in future frames when max row/column sum less than T 25

Frame- -Aggregated Concurrent Aggregated Concurrent Frame - PowerPoint PPT Presentation

Frame- -Aggregated Concurrent Aggregated Concurrent Frame Matching Switch Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel) Background Background The Concurrent Matching Switch (CMS)

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

Weakening Aggregated Traffic of Weakening Aggregated Traffic of DHCP Discover Messages draft

What is frame busting? What is frame busting? HTML allows for any site to frame any URL with an

Frame Relay Topologies and Designs Frame Relay Topologies and Design As we learned in the Frame

Data structure Mapping data What data you need to entry Exact location (case specific)

Decomposition Behavior in Aggregated Data Sets Sarah Berube Karl-Dieter Crisman Gordon College

Analyzing Aggregated AR(1) Processes Jon Gunnip Supervisory Committee Professor Lajos Horv

Concurrent Enrollment A Guide for Parents and Students What is Concurrent Enrollment? Concurrent

Concurrent Message Service M. Clemencic CERN - LHCb Forum on Concurrent Programming Models and

Concurrent Programming in Scala 1 / 7 Concurrent Programming 1 Concurrent programming:

FRAME- -DRAGGI NG DRAGGI NG FRAME (GRAVI TOMAGNETI SM) (GRAVI TOMAGNETI SM) AND I TS

Deck Deck Frame Frame DeckFrame Deck Frame is the utilization of VP Buildings

The Frame of the p -Adic Numbers Francisco Avila June 27, 2017 Francisco Avila The Frame

Solving Quadratic BSDEs Hlne HIBON 29/06/16 Contents Introduction The convex frame The

Concurrent programming made simple The (r)evolution of transactional memory Torvald Riegel Nuno

Concurrent Enrollment Board Policy 6172.1 May 13, 2020 Background Definition of concurrent

Disks and RAID CS 4410 Operating Systems 50 Years Old! 13th September 1956 The IBM

Scheduling Don Porter 1 CSE 506: Opera.ng Systems Logical Diagram Binary Memory Threads

Chapter 2 Deliberation with Deterministic Models 2.1: State-Variable Representation Automated

Redes Sociais Online: Desafios e Possibilidades para o Contexto Brasileiro Vagner Santana, Diego

Data Locality in MapReduce Loris Marchal 1 Olivier Beaumont 2 1: CNRS and ENS Lyon, France. 2:

Chapter 5: CPU Scheduling Outline Wh a t i s s c h e d u l i n g i n t h

ENE 2XX: Renewable Energy Systems and Control LEC 04 : Distributed Optimization of DERs Professor

Parallel Splash Belief Propagation Joseph E. Gonzalez Yucheng Low Carlos Guestrin David