Programmable Packet Scheduling at Line Rate Anirudh Sivaraman , - PowerPoint PPT Presentation

Programmable Packet Scheduling at Line Rate Anirudh Sivaraman , Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown

Programmable scheduling at line rate • Motivation: Can’t deploy new schedulers in production networks • The status quo in line-rate switches Queues/ Parser Deparser Egress pipeline Ingress pipeline Scheduler ??? RMT, Domino RMT RMT, Domino RMT In Out 2 The scheduler is still fixed

Why is programmable scheduling hard? • Many algorithms, yet no consensus on abstractions, cf. • Parse graphs for parsing • Match-action tables for forwarding • Packet transactions for data-plane algorithms • Scheduler has tight timing requirements • Can’t simply use an FPGA/CPU Need expressive abstraction that can run at line rate

What does the scheduler do? It decides • In what order are packets sent • e.g., FCFS, priorities, weighted fair queueing • At what time are packets sent • e.g., Token bucket shaping

A strawman programmable scheduler Programmable Packets Classification logic to decide order or time • Very little time on the dequeue side => limited programmability • Can we move programmability to the enqueue side instead?

The Push-In First-Out Queue Key observation • In many cases, relative order of buffered packets does not change • i.e., a packet’s place in the scheduling order is known at enqueue The Push-In First-Out Queue (PIFO) : Packets are pushed into an arbitrary location based on a rank , and dequeued from the head 8 7 2 13 10 9 9 5

A programmable scheduler To program the scheduler, program the rank computation Rank Computation PIFO Scheduler f = flow(pkt) … ... 9 8 5 2 p.rank= T[f] + p.len (programmable) (fixed logic)

A programmable scheduler Queues/ PIFO Scheduler Scheduler Deparser Parser Egress pipeline Ingress pipeline In Rank Out Computation Rank computation is a packet transaction (Domino, SIGCOMM’ 16)

Fair queuing Queues/ PIFO Scheduler Scheduler Deparser Parser Egress pipeline Ingress pipeline Rank Computation 1. f = flow(p) In Out 2. p.start = max(T[f].finish, virtual_time) 3. T[f].finish = p.start + p.len 4. p.rank = p.start

Token bucket shaping Queues/ PIFO Scheduler Scheduler Deparser Parser Egress pipeline Ingress pipeline Rank Computation 1. tokens = min( tokens + rate * (now – last), In Out burst) 2. p.send = now + max( (p.len – tokens) / rate, 0) 3. tokens = tokens - p.len 4. last = now 5. p.rank = p.send

Shortest remaining flow size Queues/ PIFO Scheduler Scheduler Deparser Parser Egress pipeline Ingress pipeline PIFO Scheduler In Out 9 8 5 2 11

Shortest remaining flow size PIFO Scheduler Rank Computation 1. f = flow(p) 2. p.rank = f.rem_size 9 8 5 2 12

Beyond a single PIFO Hierarchical Packet Fair Queuing a 1 root Red (0.5) Blue (0.5) y x y x b 3 b 2 b 1 2 1 1 2 a x b y (0.99) (0.5) (0.01) (0.5) Hierarchical scheduling algorithms need hierarchy of PIFOs

Tree of PIFOs Hierarchical PIFO-root a a Packet Fair Queuing (WFQ on Red & Blue) 1 1 root R B R B R B R B Red (0.5) Blue (0.5) a x b y (0.99) (0.5) (0.01) (0.5) y x y x b b b a 2 2 1 1 3 2 1 1 PIFO-Blue PIFO-Red (WFQ on x & y) (WFQ on a & b)

Expressiveness of PIFOs • Fine-grained priorities: shortest-flow first, earliest deadline first, service- curve EDF • Hierarchical scheduling: HPFQ, Class-Based Queuing • Non-work-conserving algorithms: Token buckets, Stop-And-Go, Rate Controlled Service Disciplines • Least Slack Time First • Service Curve Earliest Deadline First • Minimum and maximum rate limits on a flow • Cannot express some scheduling algorithms, e.g., output shaping. 15

PIFO in hardware • Performance targets for a shared-memory switch • 1 GHz pipeline (64 ports * 10 Gbit/s) • 1K flows/physical queues • 60K packets (12 MB packet buffer, 200 byte cell) • Scheduler is shared across ports • Naive solution: flat, sorted array is infeasible • Exploit observation that ranks increase within a flow 16

A single PIFO block Rank Store Flow Scheduler (SRAM) (flip-flops) A 2 3 A 2 Enqueue Dequeue B 2 4 A 0 B 1 C 6 C 3 D 4 C 4 5 D • 1 enqueue + 1 dequeue per clock cycle • Can be shared among multiple logical PIFOs 17

Hardware feasibility • The rank store is just a bank of FIFOs (well-understood design) • Flow scheduler for 1K flows meets timing at 1GHz on 16-nm transistor library • Continues to meet timing until 2048 flows, fails timing at 4096 • 7 mm 2 area for 5-level programmable hierarchical scheduler • < 4% for a typical chip. 18

Related work • PIFO: Used in theoretical work by Chuang et. al. in the 90s • Universal Packet Scheduling (UPS): Uses LSTF to replay all schedules, end point sets slack • Assumes fixed switches => cannot express fair queueing, shaping • Assumes single priority queue => cannot express hierarchies

Conclusion • Programmable scheduling at line rate is within reach • Two benefits: • Express new schedulers for different performance objectives • Express existing schedulers as software, not hardware • Code: http://web.mit.edu/pifo

Backup slides

Limitations of PIFOs • Output shaping: PIFOs rate limit input to a queue, not output • Shaping and scheduling are coupled.

PIFO mesh

Proposal: scheduling in P4 • Currently not modeled at all, blackbox left to vendor • Only part of the switch that isn’t programmable • PIFOs present a candidate • Concurrent work on Universal Packet Scheduling also requires a priority queue that is identical to a PIFO

Hardware implementation Shift elements based on push, pop indices Logical Logical Logical Rank Rank Rank PIFO ID PIFO ID PIFO ID Pop Push 1 (DEQ) (ENQ) Rank > comparators == comparators Rank Logical § Meets timing (1 GHz) for up to 2048 flows at 16 nm PIFO ID § Less than 4% area overhead (~7 mm 2 ) for 5-level scheduler Priority encoder Push 2 Priority encoder (reinsert) 25

A PIFO block ALU Enqueue: Dequeue: (logical PIFO, (logical PIFO) rank, flow) 26

Next-hop 27 lookup Deq Enq ALU Next-hop lookup Deq Enq ALU A PIFO mesh Next-hop lookup Deq Enq ALU

Proposal: scheduling in P4 • Need to model a PIFO (or priority queue) in P4 • Requires an extern instance to model a PIFO • Can start by including it in a target-specific library • Later migrate to standard library if there’s sufficient interest • Section 16 of P4v1.1 • Transactions themselves can be compiled down to P4 code using the Domino DSL for stateful algorithms.

Hardware feasibility of PIFOs • Number of flows handled by a PIFO affects timing. • Number of logical PIFOs within a PIFO, priority and metadata width, and number of PIFO blocks only increases area.

Composing PIFOs: min. rate guarantees Composing PIFOs Minimum rate guarantees: PIFO-Root Provide each flow a guaranteed ABABA Prioritize flows under min. rate rate provided the sum of these 1 2 2 3 4 guarantees is below capacity. PIFO-A PIFO-B (FIFO for flow A) (FIFO for flow B)

Traffic Shaping Ingress Pipeline Scheduler 1. update tokens 2. p.send = now + (p.len - tokens) / rate; Push-In-First-Out 3. p.prio =p.send (PIFO) Queue

LSTF Ingress Pipeline Scheduler Add Decrement wait Initialize slack transmission time in queue values delay to slack from slack Push-In-First-Out (PIFO) Queue

The PIFO abstraction in one slide • PIFO: A sorted array that let us insert an entry (packet or PIFO pointer) into a PIFO based on a programmable priority • Entries are always dequeued from the head • If an entry is a packet, dequeue and transmit it • If an entry is a PIFO, dequeue it, and continue recursively

Programmable Packet Scheduling at Line Rate Anirudh Sivaraman , - PowerPoint PPT Presentation

Programmable Packet Scheduling at Line Rate Anirudh Sivaraman , Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, Nick McKeown Programmable scheduling at line

Labor Classification Yrs Rate 1 Rate 2 Rate 3 Rate 4 Rate 5 Rate 6 Rate 7 Rate 8 Rate 9

Worm Detection ICMP Packet Analysis Ankur Agiwal 1 2 Packet Content Matching Packet

Fast, Scalable, and Programmable Packet Scheduler in Hardware Vishal Shrivastav Cornell

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

ROMs, PLAs and FPGAs October 5, 2006 Typeset by Foil T EX Why Programmable Logic?

Aperiodic Task Scheduling Radek Pel anek Preemptive Scheduling Non-preemptive Scheduling

Introduction to Packet Tracer What is Packet Tracer? Packet Tracer is a protocol simulator

Chapter 7 Packet-Switching Networks Routing in Packet Networks Shortest Path Routing Chapter 7

OpenCL-Based Design Pattern for Line Rate Packet Processing Jehandad Khan, Peter Athanas

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Uniprocessor Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms 2

Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

Module 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms

06 B: Hashing and Priority Queues CS1102S: Data Structures and Algorithms Martin Henz February

On the performance of Smiths rule in single-machine scheduling with nonlinear cost Wiebke H

Periodic Task Scheduling Radek Pel anek Introduction Periodic Scheduling Aperiodic Jobs in

Last Time Real-time scheduling using cyclic executives Today Real-time scheduling using

Cadet-Branching at U.S. Army Programs Tayfun S onmez, BC 1/68 Based on: Matching with

Preference Formation in School Choice COMSOC Summer School on Matching Problems, Markets and

GPrioSwap: Towards a Swapping Policy for GPUs Jens Kehne , Jonathan Metter, Martin Merkel, Marius

GraBi : Communication-Efficient and Workload-Balanced Partitioning for Bipartite Graphs 1 Feng