packet transactions high level programming for line rate
play

Packet Transactions: High-Level Programming for Line-Rate Switches - PowerPoint PPT Presentation

Packet Transactions: High-Level Programming for Line-Rate Switches Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, Steve Licking Programmability at line rate


  1. Packet Transactions: High-Level Programming for Line-Rate Switches Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, Steve Licking

  2. Programmability at line rate • Programmable: Can we express new data-plane algorithms? • Active queue management • Congestion control • Measurement • Load balancing • Line rate: Highest capacity supported by dedicated hardware 2

  3. Programmable switching chips Same performance as fixed-function chips, some programmability E.g., FlexPipe, Xpliant, Tofino Queues/ Scheduler Deparser Parser Egress pipeline Ingress pipeline match/action match/action match/action match/action match/action Eth VLAN In Out IPv4 IPv6 TCP New Stage 1 Stage 2 Stage 16 Stage 1 Stage 16

  4. Where do programmable switches fall short? • Hard to program data-plane algorithms today • Hardware good for stateless tasks (forwarding), not stateful ones (AQM) • Low-level languages (P4, POF). • Challenges • Can we program data-plane algorithms in a high-level language? • Can we design a stateful instruction set supporting these algorithms?

  5. Contributions • Packet transaction: High-level abstraction for data-plane algorithms • Examples of several algorithms as packet transactions • Atoms: A representation for switch instruction sets • Seven concrete stateful instructions • Compiler from packet transactions to atoms • Allows us to iteratively design switch instruction sets

  6. Packet transactions • Packet transaction: block of imperative code • Transaction runs to completion, one packet at a time, serially p1.sample = 0 p1 count if (count == 9): p2.sample = 0 pkt.sample = pkt.src p2 0 1 2 9 0 count = 0 else : pkt.sample = 0 count++ persistent state packet fields p10.sample = 1.2.3.4 p10

  7. Under the hood … pipeline match/action match/action match/action Stage 1 Stage 2 Stage 16 7

  8. A machine model for line-rate switches pipeline action action action state state state unit unit unit Packet Header Stage 1 Stage 2 Stage 16 8

  9. A machine model for line-rate switches pipeline action action action state state state unit unit unit Typical requirement: 1 pkt / nanosecond Stage 1 Stage 2 Stage 16 9

  10. A machine model for line-rate switches action action action state state state unit unit unit Stage 1 Stage 2 Stage 16 10

  11. A machine model for line-rate switches action action action state state state constant X unit unit unit Add Mul choice 2-to-1 Mux X Stage 1 Stage 2 Stage 16 • Atom: smallest unit of atomic packet/state update A switch’s atoms constitute its instruction set 11

  12. Stateless vs. stateful operations Stateless operation: pkt.f4 = pkt.f1 + pkt.f2 – pkt.f3 f1 f1 f1 f2 f2 f2 pkt.f4 = pkt.tmp = f3 f3 f3 pkt.tmp - pkt.f3 pkt.f1 + pkt.f2 f4 f4 f4 = tmp – f3 tmp tmp = f1 tmp = f1 Can pipeline stateless operations + f2 + f2

  13. Stateless vs. stateful operations X should be 2, Stateful operation: x = x + 1 not 1! X = 0 X = 1 tmp tmp tmp pkt.tmp = x pkt.tmp ++ x = pkt.tmp = 0 = 1 tmp tmp tmp = 0 = 1

  14. Stateless vs. stateful operations Stateful operation: x = x + 1 X X++ tmp Cannot pipeline, need atomic operation in h/w

  15. Stateful atoms can be fairly involved x 2 - t o - 1 M u 0 x Adder Const Sub 3 - t o - 1 pkt_1 M u x RELOP pkt_2 Const Const 3 - t o - 1 pkt_1 M u x pkt_2 Update state in one of four x x 2 - t o 2 - t o - 1 - 1 0 M u x 0 M u x Adder Adder Const Const Sub Sub 3 - t o - 1 2 - t o - 1 3 - t o - 1 pkt_1 pkt_1 M u x M u x M u x ways based on four RELOP pkt_2 pkt_2 Const Const Const 3 - t o - 1 3 - t o - 1 predicates. pkt_1 pkt_1 M u x M u x pkt_2 pkt_2 x 2 - t o - 1 0 M u x Adder Const Sub 3 - t o - 1 pkt_1 M u x 2 - t o - 1 x pkt_2 M u x Each predicate can itself Const 3 - t o - 1 pkt_1 M u x pkt_2 depend on the state. x 2 - t o - 1 0 M u x Adder Const Sub 3 - t o - 1 pkt_1 M u x RELOP pkt_2 Const Const 3 - t o - 1 pkt_1 M u x pkt_2 x 2 - t o - 1 0 M u x Adder Const Sub 3 - t o - 1 2 - t o - 1 pkt_1 M u x M u x pkt_2 Const 3 - t o - 1 pkt_1 M u x pkt_2 x 2 - t o - 1 M u x 0 Adder Const Sub 3 - t o - 1 pkt_1 M u x pkt_2 Const 3 - t o - 1 pkt_1 M u x pkt_2

  16. Compiling packet transactions Packet Sampling Pipeline Packet Sampling Algorithm Stage 2 Stage 1 pkt.old = count; pkt.tmp = pkt.old == 9; if (count == 9): pkt.sample = pkt.tmp ? pkt.new = pkt.tmp ? 0 : (pkt.old + 1); pkt.src : 0 pkt.sample = pkt.src count = pkt.new; count = 0 Compiler else: pkt.sample = 0 count++ Stage 1 Stage 2 Stage 16

  17. Designing programmable switches Modify pipeline geometry or atom. Pipeline geometry Algorithm doesn’t compile? Compiler Atom Algorithm compiles Algorithm Move on to another algorithm Focus on stateful atoms, stateless operations are easily pipelined

  18. Demo

  19. Stateful atoms for programmable switches Atom Description Least R/W Read or write state Expressive RAW Read, add, and write back PRAW Predicated version of RAW IfElseRA 2 RAWs, one each when a W predicate is true or false Sub IfElseRAW with a stateful subtraction capability Nested 4-way predication (nests 2 IfElseRAWs) Most Pairs Update a pair of state variables Expressive

  20. Expressiveness of packet transactions Algorithm LOC Bloom filter 29 Heavy hitter detection 35 Rate-Control 23 Protocol Flowlet switching 37 Sampled NetFlow 18 HULL 26 Adaptive Virtual Queue 36 CONGA 32 CoDel 57

  21. Compilation results Algorithm LOC Most expressive stateful atom required Bloom filter 29 R/W Heavy hitter detection 35 RAW Rate-Control 23 PRAW Protocol Flowlet switching 37 PRAW Sampled NetFlow 18 IfElseRAW HULL 26 Sub Adaptive Virtual Queue 36 Nested CONGA 32 Pairs CoDel 57 Doesn’t map

  22. Compilation results Algorithm LOC Most expressive Pipeline Pipeline stateful atom required Depth Width Bloom filter 29 R/W 4 3 Heavy hitter detection 35 RAW 10 9 Rate-Control 23 PRAW 6 2 Protocol Flowlet switching 37 PRAW 3 3 Sampled NetFlow 18 IfElseRAW 4 2 HULL 26 Sub 7 1 Adaptive Virtual Queue 36 Nested 7 3 CONGA 32 Pairs 4 2 CoDel 57 Doesn’t map 15 3 ~100 atom instances are sufficient

  23. Modest cost for programmability • All atoms meet timing at 1 GHz in a 32-nm library. • They occupy modest additional area relative to a switching chip. Atom Description Atom area Area for 100 atoms relative (micro m^2) to 200 mm^2 chip R/W Read or write state 250 0.0125% RAW Read, add, and write back 431 0.022% PRAW Predicated version of RAW 791 0.039% IfElseRAW 2 RAWs, one each when a 985 0.049% predicate is true or false Sub IfElseRAW with a stateful 1522 0.076% subtraction capability Nested 4-way predication (nests 2 3597 0.179% IfElseRAWs) <1 % additional area for 100 atom instances Pairs Update a pair of state variables 5997 0.30%

  24. Conclusion • Packet transactions: an abstraction for data-plane algorithms • Atoms: a representation for switch instruction sets • A blue print for designing switch instruction sets • Source code: http://web.mit.edu/domino

  25. Backup slides

  26. Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) Create one node for each instruction pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new

  27. Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) Packet field dependencies pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new

  28. Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) State dependencies pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new

  29. Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) Strongly connected components pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new

  30. Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1); count = pkt.new Condensed DAG pkt.sample = pkt.tmp ? pkt.src : 0

  31. Sequential to pipelined code Stage 1 Stage 2 pkt.old = count; pkt.tmp = pkt.old == 9; pkt.sample = pkt.tmp ? pkt.new = pkt.tmp ? 0 : (pkt.old + 1); pkt.src : 0 count = pkt.new; Code pipelining

  32. Hardware constraints Stage 1 Stage 2 pkt.old = count; pkt.tmp = pkt.old == 9; pkt.sample = pkt.tmp ? pkt.new = pkt.tmp ? 0 : (pkt.old + 1); pkt.src : 0 count = pkt.new; Stage 1 Stage 2 Stage 16

  33. Hardware constraints: example constant 1 X x = x + 1 maps to this atom Add Mul x = x * x doesn’t map 2-to-1 Mux Add choice X § Determines if algorithm can/cannot run at line rate

  34. Our work pipeline Packet transaction in Domino For each packet match/action match/action match/action Calculate average queue size if min < avg < max Compiler calculate probability p mark packet with probability p else if avg > max mark packet Stage 1 Stage 2 Stage 16 Program in imperative DSL, compile to run at line-rate

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend