the deforestation of l2
play

THE DEFORESTATION OF L2 James McCauley, Mingjie Zhao, Ethan J. - PowerPoint PPT Presentation

THE DEFORESTATION OF L2 James McCauley, Mingjie Zhao, Ethan J. Jackson, Barath Raghavan, Sylvia Ratnasamy, Scott Shenker UC Berkeley, UESTC, and ICSI The Talk What is AXE? Why look at this? How does it work? Really? This actually


  1. THE DEFORESTATION OF L2 James McCauley, Mingjie Zhao, Ethan J. Jackson, Barath Raghavan, Sylvia Ratnasamy, Scott Shenker UC Berkeley, UESTC, and ICSI

  2. The Talk • What is AXE? • Why look at this? • How does it work? • Really? This actually works? 2

  3. The What • An redesign of L2 to replace Ethernet and Spanning Tree Protocol (and its variants) • Targets are “normal” enterprise networks, machine rooms, small private DCs – Not the Googles, Microsofts, Rackspaces – Not networks with incredibly highly utilization – Not managed by a full-time team of experts 3

  4. The What: Goals • Plug-and-play – If not, might as well just use L3 • Use all links for shortest paths – Number one shortcoming of STP • Fast recovery from failure – Number two shortcoming of STP? 4

  5. The What: Goals • Plug-and-play – If not, might as well just use L3 • Use all links for shortest paths – Number one shortcoming of STP • Fast Packet-timescale recovery from failure – Number two shortcoming of STP? 5

  6. The What: Assumptions • Failure detection can be fast – Not traditionally the bottleneck • Control plane “hellos” were sufficient – Need interrupt-driven LFS, BFD, etc. • There’s a market for flood-and-learn L2 – Flooding/learn has security implications – No heavy unidirectional traffic • No multi-access links – Everything is point-to-point 6

  7. The Why: Is L2 still a problem? • Still many largely-unmanaged, small/med L2 networks! – Two in our building in Berkeley! • There have been a few interesting developments… – SPB, TRILL, SEATTLE, etc. – Provide various tradeoffs • AXE attempts to strike a different balance – Focus on two key problems – Keeping things as simple as possible (no control plane) 7

  8. The Why: Context Plug-and-play Shortest Paths Fast Recovery No Control Plane ✓ � � � STP ✓ ✓ ✓ � � No STP (Tree) ✓ ✓ � � TRILL/SPB ✓ � � � IP (L3) � ✓ � ? Custom ✓ ✓ ✓ ✓ AXE 8

  9. The How: Extend Ethernet • Basic flood/learn Ethernet – When you see a packet: learn – When you don’t know what to do: flood • But AXE does not need a tree to deal with loops – Means flooding works for handling failures too • (because alternate paths are immediately available) – Means that flood/learn finds short paths • (because you haven’t removed links) 9

  10. The How: Treeless flooding • How do you get around the loop problem? • Duplicate-packet-detection • Multiple ways of doing it • Our focus: hash-based deduplication filter – In short: hash table where you replace upon collision – Straightforward – Amenable to hardware/P4 implementation 10

  11. The How: What changes? • Learning is more subtle – Source address seen on multiple ports – Packets may even be going backwards! • Responding to failures is more subtle – Means we have to unlearn (outdated) state 11

  12. The How: Extend Header • Extend the packet header between switches – Nonce (per-switch sequence number) • Used for packet deduplication – Hop count • Influences learning, also protects from loops – Flooded flag: F • Tracks whether a packet is being flooded – Learnable flag: L • Tracks whether packet can be learned from 12

  13. The How: Separate queues • Switches have flood queue and normal queue – The Flooded flag in the header determines which – Flood queue has higher priority and is shorter – Normal queue sized… normally • Intuition: – Delivering floods quickly stops flooding quickly – Deduplication only applies to floods, keeping fewer floods in flight makes dedup easier 13

  14. The How: Overview • Extend packet header – Nonce, Hop Count, Flooded / Learnable flags • Learning/Unlearning Phase – May learn port and HC to src – May unlearn path to dst if trouble was observed • Output Phase – If packet is a duplicate: drop – If unknown- dst /path-failed/already-flooding: flood – Otherwise forward according to table 14

  15. The How: Example A sends a packet to B (L:True) • – Destination B unknown; packet flooded from first hop (F:True) All switches learn how to reach A • B sends to A (L:True) • – Direct path following table entries to A (F:False) Switches along path learn how to reach B • Link fails • A sends another packet to B (L:True) • – Follows along path… (F:False) – .. until it hits failure (L:False F:True) – Switch floods packet out all ports (even backwards) Flooded packet reaches B (Successful delivery!) • Another duplicate of flooded packet reaches A ’s first hop; unlearn B • A sends another packet to B (L:True) • – Destination B unknown; packet flooded from first hop (F:True) All switches learn how to reach A again • 15

  16. Really? Preliminaries • How much flooding do failures cause? Campus Cluster 0.00% 0.02% 0.04% 0.06% 0.08% 0.10% 0.12% 0.14% 0.16% 0.18% • How big does the deduplication filter need to be? – Less than 1,000 entries in our simulations • Does it recover from overload? – Yes * 16

  17. Really? Overview • Thinking back to that matrix… – We want plug and play – We to support shortest paths using all links – We don’t want to have a control plane – Packet-timescale recovery from failures 17

  18. Really? Failure benchmark • Omniscient, randomized, shortest-path routing • Failure → Adjustable delay → Fix routes • Delay of zero is optimal routing / an upper bound • Nonzero delay meant to roughly simulate… – OSPF, IS-IS, TRILL, SPB, etc. – .. without needing to model each one in detail • Random shortest-cost tree rooted at each destination • Note: we don’t compare ourselves to STP at all 18

  19. Really? Failure recovery - UDP • Send traffic on network 160ms 0.27% with high failure rate 80ms 0.19% 40ms 0.13% 20ms 0.08% 10ms 0.05% • Metric is unnecessary 5ms 0.03% delivery failures – AXE 0.00% 0.00% 0.05% 0.10% 0.15% 0.20% 0.25% 0.30% packets that weren’t Delivery Failures (Campus) delivered even though optimal routing could 160ms 0.021% have delivered them 80ms 0.011% 40ms 0.008% 20ms 0.005% 10ms 0.003% • AXE has no unnecessary 5ms 0.002% AXE 0.000% delivery failures 0.000% 0.005% 0.010% 0.015% 0.020% 0.025% Delivery Failures (Cluster) 19

  20. Really? Failure recovery - TCP 160ms 0.25% 80ms 0.11% • Similar setup, but with TCP 40ms 0.14% 20ms 0.06% 10ms 0.01% 5ms 0% • Metric is number of flows AXE 0% with significantly worse FCT 0.0% 0.1% 0.1% 0.2% 0.2% 0.3% 0.3% Delayed FCTs (Campus) than optimal routing 160ms 0.036% 80ms 0.020% • AXE has no significantly 40ms 0.015% 20ms 0.010% worse FCTs 10ms 0.007% 5ms 0.002% AXE 0% 0.00% 0.01% 0.01% 0.02% 0.02% 0.03% 0.03% 0.04% 0.04% Delayed FCTs (Cluster) 20

  21. The End: Not Mentioned Here • Multicast AXE – On any change (failure; join), flood+dedup and prune – Flooded packets have all data needed to build tree • AXE with Hedera – Use AXE for mice & recovery – Centralized SDN routing for elephant flows • P4 implementation – AXE is expressible in P4 – Performance on real hardware is open question 21

  22. THE END 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend