THE DEFORESTATION OF L2 James McCauley, Mingjie Zhao, Ethan J. - - PowerPoint PPT Presentation

the deforestation of l2
SMART_READER_LITE
LIVE PREVIEW

THE DEFORESTATION OF L2 James McCauley, Mingjie Zhao, Ethan J. - - PowerPoint PPT Presentation

THE DEFORESTATION OF L2 James McCauley, Mingjie Zhao, Ethan J. Jackson, Barath Raghavan, Sylvia Ratnasamy, Scott Shenker UC Berkeley, UESTC, and ICSI The Talk What is AXE? Why look at this? How does it work? Really? This actually


slide-1
SLIDE 1

THE DEFORESTATION OF L2

James McCauley, Mingjie Zhao, Ethan J. Jackson, Barath Raghavan, Sylvia Ratnasamy, Scott Shenker UC Berkeley, UESTC, and ICSI

slide-2
SLIDE 2

The Talk

  • What is AXE?
  • Why look at this?
  • How does it work?
  • Really? This actually works?

2

slide-3
SLIDE 3

The What

  • An redesign of L2 to replace Ethernet and

Spanning Tree Protocol (and its variants)

  • Targets are “normal” enterprise networks,

machine rooms, small private DCs

– Not the Googles, Microsofts, Rackspaces – Not networks with incredibly highly utilization – Not managed by a full-time team of experts

3

slide-4
SLIDE 4

The What: Goals

  • Plug-and-play

– If not, might as well just use L3

  • Use all links for shortest paths

– Number one shortcoming of STP

  • Fast recovery from failure

– Number two shortcoming of STP?

4

slide-5
SLIDE 5

The What: Goals

  • Plug-and-play

– If not, might as well just use L3

  • Use all links for shortest paths

– Number one shortcoming of STP

  • Fast Packet-timescale recovery from failure

– Number two shortcoming of STP?

5

slide-6
SLIDE 6

The What: Assumptions

  • Failure detection can be fast

– Not traditionally the bottleneck

  • Control plane “hellos” were sufficient

– Need interrupt-driven LFS, BFD, etc.

  • There’s a market for flood-and-learn L2

– Flooding/learn has security implications – No heavy unidirectional traffic

  • No multi-access links

– Everything is point-to-point

6

slide-7
SLIDE 7

The Why: Is L2 still a problem?

  • Still many largely-unmanaged, small/med L2 networks!

– Two in our building in Berkeley!

  • There have been a few interesting developments…

– SPB, TRILL, SEATTLE, etc. – Provide various tradeoffs

  • AXE attempts to strike a different balance

– Focus on two key problems – Keeping things as simple as possible (no control plane)

7

slide-8
SLIDE 8

The Why: Context

8

Plug-and-play Shortest Paths Fast Recovery No Control Plane STP

  • No STP (Tree)

✓ ✓

TRILL/SPB

✓ ✓

  • IP (L3)
  • Custom

?

  • AXE

✓ ✓ ✓ ✓

slide-9
SLIDE 9

The How: Extend Ethernet

  • Basic flood/learn Ethernet

– When you see a packet: learn – When you don’t know what to do: flood

  • But AXE does not need a tree to deal with loops

– Means flooding works for handling failures too

  • (because alternate paths are immediately available)

– Means that flood/learn finds short paths

  • (because you haven’t removed links)

9

slide-10
SLIDE 10

The How: Treeless flooding

  • How do you get around the loop problem?
  • Duplicate-packet-detection
  • Multiple ways of doing it
  • Our focus: hash-based deduplication filter

– In short: hash table where you replace upon collision – Straightforward – Amenable to hardware/P4 implementation

10

slide-11
SLIDE 11

The How: What changes?

  • Learning is more subtle

– Source address seen on multiple ports – Packets may even be going backwards!

  • Responding to failures is more subtle

– Means we have to unlearn (outdated) state

11

slide-12
SLIDE 12

The How: Extend Header

  • Extend the packet header between switches

– Nonce (per-switch sequence number)

  • Used for packet deduplication

– Hop count

  • Influences learning, also protects from loops

– Flooded flag: F

  • Tracks whether a packet is being flooded

– Learnable flag: L

  • Tracks whether packet can be learned from

12

slide-13
SLIDE 13

The How: Separate queues

  • Switches have flood queue and normal queue

– The Flooded flag in the header determines which – Flood queue has higher priority and is shorter – Normal queue sized… normally

  • Intuition:

– Delivering floods quickly stops flooding quickly – Deduplication only applies to floods, keeping fewer floods in flight makes dedup easier

13

slide-14
SLIDE 14

The How: Overview

  • Extend packet header

– Nonce, Hop Count, Flooded / Learnable flags

  • Learning/Unlearning Phase

– May learn port and HC to src – May unlearn path to dst if trouble was observed

  • Output Phase

– If packet is a duplicate: drop – If unknown-dst/path-failed/already-flooding: flood – Otherwise forward according to table

14

slide-15
SLIDE 15

The How: Example

  • A sends a packet to B (L:True)

– Destination B unknown; packet flooded from first hop (F:True)

  • All switches learn how to reach A
  • B sends to A (L:True)

– Direct path following table entries to A (F:False)

  • Switches along path learn how to reach B
  • Link fails
  • A sends another packet to B (L:True)

– Follows along path… (F:False) – .. until it hits failure (L:False F:True) – Switch floods packet out all ports (even backwards)

  • Flooded packet reaches B (Successful delivery!)
  • Another duplicate of flooded packet reaches A’s first hop; unlearn B
  • A sends another packet to B (L:True)

– Destination B unknown; packet flooded from first hop (F:True)

  • All switches learn how to reach A again

15

slide-16
SLIDE 16

0.00% 0.02% 0.04% 0.06% 0.08% 0.10% 0.12% 0.14% 0.16% 0.18% Campus Cluster

  • How much flooding do failures cause?
  • How big does the deduplication filter need to be?

– Less than 1,000 entries in our simulations

  • Does it recover from overload?

– Yes

Really? Preliminaries

16

*

slide-17
SLIDE 17

Really? Overview

  • Thinking back to that matrix…

– We want plug and play – We to support shortest paths using all links – We don’t want to have a control plane – Packet-timescale recovery from failures

17

slide-18
SLIDE 18

Really? Failure benchmark

  • Omniscient, randomized, shortest-path routing
  • Failure → Adjustable delay → Fix routes
  • Delay of zero is optimal routing / an upper bound
  • Nonzero delay meant to roughly simulate…

– OSPF, IS-IS, TRILL, SPB, etc. – .. without needing to model each one in detail

  • Random shortest-cost tree rooted at each destination
  • Note: we don’t compare ourselves to STP at all

18

slide-19
SLIDE 19

Really? Failure recovery - UDP

  • Send traffic on network

with high failure rate

  • Metric is unnecessary

delivery failures – packets that weren’t delivered even though

  • ptimal routing could

have delivered them

  • AXE has no unnecessary

delivery failures

19

0.00% 0.03% 0.05% 0.08% 0.13% 0.19% 0.27% 0.00% 0.05% 0.10% 0.15% 0.20% 0.25% 0.30% AXE 5ms 10ms 20ms 40ms 80ms 160ms 0.000% 0.002% 0.003% 0.005% 0.008% 0.011% 0.021% 0.000% 0.005% 0.010% 0.015% 0.020% 0.025% AXE 5ms 10ms 20ms 40ms 80ms 160ms

Delivery Failures (Campus) Delivery Failures (Cluster)

slide-20
SLIDE 20
  • Similar setup, but with TCP
  • Metric is number of flows

with significantly worse FCT than optimal routing

  • AXE has no significantly

worse FCTs

Really? Failure recovery - TCP

20

0% 0% 0.01% 0.06% 0.14% 0.11% 0.25% 0.0% 0.1% 0.1% 0.2% 0.2% 0.3% 0.3% AXE 5ms 10ms 20ms 40ms 80ms 160ms 0% 0.002% 0.007% 0.010% 0.015% 0.020% 0.036% 0.00% 0.01% 0.01% 0.02% 0.02% 0.03% 0.03% 0.04% 0.04% AXE 5ms 10ms 20ms 40ms 80ms 160ms

Delayed FCTs (Campus) Delayed FCTs (Cluster)

slide-21
SLIDE 21

The End: Not Mentioned Here

  • Multicast AXE

– On any change (failure; join), flood+dedup and prune – Flooded packets have all data needed to build tree

  • AXE with Hedera

– Use AXE for mice & recovery – Centralized SDN routing for elephant flows

  • P4 implementation

– AXE is expressible in P4 – Performance on real hardware is open question

21

slide-22
SLIDE 22

22

THE END