Data Driven Connectivity Junda Liu, Aurojit Panda , Ankit Singla, - - PowerPoint PPT Presentation

data driven connectivity
SMART_READER_LITE
LIVE PREVIEW

Data Driven Connectivity Junda Liu, Aurojit Panda , Ankit Singla, - - PowerPoint PPT Presentation

Data Driven Connectivity Junda Liu, Aurojit Panda , Ankit Singla, Brighten Godfrey, Michael Schapira, Scott Shenker Division of Concerns Division of Concerns Routing is a control plane operation. Operates in the order of milliseconds.


slide-1
SLIDE 1

Data Driven Connectivity

Junda Liu, Aurojit Panda, Ankit Singla, Brighten Godfrey, Michael Schapira, Scott Shenker

slide-2
SLIDE 2

Division of Concerns

slide-3
SLIDE 3

Division of Concerns

  • Routing is a control plane operation.
  • Operates in the order of milliseconds.
slide-4
SLIDE 4

Division of Concerns

  • Routing is a control plane operation.
  • Operates in the order of milliseconds.
  • Packet forwarding is a data plane operation.
  • Operates in the order of microseconds.
slide-5
SLIDE 5

Link Failures Hard

slide-6
SLIDE 6

Link Failures Hard

  • Some users require low latency packet delivery.
slide-7
SLIDE 7

Link Failures Hard

  • Some users require low latency packet delivery.
  • Some users require high reliability.
slide-8
SLIDE 8

Link Failures Hard

  • Some users require low latency packet delivery.
  • Some users require high reliability.
  • Control Plane response to link failure is too slow.
slide-9
SLIDE 9

Today’s Solution

  • Rely on precomputed backup paths
slide-10
SLIDE 10

Today’s Solution

  • Rely on precomputed backup paths
  • Typically support single link failures.
slide-11
SLIDE 11

Today’s Solution

  • Rely on precomputed backup paths
  • Typically support single link failures.
  • State grows exponentially for more links.
slide-12
SLIDE 12

Today’s Solution

  • Rely on precomputed backup paths
  • Typically support single link failures.
  • State grows exponentially for more links.
  • Hard to generalize. Hard to configure.
slide-13
SLIDE 13

Routing is the Problem!

  • Routing conflates two functions
  • Optimality - Use good paths
  • Inherently global, requires coordination.
  • Connectivity - Deliver packets
  • Can it be local?
slide-14
SLIDE 14

Data Plane Connectivity

slide-15
SLIDE 15

Data Plane Connectivity

  • Can we push connectivity to the data plane?
slide-16
SLIDE 16

Data Plane Connectivity

  • Can we push connectivity to the data plane?
  • What would it take?
slide-17
SLIDE 17

Data Plane Connectivity

  • Can we push connectivity to the data plane?
  • What would it take?
  • No FIB changes at packet rate.
slide-18
SLIDE 18

Data Plane Connectivity

  • Can we push connectivity to the data plane?
  • What would it take?
  • No FIB changes at packet rate.
  • No additional data in packet header.
slide-19
SLIDE 19

Data Plane Connectivity

  • Can we push connectivity to the data plane?
  • What would it take?
  • No FIB changes at packet rate.
  • No additional data in packet header.
  • Impossible
slide-20
SLIDE 20

Data Plane Connectivity

  • Can we push connectivity to the data plane?
  • What would it take?
  • No FIB changes at packet rate.
  • No additional data in packet header.
  • Impossible
slide-21
SLIDE 21

Data Plane Connectivity

  • Relax constraints
  • Change a few bits in FIB at packet rates.
  • Clearly feasible, but is it enough?
slide-22
SLIDE 22

Guaranteeing Connectivity

  • 1. Take advantage of available redundancy.
slide-23
SLIDE 23

Guaranteeing Connectivity

  • 1. Take advantage of available redundancy.
  • 2. Restore connectivity at data speeds.
slide-24
SLIDE 24

Guaranteeing Connectivity

  • 1. Take advantage of available redundancy.
  • 2. Restore connectivity at data speeds.
  • 3. Achieve optimality at control speeds.
slide-25
SLIDE 25

Using Redundancy: DAGs

Destination

slide-26
SLIDE 26

Using Redundancy: DAGs

  • Current paths to a destination do not use all links

Destination

slide-27
SLIDE 27

Using Redundancy: DAGs

  • Current paths to a destination do not use all links
  • Extend routing tables to increase redundancy.

Destination

slide-28
SLIDE 28

Restoring Connectivity

slide-29
SLIDE 29

Reverse to Reconnect

slide-30
SLIDE 30

Reverse to Reconnect

  • Link failure can disconnect a DAG.
slide-31
SLIDE 31

Reverse to Reconnect

  • Link failure can disconnect a DAG.
  • Disconnected node reverses all links to point out.
slide-32
SLIDE 32

Reverse to Reconnect

  • Link failure can disconnect a DAG.
  • Disconnected node reverses all links to point out.
  • Finite set of reversals reconnect DAG.
slide-33
SLIDE 33

Reversals in Data Plane

  • Two challenges must be addressed
slide-34
SLIDE 34

Reversals in Data Plane

  • Two challenges must be addressed
  • Notifications can be lost.
slide-35
SLIDE 35

Reversals in Data Plane

  • Two challenges must be addressed
  • Notifications can be lost.
  • Notifications can be delayed.
slide-36
SLIDE 36

Walk Through

slide-37
SLIDE 37

Walk Through

slide-38
SLIDE 38

Walk Through

Source

slide-39
SLIDE 39

Create an OUT Link

slide-40
SLIDE 40

Create an OUT Link

Local Sequence

slide-41
SLIDE 41

Create an OUT Link

1 Remote Sequence Local Sequence

slide-42
SLIDE 42

Create an OUT Link

1 Remote Sequence Local Sequence Reversible

slide-43
SLIDE 43

Create an OUT Link

  • Reverse link direction

1 Remote Sequence Local Sequence Reversible

slide-44
SLIDE 44

Create an OUT Link

  • Reverse link direction
  • Increment Local Sequence

1 1 Remote Sequence Local Sequence Reversible

slide-45
SLIDE 45

Create an OUT Link

  • Reverse link direction
  • Increment Local Sequence
  • Forward packet

1 1 Remote Sequence Local Sequence Reversible 1

slide-46
SLIDE 46

Dealing with Notifications

Remote Sequence Local Sequence Reversible 00

slide-47
SLIDE 47

Dealing with Notifications

  • Receive on link pointing OUT

Remote Sequence Local Sequence Reversible 00 1

slide-48
SLIDE 48

Dealing with Notifications

  • Receive on link pointing OUT
  • Compare sequence numbers

Remote Sequence Local Sequence Reversible 00 1

slide-49
SLIDE 49

Dealing with Notifications

  • Receive on link pointing OUT
  • Compare sequence numbers
  • See if anything changed

Remote Sequence Local Sequence Reversible 00 1

slide-50
SLIDE 50

1

Dealing with Notifications

  • Receive on link pointing OUT
  • Compare sequence numbers
  • See if anything changed
  • Reverse link

Remote Sequence Local Sequence Reversible 00 1

slide-51
SLIDE 51

Zooming Out

slide-52
SLIDE 52

Zooming Out

1

slide-53
SLIDE 53

Zooming Out

1

slide-54
SLIDE 54

What about Optimality?

slide-55
SLIDE 55

Safe Control Plane

  • Cannot interfere with data plane.
slide-56
SLIDE 56

Safe Control Plane

  • Cannot interfere with data plane.
  • Build a safe primitive
slide-57
SLIDE 57

Safe Control Plane

  • Cannot interfere with data plane.
  • Build a safe primitive
  • Set all edges of a node to point out
slide-58
SLIDE 58

Safe Control Plane

  • Cannot interfere with data plane.
  • Build a safe primitive
  • Set all edges of a node to point out
  • Described in paper
slide-59
SLIDE 59

Evaluation

slide-60
SLIDE 60

Evaluation Overview

  • Test on WAN and datacenter topologies
  • Stretch, Throughput, Latency
  • Effect of FIB update delays
  • On latency and throughput
  • End-to-end benefits of using DDC.
slide-61
SLIDE 61

Evaluation Overview

  • Test on WAN and datacenter topologies
  • Stretch, Throughput, Latency
  • Effect of FIB update delays
  • On latency and throughput
  • End-to-end benefits of using DDC.
slide-62
SLIDE 62

End-to-End Test

  • 8 Pod FatTree
  • Partition aggregate workload
  • 5 link failures
  • Simulated effect for 550 seconds
slide-63
SLIDE 63

Requests Fulfilled

  • Bucketed 10 second intervals.
  • Percentage requests satisfied.
slide-64
SLIDE 64

Request Latency

slide-65
SLIDE 65

FIB Update Delay

  • What is the impact of delayed FIB changes
  • On packet latency?
  • Three link failure: all traffic in test affected.
  • Focus on behavior before convergence.
slide-66
SLIDE 66

FIB Update Delay

Overall ~99% of packets in under 3 ms. No packets get dropped, just long tail.

slide-67
SLIDE 67

FIB Update Delay

  • What is the impact of delayed FIB changes
  • On TCP throughput?
  • Use a WAN topology (AS 2914)
  • 1 Gbps links
  • 5 link failures
slide-68
SLIDE 68

FIB Update Delay

slide-69
SLIDE 69

In the Same Vein...

  • FCP (SIGCOMM ’07)
  • Unbounded bits in header
  • Extensive FIB changes on failure packet
  • Packet Re-Cycling (HotNets ’10)
  • First solve an NP-Complete problem.
  • log(network diameter) bits in header.
  • DDC is simpler.
slide-70
SLIDE 70

Potential Impact

  • ASICs implement DDC
  • Connectivity guaranteed by the data plane.
  • Control Plane focuses on optimality/functionality.
slide-71
SLIDE 71

Questions?