TOWARDS LOSSLESS DATA CENTER RECONFIGURATION: CONSISTENT NETWORK - - PowerPoint PPT Presentation

towards lossless data center reconfiguration consistent
SMART_READER_LITE
LIVE PREVIEW

TOWARDS LOSSLESS DATA CENTER RECONFIGURATION: CONSISTENT NETWORK - - PowerPoint PPT Presentation

TOWARDS LOSSLESS DATA CENTER RECONFIGURATION: CONSISTENT NETWORK UPDATES IN SDNS KLAUS-TYCHO FOERSTER Joint work with 1. Consistent Updates in Software Defined Networks: On Dependencies, Loop Freedom, and Blackholes (IFIP Networking 2016)


slide-1
SLIDE 1

TOWARDS LOSSLESS DATA CENTER RECONFIGURATION: CONSISTENT NETWORK UPDATES IN SDNS

KLAUS-TYCHO FOERSTER

slide-2
SLIDE 2

Joint work with…

1. Consistent Updates in Software Defined Networks: On Dependencies, Loop Freedom, and Blackholes (IFIP Networking 2016) Klaus-Tycho Foerster, Ratul Mahajan, Roger Wattenhofer 2. On Consistent Migration of Flows in SDNs (INFOCOM 2016) Sebastian Brandt, Klaus-Tycho Foerster, Roger Wattenhofer 3. The Power of Two in Consistent Network Updates: Hard Loop Freedom, Easy Flow Migration (ICCCN 2016) Klaus-Tycho Foerster, Roger Wattenhofer 4. Augmenting Flows for the Consistent Migration of Multi-Commodity Single-Destination Flows in SDNs (Pervasive Mob. Comput. 2017) Sebastian Brandt, Klaus-Tycho Foerster, Roger Wattenhofer 5. Local Checkability, No Strings Attached: (A)cyclicity, Reachability, Loop Free Updates in SDNs (Theoret. Comput. Sci 2016) Klaus-Tycho Foerster, Thomas Luedi, Jochen Seidel, Roger Wattenhofer 6. Understanding and Mitigating Packet Corruption in Data Center Networks (SIGCOMM 2017) Danyang Zhuo, Monia Ghobadi, Ratul Mahajan, Klaus-Tycho Foerster, Arvind Krishnamurthy, Thomas Anderson 7. Survey of Consistent Network Updates (under submission, arXiv 1609.02305) Klaus-Tycho Foerster, Stefan Schmid, Stefano Vissicchio 8. Loop-Free Route Updates for Software-Defined Networks (under submission, extended version of their PODC 2015) Klaus-Tycho Foerster, Arne Ludwig, Jan Marcinkowski, Stefan Schmid 9. Not so Lossless Flow Migration (under submission, partially contained in Dissertation) Sebastian Brandt, Klaus-Tycho Foerster, Laurent Vanbever, Roger Wattenhofer

slide-3
SLIDE 3

First Motivation: Link Repair

Zhou et al.: Understanding and Mitigating Packet Corruption in Data Center Networks (SIGCOMM 2017).

Root Cause Relative Ratio Connector contamination 17-57% Bent or damaged cable 14-48% Decaying transmitter < 1% Loose or bad transceiver 6-45% Shared component failure 10-26%

Relative contributions of corruption in 15 DCNs (350K switch-to-switch optical links, over 7 months)

slide-4
SLIDE 4

Toy Example

d v u

slide-5
SLIDE 5

Toy Example

d v u

slide-6
SLIDE 6

Toy Example

d v u d v u

slide-7
SLIDE 7

Toy Example

d v u d v u d v u

slide-8
SLIDE 8

Appears in Practice

“Data plane updates may fall behind the control plane acknowledgments and may be even reordered.” Kuzniar et al., PAM 2015 “some switches can ‘straggle,’ taking substantially more time than average (e.g., 10-100x) to apply an update” Jin et al., SIGCOMM 2014 “…the inbound latency is quite variable with a […] standard deviation of 31.34ms…” He et al., SOSR 2015

slide-9
SLIDE 9

Toy Example

d v u d v u d v u

slide-10
SLIDE 10

Toy Example

d v u d v u d v u

slide-11
SLIDE 11

Software-Defined Networking

Centralized controller updates networks rules for optimization

Controller (control plane) updates the switches/routers (data plane)

slide-12
SLIDE 12
  • ld network

rules new network rules network updates

slide-13
SLIDE 13
  • ld network

rules new network rules network updates

slide-14
SLIDE 14
  • ld network

rules new network rules network updates possible solution: be fast! e.g., B4 [Jain et al., 2013]

slide-15
SLIDE 15
  • ld network

rules new network rules network updates possible solution: synchronize time well! e.g., TimedSDN [Mizrahi et al., 2014-17] Chronus [Zheng et al., 2017]

slide-16
SLIDE 16
  • ld network

rules new network rules network updates possible solution: be consistent!

e.g.,

  • per-router ordering [Vanbever et al., 2012]
  • two phase commit [Reitblatt et al., 2012]
  • SWAN [Hong et al., 2013]
  • Dionysus [Jin et al., 2014]
  • ….
slide-17
SLIDE 17
  • ld network

rules new network rules network updates possible solution: be consistent!

slide-18
SLIDE 18
slide-19
SLIDE 19

Ordering Solution: Go backwards through the new Tree

  • Always works for single-destination rules
  • Also for multi-destination with sufficient memory („split“)
  • Schedule length: tree depth (up to Ω(n) )
  • Optimal algorithms?

d v u d v u d v u

slide-20
SLIDE 20

Optimal Schedule?

  • 3-round schedule? NP-complete! [Ludwig et al., 2015]
  • (Sublinear schedule for 2 destinations w/o split: NP-complete)
slide-21
SLIDE 21

Optimal Schedule?

  • 3-round schedule? NP-complete! [Ludwig et al., 2015]
  • (Sublinear schedule for 2 destinations w/o split: NP-complete)
  • However: greedy updates always finish (eventually).
slide-22
SLIDE 22

Optimal Schedule?

  • 3-round schedule? NP-complete! [Ludwig et al., 2015]
  • (Sublinear schedule for 2 destinations w/o split: NP-complete)
  • However: greedy updates always finish (eventually).
  • Maximizing greedy update: NP-complete!
  • But: Can be approximated well.
  • Feedback Arc Set / Max. Acyclic Subgraph

[ICCCN ‘16] & [Amiri et al., ‘16]

slide-23
SLIDE 23

Optimal Schedule?

  • 3-round schedule? NP-complete! [Ludwig et al., 2015]
  • (Sublinear schedule for 2 destinations w/o split: NP-complete)
  • However: greedy updates always finish (eventually).
  • Maximizing greedy update: NP-complete!
  • But: Can be approximated well.
  • Feedback Arc Set / Max. Acyclic Subgraph
  • Bad news: Greedy can turn O(1) instances into Ω(n) schedules 
  • What to do?

[ICCCN ‘16] & [Amiri et al., ‘16] [Ludwig et al., 2015]

slide-24
SLIDE 24

Relax! [Ludwig et al., 2015]

Two key ideas:

  • 1. destination d based source-destination pairs (s,d)
  • 2. no loops no loops between (s,d)
slide-25
SLIDE 25

Relax! [Ludwig et al., 2015]

Two key ideas:

  • 1. destination d based source-destination pairs (s,d)
  • 2. no loops no loops between (s,d)

s d

slide-26
SLIDE 26

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds

s d

slide-27
SLIDE 27

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed?

s d

slide-28
SLIDE 28

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed?

s d

slide-29
SLIDE 29

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed?

s d

slide-30
SLIDE 30

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed?

s d

slide-31
SLIDE 31

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed? Just 3 rounds

s d

slide-32
SLIDE 32

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed? Just 3 rounds
  • In general: 𝑃(log 𝑜) rounds („Peacock“)

s d

slide-33
SLIDE 33

Relax! [Ludwig et al., 2015]

  • Non-relaxed: Ω(n) rounds
  • Relaxed? Just 3 rounds
  • In general: 𝑃(log 𝑜) rounds („Peacock“)

s d

slide-34
SLIDE 34

Relax!

  • Non-relaxed: Ω(n) rounds
  • Relaxed? Just 3 rounds
  • In general: 𝑃(log 𝑜) rounds („Peacock“) – Optimal?
slide-35
SLIDE 35

Relax!

  • Non-relaxed: Ω(n) rounds
  • Relaxed? Just 3 rounds
  • In general: 𝑃(log 𝑜) rounds („Peacock“) – Optimal?
  • Ω(log𝑜) instances exist for Peacock
slide-36
SLIDE 36

Relax!

  • Non-relaxed: Ω(n) rounds
  • Relaxed? Just 3 rounds
  • In general: 𝑃(log 𝑜) rounds („Peacock“) – Optimal?
  • Ω(log𝑜) instances exist for Peacock
  • Worst case for relaxed? – Unknown!
  • Worst known: 7 rounds (𝑜 > 1000)

[Ludwig et al., 2015]

slide-37
SLIDE 37

Greedy updates

slide-38
SLIDE 38

Decentralized Updates for „Tree-Ordering“

  • So far: every round:
  • Controller computes and sends out updates
  • Switches implement them and send acks
  • Controller receives acks
slide-39
SLIDE 39

Decentralized Updates for „Tree-Ordering“

  • So far: every round:
  • Controller computes and sends out updates
  • Switches implement them
  • Controller receives acks
  • Alternative: Use dualism to so-called proof labeling schemes

SDN switch (Verifier) Centralized Controller (Prover)

slide-40
SLIDE 40

Decentralized Updates for „Tree-Ordering“

When should I update?

slide-41
SLIDE 41

Decentralized Updates for „Tree-Ordering“

Once my parent updates!

slide-42
SLIDE 42

Decentralized Updates for „Tree-Ordering“

Once my parent updates! Send parent ID

slide-43
SLIDE 43

Decentralized Updates for „Tree-Ordering“

I updated

slide-44
SLIDE 44

Decentralized Updates for „Tree-Ordering“

I updated I‘ll update too!

slide-45
SLIDE 45

Decentralized Updates for „Tree-Ordering“

+ Only one controller-switch interaction per route change + New route changes can be pushed before old ones done + Incorrect updates can be locally detected

  • Requires switch-to-switch communication e.g., [Nguyen et al., SOSR 2017]

Foerster et al.: Local Checkability, No Strings Attached: (A)cyclicity, Reachability, Loop Free Updates in SDNs (Theoret. Comput. Sci 2017)

slide-46
SLIDE 46
slide-47
SLIDE 47

Saeed Akhoondian Amiri, Szymon Dudycz, Stefan Schmid, Sebastian Wiederrecht: Congestion-Free Rerouting of Flows on DAGs. CoRR abs/1611.09296 (2016)

slide-48
SLIDE 48

Consistent Migration of Flows

Introduced in SWAN (Hong et al., SIGCOMM 2013) Idea: Flows can be on the old or new route

For all edges: σ∀𝐺 max 𝐩𝐦𝐞, 𝐨𝐟𝐱 ≤ 𝑑𝑏𝑞𝑏𝑑𝑗𝑢𝑧

Unsplittable flows: Hard… (Algorithms out there: integer programs..) What about Splittable flows?

slide-49
SLIDE 49

Consistent Migration of Flows

Introduced in SWAN (Hong et al., SIGCOMM 2013) Idea: Flows can be on the old or new route

For all edges: σ∀𝐺 max 𝐩𝐦𝐞, 𝐨𝐟𝐱 ≤ 𝑑𝑏𝑞𝑏𝑑𝑗𝑢𝑧

No ordering exists (2/3 + 2/3 > 1)

2/3 2/3

slide-50
SLIDE 50

Consistent Migration of Flows

Approach of SWAN: use slack 𝑦

(i.e., %) Here 𝑦 = 1/3 Move slack 𝑦 ⇛ 1/𝑦 − 1 staged partial moves 2/3

2/3

slide-51
SLIDE 51

Consistent Migration of Flows

Approach of SWAN: use slack 𝑦

(i.e., %) Here 𝑦 = 1/3 Move slack 𝑦 ⇛ 1/𝑦 − 1 staged partial moves

Update 1 of 2

1/3 1/3

slide-52
SLIDE 52

Consistent Migration of Flows

Approach of SWAN: use slack 𝑦

(i.e., %) Here 𝑦 = 1/3 Move slack 𝑦 ⇛ 1/𝑦 − 1 staged partial moves

Update 1 of 2

1/3 1/3

slide-53
SLIDE 53

Consistent Migration of Flows

Approach of SWAN: use slack 𝑦

(i.e., %) Here 𝑦 = 1/3 Move slack 𝑦 ⇛ 1/𝑦 − 1 staged partial moves

Update 2 of 2

2/3 2/3

slide-54
SLIDE 54

Consistent Migration of Flows

No slack on flow edges? 1 1

slide-55
SLIDE 55

Consistent Migration of Flows

Alternate routes?

slide-56
SLIDE 56

Conceptually similar: 15-puzzle

How to move to reach goal? Generalized:

  • Exponentially many possibilities

This variant in P (also on graphs)

  • 15 puzzle: Johnson 1879, Am. J. of Math.
  • ….
  • n-1 agents: Kornhauser et al., FOCS 1984
  • ….
  • n agents (rotations): Foerster et al., CIAC 2017
  • Etc…
slide-57
SLIDE 57

To Slack or not to Slack? Slack of 𝑦 on all flow edges?

1/𝑦 − 1 updates

slide-58
SLIDE 58

To Slack or not to Slack? What if not?

Try to create slack

slide-59
SLIDE 59

To Slack or not to Slack? Combinatorial approach

Augmenting paths

slide-60
SLIDE 60

Combinatorial Approach

Move single commodities at a time

𝑓

1 1

u v

slide-61
SLIDE 61

Combinatorial Approach

Where to increase flow?

+ + + + + 𝑓 u v

slide-62
SLIDE 62

Combinatorial Approach

Where to push back flow?

− − 𝑓 − − − − − u v

slide-63
SLIDE 63

Combinatorial Approach

Resulting residual network

𝑓 u v

slide-64
SLIDE 64

Combinatorial Approach

We found an augmenting path ⇒ create slack on 𝑓

𝑓 − u v

slide-65
SLIDE 65

High-level Algorithm Idea

No slack on flow edges? Find augmenting paths

On both initial and desired state Success? Use slack to migrate

Can’t create slack on some flow edge?

Consistent migration impossible By contradiction (else augmenting paths would create slack)

Runtime: 𝑃 𝐺𝑛³

(𝐺 being #commodities, 𝑛 being #edges)

Brandt et al.: On Consistent Migration of Flows in SDNs (INFOCOM 2016).

slide-66
SLIDE 66

Algorithmic Ideas Overview

Loop Freedom

  • Greedy
  • Relax: Peacock
  • Proof Labeling

Consistent Flow Migration

  • Standard: integer/linear* programs
  • Alternative: augmenting flows

*polynomial runtime?

slide-67
SLIDE 67

TOWARDS LOSSLESS DATA CENTER RECONFIGURATION: CONSISTENT NETWORK UPDATES IN SDNS

KLAUS-TYCHO FOERSTER