Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS - - PowerPoint PPT Presentation

polynomial time what if analysis for
SMART_READER_LITE
LIVE PREVIEW

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS - - PowerPoint PPT Presentation

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks and Segment Routing! ... Stefan Schmid Jiri Srba University of Vienna, Austria Aalborg University, Denmark Polynomial-Time What-If Analysis for Prefix-Manipulating


slide-1
SLIDE 1

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks

Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark

… and Segment Routing! ...

slide-2
SLIDE 2

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks

Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark

Teaser: Can we verify reachability under k failures without trying exponentially many options?

  • Yes. MUCH FASTER!

An Automata-Theoretic Approach.

slide-3
SLIDE 3

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks

Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark Kudos to collaborators: Jesper Stenbjerg Jensen, Jonas Sand Madsen, Troels Beck Krøgh at Aalborg University, Denmark

slide-4
SLIDE 4

We discovered a misconfiguration on this pair of switches that caused what's called a “bridge loop” in the network. A network change was […] executed incorrectly […] more “stuck” volumes and added more requests to the re-mirroring storm Service outage was due to a series of internal network events that corrupted router data tables Experienced a network connectivity issue […] interrupted the airline's flight departures, airport processing and reservations systems Credits: Nate Foster

Datacenter, enterprise, carrier networks: mission-critical infrastructures. But even techsavvy companies struggle to provide reliable operations.

Configuring Networks is Hard…

1

slide-5
SLIDE 5

Example: BGP in Datacenter

G1 G2 C A D B X Y P1 P2 G E H F Internet

Datacenter

… Especially Under Failures

Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.

2

slide-6
SLIDE 6

Example: BGP in Datacenter

Datacenter

C A D B X Y G E H F Internet

… Especially Under Failures

Cluster with services that should be globally reachable. Cluster with services that should be accessible only internally.

G1 G2 P1 P2

Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.

2

slide-7
SLIDE 7

Example: BGP in Datacenter

Datacenter

C A D B X Y G E H F Internet

… Especially Under Failures

X and Y announce to Internet what is from G* (prefix). X and Y block what is from P*.

G1 G2 P1 P2

Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.

2

slide-8
SLIDE 8

Example: BGP in Datacenter

Datacenter

C A D B X Y G E H F Internet

… Especially Under Failures

G1 G2 P1 P2

What can go wrong?

X and Y announce to Internet what is from G* (prefix). X and Y block what is from P*.

Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.

2

slide-9
SLIDE 9

Example: BGP in Datacenter

Datacenter

C A D B X Y G E H F Internet

… Especially Under Failures

G1 G2 P1 P2 X

If link (G,X) fails and traffic from G is rerouted via Y and C to X: X announces (does not block) G and H as it comes from C. (Note: BGP.)

X and Y announce to Internet what is from G* (prefix). X and Y block what is from P*.

Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.

2

slide-10
SLIDE 10

Network Administration Today

  • Many forwarding tables with many rules,

distributed across network

  • Sysadmin responsible for:
  • Reachability: Can traffic from ingress port

A reach egress port B?

  • Loop-freedom: Are the routes implied by

the forwarding rules loop-free?

  • Non-reachability: Is it ensured that traffic
  • riginating from A never reaches B?
  • Waypoint ensurance: Is it ensured that

traffic from A to B is always routed via a node C (e.g., a firewall)?

Policy ok? A B C

3

slide-11
SLIDE 11

Network Administration Today

  • Many forwarding tables with many rules,

distributed across network

  • Sysadmin responsible for:
  • Reachability: Can traffic from ingress port

A reach egress port B?

  • Loop-freedom: Are the routes implied by

the forwarding rules loop-free?

  • Non-reachability: Is it ensured that traffic
  • riginating from A never reaches B?
  • Waypoint ensurance: Is it ensured that

traffic from A to B is always routed via a node C (e.g., a firewall)?

A B C Policy ok?

3

slide-12
SLIDE 12

Network Administration Today

  • Many forwarding tables with many rules,

distributed across network

  • Sysadmin responsible for:
  • Reachability: Can traffic from ingress port

A reach egress port B?

  • Loop-freedom: Are the routes implied by

the forwarding rules loop-free?

  • Non-reachability: Is it ensured that traffic
  • riginating from A never reaches B?
  • Waypoint ensurance: Is it ensured that

traffic from A to B is always routed via a node C (e.g., a firewall)?

  • … even under (multiple) failures!

What if...?! A B C

3

slide-13
SLIDE 13

Network Administration Today

  • Many forwarding tables with many rules,

distributed across network

  • Sysadmin responsible for:
  • Reachability: Can traffic from ingress port

A reach egress port B?

  • Loop-freedom: Are the routes implied by

the forwarding rules loop-free?

  • Non-reachability: Is it ensured that traffic
  • riginating from A never reaches B?
  • Waypoint ensurance: Is it ensured that

traffic from A to B is always routed via a node C (e.g., a firewall)?

  • … even under (multiple) failures!

A B C k failures = (𝑜 𝑙) possibilities

3

slide-14
SLIDE 14

The Good News

  • Networks are becoming more programmable and

logically centralized, have open interfaces, …

  • … are based on formal foundations…
  • … researchers develop high-level specification

languages such as NetKAT.

Enables a more automated network operation and verification! 4

slide-15
SLIDE 15

The Bad News

  • For many traditional networks (still predominant!),

such benefits are not available yet

  • Many existing tools cannot deal with failures
  • Super-polynomial runtime, verification PSPACE-hard
  • Other limitations: e.g., fixed header size

4

slide-16
SLIDE 16

Tractability of Verification

in

  • ut

in’

  • ut’

Reachability is undecidable in SDN: Can emulate a Turing machine.

Self-loop: could be replaced by “dummy switch”.

5

slide-17
SLIDE 17

Tractability of Verification

in

  • ut

in’

  • ut’

Reachability is undecidable in SDN: Can emulate a Turing machine.

Idea: packet header stores Turing machine configuration (tape, head, state).

5

slide-18
SLIDE 18

Tractability of Verification

in

  • ut

in’

  • ut’

Reachability is undecidable in SDN: Can emulate a Turing machine.

Switch action: each time packet arrives, performs one Turing machine step and updates header.

5

slide-19
SLIDE 19

Tractability of Verification

in

  • ut

in’

  • ut’

Reachability is undecidable in SDN: Can emulate a Turing machine.

Only if accept or reject, forwarded to out. Is it ever reached? Undecidable!

5

slide-20
SLIDE 20

Our Contribution

6

Polynomial-Time What-if Analysis for Prefix Rewriting Networks

slide-21
SLIDE 21

Our Contribution

Polynomial-Time What-if Analysis for Prefix Rewriting Networks

Independently of the number of failures! No need to try combinations. e.g., MPLS networks or Segment Routing networks

Support arbitrary header sizes!

Reachability, loop- freedom, waypointing, etc.!

6

slide-22
SLIDE 22

MPLS Networks

Default routing of two flows

  • MPLS: forwarding based on top label of label stack

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 10 20 11 21

7

slide-23
SLIDE 23

MPLS Networks

Default routing of two flows

  • MPLS: forwarding based on top label of label stack

push swap swap pop pop

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 10 20 11 21

7

slide-24
SLIDE 24

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

10 20 11 21 12 22

MPLS Networks

Default routing of two flows

  • MPLS: forwarding based on top label of label stack

7

slide-25
SLIDE 25

MPLS Networks: 1 Failure

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22

Default routing of two flows

  • For failover: push and pop label

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 30|11 30|21 11 21

One failure: push 30: route around (v2,v3)

  • MPLS: forwarding based on top label of label stack

10 20 11 21

8

slide-26
SLIDE 26

MPLS Networks: 1 Failure

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

10 20 11 21 12 22

Default routing of two flows

  • For failover: push and pop label

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 30|11 30|21 11 21

One failure: push 30: route around (v2,v3)

If (v2,v3) failed, push 30 and forward to v6. Pop Normal swap

  • MPLS: forwarding based on top label of label stack

8

slide-27
SLIDE 27

MPLS Networks: 1 Failure

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

10 20 11 21 12 22

Default routing of two flows

  • For failover: push and pop label

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 30|11 30|21 11 21

One failure: push 30: route around (v2,v3)

If (v2,v3) failed, push 30 and forward to v6. Pop Normal swap

What about multiple link failures?

  • MPLS: forwarding based on top label of label stack

8

slide-28
SLIDE 28

MPLS Networks: 2 Failures

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 10 20 11 21 12 22 10 20 11 21 12 22 30|11 30|21 11 21 31|11 31|21 40|30|11 40|30|21 30|11 30|21 11 21 31|11 31|21

Original Routing One failure: push 30: route around (v2,v3) Two failures: first push 30: route around (v2,v3) Push recursively 40: route around (v2,v6)

Push 30 Push 40

10 20 11 21

slide-29
SLIDE 29

MPLS Networks: 2 Failures

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 10 20 11 21 12 22 10 20 11 21 12 22 30|11 30|21 11 21 31|11 31|21 40|30|11 40|30|21 30|11 30|21 11 21 31|11 31|21

Original Routing One failure: push 30: route around (v2,v3) Two failures: first push 30: route around (v2,v3) Push recursively 40: route around (v2,v6)

Push 30 Push 40

10 20 11 21

But masking links one-by-

  • ne can be inefficient:

(v7,v3,v8) could be shortcut to (v7,v8).

slide-30
SLIDE 30

MPLS Networks: 2 Failures

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

v1 v2 v3 v4 v5 v6 v7 v8 in1 in2

  • ut1
  • ut2

12 22 10 20 11 21 12 22 10 20 11 21 12 22 30|11 30|21 11 21 31|11 31|21 40|30|11 40|30|21 30|11 30|21 11 21 31|11 31|21

Original Routing One failure: push 30: route around (v2,v3) Two failures: first push 30: route around (v2,v3) Push recursively 40: route around (v2,v6)

Push 30 Push 40

10 20 11 21

But masking links one-by-

  • ne can be inefficient:

(v7,v3,v8) could be shortcut to (v7,v8).

More efficient but also more complex! How complex?

slide-31
SLIDE 31

Failover Tables Flow Table

Protected link Alternative link Label

Forwarding Tables for Our Example

10

slide-32
SLIDE 32

MPLS configurations, Segment Routing etc. Pushdown Automaton and Prefix Rewriting Systems Theory Compilation Interpretation pX ⇒ qXX pX ⇒ qYX qY ⇒ rYY rY ⇒ r rX ⇒ pX What if...?!

Polynomial-Time Verification: An Automata-Theoretic Approach

11

slide-33
SLIDE 33

Polynomial-Time Verification: An Automata-Theoretic Approach

MPLS configurations, Segment Routing etc. Pushdown Automaton and Prefix Rewriting Systems Theory Compilation Interpretation pX ⇒ qXX pX ⇒ qYX qY ⇒ rYY rY ⇒ r rX ⇒ pX What if...?! Use cases: Sysadmin issues queries to test certain properties, or do it

  • n a regular basis automatically!

11

slide-34
SLIDE 34

Questions with Answers in Polynomial Time

Interface Connectivity Problem

  • Can a packet arriving at interface A with

label-stack header h reach an interface B?

  • Does the route avoid a given set of

nodes?

  • Will the packet always traverse a given

waypoint?

  • What subset of headers guarantees that

a given interface is not reachable under at most k link failures?

  • And everything for up to k failures!

A B

C: with firewall

Waypoint: use!

D

Blacklisted: avoid Push/pop/ swap Label stack: 5|12|4

12

slide-35
SLIDE 35

Questions with Answers in Polynomial Time

Transparency

  • MPLS: transit networks!
  • Will a packet with empty label-

stack arriving at ingress interface A always leave at egress interface B also with the empty label-stack?

  • Also under k failures?

A B

Label stack: empty? Label stack: empty

Cyclic and repeated routing

  • Will some server receive a given

packet more than r-times during the routing?

  • What is the max stack size during

the routing?

  • Under failures as well…

A B

Label stack:

size?

13

slide-36
SLIDE 36

Our Approach

The clue: exploit the specific structure of MPLS rules

  • OpenFlow rules: arbitrary rewriting

Header size not fixed!

vs

in x L* → out x L*

  • (Simplified) MPLS rules: prefix rewriting

FT: in x L → out x OP, where OP = {swap,push,pop} FFT: out x L → out x OP, where OP = {swap,push,pop}

14

slide-37
SLIDE 37
  • A general network

A Network Model

Nodes Links Incoming interfaces Outgoing interfaces Set of labels in packet header

15

slide-38
SLIDE 38

Interface function: maps outgoing interface to next hop node and incoming interface to previous hop node That is: and

A Network Model

  • A general network

Interface function

16

slide-39
SLIDE 39

Routing function: for each set of failed links , the routing function defines, for all incoming interfaces and packet headers,

  • utgoing interfaces together with modified headers.

A Network Model

  • A general network

Routing function

17

slide-40
SLIDE 40

Packet routing sequence can be represented using tuples:

Routing in Network

  • Packet routing is then (in)finite sequence of tuples

Node receives… … on interface… … packet with header… … forwards it to live next hop… … with new header.. … given that these links are down.

18

slide-41
SLIDE 41

MPLS Network Model

  • MPLS supports three operations on header sequences:
  • The local routing table can then be defined as
  • Local link protection function suggests backup interface

protected backup typically: push Interface + label Maps to next hop and operation

19

slide-42
SLIDE 42
  • Prefix rewriting system is called pushdown system if

and for all

MPLS Pushdown Prefix Rewriting System

First symbol of v and w: control state of pushdown system. Second symbol of v: top of stack label.

push swap pop

Replace prefix

20

  • Prefix rewriting system is set of rewriting rules
  • We write

for generates a transition system such that iff

slide-43
SLIDE 43

MPLS Pushdown Prefix Rewriting System

  • Control states: and
  • Labels: stack symbols and

at bottom

  • Packet with header

arriving at interface in at represented as pushdown configuration:

  • Packet to be forwarded at node

to outgoing interface represented by configuration:

How many times have we tried to reroute at this node already?

21

Node and incoming link

slide-44
SLIDE 44

Pop:

Example Rules: Regular Forwarding on Top-Most Label

Push label on stack Swap top of stack Pop top of stack

Push: Swap:

22

slide-45
SLIDE 45

Failover-Push:

Example Failover Rules

Emumerate all rerouting options

Failover-Swap: Failover-Pop: Example rewriting sequence:

Try default Try first backup Try second backup

23

slide-46
SLIDE 46

Why Polynomial Time?!

  • Arbitrary number k of failures:

How can I avoid checking all (𝑜

𝑙)

many options?!

  • Even if we reduce to push-down

automaton: simple operations such as emptiness testing or intersection on Push-Down Automata (PDA) is computationally non-trivial and sometimes even undecidable!

k failures = (𝑜 𝑙) possibilities

24

slide-47
SLIDE 47

Why Polynomial Time?!

k failures = (𝑜 𝑙) possibilities

24

The Clue: this is not how we will use the PDA!

  • Arbitrary number k of failures:

How can I avoid checking all (𝑜

𝑙)

many options?!

  • Even if we reduce to push-down

automaton: simple operations such as emptiness testing or intersection on Push-Down Automata (PDA) is computationally non-trivial and sometimes even undecidable!

slide-48
SLIDE 48

Why Polynomial Time?!

k failures = (𝑜 𝑙) possibilities

24

The words in our language are sequences of pushdown stack symbols, not the labels of transitions.

The Clue: this is not how we will use the PDA!

  • Arbitrary number k of failures:

How can I avoid checking all (𝑜

𝑙)

many options?!

  • Even if we reduce to push-down

automaton: simple operations such as emptiness testing or intersection on Push-Down Automata (PDA) is computationally non-trivial and sometimes even undecidable!

slide-49
SLIDE 49

Time for Automata Theory!

Julius Richard Büchi 1924-1984 Swiss logician

  • Classic result by Büchi 1964: the set of all

reachable configurations of a pushdown automaton a is regular set

  • Hence, we can operate only on

Nondeterministic Finite Automata (NFAs) when reasoning about the pushdown automata

  • The resulting regular operations are all

polynomial time

  • Important result of model checking

25

slide-50
SLIDE 50

Question: Beginning with an empty header [], can we get from s1 to s7 in any number of steps, and end with an empty header []? Query: []s1 >> s7[] Output: Yes and witness trace (excerpt)

Preliminary Query Language: Example

26

Take multiple steps Empty header ! !

slide-51
SLIDE 51

27

Down!

YES!

push pop swap swap swap

Traversal test: Can traffic starting with [] go through s3, under up to k=1 failures?

Query: k=1 [] s1 >> s3 >> s7 []

1 failure

Example 2: Traversal Testing

slide-52
SLIDE 52

28

Example 3: Traversal with 2 Failures

Traversal test with k=2: Can traffic go through s5, under up to k=2 failures?

2 failures push push stack size! pop pop

YES!

Query: k=2 [] s1 >> s5 >> s7 []

slide-53
SLIDE 53

29

Example 4: Transparency Violation

Transparency with k=3: Can transparency be violated under up to k=3 failures?

3 failures

Root cause is a misconfiguration in s5, causing it to swap to 11 instead of popping when doing the failover on s5-s4.

empty non-empty

YES!

Query: k=3 [] s1 >> s7 [+]

slide-54
SLIDE 54

Preliminary Tool

Part 1: Parses query and constructs Push- Down System (PDS)

  • In Python 3

query processing flow Part 2: Reachability analysis of constructed PDS

  • Using Moped tool

30

slide-55
SLIDE 55

Preliminary Evaluation

For small queries fast: 1000s of links, within seconds

Bottleneck are large queries

100,000s secs 1000s

31

# failures affects performance

  • nly linearly!
slide-56
SLIDE 56

Summary

  • Polynomial-time verification of MPLS reachability and policy-

related properties like waypointing

  • For arbitrary number of failures (up to linear in n)!
  • Supports arbitrary header sizes („infinite“)
  • Also allows to compute headers which do (not) fulfill a property
  • Allows to support a constant number of stateful nodes as well
  • Extends to Segment Routing networks based on MPLS (SR-MPLS)
  • Leveraging theory from Prefix Rewriting Systems and Büchi‘s

classic result 32

slide-57
SLIDE 57

Future Work

  • Other networks and properties which can be verified in

polynomial time?

  • Good tradeoff expressiveness vs polynomial-time

verifiability?

  • We‘re looking for industrial case studies and collaborations

Thank you! Questions?

33

slide-58
SLIDE 58

Further Reading

Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks Stefan Schmid and Jiri Srba. 37th IEEE Conference on Computer Communications (INFOCOM), Honolulu, Hawaii, USA, April 2018. WNetKAT: A Weighted SDN Programming and Verification Language Kim G. Larsen, Stefan Schmid, and Bingtian Xue. 20th International Conference on Principles of Distributed Systems (OPODIS), Madrid, Spain, December 2016. TI-MFA: Keep Calm and Reroute Segments Fast Klaus-Tycho Foerster, Mahmoud Parham, Marco Chiesa, and Stefan Schmid. IEEE Global Internet Symposium (GI), Honolulu, Hawaii, USA, April 2018. Local Fast Failover Routing With Low Stretch Klaus-Tycho Foerster, Yvonne-Anne Pignolet, Stefan Schmid, and Gilles Tredan. ACM SIGCOMM Computer Communication Review (CCR), 2018.

34