Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks
Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark
… and Segment Routing! ...
Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS - - PowerPoint PPT Presentation
Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks and Segment Routing! ... Stefan Schmid Jiri Srba University of Vienna, Austria Aalborg University, Denmark Polynomial-Time What-If Analysis for Prefix-Manipulating
Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark
… and Segment Routing! ...
Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark
Teaser: Can we verify reachability under k failures without trying exponentially many options?
An Automata-Theoretic Approach.
Stefan Schmid University of Vienna, Austria Jiri Srba Aalborg University, Denmark Kudos to collaborators: Jesper Stenbjerg Jensen, Jonas Sand Madsen, Troels Beck Krøgh at Aalborg University, Denmark
We discovered a misconfiguration on this pair of switches that caused what's called a “bridge loop” in the network. A network change was […] executed incorrectly […] more “stuck” volumes and added more requests to the re-mirroring storm Service outage was due to a series of internal network events that corrupted router data tables Experienced a network connectivity issue […] interrupted the airline's flight departures, airport processing and reservations systems Credits: Nate Foster
Datacenter, enterprise, carrier networks: mission-critical infrastructures. But even techsavvy companies struggle to provide reliable operations.
1
Example: BGP in Datacenter
Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.
2
Example: BGP in Datacenter
Cluster with services that should be globally reachable. Cluster with services that should be accessible only internally.
Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.
2
Example: BGP in Datacenter
X and Y announce to Internet what is from G* (prefix). X and Y block what is from P*.
Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.
2
Example: BGP in Datacenter
What can go wrong?
X and Y announce to Internet what is from G* (prefix). X and Y block what is from P*.
Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.
2
Example: BGP in Datacenter
X and Y announce to Internet what is from G* (prefix). X and Y block what is from P*.
Credits: Beckett et al. (SIGCOMM 2016): Bridging Network- wide Objectives and Device-level Configurations.
2
distributed across network
A reach egress port B?
the forwarding rules loop-free?
traffic from A to B is always routed via a node C (e.g., a firewall)?
Policy ok? A B C
3
distributed across network
A reach egress port B?
the forwarding rules loop-free?
traffic from A to B is always routed via a node C (e.g., a firewall)?
A B C Policy ok?
3
distributed across network
A reach egress port B?
the forwarding rules loop-free?
traffic from A to B is always routed via a node C (e.g., a firewall)?
What if...?! A B C
3
distributed across network
A reach egress port B?
the forwarding rules loop-free?
traffic from A to B is always routed via a node C (e.g., a firewall)?
A B C k failures = (𝑜 𝑙) possibilities
3
Enables a more automated network operation and verification! 4
4
in’
Self-loop: could be replaced by “dummy switch”.
5
in’
Idea: packet header stores Turing machine configuration (tape, head, state).
5
in’
Switch action: each time packet arrives, performs one Turing machine step and updates header.
5
in’
Only if accept or reject, forwarded to out. Is it ever reached? Undecidable!
5
6
Independently of the number of failures! No need to try combinations. e.g., MPLS networks or Segment Routing networks
Reachability, loop- freedom, waypointing, etc.!
6
Default routing of two flows
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 10 20 11 21
7
Default routing of two flows
push swap swap pop pop
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 10 20 11 21
7
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
10 20 11 21 12 22
Default routing of two flows
7
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22
Default routing of two flows
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 30|11 30|21 11 21
One failure: push 30: route around (v2,v3)
10 20 11 21
8
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
10 20 11 21 12 22
Default routing of two flows
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 30|11 30|21 11 21
One failure: push 30: route around (v2,v3)
If (v2,v3) failed, push 30 and forward to v6. Pop Normal swap
8
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
10 20 11 21 12 22
Default routing of two flows
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 30|11 30|21 11 21
One failure: push 30: route around (v2,v3)
If (v2,v3) failed, push 30 and forward to v6. Pop Normal swap
8
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 10 20 11 21 12 22 10 20 11 21 12 22 30|11 30|21 11 21 31|11 31|21 40|30|11 40|30|21 30|11 30|21 11 21 31|11 31|21
Original Routing One failure: push 30: route around (v2,v3) Two failures: first push 30: route around (v2,v3) Push recursively 40: route around (v2,v6)
Push 30 Push 40
10 20 11 21
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 10 20 11 21 12 22 10 20 11 21 12 22 30|11 30|21 11 21 31|11 31|21 40|30|11 40|30|21 30|11 30|21 11 21 31|11 31|21
Original Routing One failure: push 30: route around (v2,v3) Two failures: first push 30: route around (v2,v3) Push recursively 40: route around (v2,v6)
Push 30 Push 40
10 20 11 21
But masking links one-by-
(v7,v3,v8) could be shortcut to (v7,v8).
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
v1 v2 v3 v4 v5 v6 v7 v8 in1 in2
12 22 10 20 11 21 12 22 10 20 11 21 12 22 30|11 30|21 11 21 31|11 31|21 40|30|11 40|30|21 30|11 30|21 11 21 31|11 31|21
Original Routing One failure: push 30: route around (v2,v3) Two failures: first push 30: route around (v2,v3) Push recursively 40: route around (v2,v6)
Push 30 Push 40
10 20 11 21
But masking links one-by-
(v7,v3,v8) could be shortcut to (v7,v8).
Protected link Alternative link Label
10
MPLS configurations, Segment Routing etc. Pushdown Automaton and Prefix Rewriting Systems Theory Compilation Interpretation pX ⇒ qXX pX ⇒ qYX qY ⇒ rYY rY ⇒ r rX ⇒ pX What if...?!
11
MPLS configurations, Segment Routing etc. Pushdown Automaton and Prefix Rewriting Systems Theory Compilation Interpretation pX ⇒ qXX pX ⇒ qYX qY ⇒ rYY rY ⇒ r rX ⇒ pX What if...?! Use cases: Sysadmin issues queries to test certain properties, or do it
11
Interface Connectivity Problem
label-stack header h reach an interface B?
nodes?
waypoint?
a given interface is not reachable under at most k link failures?
A B
C: with firewall
Waypoint: use!
D
Blacklisted: avoid Push/pop/ swap Label stack: 5|12|4
12
Transparency
stack arriving at ingress interface A always leave at egress interface B also with the empty label-stack?
A B
Label stack: empty? Label stack: empty
Cyclic and repeated routing
packet more than r-times during the routing?
the routing?
A B
Label stack:
size?
13
Header size not fixed!
FT: in x L → out x OP, where OP = {swap,push,pop} FFT: out x L → out x OP, where OP = {swap,push,pop}
14
Nodes Links Incoming interfaces Outgoing interfaces Set of labels in packet header
15
Interface function: maps outgoing interface to next hop node and incoming interface to previous hop node That is: and
Interface function
16
Routing function: for each set of failed links , the routing function defines, for all incoming interfaces and packet headers,
Routing function
17
Packet routing sequence can be represented using tuples:
Node receives… … on interface… … packet with header… … forwards it to live next hop… … with new header.. … given that these links are down.
18
protected backup typically: push Interface + label Maps to next hop and operation
19
and for all
First symbol of v and w: control state of pushdown system. Second symbol of v: top of stack label.
push swap pop
Replace prefix
20
for generates a transition system such that iff
at bottom
arriving at interface in at represented as pushdown configuration:
to outgoing interface represented by configuration:
How many times have we tried to reroute at this node already?
21
Node and incoming link
Push label on stack Swap top of stack Pop top of stack
22
Emumerate all rerouting options
Try default Try first backup Try second backup
23
How can I avoid checking all (𝑜
𝑙)
many options?!
automaton: simple operations such as emptiness testing or intersection on Push-Down Automata (PDA) is computationally non-trivial and sometimes even undecidable!
k failures = (𝑜 𝑙) possibilities
24
k failures = (𝑜 𝑙) possibilities
24
The Clue: this is not how we will use the PDA!
How can I avoid checking all (𝑜
𝑙)
many options?!
automaton: simple operations such as emptiness testing or intersection on Push-Down Automata (PDA) is computationally non-trivial and sometimes even undecidable!
k failures = (𝑜 𝑙) possibilities
24
The Clue: this is not how we will use the PDA!
How can I avoid checking all (𝑜
𝑙)
many options?!
automaton: simple operations such as emptiness testing or intersection on Push-Down Automata (PDA) is computationally non-trivial and sometimes even undecidable!
Julius Richard Büchi 1924-1984 Swiss logician
reachable configurations of a pushdown automaton a is regular set
Nondeterministic Finite Automata (NFAs) when reasoning about the pushdown automata
polynomial time
25
Question: Beginning with an empty header [], can we get from s1 to s7 in any number of steps, and end with an empty header []? Query: []s1 >> s7[] Output: Yes and witness trace (excerpt)
26
Take multiple steps Empty header ! !
27
Down!
push pop swap swap swap
Traversal test: Can traffic starting with [] go through s3, under up to k=1 failures?
Query: k=1 [] s1 >> s3 >> s7 []
1 failure
28
Traversal test with k=2: Can traffic go through s5, under up to k=2 failures?
2 failures push push stack size! pop pop
Query: k=2 [] s1 >> s5 >> s7 []
29
Transparency with k=3: Can transparency be violated under up to k=3 failures?
3 failures
Root cause is a misconfiguration in s5, causing it to swap to 11 instead of popping when doing the failover on s5-s4.
empty non-empty
Query: k=3 [] s1 >> s7 [+]
Part 1: Parses query and constructs Push- Down System (PDS)
query processing flow Part 2: Reachability analysis of constructed PDS
30
For small queries fast: 1000s of links, within seconds
Bottleneck are large queries
100,000s secs 1000s
31
# failures affects performance
related properties like waypointing
classic result 32
polynomial time?
verifiability?
33
Polynomial-Time What-If Analysis for Prefix-Manipulating MPLS Networks Stefan Schmid and Jiri Srba. 37th IEEE Conference on Computer Communications (INFOCOM), Honolulu, Hawaii, USA, April 2018. WNetKAT: A Weighted SDN Programming and Verification Language Kim G. Larsen, Stefan Schmid, and Bingtian Xue. 20th International Conference on Principles of Distributed Systems (OPODIS), Madrid, Spain, December 2016. TI-MFA: Keep Calm and Reroute Segments Fast Klaus-Tycho Foerster, Mahmoud Parham, Marco Chiesa, and Stefan Schmid. IEEE Global Internet Symposium (GI), Honolulu, Hawaii, USA, April 2018. Local Fast Failover Routing With Low Stretch Klaus-Tycho Foerster, Yvonne-Anne Pignolet, Stefan Schmid, and Gilles Tredan. ACM SIGCOMM Computer Communication Review (CCR), 2018.
34