SLIDE 1 Improving network agility with seamless BGP reconfigurations
IRTF Open Meeting, IETF87 Laurent Vanbever vanbever@cs.princeton.edu July, 30 2013
Based on joint work with
Stefano Vissicchio, Luca Cittadini, Cristel Pelsser, Pierre François and Olivier Bonaventure
SLIDE 2
When you are changing the tires of a moving car “
SLIDE 3 make sure one wheel is
“
When you are changing the tires of a moving car ”
SLIDE 4
Why does seamless BGP reconfigurations matter?
BGP configuration is often changed On average, 400+ changes accounted per month in a Tier1
Changing a BGP configuration can impact availability
even if the initial and final configurations are safe BGP is critical for ISPs enforce business relationship, responsible for most of traffic
SLIDE 5
A crash course
BGP reconfiguration
1
Finding an ordering
Is it easy? Does it exist? 2
Reconfiguration framework
Overcome complexity 3
Improving network agility with seamless BGP reconfigurations
SLIDE 6
BGP reconfiguration Finding an ordering
Is it easy? Does it exist?
Reconfiguration framework
Overcome complexity A crash course 1
Improving network agility with seamless BGP reconfigurations
SLIDE 7 AS10 AS20 AS30 AS40 AS50 Border Gateway Protocol Autonomous System
AS1
BGP is the only inter-domain routing protocol used today
SLIDE 8 BGP comes in two flavors
AS10 AS20 AS30 AS40 AS50
AS1
SLIDE 9 AS10 AS20 AS30 AS40 AS50 eBGP sessions
AS1
external BGP (eBGP) exchanges reachability information between ASes
SLIDE 10 internal BGP (iBGP) distributes externally learned routes within the AS
AS10 AS20 AS30 AS40 AS50
AS1
iBGP sessions
SLIDE 11 Plain iBGP mandates a full-mesh of iBGP sessions
Fair warning: some sessions are missing
O(n2) iBGP sessions where n is the number of routers ... quickly becomes totally unmanageable
SLIDE 12
With Route Reflection, iBGP routers are hierarchically organized
SLIDE 13
Route Reflectors Clients
Route Reflectors relay route updates between iBGP neighbors
SLIDE 14
Route Reflectors Clients
Lower layers rely on upper layers to learn and propagate routing informations
Route Reflectors relay route updates between iBGP neighbors
SLIDE 15
iBGP Clients sessions eBGP Routing policies Route-reflector sessions Peer sessions External sessions
iBGP and eBGP need to be carefully configured
A BGP configuration is composed of
SLIDE 16
iBGP Clients sessions eBGP Routing policies Route-reflector sessions Peer sessions External sessions
Each part of a BGP configuration can be changed
Add sessions Remove sessions Change type Typical reconfiguration scenarios consist in
SLIDE 17
iBGP Clients sessions eBGP Routing policies Route-reflector sessions Peer sessions External sessions
Each part of a BGP configuration can be changed
Add sessions Remove sessions Modify policies Add sessions Remove sessions Change type Typical reconfiguration scenarios consist in
SLIDE 18 signaling anomalies dissemination anomalies forwarding anomalies BGP reconfigurations can create
Reconfiguring BGP can be disruptive
- r any combination of those
[Griffin, SIGCOMM02] [Vissicchio, INFOCOM12] [Griffin, SIGCOMM02]
SLIDE 19 signaling anomalies dissemination anomalies forwarding anomalies BGP reconfigurations can create
Reconfiguring BGP can be disruptive
- r any combination of those
routing oscillations black holes forwarding loops traffic shifts
SLIDE 20 signaling anomalies dissemination anomalies forwarding anomalies BGP reconfigurations can create
Reconfiguring BGP can be disruptive
- r any combination of those
How much ?
SLIDE 21
Let’s migrate from a full-mesh to a RR topology
SLIDE 22
Establish the RR sessions in a bottom-up manner, then remove the full-mesh sessions
[Herrero10]
Let’s migrate from a full-mesh to a RR topology, following best practices
SLIDE 23 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0
35 100 60
Best practices do not work
Tier1 (50) experiments (cumul. frequency) % of migration steps with anomalies
Loops
60% of the experiments were subject to loops for > 35% of the steps
100
SLIDE 24 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0
45 100 100
Best practices do not work
Tier1 (50) experiments (cumul. frequency) % of migration steps with anomalies
Traffic shifts Loops
100% of the experiments were subject to traffic shifts for > 40% of the steps
SLIDE 25 AS3 AS4 AS2
E4 E1 E2 E3 E5
P P P P P
AS1
Let’s tune BGP policies
SLIDE 26 AS3 AS4 AS2
E4 E1 E2 E3 E5
AS1 learns a destination P via 5 egress points
P P P P P
AS1
SLIDE 27 AS3 AS4 AS2
60 60 60 60 60
E4 E1 E2 E3 E5
preference
Initially, each egress point is equally preferred
AS1
SLIDE 28 AS3 AS4 AS2
60 60 60 60 60
E4 E1 E2 E3 E5
preference
Depending on its position, each egress receives a percentage of the traffic
40% 20% 10% 10% 10% usage
AS1
SLIDE 29 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
60 60 60 40% 20% 10% 10% 10% usage
AS1
SLIDE 30 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 60 60 40% 20% 10% 10% 10% usage
AS1
SLIDE 31 AS1 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 60 60 100% 0% 0% 0% 0% usage
60% of the traffic experience a traffic shift
SLIDE 32 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 60 60 100% 0% 0% 0% 0% usage
AS1
SLIDE 33 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 120 60 100% 0% 0% 0% 0% usage
AS1
SLIDE 34 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 120 60 67% 33% 0% 0% 0% usage
AS1
33% of the traffic experience a traffic shift
60% of the traffic experience a traffic shift
SLIDE 35 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 120 120 67% 33% 0% 0% 0% usage
AS1
SLIDE 36 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
Let’s say that AS2 becomes more preferred
120 120 120 56% 28% 16% 0% 0% usage
AS1
33% of the traffic experience a traffic shift 60% of the traffic experience a traffic shift 16% of the traffic experience a traffic shift
SLIDE 37 AS3 AS4 AS2
60 60
E4 E1 E2 E3 E5
preference
During the migration, 109% of the traffic has been shifted
120 120 120 56% 28% 16% 0% 0% usage
AS1
33% of the traffic experience a traffic shift 60% of the traffic experience a traffic shift 16% of the traffic experience a traffic shift
SLIDE 38 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0
Tuning eBGP policies can create huge traffic shifts
100 1 3.0
max LP
Tier1 experiments (cumul. frequency) avg # traffic shifts per router per prefix 50% of the routers experience > 1 TS for each prefix
50
SLIDE 39
A crash course
BGP reconfiguration Finding an ordering Reconfiguration framework
Overcome complexity Is it easy? Does it exist? 2
Improving network agility with seamless BGP reconfigurations
SLIDE 40 Given an initial & final, anomaly-free, BGP configuration.
To avoid reconfiguration problems, a proper
- perational ordering must be enforced
signaling anomalies dissemination anomalies forwarding anomalies Find a sequence of configuration changes such that never occur, during any migration step
SLIDE 41
Find a sequence of configuration changes
SLIDE 42
Does it always exist ? Find a sequence of configuration changes
SLIDE 43
Does it always exist ? Is it easy to compute ? Find a sequence of configuration changes
SLIDE 44 E1 E2 R1 R2
E2 E1 E2 E1
P P
We model iBGP configurations by using extended Stable Path Problem instances
SLIDE 45 E1 E2 R1 R2
E2 E1 E2 E1
P P Egress-point to prefix P
We model iBGP configurations by using extended Stable Path Problem instances
SLIDE 46 E1 E2 R1 R2
E2 E1 E2 E1
P P Egress-point to prefix P Egress-points in decreasing preference order
We model iBGP configurations by using extended Stable Path Problem instances
SLIDE 47 E1 E2 R1 R2
E2 E1 E2 E1
P P Egress-point to prefix P Egress-points in decreasing preference order Best-learned egress point
We model iBGP configurations by using extended Stable Path Problem instances
SLIDE 48 E1 E2 R1 R2
1 2 1 1 1
E1 E2 R1 R2
E2 E1 E2 E1
P P
A stable BGP configuration determines the forwarding paths being used
BGP configuration IGP configuration
resulting forwarding paths
SLIDE 49 A seamless migration ordering might not always exist
E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S
Initial BGP configuration Final BGP configuration
P P P P P P
SLIDE 50 A seamless migration ordering might not always exist
E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S
Initial BGP configuration Final BGP configuration
removed session
P P P P P P
SLIDE 51 A seamless migration ordering might not always exist
E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S
Initial BGP configuration Final BGP configuration
added session
P P P P P P
SLIDE 52 E1 E2 R1 R2 RR1 RR2 S 1 1 E1 E2 R1 R2 RR1 RR2 S 1 1 1 100 100 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Path preferences IGP configuration
P P P
SLIDE 53 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
The initial configuration is anomaly-free
P P P
SLIDE 54 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S
E2 E1 S
E2 E1 S E1 E2 S
The final configuration is anomaly-free
P P P
SLIDE 55 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Let’s add the final session before removing the initial one
P P P
SLIDE 56 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Let’s add the final session before removing the initial one
P P P
SLIDE 57 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
R1 now learns and selects E2, forcing RR1 to use E2 as well
P P P
SLIDE 58 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
RR1 uses RR2 to reach E2, and RR2 uses RR1 to reach E1 ...
P P P
SLIDE 59 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Forwarding Loop
which creates a forwarding loops
P P P
SLIDE 60 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Let’s remove the initial session before adding the final one
P P P
SLIDE 61 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Let’s remove the initial session before adding the final one
P P P
SLIDE 62 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
When we remove the session, R2 and RR2 stop learning E1 and switch to E2
P P P
SLIDE 63 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S 1 1 1 1 1 1
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
R1 uses R2 to reach E1, and R2 uses R1 to reach E2
P P P
SLIDE 64 E1 E2 R1 R2 RR1 RR2 S E1 E2 R1 R2 RR1 RR2 S
E1 E2 S E2 E1 S E2 E1 S E1 E2 S
Forwarding Loop
which creates a forwarding loop as well...
P P P
SLIDE 65
Does it always exist ? No. Find a sequence of configuration changes
SLIDE 66
Does it always exist ? No. Find a sequence of configuration changes Is it easy to compute ?
SLIDE 67
Finding a seamless migration ordering is computationally hard
reduction in polynomial time from 3-SAT Deciding if an ordering free from signaling anomalies exists is NP-hard
SLIDE 68
The same reduction applies for dissemination anomalies forwarding anomalies iBGP or eBGP reconfigurations reduction in polynomial time from 3-SAT
Finding a seamless migration ordering is computationally hard
Deciding if an ordering free from signaling anomalies exists is NP-hard
SLIDE 69
Does it always exist ? No. Find a sequence of configuration changes Is it easy to compute ? No.
SLIDE 70
Does it always exist ? No. Find a sequence of configuration changes Is it easy to compute ? No.
An algorithmic approach is not viable
SLIDE 71
A crash course
BGP reconfiguration Finding an ordering
Is it easy? Does it exist?
Reconfiguration framework
Overcome complexity 3
Improving network agility with seamless BGP reconfigurations
SLIDE 72
Why is BGP reconfiguration so complex ?
Local reconfiguration can have global impact in an unpredictable manner
SLIDE 73
Why is BGP reconfiguration so complex ?
To avoid that, we could run each configuration in an independent routing plane Local reconfiguration can have global impact in an unpredictable manner Similar to IGP reconfiguration Shadow configuration
[Vanbever, SIGCOMM11] [Alimi, SIGCOMM08]
SLIDE 74 The reconfiguration framework leverages Ships-In-The-Night (SITN) migration for BGP
SITNs migrations consists in
1 2 3
running multiple BGP routing planes waiting for each plane to converge modifying the plane responsible for forwarding
Data-plane
init forwarding paths init BGP
Control-plane
Abstract model of a router
SLIDE 75 The reconfiguration framework leverages Ships-In-The-Night (SITN) migration for BGP
Data-plane
final BGP init forwarding paths init BGP
Control-plane
Abstract model of a router
SITNs migrations consists in
1 2 3
running multiple BGP routing planes waiting for each plane to converge modifying the plane responsible for forwarding
SLIDE 76 The reconfiguration framework leverages Ships-In-The-Night (SITN) migration for BGP
Data-plane
final BGP init forwarding paths init BGP
Control-plane
Abstract model of a router
SITNs migrations consists in
1 2 3
running multiple BGP routing planes waiting for each plane to converge modifying the plane responsible for forwarding
SLIDE 77 The reconfiguration framework leverages Ships-In-The-Night (SITN) migration for BGP
Data-plane
final BGP final forwarding paths init BGP
Control-plane
Abstract model of a router
SITNs migrations consists in
1 2 3
running multiple BGP routing planes waiting for each plane to converge modifying the plane responsible for forwarding
SLIDE 78 The reconfiguration framework leverages Ships-In-The-Night (SITN) migration for BGP
Data-plane
final BGP final forwarding paths init BGP
Control-plane
Abstract model of a router
BGP SITN can be deployed on today’s routers using BGP/MPLS VPNs technology
SITNs migrations consists in
1 2 3
running multiple BGP routing planes waiting for each plane to converge modifying the plane responsible for forwarding
SLIDE 79
GEANT
European research network 53 links
Let’s reconfigure a network from an iBGP full-mesh ...
36 routers (virtualized)
SLIDE 80
GEANT
European research network 36 routers (virtualized) 53 links Top Middle Bottom iBGP hierarchy
Let’s reconfigure a network from an iBGP full-mesh to an iBGP hierarchy
SLIDE 81 5 10 15 20 25 200 400 600 800 1000 migration steps # of failed ping median 95% 5% current best practices
Following best practices, traffic was lost for 30% of the process
losses
Average results (30 repetitions) computed on 120+ pings per step from every router to 16 summary prefixes
losses from 7 routers 60% of GEANT routing table is impacted !
SLIDE 82 5 10 15 20 25 200 400 600 800 1000 migration steps # of failed ping median 95% 5% current best practices
Following our approach, lossless reconfiguration was achieved
losses
No loss occurred with our approach losses from 7 routers 60% of GEANT routing table is impacted !
Average results (30 repetitions) computed on 120+ pings per step from every router to 16 summary prefixes
SLIDE 83
A crash course
BGP reconfiguration Finding an ordering
Is it easy? Does it exist?
Reconfiguration framework
Overcome complexity
Improving network agility with seamless BGP reconfigurations
SLIDE 84
Contributions
Implement and validate a BGP reconfiguration framework Study BGP reconfiguration, both practically and theoretically Show that a (seamless) operational ordering might be needed might not exist is computationally hard to find
1 2 3
SLIDE 85
IRTF Open Meeting, IETF87 Laurent Vanbever July, 30 2013 http://vanbever.eu
Improving network agility with seamless BGP reconfigurations