 
              How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK NTT:Ping? Fr:GMU Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU GMU:Ping! NTT Rostelecom Fr:NTT ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works L IFE G UARD : Practical Repair of Persistent Route Failures 12
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Rostele: Ping? Fr:GMU Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13
How does L IFE G UARD locate a failure? During outage: Level3 Telia TransTelecom ZSTTK Source: Source: Source: Target: Target: GMU GMU Smartkom Smartkom GMU NTT Rostelecom ! Forward path works ! Rostelcom is not forwarding traffic towards GMU L IFE G UARD : Practical Repair of Persistent Route Failures 13
How L IFE G UARD Locates Failures L IFE G UARD : 1. Maintains background historical atlas 2. Isolates direction of failure, measures working direction 3. Tests historical paths in failing direction in order to prune candidate failure locations 4. Locates failure as being at the horizon of reachability L IFE G UARD : Practical Repair of Persistent Route Failures 14
Our Approach and Outline L IFE G UARD : L ocating I nternet F ailures E ffectively and G enerating U sable A lternate R outes D ynamically ! Locate the ISP / link causing the problem ! Suggest that other ISPs reroute around the problem L IFE G UARD : Practical Repair of Persistent Route Failures 15
Our Approach and Outline L IFE G UARD : L ocating I nternet F ailures E ffectively and G enerating U sable A lternate R outes D ynamically ! Locate the ISP / link causing the problem ! Suggest that other ISPs reroute around the problem ! What would we like to add to BGP to enable this? ! What can we deploy today, using only available protocols and router support? L IFE G UARD : Practical Repair of Persistent Route Failures 15
Our Goal for Failure Avoidance ! Enable content / service providers to repair persistent routing problems affecting them, regardless of which ISP is causing them Setting ! Assume we can locate problem ! Assume we are multi-homed / have multiple data centers ! Assume we speak BGP ! We use BGP-Mux to speak BGP to the real Internet: 5 US universities as providers L IFE G UARD : Practical Repair of Persistent Route Failures 16
Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17
Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17
Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17
Self-Repair of Forward Paths Straightforward: Choose a path that avoids the problem. L IFE G UARD : Practical Repair of Persistent Route Failures 17
A Mechanism for Failure Avoidance Forward path: Choose route that avoids ISP or ISP-ISP link Reverse path: Want others to choose paths to my prefix P that avoid ISP or ISP-ISP link X ! Want a BGP announcement AVOID(X,P): ! Any ISP with a route to P that avoids X uses such a route ! Any ISP not using X need only pass on the announcement L IFE G UARD : Practical Repair of Persistent Route Failures 18
Ideal Self-Repair of Reverse Paths L IFE G UARD : Practical Repair of Persistent Route Failures 19
Ideal Self-Repair of Reverse Paths AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19
Ideal Self-Repair of Reverse Paths AVOID(L3,WS) AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19
Ideal Self-Repair of Reverse Paths AVOID(L3,WS) AVOID(L3,WS) AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19
Ideal Self-Repair of Reverse Paths AVOID(L3,WS) AVOID(L3,WS) AVOID(L3,WS) L IFE G UARD : Practical Repair of Persistent Route Failures 19
Do paths exist that AVOID problem? L IFE G UARD repairs outages by instructing others to avoid particular routes. Q: Do alternative routes exist? A: Alternate policy-compliant paths exist in 90% of simulated AVOID(X,P) announcements. ! Simulated 10 million AVOIDs on actual measured routes. L IFE G UARD : Practical Repair of Persistent Route Failures 20
Practical Self-Repair of Reverse Paths L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths WS L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths ATT ! WS WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 21
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS Sprint ! Qwest ! WS AVOID(L3,WS) AISP ! Qwest ! WS Qwest ! WS L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS Qwest ! WS ! L3 ! WS AISP ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS AISP ! Qwest ! WS ! L3 ! WS Qwest ! WS ! L3 ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS Qwest ! WS ! L3 ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS L3 ! ATT ! WS ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! L3 ! ATT ! WS ? ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! Sprint ! Qwest ! WS ! L3 ! WS UW ! L3 ! ATT ! WS ? ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Practical Self-Repair of Reverse Paths UW ! Sprint ! Qwest ! WS ! L3 ! WS UW ! L3 ! ATT ! WS ? ATT ! WS ATT ! WS ! L3 ! WS WS ! L3 ! WS WS Sprint ! Qwest ! WS ! L3 ! WS Sprint ! Qwest ! WS BGP loop prevention encourages switch to working path. L IFE G UARD : Practical Repair of Persistent Route Failures 22
Stuff I Don’t Have Time to Talk About Results from real poisonings ! Poisoning in the wild / poisoning anomalies ! Case study of restoring connectivity Making poisoning flexible ! Monitoring broken path while it is disabled ! Allowing ISPs w/o alternatives to use disabled route L IFE G UARD ’s scalability ! Overhead and speed of failure location ! Router update load if many ISPs deploy our approach Alternatives to poisoning ! Compatibility with secure routing (BGPSEC, etc.) ! Comparing to other route control mechanisms L IFE G UARD : Practical Repair of Persistent Route Failures 23
Can poisoning approximate AVOID effects? L IFE G UARD ’s poisoning repairs outages by disabling routes to induce route exploration. Q: Does poisoning disrupt working routes? A: No. As I will describe: (a) Under certain circumstances, we can disable a link without disabling the full ISP . (b) We can speed BGP convergence by carefully crafting announcements. L IFE G UARD : Practical Repair of Persistent Route Failures 24
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 L IFE G UARD : Practical Repair of Persistent Route Failures 25
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 L IFE G UARD : Practical Repair of Persistent Route Failures 25
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 ! Forward direction is easy: choose a different route L IFE G UARD : Practical Repair of Persistent Route Failures 26
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 ! Forward direction is easy: choose a different route L IFE G UARD : Practical Repair of Persistent Route Failures 26
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O ! We only want C3 to change its route, to avoid A-B2 ! Forward direction is easy: choose a different route L IFE G UARD : Practical Repair of Persistent Route Failures 27
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 28
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 28
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O-A-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 29
What if some routes in an ISP still work? A ? ? Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O-A-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP L IFE G UARD : Practical Repair of Persistent Route Failures 30
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! Selective advertising via just D1 is also blunt L IFE G UARD : Practical Repair of Persistent Route Failures 31
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 New path C1 C4 D1 D2 O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! Selective advertising via just D1 is also blunt L IFE G UARD : Practical Repair of Persistent Route Failures 31
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Original path B1 B2 ? New path ? C1 C4 D1 D2 ? O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! Selective advertising via just D1 is also blunt L IFE G UARD : Practical Repair of Persistent Route Failures 32
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 33
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 33
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 34
What if some routes in an ISP still work? A Network link C2 C3 Transitive link Pre-poisoning path B1 B2 Post-poisoning path C1 C4 D1 D2 O-O-O O-A-O O ! We only want C3 to change its route, to avoid A-B2 ! Poisoning seems blunt, disabling an entire ISP ! If D1 and D2 (transitively) connect to different PoPs of A , selectively poison via D2 and not D1 L IFE G UARD : Practical Repair of Persistent Route Failures 35
Can poisoning approximate AVOID effects? L IFE G UARD ’s poisoning repairs outages by disabling routes to induce route exploration. Q: Does poisoning disrupt working routes? A: No. As I will describe: (a) “Selective poisoning” can avoid 73% of links without disabling entire AS. ‣ Real-world results from 5 provider BGP-Mux testbed (b) We can speed BGP convergence by carefully crafting announcements. L IFE G UARD : Practical Repair of Persistent Route Failures 36
Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O B-A-O F C working paths that E-D-A-O avoid problem ISP X D-A-O A-O ! Naively, poisoning E B F-B-A-O causes path exploration even for these ISPs A-O O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 37
Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O B-A-O F C working paths that E-D-A-O avoid problem ISP X D-A-O A-O ! Naively, poisoning E B F-B-A-O causes path exploration even for these ISPs A-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 38
Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O B-A-O F C working paths that E-D-A-O avoid problem ISP X D-A-O A-O-X-O ! Naively, poisoning E B F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 39
Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O-X-O B-A-O-X-O F C working paths that E-D-A-O E-D-A-O avoid problem ISP X D-A-O-X-O A-O-X-O ! Naively, poisoning E B F-B-A-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 40
Naive Poisoning Causes Transient Loss ! Some ISPs may have E-D-A-O B-A-O-X-O E-D-A-O B-A-O-X-O F C working paths that E-D-A-O B-A-O-X-O E-D-A-O avoid problem ISP X F-B-A-O D-A-O-X-O F-B-A-O A-O-X-O ! Naively, poisoning E B F-B-A-O D-A-O-X-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 41
Naive Poisoning Causes Transient Loss ! Some ISPs may have E-D-A-O E-D-A-O B-A-O-X-O E-D-A-O B-A-O-X-O F C working paths that E-D-A-O B-A-O-X-O E-D-A-O avoid problem ISP X D-A-O-X-O F-B-A-O F-B-A-O F-B-A-O A-O-X-O ! Naively, poisoning E B F-B-A-O D-A-O-X-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 42
Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O-X-O E-D-A-O E-D-A-O B-A-O-X-O E-D-A-O B-A-O-X-O F C working paths that E-D-A-O E-D-A-O B-A-O-X-O E-D-A-O avoid problem ISP X D-A-O-X-O F-B-A-O F-B-A-O F-B-A-O A-O-X-O D-A-O-X-O ! Naively, poisoning E B F-B-A-O D-A-O-X-O F-B-A-O F-B-A-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 43
Naive Poisoning Causes Transient Loss ! Some ISPs may have B-A-O-X-O B-A-O-X-O F C working paths that E-D-A-O-X-O avoid problem ISP X D-A-O-X-O A-O-X-O ! Naively, poisoning E B F-B-A-O-X-O causes path exploration even for these ISPs A-O-X-O O-X-O D A ! Path exploration causes transient loss AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 44
Prepend to Reduce Path Exploration ! Most routing decisions B-A-O-O-O B-A-O-O-O F C based on: E-D-A-O-O-O (1) next hop ISP D-A-O-O-O A-O-O-O (2) path length E B F-B-A-O-O-O ! Keep these fixed to speed convergence A-O-O-O O-O-O D A ! Prepending prepares ISPs for later poison AVOID(X,P) O L IFE G UARD : Practical Repair of Persistent Route Failures 45
Recommend
More recommend