swift
play

SWIFT Predictive Fast Reroute upon Remote BGP Disruptions Laurent - PowerPoint PPT Presentation

SWIFT Predictive Fast Reroute upon Remote BGP Disruptions Laurent Vanbever ETH Zrich (D-ITET) Munich Internet Research Retreat November 25 2016 Human factors are responsible for 50% to 80% of network outages Juniper Networks, Whats


  1. Fibbing computes routing messages to inject in ~1ms simple merger (95 − th) merger (median) computation 10 10 merger (5 − th) time (s) time (sec) 0.1 0.1 median 0.001 0.001 0 20 40 60 80 0 20 40 60 80 % of nodes changing next-hop % of nodes changing next-hop % of nodes changing next − hop

  2. Check out our webpage fibbing.net

  3. Network programmability through synthesis Fibbing SyNET “the inputs” “the functions” current focus under submission

  4. Fibbing is limited by the configurations running on the routers Works with a single protocol family Dijkstra-based shortest-path routing Can lead to loads of messages if the configuration is not adapted Suffers from reliability issues need to remove the lies upon failures

  5. Inputs Outputs ! ip multicast-routing Network specification ( N ) ! ! ip multicast-routing ! interface Loopback0 ! ! ip address 120.1.7.7 255.255.255.255 interface Loopback0 ! ip ospf 1 area 0 ip address 120.1.7.7 255.255.255.255 ! ! ip ospf 1 area 0 router ospf 1 ! ! router-id 120.1.7.7 SyNET Physical topology ( φ N ) interface Ethernet0/0 ! redistribute bgp 700 subnets no ip address interface Ethernet0/0 ! ! no ip address router bgp 700 interface Ethernet0/0.17 ! neighbor 125.1.17.1 remote-as 100 encapsulation dot1Q 17 interface Ethernet0/0.17 ! ip address 125.1.17.7 255.255.255.0 encapsulation dot1Q 17 address-family ipv4 ip pim bsr-border ip address 125.1.17.7 255.255.255.0 High-level requirements ( φ R ) redistribute ospf 1 match internal external 1 external 2 ip pim sparse-mode ip pim bsr-border neighbor 125.1.17.1 activate ! ip pim sparse-mode ! ! address-family ipv4 multicast router ospf 1 network 125.1.79.0 mask 255.255.255.0 router-id 120.1.7.7 redistribute ospf 1 match internal external 1 external 2 redistribute bgp 700 subnets neighbor 125.1.17.1 activate !

  6. SyNET can generate configurations for (small) networks # routers 4 9 16 static # protocols static, OSPF static, OSPF, BGP

  7. SyNET can generate configurations for (small) networks # routers 4 9 16 static 1.8s 18.2s 116.1s # protocols static, OSPF 4.2s 37.0s 197.0s static, OSPF, BGP 13.8s 189.4s 577.4s

  8. Check out our webpage synet.ethz.ch

  9. Network programmability through synthesis Fibbing SyNET “the inputs” “the functions”

  10. Now that we’ve programmability, What can we do with it?

  11. Analyze Plan control Monitor Execute algorithms visibility programmability Adaptative Networked System

  12. SWIFT Predictive Fast Reroute upon Remote BGP Disruptions Laurent Vanbever ETH Zürich (D-ITET) Munich Internet Research Retreat November 25 2016

  13. 25.9 seconds

  14. 25.9 seconds max. monthly downtime under a 99.999% SLA

  15. IP routers are slow to converge upon remote link and node failures

  16. R1

  17. R2 0 1 R1 R3

  18. R1 prefers to send traffic via R2 when possible, as it is much cheaper than via R3 R2 $ 0 1 R1 R3 $$$

  19. preferred R2 $ 0 1 R1 R3 $$$

  20. R3 R2 0 1 R1 R4 R3 R5

  21. R3 300k R2 600k 300k 0 300k 1 R1 R4 600k R3 300k R5

  22. R3 R1’s Forwarding Table 300k R2 prefix Next-Hop 600k 300k 0 1 1.0.0.0/24 0 300k 2 1.0.1.0/16 0 1 R1 … … … R4 600k 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 300k R5

  23. What if R3 fails? R3 R1’s Forwarding Table R2 prefix Next-Hop 0 1 1.0.0.0/24 0 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5

  24. R2 sends 300k routing messages withdrawing the routes from R3 R3 R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 0 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5

  25. R1 receives the messages one-by-one and updates its forwarding table entry-by-entry R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 0 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5

  26. R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 1 2 1.0.1.0/16 0 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5

  27. R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 1 2 1.0.1.0/16 1 1 R1 … … … R4 300k 100.0.0.0/8 0 … … … R3 600k 200.99.0.0/24 0 R5

  28. R1’s Forwarding Table 300k WITHDRAWs R2 prefix Next-Hop 0 1 1.0.0.0/24 1 2 1.0.1.0/16 1 1 R1 … … … R4 300k 100.0.0.0/8 1 … … … R3 600k 200.99.0.0/24 0 R5

  29. Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries

  30. Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries Both of which are terribly slow…

  31. Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries

  32. We measured how long it takes for large bursts of BGP updates to propagate in the Internet dataset a month (July’16) worth of Internet updates from ~200 routers scattered around the globe methodology detect the beginning and end of a burst using a 10 sec sliding window

  33. 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)

  34. We found a total of 2619 bursts over the month 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)

  35. ~15% of the bursts takes more than 15s to be learned 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)

  36. ~10% of the bursts contained more than 100k prefixes 10 6 burst size 10 5 10 4 10 3 1101 809 308 247 10 3 nb of bursts 92 10 2 21 14 18 9 10 1 0-2 2-8 8-15 15-30 30-60 60-90 120-200 90-120 >200 burst duration (sec)

  37. Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries

  38. We measured how long it takes recent routers to update a growing number of forwarding entries Cisco Nexus 7k ETH recent routers 25 deployed

  39. 150 convergence time (s) 10 1 0.1 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes

  40. worst-case 150 convergence 150 100 time (s) 10 10 1 1 0.1 .1 1K 5K 10K 50K 100K 300K 500K 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes

  41. worst-case 150 convergence 150 100 time (s) median case 10 10 1 1 0.1 .1 1K 5K 10K 50K 100K 300K 500K 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes

  42. Traffic can be lost for several minutes ~2.5 min. 150 150 100 10 10 1 1 0.1 .1 1K 5K 10K 50K 100K 300K 500K 1K 5K 10K 50K 100K 200K 300K 400K 500K # of prefixes

  43. Internet convergence a two-phase process Phase 1 Phase 2 Learning Updating about the failure forwarding entries prefix-based and hence, slow

  44. SWIFT: Predictive Fast Rerouting Joint work with: Thomas Holterbach, Alberto Dainotti, Stefano Vissicchio

  45. SWIFT: Predictive Fast Rerouting speed up… learning about the failure

  46. SWIFT: Predictive Fast Rerouting speed up… learning about the failure solution predict the extent of a failure from 
 few messages

  47. SWIFT: Predictive Fast Rerouting speed up… learning about the failure solution predict the extent of a failure from 
 few messages challenge speed and precision

  48. SWIFT: Predictive Fast Rerouting speed up… learning updating about the failure the data plane solution predict the extent of a failure from 
 few messages challenge speed and precision

  49. SWIFT: Predictive Fast Rerouting speed up… learning updating about the failure the data plane solution predict the extent update groups of entries of a failure from 
 instead of individual ones few messages challenge speed and precision

  50. SWIFT: Predictive Fast Rerouting speed up… learning updating about the failure the data plane solution predict the extent update groups of entries of a failure from 
 instead of individual ones few messages challenge speed and precision failure model

  51. SWIFT: Predictive Fast Rerouting Predicting 1 out of few messages Updating 2 groups of entries Supercharging 3 existing systems

  52. SWIFT: Predictive Fast Rerouting Predicting 1 out of few messages Updating groups of entries Supercharging existing systems

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend