sdn at google
play

SDN at Google Opportunities for WAN Optimization Edward Crabbe, - PowerPoint PPT Presentation

SDN at Google Opportunities for WAN Optimization Edward Crabbe, Vytautas Valancius 8/1/2012 some slides taken from Urs Hlzle's ONS 2012 keynote Google Confidential and Proprietary Topics SDN at Google today Example SDN Use Case: TE


  1. SDN at Google Opportunities for WAN Optimization Edward Crabbe, Vytautas Valancius 8/1/2012 some slides taken from Urs Hölzle's ONS 2012 keynote Google Confidential and Proprietary

  2. Topics ● SDN at Google today ● Example SDN Use Case: TE ● Our SDN Experience So Far ● Research Opportunities Google Confidential and Proprietary

  3. Topics ● SDN at Google today ● Example SDN Use Case: TE ● Our SDN Experience So Far ● Research Opportunities Google Confidential and Proprietary

  4. Google's WAN ● Two backbones ○ Internet facing (user traffic) ■ smooth/diurnal ■ externally originated/destined flows ○ Datacenter traffic (internal) ■ bursty/bulk ■ all internal flows ● Widely varying requirements: loss sensitivity, availability, topology, etc. ● Difference in node density, degree and geographic placement ● thus: built two separate logical networks ○ I-Scale ○ G-Scale Google Confidential and Proprietary

  5. Internet Backbone Scale “If Google were an ISP, as of this month it would rank as the second largest carrier on the planet.” [ATLAS 2010 Traffic Report, Arbor Networks] Google Confidential and Proprietary

  6. WAN TCO ● Cost/bit should go down with additional scale, not up ○ Consider analogies with compute and storage ● However, cost/bit doesn't naturally decrease with size Complexity in pairwise interactions and any-to-any communication ○ requires more advanced forecasting and control mechanisms Lack of control and determinism in distributed protocols necessitates ○ worst case over-provisioning Complexity of automated configuration to deal with non-standard ○ vendor configuration APIs existing routing mechanisms do not allow for ○ scheduling ■ optimization of explicit objectives ■ Google Confidential and Proprietary

  7. A Solution: WAN Fabrics ● Goal: manage the WAN as a system not as a collection of individual boxes ● Current equipment and protocols don't allow this ○ Internet protocols are node centric, not system centric ○ lack of uniformity in support for monitoring and operations ○ Optimized for survivability and “eventual consistency” in routing Google Confidential and Proprietary

  8. Why Software Defined WAN ● Separate hardware from software ○ Choose hardware based on necessary features ○ Choose software based on TE requirements ( not protocol requirements) ● Logically centralized network control ○ More deterministic ○ More efficient ● Separate monitoring, management, and operation from individual boxes ● Flexibility and Innovation Velocity Google Confidential and Proprietary

  9. Advantages of Centralized TE ● Better efficiency with global visibility ● Converges faster to target optimum on failure ● Higher Efficiency ○ allows for explicit definition of cost functions ○ allows for in-house development of optimization algorithms ● Deterministic behavior ○ simplifies planning vs. over-provisioning for worst case variability ○ Can directly mirror production event streams for testing ● Supports innovation and more robust SW development ● Controller uses modern server hardware ○ significantly higher performance Google Confidential and Proprietary

  10. Topics ● SDN at Google today ● Example SDN Use Case: TE ● Our SDN Experience So Far ● Research Opportunities Google Confidential and Proprietary

  11. Practical SDN TE Use Cases ● Deadlock Resolution ● Bin Packing ● Scheduling / Calendaring ● Predictability ● Adaptive TE Control Loops ● Constraint Relaxation ● GCO ● Max-Min Fairness ... Google Confidential and Proprietary

  12. Practical SDN TE Use Cases ● Deadlock Resolution ● Bin Packing ● Scheduling / Calendaring ● Predictability ● Adaptive TE Control Loops ● Constraint Relaxation ● GCO ● Max-Min Fairness ... Google Confidential and Proprietary

  13. Deadlock causes: ● control / dataplane decoupling A ● rfc3209 implies no teardown on reservation increase failure 1 ○ demand will be miss signaled for long periods ● lack of global LSP state C E ● lack of LSP level ingress admission 10 control 1 1 ○ would require another online or 1 offline control mechanism ○ tension between overprovisioning B D level and transport elasticity Link Metric Capacity A-C 1 20 Time LSP Src Dst Demand B-C 1 20 1 1 A E 2 C-E 10 5 2 2 B E 2 C-D 1 10 3 1 A E 20 D-E 1 10 Google Confidential and Proprietary

  14. Deadlock A 1 C E 10 1 1 1 B D Link Metric Capacity A-C 1 20 Time LSP Src Dst Demand B-C 1 20 1 1 A E 2 C-E 10 5 2 2 B E 2 C-D 1 10 3 1 A E 20 D-E 1 10 Google Confidential and Proprietary

  15. Deadlock A 1 C E 10 1 1 1 B D Link Metric Capacity A-C 1 20 Time LSP Src Dst Demand B-C 1 20 1 1 A E 2 C-E 10 5 2 2 B E 2 C-D 1 10 3 1 A E 20 D-E 1 10 Google Confidential and Proprietary

  16. Deadlock ● LSP 1: ○ demand cannot be satisfied A ○ LSP not torn down due to 3209 ○ usage controlled due to 1 control/data plane decoupling ○ ⇒ information in IGP, RSVP is inaccurate C E ● LSP 2 10 ○ lack of visibility w/r/t LSP 1 misbehavior results in unecessary, 1 1 potentially prolongued degradation 1 in service B D ○ could be rerouted along C-E link modulo flow performance constraints Link Metric Capacity A-C 1 20 Time LSP Src Dst Demand B-C 1 20 1 1 A E 2 C-E 10 5 2 2 B E 2 C-D 1 10 3 1 A E 20 D-E 1 10 Google Confidential and Proprietary

  17. Deadlock ● lack of LSP level ingress admission control ○ would require another online or offline A control mechanism ■ offline: need northbound API 1 ■ online: back to autopbw issues ○ tension between overprovisioning level and transport elasticity C E 10 1 1 1 B D Link Metric Capacity A-C 1 20 Time LSP Src Dst Demand B-C 1 20 1 1 A E 2 C-E 10 5 2 2 B E 2 C-D 1 10 3 1 A E 20 D-E 1 10 Google Confidential and Proprietary

  18. Bin Packing causes: ● lack of global LSP state ● bin packing is a sequencing problem - NP-Hard A ○ Better to solve w/ some throughput optimization 1 C E 10 1 1 1 B D Link Metric Capacity A-C 1 10 B-C 1 10 Time LSP Src Dst Demand C-E 10 5 1 1 A E 5 C-D 1 10 2 2 B E 10 Google Confidential and Proprietary D-E 1 10

  19. Bin Packing A 1 C E 10 1 1 1 B D Link Metric Capacity A-C 1 10 B-C 1 10 Time LSP Src Dst Demand C-E 10 5 1 1 A E 5 C-D 1 10 2 2 B E 10 Google Confidential and Proprietary D-E 1 10

  20. Bin Packing ● unable to shuffle demands w/o ○ some offline control A ○ stateful knowledge network LSPs 1 ● 33% efficiency in capacity usage ○ efficiency dictated by order of event arrival C E X 10 1 1 1 B D Link Metric Capacity A-C 1 10 B-C 1 10 Time LSP Src Dst Demand C-E 10 5 1 1 A E 5 C-D 1 10 2 2 B E 10 Google Confidential and Proprietary D-E 1 10

  21. Scheduling causes: ● autobw empirically derives demand with A single period hysteresis 1 ○ unable to use ■ historical timeseries ■ apriori knowledge of demand C E 10 ○ network must be overprovisioned for 1 1 1 either ■ offline: worst case demand B D over reopt interval ( ⇔ ) online: (autobw) reopt trigger threshold + safety margin Link Metric Capacity Time LSP Src Dst Demand A-C 1 20 1 1 A E 2 2 2 B E 7 B-C 1 20 3 1 A E 7 C-E 10 10 C-D 1 10 3+k 1 A E 7 Google Confidential and Proprietary D-E 1 10

  22. Scheduling A 1 C E 10 1 1 1 B D Link Metric Capacity Time LSP Src Dst Demand A-C 1 20 1 1 A E 2 2 2 B E 7 B-C 1 20 3 1 A E 7 C-E 10 10 C-D 1 10 3+k 1 A E 7 Google Confidential and Proprietary D-E 1 10

  23. Scheduling A 1 C E 10 1 1 1 B D Link Metric Capacity Time LSP Src Dst Demand A-C 1 20 1 1 A E 2 2 2 B E 7 B-C 1 20 3 1 A E 7 C-E 10 10 C-D 1 10 3+k 1 A E 7 Google Confidential and Proprietary D-E 1 10

  24. Scheduling A 1 C E 10 1 1 1 B D Link Metric Capacity Time LSP Src Dst Demand A-C 1 20 1 1 A E 2 2 2 B E 7 B-C 1 20 3 1 A E 7 C-E 10 10 C-D 1 10 3+k 1 A E 7 Google Confidential and Proprietary D-E 1 10

  25. Scheduling A 1 C E 10 1 1 1 B D Time LSP Src Dst Demand Link Metric Capacity 1 1 A E 2 A-C 1 10 2 2 B E 7 B-C 1 10 3 1 A E 7 C-E 10 10 C-D 1 10 3+k 1 A E 7 Google Confidential and Proprietary D-E 1 10

  26. Predictability causes: ● routers act independently and A asynchronously ⇒ path dictated 1 by order of event arrival C E 10 1 1 1 B D Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Link Metric Capacity A-C 1 10 VS B-C 1 10 Time LSP Src Dst Demand C-E 1 10 1 2 B E 7 C-D 1 10 Google Confidential and Proprietary 2 1 A E 7 D-E 1 10

  27. Predictability A 1 C E 10 1 1 1 B D Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Link Metric Capacity A-C 1 10 VS B-C 1 10 Time LSP Src Dst Demand C-E 1 10 1 2 B E 7 C-D 1 10 Google Confidential and Proprietary 2 1 A E 7 D-E 1 10

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend