SDN at Google Opportunities for WAN Optimization Edward Crabbe, - - PowerPoint PPT Presentation

sdn at google
SMART_READER_LITE
LIVE PREVIEW

SDN at Google Opportunities for WAN Optimization Edward Crabbe, - - PowerPoint PPT Presentation

SDN at Google Opportunities for WAN Optimization Edward Crabbe, Vytautas Valancius 8/1/2012 some slides taken from Urs Hlzle's ONS 2012 keynote Google Confidential and Proprietary Topics SDN at Google today Example SDN Use Case: TE


slide-1
SLIDE 1

Google Confidential and Proprietary

SDN at Google

Opportunities for WAN Optimization

some slides taken from Urs Hölzle's ONS 2012 keynote Edward Crabbe, Vytautas Valancius 8/1/2012

slide-2
SLIDE 2

Google Confidential and Proprietary

Topics

  • SDN at Google today
  • Example SDN Use Case: TE
  • Our SDN Experience So Far
  • Research Opportunities
slide-3
SLIDE 3

Google Confidential and Proprietary

Topics

  • SDN at Google today
  • Example SDN Use Case: TE
  • Our SDN Experience So Far
  • Research Opportunities
slide-4
SLIDE 4

Google Confidential and Proprietary

Google's WAN

  • Two backbones

○ Internet facing (user traffic) ■ smooth/diurnal ■ externally originated/destined flows ○ Datacenter traffic (internal) ■ bursty/bulk ■ all internal flows

  • Widely varying requirements: loss sensitivity, availability, topology, etc.
  • Difference in node density, degree and geographic placement
  • thus: built two separate logical networks

○ I-Scale ○ G-Scale

slide-5
SLIDE 5

Google Confidential and Proprietary

Internet Backbone Scale

“If Google were an ISP, as of this month it would rank as the second largest carrier on the planet.”

[ATLAS 2010 Traffic Report, Arbor Networks]

slide-6
SLIDE 6

Google Confidential and Proprietary

WAN TCO

  • Cost/bit should go down with additional scale, not up

Consider analogies with compute and storage

  • However, cost/bit doesn't naturally decrease with size

Complexity in pairwise interactions and any-to-any communication requires more advanced forecasting and control mechanisms

Lack of control and determinism in distributed protocols necessitates worst case over-provisioning

Complexity of automated configuration to deal with non-standard vendor configuration APIs

existing routing mechanisms do not allow for

scheduling

  • ptimization of explicit objectives
slide-7
SLIDE 7

Google Confidential and Proprietary

A Solution: WAN Fabrics

  • Goal: manage the WAN as a system not as a collection
  • f individual boxes
  • Current equipment and protocols don't allow this

○ Internet protocols are node centric, not system centric ○ lack of uniformity in support for monitoring and

  • perations

○ Optimized for survivability and “eventual consistency” in routing

slide-8
SLIDE 8

Google Confidential and Proprietary

Why Software Defined WAN

  • Separate hardware from software

○ Choose hardware based on necessary features ○ Choose software based on TE requirements (not protocol requirements)

  • Logically centralized network control

○ More deterministic ○ More efficient

  • Separate monitoring, management, and operation from

individual boxes

  • Flexibility and Innovation Velocity
slide-9
SLIDE 9

Google Confidential and Proprietary

Advantages of Centralized TE

  • Better efficiency with global visibility
  • Converges faster to target optimum on failure
  • Higher Efficiency

allows for explicit definition of cost functions

allows for in-house development of optimization algorithms

  • Deterministic behavior

○ simplifies planning vs. over-provisioning for worst case variability ○ Can directly mirror production event streams for testing

  • Supports innovation and more robust SW development
  • Controller uses modern server hardware

○ significantly higher performance

slide-10
SLIDE 10

Google Confidential and Proprietary

Topics

  • SDN at Google today
  • Example SDN Use Case: TE
  • Our SDN Experience So Far
  • Research Opportunities
slide-11
SLIDE 11

Google Confidential and Proprietary

Practical SDN TE Use Cases

  • Deadlock Resolution
  • Bin Packing
  • Scheduling / Calendaring
  • Predictability
  • Adaptive TE Control Loops
  • Constraint Relaxation
  • GCO
  • Max-Min Fairness

...

slide-12
SLIDE 12

Google Confidential and Proprietary

Practical SDN TE Use Cases

  • Deadlock Resolution
  • Bin Packing
  • Scheduling / Calendaring
  • Predictability
  • Adaptive TE Control Loops
  • Constraint Relaxation
  • GCO
  • Max-Min Fairness

...

slide-13
SLIDE 13

Google Confidential and Proprietary

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 5 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 2 2 2 B E 2 3 1 A E 20

causes:

  • control / dataplane decoupling
  • rfc3209 implies no teardown on

reservation increase failure ○ demand will be miss signaled for long periods

  • lack of global LSP state
  • lack of LSP level ingress admission

control ○ would require another online or

  • ffline control mechanism

○ tension between overprovisioning level and transport elasticity

Deadlock

1 1 10 1 1

B A D E C

slide-14
SLIDE 14

Google Confidential and Proprietary

Time LSP Src Dst Demand 1 1 A E 2 2 2 B E 2 3 1 A E 20

Deadlock

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 5 C-D 1 10 D-E 1 10

slide-15
SLIDE 15

Google Confidential and Proprietary

Time LSP Src Dst Demand 1 1 A E 2 2 2 B E 2 3 1 A E 20

Deadlock

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 5 C-D 1 10 D-E 1 10

slide-16
SLIDE 16

Google Confidential and Proprietary

Time LSP Src Dst Demand 1 1 A E 2 2 2 B E 2 3 1 A E 20

Deadlock

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 5 C-D 1 10 D-E 1 10

  • LSP 1:

○ demand cannot be satisfied ○ LSP not torn down due to 3209 ○ usage controlled due to control/data plane decoupling ○ ⇒ information in IGP, RSVP is inaccurate

  • LSP 2

○ lack of visibility w/r/t LSP 1 misbehavior results in unecessary, potentially prolongued degradation in service ○ could be rerouted along C-E link modulo flow performance constraints

slide-17
SLIDE 17

Google Confidential and Proprietary

Time LSP Src Dst Demand 1 1 A E 2 2 2 B E 2 3 1 A E 20

Deadlock

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 5 C-D 1 10 D-E 1 10

  • lack of LSP level ingress admission control

○ would require another online or offline control mechanism ■

  • ffline: need northbound API

  • nline: back to autopbw issues

○ tension between overprovisioning level and transport elasticity

slide-18
SLIDE 18

Google Confidential and Proprietary

Bin Packing

1 1 10 1 1

B A D E C causes:

  • lack of global LSP state
  • bin packing is a sequencing problem - NP-Hard

○ Better to solve w/ some throughput optimization

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 10 5 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 5 2 2 B E 10

slide-19
SLIDE 19

Google Confidential and Proprietary

Bin Packing

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 10 5 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 5 2 2 B E 10

slide-20
SLIDE 20

Google Confidential and Proprietary

Bin Packing

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 10 5 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 5 2 2 B E 10

X

  • unable to shuffle demands w/o

○ some offline control ○ stateful knowledge network LSPs

  • 33% efficiency in capacity usage

○ efficiency dictated by order of event arrival

slide-21
SLIDE 21

Google Confidential and Proprietary

Scheduling

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 10 C-D 1 10 D-E 1 10

causes:

  • autobw empirically derives demand with

single period hysteresis

unable to use

historical timeseries

apriori knowledge of demand

network must be overprovisioned for either

  • ffline: worst case demand
  • ver reopt interval

(⇔) online: (autobw) reopt trigger threshold + safety margin

Time LSP Src Dst Demand

1 1 A E 2 2 2 B E 7 3 1 A E 7 3+k 1 A E 7

slide-22
SLIDE 22

Google Confidential and Proprietary

Scheduling

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand

1 1 A E 2 2 2 B E 7 3 1 A E 7 3+k 1 A E 7

slide-23
SLIDE 23

Google Confidential and Proprietary

Scheduling

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand

1 1 A E 2 2 2 B E 7 3 1 A E 7 3+k 1 A E 7

slide-24
SLIDE 24

Google Confidential and Proprietary

Scheduling

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 20 B-C 1 20 C-E 10 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand

1 1 A E 2 2 2 B E 7 3 1 A E 7 3+k 1 A E 7

slide-25
SLIDE 25

Google Confidential and Proprietary

Scheduling

1 1 10 1 1

B A D

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 10 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand

1 1 A E 2 2 2 B E 7 3 1 A E 7 3+k 1 A E 7

E C

slide-26
SLIDE 26

Google Confidential and Proprietary

Predictability

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 1 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Time LSP Src Dst Demand 1 2 B E 7 2 1 A E 7

VS causes:

  • routers act independently and

asynchronously ⇒ path dictated by order of event arrival

slide-27
SLIDE 27

Google Confidential and Proprietary

1 1 10 1 1

B A D E C

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 1 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Time LSP Src Dst Demand 1 2 B E 7 2 1 A E 7

VS

Predictability

slide-28
SLIDE 28

Google Confidential and Proprietary

1 1 10 1 1

B A D

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 1 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Time LSP Src Dst Demand 1 2 B E 7 2 1 A E 7

VS

C E

Predictability

slide-29
SLIDE 29

Google Confidential and Proprietary

1 1 10 1 1

B A D

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 1 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Time LSP Src Dst Demand 1 2 B E 7 2 1 A E 7

VS

C E

Predictability

slide-30
SLIDE 30

Google Confidential and Proprietary

1 1 10 1 1

B A D

Link Metric Capacity A-C 1 10 B-C 1 10 C-E 1 10 C-D 1 10 D-E 1 10 Time LSP Src Dst Demand 1 1 A E 7 2 2 B E 7 Time LSP Src Dst Demand 1 2 B E 7 2 1 A E 7

VS

C E

Predictability

slide-31
SLIDE 31

Google Confidential and Proprietary

Topics

  • SDN at Google today
  • Example SDN Use Case: TE
  • Our SDN Experience So Far
  • Research Opportunities
slide-32
SLIDE 32

Google Confidential and Proprietary

Google SDN Experiences

  • Much faster iteration time: deployed production-grade

centralized traffic engineering in two months ○ fewer devices to update ○ much better testing ahead of rollout

  • Simplified, high fidelity test environment

○ Can emulate entire backbone in software

  • Hitless SW upgrades and new features

○ Almost no packet loss and no capacity degradation ○ Most feature releases do not touch the switch ■ most state does not have to carried by network protocols

slide-33
SLIDE 33

Google Confidential and Proprietary

Topics

  • SDN at Google today
  • Example SDN Use Case: TE
  • Our SDN Experience So Far
  • Research Opportunities
slide-34
SLIDE 34

Google Confidential and Proprietary

SDN had been Around for Quite a While

Ipsilon GSMP 1996 Cambridge's The Tempest 1998 IETF FORCES 2000 IETF PCE 2004 Princeton's Routing Control Platform 2004 4d Initiative 2005 Ethane 2007 Openflow 2008

slide-35
SLIDE 35

Google Confidential and Proprietary

SDN Opportunities

And yet all of SDN is in it's infancy:

  • 1. Controller Switch abstractions

south-bound

  • 2. Controller Application abstractions

north-bound

  • 3. Controller Controller abstractions

east-west

  • 4. Applications
slide-36
SLIDE 36

Google Confidential and Proprietary

SDN Opportunities

And yet all of SDN is in it's infancy:

  • 1. Controller Switch abstractions

south-bound

  • 2. Controller Application abstractions

north-bound

  • 3. Controller Controller abstractions

east-west

  • 4. Applications

Structural

slide-37
SLIDE 37

Google Confidential and Proprietary

SDN South-Bound

  • OpenFlow: Still bare-bones but enough for initial

production deployment with apriori knowledge of system capabilities

  • ForCES: untested, no opensource implementation

currently

  • PCEP: low adoption currently
  • IRS(???), many other less developed protocols.

All of these abstractions are lacking in expressiveness and/or adoption.

slide-38
SLIDE 38

Google Confidential and Proprietary

SDN North-Bound

  • What should the north-bound API look like?
  • Should industry:

○ standardize? ○ wait for a de-facto controller to emerge with its own interfaces and an app store?

  • policy

○ composition ○ decomposition ○

  • ptimal state distribution
  • Some researchers are tackling this problem

○ Stanford ONRC ○ Nick@(?): Procera ○ JRex@ Princeton: http://www.frenetic-lang.org/papers/

slide-39
SLIDE 39

Google Confidential and Proprietary

SDN East-West

  • Inter-domain SDN...
slide-40
SLIDE 40

Google Confidential and Proprietary

SDN Applications

Having a centralized view allows new applications. Many of these applications require novel research. A few

  • f the most interesting to us are:
  • Traffic Engineering

○ Intra-domain ○ Inter-domain egress ○ optimization ○ scheduling ○ control theory

  • Security
  • Event Based Control
slide-41
SLIDE 41

Google Confidential and Proprietary

Some Examples of Recent Google Research from InfoCom 2012:

  • How to split a flow by Tzvika Hartman, Avinatan Hassidim, Haim Kaplan,

Danny Raz, and Michal Segalov

  • Upward max-min fairness by Emilie Danna, Avinatan Hassidim, Haim

Kaplan, Alok Kumar, Yishay Mansour, Danny Raz, and Michal Segalov (runner up for best paper)

  • A practical algorithm for balancing the max-min fairness and throughput
  • bjectives in traffic engineering by Emilie Danna, Subhasree Mandal, and

Arjun Singh

slide-42
SLIDE 42

Google Confidential and Proprietary

Conclusions

  • Despite it's relative immaturity, SDN is ready for real-

world use ○ Google's datacenter WAN successfully runs on SDN (OpenFlow) ○ Enables rapid rich feature deployment

  • Many Research Opportunities