Networking Challenges for the Next Decade Amin Vahdat On behalf of - - PowerPoint PPT Presentation

networking challenges for the next decade
SMART_READER_LITE
LIVE PREVIEW

Networking Challenges for the Next Decade Amin Vahdat On behalf of - - PowerPoint PPT Presentation

Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017 Google Network More than a collection of data centers FASTER (US, JP, TW) 2016 SJC (JP, HK, SG) 2013


slide-1
SLIDE 1

Networking Challenges for the Next Decade

Amin Vahdat

On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017

slide-2
SLIDE 2

Google Global Cache edge nodes FASTER (US, JP, TW) 2016 Unity (US, JP) 2010 SJC (JP, HK, SG) 2013 Points of presence >100 Network fiber

Google Network

More than a collection of data centers

slide-3
SLIDE 3

# # Future regions and number of zones Current regions and number of zones

3 3 2 3 3 3 3 3 2 4 3 3 2

Frankfurt Singapore S Carolina N Virginia Belgium London Taiwan Mumbai Sydney Oregon Iowa São Paulo Finland Tokyo Montreal California Netherlands

3 3 3 3

Google Cloud Regions

Adding 11 new regions

slide-4
SLIDE 4

Ubiquitous Cloud...10x Scaling

Datacenter

Next-gen disaggregation of storage, memory and compute

Campus & Metro

Cloud regions and campus expansion driving DC interconnect

WAN

Cloud replication and bandwidth intensive cloud services (e.g., turnkey video, IoT)

10x 10x 10x

Step Function Disruptions: Bandwidth, Latency, Availability, Predictability

slide-5
SLIDE 5

B4

WAN Interconnect

Andromeda

NFV and network virtualization

Jupiter

Datacenter Networking

The Pillars of SDN @ Google

slide-6
SLIDE 6

B4

WAN Interconnect

Andromeda

NFV and network virtualization

Jupiter

Datacenter Networking

The Pillars of SDN @ Google

Espresso

SDN for public Internet

slide-7
SLIDE 7

B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]

B4: Google's Software Defined WAN

slide-8
SLIDE 8

B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]

B4: From Copy Network to Business Critical

B4 traffic

2012 — 2016

slide-9
SLIDE 9

10.1.4/24

VNET: 5.4/16 VNET: 192.168.32/24 VNET: 10.1.1/24

Load Balancing DoS ACLs VPN

NFV

Internal Network

Andromeda

ToR

Google Infrastructure Services

10.1.1/24

ToR

10.1.2/24

ToR

10.1.3/24

ToR

slide-10
SLIDE 10

Watchtower Saturn Firehose 1.1

Google Datacenter Network Innovation

And hardware scale that we could not buy

10

Time Capacity Firehose 1.0 Jupiter 4 Post 1.3Pb/s clusters in 2013

slide-11
SLIDE 11

B4

WAN Interconnect

Andromeda

NFV and network virtualization

Jupiter

Datacenter Networking

The Pillars of SDN @ Google

Public Internet?

slide-12
SLIDE 12

B4

WAN Interconnect

Andromeda

NFV and network virtualization

Jupiter

Datacenter Networking

The Pillars of SDN @ Google

Espresso

SDN for public Internet

slide-13
SLIDE 13

Espresso in Context

B4 Jupiter Data Center Google

slide-14
SLIDE 14

Espresso in Context

B4 B2 Peering Metro Jupiter Data Center Google Google

slide-15
SLIDE 15

Espresso in Context

B4 Espresso B2 Internet Peering Metro User Jupiter Data Center Google Google

slide-16
SLIDE 16

Cloud 1.0 Espresso SDN Peering Router Centric Protocols

Espresso: Before and After

Local view Connectivity first Coarse fault recovery Per-metro and global view Application signals Real-time optimization

slide-17
SLIDE 17

Espresso Architecture Overview

Label-switched Fabric BGP speaker External Peer

Espresso Metro

Peering Fabric eBGP Peering

slide-18
SLIDE 18

Espresso Architecture Overview

Label-switched Fabric Host Host Host Host Host Host Packet Processor BGP speaker External Peer eBGP Peering

Espresso Metro

Labeled packets specify egress

Host Host Host Host Host Peering Fabric

slide-19
SLIDE 19

Espresso Architecture Overview

Label-switched Fabric Host Host Host Host Host Host Packet Processor Local Control

Global Controller

BGP speaker External Peer eBGP Peering

Espresso Metro

Application Signals Labeled packets specify egress

Host Host Host Host Host Peering Fabric

slide-20
SLIDE 20

The next wave in computing

  • Serverless compute in Cloud 3.0
  • IoT
  • Tightly coupled, general purpose

distributed computing It’s time to put it all together

  • Agile Scale
  • Jitter
  • Isolation
  • Performance is great, but only

meaningful with availability, manageability, and velocity

Next Decade Challenges in Networking

slide-21
SLIDE 21

Virtualization delivers capex savings to enterprise DCs

Cloud 1.0

Last Decade

slide-22
SLIDE 22

Cloud 1.0

Public cloud frees enterprise from private HW infrastructure

Scheduling, load balancing primitives, “big data” query processing

Cloud 2.0 Cloud 1.0

HW on Demand

Now

slide-23
SLIDE 23

Cloud 1.0 Cloud 2.0

Serverless compute, real-time intelligence, and machine learning

Not data placement, load balancing, OS configuration and patching

Cloud 3.0

Compute, not servers

The Third Wave of Cloud Computing

slide-24
SLIDE 24

Cloud 2.0

Networking should be aiming for Cloud 3.0

Cloud 3.0 Cloud 1.0

The Third Wave of Cloud Computing

slide-25
SLIDE 25

Storage disaggregation:

the datacenter is the storage appliance

Seamless telemetry

and scale up/down

Transparent live migration Open Marketplace

  • f services, securely placed and accessed

Networking and Cloud 3.0

slide-26
SLIDE 26

Applications+Functions

not VMs

Policy

not middleboxes

Actionable Intelligence

not data processing

SLOs

not placement/load balancing/scheduling

Networking and Cloud 3.0

slide-27
SLIDE 27

The network will enable next-generation compute infrastructure The network can define next-generation storage infrastructure The right network infrastructure can deliver fundamental new capability

Next Decade Challenges in Networking

slide-28
SLIDE 28

How we Prioritize Infrastructure Work

Availability Manageability Velocity Stranding Performance

slide-29
SLIDE 29
  • First things first: an insecure infrastructure is an unavailable infrastructure
  • Stability is more important than efficiency
  • Network management is critical
  • Configuration is hard
  • Automation matters but can be counter to availability

“Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure.” SIGCOMM 2016.

Availability is Paramount

slide-30
SLIDE 30
  • Velocity is the speed of iteration
  • Retrospective on “Tussle in Cyberspace:

Defining Tomorrow’s Internet”

  • Build for hitless upgrades and

self-validation

  • Debugging and tracing matter

○ Without visibility, performance does not matter

  • Network fabrics built for expansion and

evolution

  • Launch and Iterate

Build for Velocity

slide-31
SLIDE 31

Isolation with reservations is easy but leads to huge resource stranding

  • General-purpose, shared infrastructure to approximate custom-built and reserved

Isolation has many components

  • Latency, bandwidth, but also the control plane
  • Accounting and chargeback are big missing pieces

Congestion Control is still really hard

  • Rationalizing multiple control loops, flow, endpoint, flow group, Traffic Engineering

Isolation is Critical; Stranding is Terrible

slide-32
SLIDE 32

Amdahl’s law applies and so an incredible, localized optimization that takes any effort to adopt will be ignored

1.

Scale

2.

Jitter

3.

Storage Disaggregation Must optimize from the application all the way to the end user

Performance only Matters if End to End

slide-33
SLIDE 33

How we Prioritize Infrastructure Work

Availability Manageability Velocity Stranding Performance

slide-34
SLIDE 34

The next wave of computing

  • Serverless compute in Cloud 3.0
  • IoT
  • Tightly coupled, general purpose

distributed computing It’s time to put it all together

  • Agile Scale
  • Jitter
  • Isolation
  • Performance is great, but only

meaningful with availability, manageability, and velocity

Next Decade Challenges in Networking

slide-35
SLIDE 35

Thank You! Thank You!

slide-36
SLIDE 36

Open Source

Google Cloud Platform 36

Google MapReduce Google Bigtable Google Borg Google Borg Google Dremel

slide-37
SLIDE 37

Open Source

Google Cloud Platform 37

TCP BBR gRPC Open Config QUIC ...