Networking Challenges for the Next Decade
Amin Vahdat
On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017
Networking Challenges for the Next Decade Amin Vahdat On behalf of - - PowerPoint PPT Presentation
Networking Challenges for the Next Decade Amin Vahdat On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017 Google Network More than a collection of data centers FASTER (US, JP, TW) 2016 SJC (JP, HK, SG) 2013
Amin Vahdat
On behalf of Google Technical Infrastructure and Google Cloud Platform APRIL 4, 2017
Google Global Cache edge nodes FASTER (US, JP, TW) 2016 Unity (US, JP) 2010 SJC (JP, HK, SG) 2013 Points of presence >100 Network fiber
More than a collection of data centers
# # Future regions and number of zones Current regions and number of zones
3 3 2 3 3 3 3 3 2 4 3 3 2
Frankfurt Singapore S Carolina N Virginia Belgium London Taiwan Mumbai Sydney Oregon Iowa São Paulo Finland Tokyo Montreal California Netherlands
3 3 3 3
Adding 11 new regions
Datacenter
Next-gen disaggregation of storage, memory and compute
Campus & Metro
Cloud regions and campus expansion driving DC interconnect
WAN
Cloud replication and bandwidth intensive cloud services (e.g., turnkey video, IoT)
Step Function Disruptions: Bandwidth, Latency, Availability, Predictability
B4
WAN Interconnect
Andromeda
NFV and network virtualization
Jupiter
Datacenter Networking
B4
WAN Interconnect
Andromeda
NFV and network virtualization
Jupiter
Datacenter Networking
Espresso
SDN for public Internet
B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]
B4: [Jain et al, SIGCOMM 13] BwE: [Jain et al, SIGCOMM 15]
B4 traffic
2012 — 2016
10.1.4/24
VNET: 5.4/16 VNET: 192.168.32/24 VNET: 10.1.1/24
Load Balancing DoS ACLs VPN
Internal Network
ToR
Google Infrastructure Services
10.1.1/24
ToR
10.1.2/24
ToR
10.1.3/24
ToR
Watchtower Saturn Firehose 1.1
And hardware scale that we could not buy
10
Time Capacity Firehose 1.0 Jupiter 4 Post 1.3Pb/s clusters in 2013
B4
WAN Interconnect
Andromeda
NFV and network virtualization
Jupiter
Datacenter Networking
Public Internet?
B4
WAN Interconnect
Andromeda
NFV and network virtualization
Jupiter
Datacenter Networking
Espresso
SDN for public Internet
B4 Jupiter Data Center Google
B4 B2 Peering Metro Jupiter Data Center Google Google
B4 Espresso B2 Internet Peering Metro User Jupiter Data Center Google Google
Cloud 1.0 Espresso SDN Peering Router Centric Protocols
Local view Connectivity first Coarse fault recovery Per-metro and global view Application signals Real-time optimization
Label-switched Fabric BGP speaker External Peer
Espresso Metro
Peering Fabric eBGP Peering
Label-switched Fabric Host Host Host Host Host Host Packet Processor BGP speaker External Peer eBGP Peering
Espresso Metro
Labeled packets specify egress
Host Host Host Host Host Peering Fabric
Label-switched Fabric Host Host Host Host Host Host Packet Processor Local Control
Global Controller
BGP speaker External Peer eBGP Peering
Espresso Metro
Application Signals Labeled packets specify egress
Host Host Host Host Host Peering Fabric
The next wave in computing
distributed computing It’s time to put it all together
meaningful with availability, manageability, and velocity
Virtualization delivers capex savings to enterprise DCs
Cloud 1.0
Cloud 1.0
Public cloud frees enterprise from private HW infrastructure
Scheduling, load balancing primitives, “big data” query processing
Cloud 2.0 Cloud 1.0
HW on Demand
Cloud 1.0 Cloud 2.0
Serverless compute, real-time intelligence, and machine learning
Not data placement, load balancing, OS configuration and patching
Cloud 3.0
Compute, not servers
Cloud 2.0
Cloud 3.0 Cloud 1.0
Storage disaggregation:
the datacenter is the storage appliance
Seamless telemetry
and scale up/down
Transparent live migration Open Marketplace
Applications+Functions
not VMs
Policy
not middleboxes
Actionable Intelligence
not data processing
SLOs
not placement/load balancing/scheduling
The network will enable next-generation compute infrastructure The network can define next-generation storage infrastructure The right network infrastructure can deliver fundamental new capability
Availability Manageability Velocity Stranding Performance
“Evolve or Die: High-Availability Design Principles Drawn from Google’s Network Infrastructure.” SIGCOMM 2016.
Defining Tomorrow’s Internet”
self-validation
○ Without visibility, performance does not matter
evolution
Isolation with reservations is easy but leads to huge resource stranding
Isolation has many components
Congestion Control is still really hard
Amdahl’s law applies and so an incredible, localized optimization that takes any effort to adopt will be ignored
1.
Scale
2.
Jitter
3.
Storage Disaggregation Must optimize from the application all the way to the end user
Availability Manageability Velocity Stranding Performance
The next wave of computing
distributed computing It’s time to put it all together
meaningful with availability, manageability, and velocity
Google Cloud Platform 36
Google MapReduce Google Bigtable Google Borg Google Borg Google Dremel
Google Cloud Platform 37
TCP BBR gRPC Open Config QUIC ...