Flexible Networking at Large Mega-Scale Exploring issues and - - PowerPoint PPT Presentation
Flexible Networking at Large Mega-Scale Exploring issues and - - PowerPoint PPT Presentation
Flexible Networking at Large Mega-Scale Exploring issues and solutions What is Mega-Scale? One or more of: > 10,000 compute nodes > 100,000 IP addresses > 1 Tb/s aggregate bandwidth Massive East/West traffic
SLIDE 1
SLIDE 2
What is “Mega-Scale”?
One or more of:
- > 10,000 compute nodes
- > 100,000 IP addresses
- > 1 Tb/s aggregate bandwidth
- Massive East/West traffic between tenants
Yahoo is “Mega-Scale”
SLIDE 3
What are our goals?
- Mega-Scale, with
○ Reliability
■ Yahoo supports ~200 million users/day -- it must be reliable
○ Flexibility
■ Yahoo has 100s of internal and user-facing services
○ Simplicity
■ Undue complexity is the enemy of scale!
SLIDE 4
Our Strategy
Leverage high-performance network design with:
➢ OpenStack ➢ Augmented with additional automation ➢ Hosting applications designed to be “disposable”
- Fortunately, we already had many of the needed pieces
SLIDE 5
Traditional network design
- Large layer 2 domains
- Cheap to build and manage
- Allows great flexibility of solutions
- Leverage pre-existing network design
- IP mobility across the entire domain
It’s Simple. But...
SLIDE 6
L2 Networks Have Limits
- The L2 Domain can only be extended so far
○ Hardware TCAM limitations (size and update rate) ○ STP scaling/stability issues
- But an L3 network can
○ scale larger ○ at less cost ○ but limits flexibility
SLIDE 7
Potential Solutions
- Why not use a Software Defined Network?
○ Overlay allows IP mobility but
■ Control plane limits scale and reliability ■ Overhead at on-ramp boundaries ○ OpenFlow-based solutions ■ Not ready for mega-scale yet w/ L3 support ■ Control plane complexities
Not Ready for Mega-Scale
SLIDE 8
Our Solution
- Use Clos design network backplane
- Each cabinet has a Top-Of-Rack router
○ Cabinet is a separate L2 domain ○ Cabinets “own” one or more subnets (CIDRs) ○ OpenStack is patched to “know” which subnet to use
- Network backplane supports East-West and North-
South traffic equally Well
- Structure is ideal if we decide to deploy SDN overlay
SLIDE 9
A solution for scale: Layer 3 to the rack
Compute Racks
L2 L3
Compute + Admin
Admin= API, DB, MQ, etc
...
- Clos-based L3 network
- TOR (Top Of Rack) routers
SLIDE 10
Adding Robustness With Availability Zones
SLIDE 11
Problems
- No IP Mobility Between Cabinets
○ Moving a VM between cabinets requires a re-IP ○ Many small subnets rather than one or more large
- nes
○ Scheduling complexities: ■ Availability zones, rack-awareness
- Other issues
○ Coordination between clusters ○ Integration with existing infrastructure
You call that “flexible?”
SLIDE 12
(re-)Adding Flexibility
- Leverage Load Balancing
○ Allows VMs to be added and removed
(remember, our VMs are mostly “disposable”)
○ Conceals IP changes (such as rack/rack movement) ○ Facilitates high-availability ○ Is the key to flexibility in what would otherwise be a constrained architecture
SLIDE 13
(re-)Adding Flexibility (cont’d)
- Automate it:
○ Load Balancer Management ■ Device selection based on capacity & quotas ■ Association between service groups and VIPs ■ Assignment of VMs to VIPs ○ Availability Zone selection & balancing ○ Multiple cluster integration
- Implement “Service Groups”
○ (external to OpenStack -- for now)
SLIDE 14
Service Groups
- Consists of groups of VMs running the same
application
- Can be a layer of an application stack, an
implementation of an internal service, or a user-facing server
- Present an API that functions behind a VIP
○ Web services everywhere!
SLIDE 15
Service Group Creation
SLIDE 16
Integrating With Openstack
SLIDE 17
Putting It Together
- Registration of hosts and services
○ A VM is associated with a service group at creation ○ A tag associated with the service group is accessible to resource allocation
- Control of load balancers
○ Allocates and controls hardware ○ Manages VMs for each service group ○ Provides elasticity and robustness
SLIDE 18
Putting It Together (cont’d)
- OpenStack Extensions and Patches
○ Three points of integration:
- 1. Intercept request before issue
- 2a. Select network based on hypervisor
- 2b. Transmit new instance information to external automation
- 3. Transmit deleted instance information to external automation
SLIDE 19
Wither OpenStack?
- Our Goals:
○ Minimize patching code ○ Minimize points of integration with external systems ○ Contribute back patches of general use ○ Replace custom code with community code: ■ Use Heat for automation ■ Use LBaaS to control load balancers ○ Share our experiences
SLIDE 20
Complications
- OpenStack clusters don’t exist in a vacuum
- - this makes scaling them harder
○ Existing physical infrastructure ○ Existing management infrastructure ○ Interaction with off-cluster resources ○ Security and organizational policies ○ Requirements of existing software stack ○ Stateful application introduce complexities
SLIDE 21
Conclusion
- Mega-Scale has unique issues
○ Many potential solutions don’t scale sufficiently ○ Some flexibility must be sacrificed
*BUT*
○ Mega-Scale also admits solutions that aren’t practical or cost-effective at smaller scale ○ Automation and integration with external infrastructure is key
SLIDE 22
Questions
?
email: edhall@yahoo-inc.com