Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant - - PowerPoint PPT Presentation

gatekeeper supporting bandwidth guarantees for
SMART_READER_LITE
LIVE PREVIEW

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant - - PowerPoint PPT Presentation

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks Henrique Rodrigues , Yoshio Turner , Jose Renato Santos , Paolo Victor , Dorgival Guedes HP Labs WIOV 2011, Portland, OR The Problem: Network Performance


slide-1
SLIDE 1

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks

Henrique Rodrigues , Yoshio Turner , Jose Renato Santos , Paolo Victor , Dorgival Guedes

HP Labs

WIOV 2011, Portland, OR

slide-2
SLIDE 2

The Problem: Network Performance Isolation

slide-3
SLIDE 3

Suppose that you have a datacenter…

slide-4
SLIDE 4

Suppose that you have a datacenter…

slide-5
SLIDE 5

And you are an IaaS provider …

70% BW

slide-6
SLIDE 6

And you are an IaaS provider …

70% BW 30% BW

slide-7
SLIDE 7

... and your network faces this traffic pattern:

70% BW 30% BW

TCP TCP

slide-8
SLIDE 8

... and your network faces this traffic pattern:

TCP TCP

70% BW 30% BW

slide-9
SLIDE 9

... and your network faces this traffic pattern:

TCP TCP

TCP is flow-based, not tenant-aware...

70% BW 30% BW

slide-10
SLIDE 10

It becomes worse with these transport protocols:

70% BW 30% BW

TCP UDP

slide-11
SLIDE 11

It becomes worse with these transport protocols:

UDP will consume most of the bandwidth

70% BW 30% BW

TCP UDP

slide-12
SLIDE 12

It becomes worse with these transport protocols:

Using rate limiters at each server doesn’t solve the problem…

70% BW 30% BW

TCP UDP

slide-13
SLIDE 13

It becomes worse with these transport protocols:

70% BW 30% BW

TCP UDP If you limit the rate of each at 30% Using rate limiters at each server doesn’t solve the problem…

slide-14
SLIDE 14

It becomes worse with these transport protocols:

70% BW 30% BW

The aggregate RX is 90% TCP UDP Using rate limiters at each server doesn’t solve the problem…

slide-15
SLIDE 15

The Problem: Network Performance Isolation

  • How can we enforce that all tenants will have at least the

minimum amount of network resources they need to keep their services up?

  • In other words, how to provide network performance

isolation to multi-tenant datacenters?

slide-16
SLIDE 16

Practical requirements for a traffic isolation mechanism/system

slide-17
SLIDE 17

Requirements for a practical solution

  • Scalability

Datacenter supports thousands of physical servers hosting 10s of thousands of tenants and 10s to 100s of thousands

  • f VMs
  • Intuitive Service Model

Straightforward for tenants to understand and specify their network performance needs

  • Robust against untrusted tenants

IaaS model allows users to run arbitrary code as tenants, giving users total control over the network stack. Malicious users could jeopardize the performance of other tenants

  • Flexibility / Predictability

What should we do with the idle bandwidth? Work conserving vs non-work conserving?

slide-18
SLIDE 18

Existing solutions don’t meet all these requirements

Solution Scalable Flexibility / Predictability Intuitive Model Robustness TCP

✗ ✗ ✗

BW Capping (policing)

Secondnet

✔ ✔

Seawall

✗ ✗

AF-QCN

✗ ✗

slide-19
SLIDE 19

Our approach

slide-20
SLIDE 20

Assumption

Bisection bandwidth should not be a problem:

  • Emerging multi-path technologies will enable high

bandwidth networks with full-bisection bandwidth

  • Smart tenant placement: tenant VMs placed close to each
  • ther in the network topology
  • Results on DC traffic analysis show that most of the

congestion happens within racks, not at the core

slide-21
SLIDE 21

Our approach

  • Assume core is over-provisioned and manage

bandwidth at edge

  • Addresses scalability challenge:

Limited number of tenants in each edge link

slide-22
SLIDE 22

Tenant Performance Model Abstraction

VM

BW1

VM

BW2

VM

BW3

VM

BW8

VM

BW7

VM

BW6

VM

BW4

VM

BW5

VM

BW10

VM

BW9

  • Simple abstraction to tenant
  • Model similar to physical servers connected to a switch
  • Guaranteed bandwidth for each VM (TX and RX)
  • Minimum and Maximum rate per vNIC
slide-23
SLIDE 23

Gatekeeper

  • Provides network isolation for multi-tenant datacenters

using a distributed mechanism

  • Agents implemented at the virtualization layer coordinate

bandwidth allocation dynamically, based on tenants’ guarantees

slide-24
SLIDE 24
  • Agents in the VMM control the transmission (TX)

and coordinate the reception (RX)

Gatekeeper

slide-25
SLIDE 25

Gatekeeper - Overview

UDP TCP

70% BW 30% BW

slide-26
SLIDE 26

Gatekeeper - Overview

UDP TCP

70% BW 30% BW Congestion!

slide-27
SLIDE 27

Gatekeeper - Overview

UDP TCP

70% BW 30% BW

OK! Reducing TX OK! Reducing TX OK! Reducing TX

slide-28
SLIDE 28

Gatekeeper Architecture

slide-29
SLIDE 29

Gatekeeper Prototype

  • Xen/Linux
  • Gatekeeper integrated into Linux Open vSwitch
  • Leverage Linux traffic control mechanism (HTB) for

rate control

slide-30
SLIDE 30

Example - RX

2 Tenants share a gigabit link:

  • Tenant A
  • 70% of the link,
  • 1 TCP Flow
  • Tenant B
  • 30% of the link,
  • 3 Flows (TCP or UDP)
slide-31
SLIDE 31

Example - TX

2 Tenants share a gigabit link:

  • Tenant A
  • 70% of the link,
  • 1 TCP Flow
  • Tenant B
  • 30% of the link,
  • 3 Flows (TCP or UDP)
slide-32
SLIDE 32

Example – Results without Gatekeeper

Type of traffic for tenant B Tenant A (TCP) Tenant B

0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡

no control TX rate cap

Transmit (TX) Scenario

slide-33
SLIDE 33

0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ 0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡

Tenant A (TCP)

Receive (RX) Scenario

Tenant B

Transmit (TX) Scenario

Type of traffic for tenant B Type of traffic for tenant B

Example – Results without Gatekeeper

no control RX rate cap no control TX rate cap

slide-34
SLIDE 34

0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ 0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡

Type of traffic for tenant B Type of traffic for tenant B

  • Bandwidth Capping doesn’t

reallocate unused bandwidth (non work-conserving)

  • UDP consumes most of the

switch resources

Example – Results without Gatekeeper

Receive (RX) Scenario Transmit (TX) Scenario

Tenant A (TCP) Tenant B no control RX rate cap no control TX rate cap

slide-35
SLIDE 35

0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ 0 ¡ 100 ¡ 200 ¡ 300 ¡ 400 ¡ 500 ¡ 600 ¡ 700 ¡ 800 ¡ 900 ¡ 1000 ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡ none ¡ TCP ¡UDP ¡

no control RX rate cap Gatekeeper predictable Gatekeeper flexible no control TX rate cap Gatekeeper predictable Gatekeeper flexible

Example – Results with Gatekeeper

Receive (RX) Scenario Transmit (TX) Scenario

Tenant A (TCP) Tenant B Type of traffic for tenant B Type of traffic for tenant B

slide-36
SLIDE 36

Summary

  • Gatekeeper provides network bandwidth guarantee at the

server virtualization layer

§ Extends hypervisor to control RX bandwidth

  • Prototype implemented and used to demonstrate Gatekeeper

in simple scenario

  • Future work

§ Evaluate Gatekeeper at larger scales

§ HP Labs Open Cirrus testbed (100+ nodes)

§ Further explore the design space

§ Functions to decrease/increase rate, etc

§ Evaluate Gatekeeper with more realistic benchmarks and applications

slide-37
SLIDE 37

Contacts:

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks

WIOV 2011, Portland, OR

{hsr,dorgival}@dcc.ufmg.br {yoshio_turner,joserenato.santos}@hp.com

Acknowledgements:

Brasil