CrossBow: A vertically integrated QoS stack Sunay Tripathi, Nicolas - - PowerPoint PPT Presentation

crossbow a vertically integrated qos stack
SMART_READER_LITE
LIVE PREVIEW

CrossBow: A vertically integrated QoS stack Sunay Tripathi, Nicolas - - PowerPoint PPT Presentation

CrossBow: A vertically integrated QoS stack Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan, Kais Belgaied, Venu Iyer Aug 21 st . 2009 Sigcomm WREN 2009, Barcelona Sunay Tripathi, Distinguished Engineer, Sun Microsystems Inc


slide-1
SLIDE 1

Sunay Tripathi, Distinguished Engineer, Sun Microsystems Inc Sunay.Tripathi@Sun.Com

CrossBow: A vertically integrated QoS stack

Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan, Kais Belgaied, Venu Iyer

Aug 21st. 2009 Sigcomm WREN 2009, Barcelona

slide-2
SLIDE 2

www.opensolaris.org/os/project/crossbow 2

Issues in Host based QoS solutions

  • Performance

> Additional Classification/Queuing for all packets > QoS layers typically high up in the stack (bulk of the work already done) > Packet needs to be DMA'd into the system before any policy can be applied

  • QoS layers are typically bump in the stack
  • Management complexities
slide-3
SLIDE 3

www.opensolaris.org/os/project/crossbow 3

Crossbow: Solaris Networking Stack

  • 8 years of development work to achieve

> Scalability across multi-core CPUs and multi-10gigE bandwidth > Virtualization, QoS, High Availibility designed in > Exploit advanced NIC features

  • Key Enabler for

> Server and Network Consolidation > Resource partitioning > Cloud computing

slide-4
SLIDE 4

www.opensolaris.org/os/project/crossbow 4

Crossbow “Hardware Lanes”

Ground-Up Design for multi-core and multi-10GigE

  • Linear Scalability using 'Hardware Lanes' with dedicated resources
  • Network Virtualization and QoS designed in the stack
  • More Efficiency due to 'Dynamic Polling and Packet Chaining'

Physical Machine Physical NIC Hardware Lane

C L A S S I F I E R

Virtual NIC Hardware Rings/DMA Kernel Threads and Queues Virtual NIC Kernel Threads and Queues Squeue Hardware Rings/DMA Kernel Threads and Queues Virtual Machine/Zone Virtual Machine/Zone Application Switch

VLAN Separated

Hardware Rings/DMA

slide-5
SLIDE 5

www.opensolaris.org/os/project/crossbow 5

Hardware Lanes and Dynamic Polling

  • Partition the NIC Hardware (Rx/Tx rings, DMA), kernel

queues/threads, and CPU to allow creation of “Hardware Lane” which can be assigned to VNICs & Flows

  • Use Dynamic Polling on Rx/Tx rings to schedule rate of

packet arrival and transmission on a per lane bassis

  • Effect of dynamic polling

Mpstat (older driver)

intr ithr csw icsw migr smtx srw syscl usr sys wt idl 10818 8607 4558 1547 161 1797 289 19112 17 69 0 12

Mpstat (GLDv3 based driver)

intr ithr csw icsw migr smtx srw syscl usr sys wt idl 2823 1489 875 151 93 261 1 19825 15 57 0 27

  • Use Dynamic polling for B/W partitioning and isolation

without any support from switches and routers

~75% Fewer Interrupts ~85% Fewer Mutexes ~85% Fewer Context Switches ~15% More CPU Free

slide-6
SLIDE 6

www.opensolaris.org/os/project/crossbow 6

Crossbow Flows: Service Virtualization

Services and Protocols

Compute Resources

NIC 1 CPU 1

VIRTUAL SQUEUE

CPU 2

VIRTUAL SQUEUE

CPU 'n'

VIRTUAL SQUEUE

CPU 1 Virtual Squeue CPU 2 Virtual Squeue

VOIP

SQUEUE

HTTPS

SQUEUE

DEFAULT

SQUEUE

TCP

SQUEUE

UDP

SQUEUE

DEFAULT

SQUEUE

Kernel threads/Qs Memory Partition Memory Partition Memory Partition Memory Partition Memory Partition Memory Partition Flow Classifier Flow Classifier NIC 2 Kernel threads/Qs Kernel threads/Qs Kernel threads/Qs Kernel threads/Qs Kernel threads/Qs

slide-7
SLIDE 7

www.opensolaris.org/os/project/crossbow 7

Crossbow Flows

Crossbow Flows based on:

> Services (protocol + remote/local ports) > Transport (TCP, UDP, SCTP, iSCSI, etc) > Remote and local IP addresses > Remote IP Subnets > DSCP labels

Following attributes can be set on each Flow

> B/W limits > Priorities > CPUs

# flowadm create-flow -l bge0 protocol=tcp,local_port=443 -p maxbw=50M http-1 # flowadm set-flowprop -l bge0 -p maxbw=100M http-1

slide-8
SLIDE 8

www.opensolaris.org/os/project/crossbow 8

Virtual Network Containers

Flow Classifier

Exclusive IP Instance

Rx/Tx

DMA

Rx/Tx

DMA

Rx/Tx

DMA

NIC bge0

VNIC1 (100Mbps) VNIC2 (200Mbps) Exclusive IP Instance

Virtual

SQUEUE

Virtual

SQUEUE

Zone

xb1-z1

Zone

xb1-z2

Client

xb2

Client

xb3

Solaris Global Zone Virtualization

  • Flows
  • Virtual NICs & Virtual Switches
  • Virtual Wire

Resource Control

  • Bandwidth Partitioning
  • NIC H/W Partitioning
  • CPUs/pri assignment

Observability

  • Real time usage for each Link/flow
  • Finer grained stats per Link/flow
  • History at no cost
slide-9
SLIDE 9

www.opensolaris.org/os/project/crossbow 9

Defense against DOS/DDOS

  • DDOS have the ability to cripple entire server farms and all

services offered by them

  • Only the impacted services or virtual machine takes the hit

instead of the entire grid

  • Under attack, impacted services start all new connections

under lower priority flow with limited bandwidth

  • Connections transition to appropriate priority stacks after

application authentication

  • IDS systems can use Crossbow APIs to create '0' B/W flows

based on remote IP addresses or subnets of the attackers and minimize their impact

slide-10
SLIDE 10

BACKUP

slide-11
SLIDE 11

www.opensolaris.org/os/project/crossbow 11

Solaris Core Network Functionality

Scalable Virtualized TCP/IP Stack Crossbow: Network Virtualization

Virtual NICs Virtual Switches Virtual Wire Flows QoS Observ- ability L2 Classification, Filtering

Kernel Sockets

L2 Bridge L3/L4 Load Balancer IPFilter (Firewall) IP Tunnels

Kernel

Generic LAN Driver – GLDv3

Aggr, SR-IOV, Vanity Names

1gigE/10gigE

(Neptune, Niantic, etc)

FCOE IPoIB

Driver

Routing Protocols (Quagga)

Developer Tools and Management Interfaces

User

VRRP (Routing HA) IP Multi Pathing Perf Diag Tools

  • Networking Services

>

Routing Protocols using Quagga

>

L3/L4 Load Balancer kernel modules

>

IP Firewall (IPFilter)

>

DNS, DHCP, NTP, SIP, VOIP, etc

  • Scalable & Virtualized Network Stack

>

Kernel Socket & Socket Filter

>

Modernized TCP/IP Stack

>

QoS: B/W limits, Priorities, CPU bindings

>

IP Multi Pathing (IPMP)

>

IP Tunneling

>

Defense against DDoS attacks

  • Crossbow: Virtual Networking

>

VNICs, VSwitches, VWire

>

Service Virtualization (Flows)

>

L2 Services: Classification, Filtering

  • Generic LAN Driver v3 – GLDv3

>

Aggregation

>

Vanity Names

>

Drivers (1GbE and 10GbE, FCoE, IPoIB)

S Y S A P I s

Kernel Socket API MAC Client API MAC Driver API IP Hooks API

slide-12
SLIDE 12

www.opensolaris.org/os/project/crossbow 12

Virtual NIC (VNIC) & Virtual Switches

Virtual NICs

> Functionally physical NICs:

> IP address assigned statically or via DHCP and snooped individually > Appear in MIB as separate 'if' with configured link speed shown as 'ifspeed' > VNICs can be created over Link Aggregation on can be assigned to IPMP groups for load balancing and failover support

> VNICs Can have multiple hardware lanes assigned to them > Can be created over physical NIC (without needing a Vswitch) to

provide external connectivity with switching done in NIC H/W

> VNICs have configurable link speed, CPU and priority assignment > Standards based End to End Network Virtualization

> VLAN tags and Priority Flow Control (PFC) assigned to VNIC extend Hardware Lanes to Switch

> No configuration changes needed on switch to support virtualization

Virtual Switches

> Can be created to provide private connectivity between Virtual

Machines

slide-13
SLIDE 13

www.opensolaris.org/os/project/crossbow 13

Virtual NIC & Virtual Switch Usage

# dladm create-vnic -l bge1 vnic1 # dladm create-vnic -l bge1 -m random -p maxbw=100M -p cpus=4,5,6 vnic2 # dladm create-etherstub vswitch1 # dladm show-etherstub LINK vswitch1 # dladm create-vnic -l vswitch1 -p maxbw=1000M vnic3 # dladm show-vnic LINK OVER MACTYPE MACVALUE BANDWIDTH CPUS vnic1 bge1 factory 0:1:2:3:4:5 - - vnic2 bge1 random 2:5:6:7:8:9 max=100M 4,5,6 vnic3 vswitch1 random 4:3:4:7:0:1 max=1000M

  • # dladm create-vnic -l ixgbe0 -v 1055 -p maxbw=500M -p cpus=1,2 vnic9
slide-14
SLIDE 14

www.opensolaris.org/os/project/crossbow 14

Physical Wire w/Physical Machines

Client Router

Virtual Wire w/Virtual Network Machines

Host 1 Host 2

Port 6 20.0.03 1 Gbps 1 Gbps 100 Mbps 1 Gbps Port 9 20.0.01 Port 3 10.0.03 Port 1 10.0.01 Port 2 10.0.02

Switch 3 Switch 1 Client Router

(Virtual Router) VNIC6 20.0.03 1 Gbps 1 Gbps 1 Gbps 100 Mbps 1 Gbps VNIC9 20.0.01 VNIC3 10.0.03 VNIC1 10.0.01 VNIC2 10.0.02 1 Gbps

EtherStub 3 EtherStub 1 Host 1 Host 2

slide-15
SLIDE 15

www.opensolaris.org/os/project/crossbow 15

Crossbow extends H/W Lanes to Switch

Switch

Physical NIC

Packet Classifier Rx/Tx Rings Rx/Tx Rings Rx/Tx Rings

Zone/Virtual Machine A VNIC A (100Mbps) VNIC B (500Mbps) Zone/Virtual Machine B Pause Frame sent By VNIC-A to switch asking it to slow the incoming traffic for VM-A

Client A

(Sending traffic to Virtual Machine A faster than 100 Mbps)

  • Dedicated path from switch to the Virtual Machine
  • VNIC A can send PFC pause to switch forcing the traffic

from client A to slow down

  • Incoming traffic for Virtual machine B (who has higher

configured link speed) does not suffer

slide-16
SLIDE 16

www.opensolaris.org/os/project/crossbow 16

Virtual Machines

Solaris Guest OS 1 Solaris Guest OS 2 Solaris Host OS Host OS

VIRTUAL SQUEUE All Traffic NIC Virtualization Engine NIC Virtualization Engine NIC Virtualization Engine

Guest OS 1

VIRTUAL SQUEUE

Guest OS 2

VIRTUAL SQUEUE All Traffic Host OS VNIC Guest OS 2 VNIC

NIC

H/W Flow Classifier HTTP

SQUEUE

HTTPS

SQUEUE

DEFAULT

SQUEUE

Virtual NIC Virtual NIC Virtual NIC

Host OS All traffic Guest OS 1 HTTP Guest OS 1 HTTPS Guest OS 1 DEFAULT Guest OS 2 All Traffic

slide-17
SLIDE 17

www.opensolaris.org/os/project/crossbow 17

Dynamic Polling: Effect on Throughput

5 1 2 3 4 5 6 7 8 9 10 11 12 13

High Load TCP Read/Write Test 5 Clients (pktsz=1500; wrtsz=8k)

Xbow2 Fedora 2.6

5 Client Read/Write 3 Reading/2 Writing 10 thread/client Bi-Directional Thruput (Gbps)

Config Details:

5 Client; 1 Server – 10GigE Links 3 Clients reading (10 thread each) 2 Clients writing (10 thread each) All Client/Sever: x4150 dual soc 8x2.8Ghz Intel CPU 10 GigE NIC – Intel Oplin (ixgbe)

1 2 3 4 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000

Pkts Rcv'd via interrupt/poll

Pkts by Interrupt Pkts by Poll Total Pkts

Lane Number Number of Packets

1 2 3 4 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

Chain Lengths

Chains > 50 pkts Chains 10 – 50 Pkts Chains < 10 Pkts

Lane Numbers Chain Lengths

slide-18
SLIDE 18

www.opensolaris.org/os/project/crossbow 18

Dynamic Polling: Effect on Latency

1 2 3 4 5 10 20 30 40 50 60 70 80 90 100 110 120

UDP 66byte pkt High Load Latency Test

Xbow2 Fedora 2.6

Number of Clients Txn/Sec (66 byte packets)

1 2 3 4 5 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

Pkt Chain Lengths

Chains > 50 pkts Chains 10 – 50 pkts Chains < 10 pkts

Number of clients Chain Lengths

1 2 3 4 5 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

Pkts Received via Interrupt/Poll Ratio

Poll Interrupts

Number of Clients Interrupt/Poll Ratio

1 2 3 4 5 2 4 6 8 10 12 14 16 18 20

UDP 66byte pkt Low Load Latency Test

Xbow2 Fedora 2.6

Number of Clients Txn/Sec (66byte packets)

1 2 3 4 5 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%

Pkts Received via Interrupt/Poll Ratio

Pkts by Poll Pkts by Interrupts

Number of Clients Interrupt/Poll Percent

1 2 3 4 5 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000 600000

Number of Chains < 10 pkts

Chains < 10 pkts

Number of Clients Chains < 10 pkts

slide-19
SLIDE 19

www.opensolaris.org/os/project/crossbow 19

Virtual Network Machines

Networking as a Service (NaaS)

  • Virtual Network Machines – Networking as a Service

– Monetize via the subscription model in cloud using virtualized

networking services like vRouter, vloadbalancer, vFirewall, vDHCPserver, vDNSserver, etc

  • Virtualized Networking Services wrapped in Solaris Zone/Xen/VB

running on dedicated Networking blades/appliance

Open Source Virtualized Networking Services

VNICs and Vswitches provide the virtualized ports similar to physical ports

Enable Virtual Networks with configurable link speeds using Virtual Wire

  • Management for a Virtual Network Machines

Solaris command line

Cisco Style 'cli'

Web based

slide-20
SLIDE 20

www.opensolaris.org/os/project/crossbow 20

The network is the computer

P e r im e t e r F W X M L M e s s a g e S w i tc h I n tr u s i o n D e t e c t i o n A p p l i c a t i o n S w i t c h 1 R o u t e r - O S P F S S L V P N G a t e w a y A s tr i s k V o IP P B X

Virtual Network Machine Appliance

for the cloud

Crossbow

Networking as a Service (Naas)

Subscription based or dedicated appliance

slide-21
SLIDE 21

www.opensolaris.org/os/project/crossbow 21

Virtual Network Machines over 10Gbe

vFirewall,

vVPN

OpenSolaris N2/x64 Server/Blades

vRouter vNTP, vDHCP, vDNS, vLDAP, ..

IP IP TCP/ UDP IP Virtual NIC A Virtual NIC A Virtual NIC B Virtual NIC B TCP/ UDP TCP/ UDP Rx/Tx

DMA

Rx/Tx

DMA

Rx/Tx

DMA

Rx/Tx

DMA

Rx/Tx

DMA

Rx/Tx

DMA

Flow Classifier & Offload Eng. Flow Classifier & Offload Eng.

NIC A NIC B

WAN Data Center VLAN'd ETH Fabric

APIs for ISVs at each layer Dedicated CPUs

10Gbe NIC/ NIU

slide-22
SLIDE 22

www.opensolaris.org/os/project/crossbow 22

Cloud Virtual Machines over 40Gbs IB

Dom0

OpenSolaris N2/x64 Server/Blades

DomU DomU

TCP/IP Apps TCP/IP IBTF VNIC/ EoIB VNIC/ EoIB VNIC/ EoIB Apps iSER, NFS, ... Rx/Tx

Q-Pair

Rx/Tx

Q-Pair

Rx/Tx

Q-Pair

Rx/Tx

Q-Pair

Rx/Tx Q-Pair Rx/Tx

Q-Pair

Infiniband Firmware Infiniband Firmware

HCA A HCA B

DomU IB Partition

APIs for ISVs at each layer Dedicated CPUs

.... ....

RDMA, IPoIB

Dom0 IB Partition

....

slide-23
SLIDE 23

www.opensolaris.org/os/project/crossbow 23

Open Storage Networking

  • Priority Based Flow Control (PFC)

> 8 ethernet virtual lanes with their own pause mechanism > Extend the Crossbow H/W Virtualized Lanes to the switch

  • Enhanced Transmission Selection (ETS)

> Add Class of service support within the ethernet virtual lane > Extend the Crossbow flow based QoS to the switch

  • Link Layer Discovery Protocol (LLDP) and Congestion notification

(optional)

  • PFC and ETS is useful in normal virtualization and server QoS

scenarios

  • PFC, ETS, and LLDP are necessary to implement Data Center Bridge

Exchange protocol (DCBX) and FCOE

slide-24
SLIDE 24

www.opensolaris.org/os/project/crossbow 24

Join Us...

  • Our communities and projects are open on

OpenSolaris.org:

> CrossBow: http://opensolaris.org/os/project/crossbow > VNM: http://opensolaris.org/os/project/vnm > Networking: http://opensolaris.org/os/community/networking

  • Where you will find:

> Active discussions, design docs, FAQs, source code

drops, preliminary binary releases, etc...