Sunay Tripathi, Distinguished Engineer, Sun Microsystems Inc Sunay.Tripathi@Sun.Com
CrossBow: A vertically integrated QoS stack
Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan, Kais Belgaied, Venu Iyer
CrossBow: A vertically integrated QoS stack Sunay Tripathi, Nicolas - - PowerPoint PPT Presentation
CrossBow: A vertically integrated QoS stack Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan, Kais Belgaied, Venu Iyer Aug 21 st . 2009 Sigcomm WREN 2009, Barcelona Sunay Tripathi, Distinguished Engineer, Sun Microsystems Inc
Sunay Tripathi, Nicolas Droux, Thirumalai Srinivasan, Kais Belgaied, Venu Iyer
www.opensolaris.org/os/project/crossbow 2
> Additional Classification/Queuing for all packets > QoS layers typically high up in the stack (bulk of the work already done) > Packet needs to be DMA'd into the system before any policy can be applied
www.opensolaris.org/os/project/crossbow 3
> Scalability across multi-core CPUs and multi-10gigE bandwidth > Virtualization, QoS, High Availibility designed in > Exploit advanced NIC features
> Server and Network Consolidation > Resource partitioning > Cloud computing
www.opensolaris.org/os/project/crossbow 4
Physical Machine Physical NIC Hardware Lane
C L A S S I F I E R
Virtual NIC Hardware Rings/DMA Kernel Threads and Queues Virtual NIC Kernel Threads and Queues Squeue Hardware Rings/DMA Kernel Threads and Queues Virtual Machine/Zone Virtual Machine/Zone Application Switch
VLAN Separated
Hardware Rings/DMA
www.opensolaris.org/os/project/crossbow 5
Mpstat (older driver)
intr ithr csw icsw migr smtx srw syscl usr sys wt idl 10818 8607 4558 1547 161 1797 289 19112 17 69 0 12
Mpstat (GLDv3 based driver)
intr ithr csw icsw migr smtx srw syscl usr sys wt idl 2823 1489 875 151 93 261 1 19825 15 57 0 27
~75% Fewer Interrupts ~85% Fewer Mutexes ~85% Fewer Context Switches ~15% More CPU Free
www.opensolaris.org/os/project/crossbow 6
Compute Resources
NIC 1 CPU 1
VIRTUAL SQUEUE
CPU 2
VIRTUAL SQUEUE
CPU 'n'
VIRTUAL SQUEUE
CPU 1 Virtual Squeue CPU 2 Virtual Squeue
VOIP
SQUEUE
HTTPS
SQUEUE
DEFAULT
SQUEUE
TCP
SQUEUE
UDP
SQUEUE
DEFAULT
SQUEUE
Kernel threads/Qs Memory Partition Memory Partition Memory Partition Memory Partition Memory Partition Memory Partition Flow Classifier Flow Classifier NIC 2 Kernel threads/Qs Kernel threads/Qs Kernel threads/Qs Kernel threads/Qs Kernel threads/Qs
www.opensolaris.org/os/project/crossbow 7
> Services (protocol + remote/local ports) > Transport (TCP, UDP, SCTP, iSCSI, etc) > Remote and local IP addresses > Remote IP Subnets > DSCP labels
> B/W limits > Priorities > CPUs
# flowadm create-flow -l bge0 protocol=tcp,local_port=443 -p maxbw=50M http-1 # flowadm set-flowprop -l bge0 -p maxbw=100M http-1
www.opensolaris.org/os/project/crossbow 8
Flow Classifier
Exclusive IP Instance
Rx/Tx
DMA
Rx/Tx
DMA
Rx/Tx
DMA
NIC bge0
VNIC1 (100Mbps) VNIC2 (200Mbps) Exclusive IP Instance
Virtual
SQUEUE
Virtual
SQUEUE
Zone
xb1-z1
Zone
xb1-z2
Client
xb2
Client
xb3
Solaris Global Zone Virtualization
Resource Control
Observability
www.opensolaris.org/os/project/crossbow 9
www.opensolaris.org/os/project/crossbow 11
Scalable Virtualized TCP/IP Stack Crossbow: Network Virtualization
Virtual NICs Virtual Switches Virtual Wire Flows QoS Observ- ability L2 Classification, Filtering
Kernel Sockets
L2 Bridge L3/L4 Load Balancer IPFilter (Firewall) IP Tunnels
Generic LAN Driver – GLDv3
Aggr, SR-IOV, Vanity Names
1gigE/10gigE
(Neptune, Niantic, etc)
FCOE IPoIB
Routing Protocols (Quagga)
Developer Tools and Management Interfaces
VRRP (Routing HA) IP Multi Pathing Perf Diag Tools
>
Routing Protocols using Quagga
>
L3/L4 Load Balancer kernel modules
>
IP Firewall (IPFilter)
>
DNS, DHCP, NTP, SIP, VOIP, etc
>
Kernel Socket & Socket Filter
>
Modernized TCP/IP Stack
>
QoS: B/W limits, Priorities, CPU bindings
>
IP Multi Pathing (IPMP)
>
IP Tunneling
>
Defense against DDoS attacks
>
VNICs, VSwitches, VWire
>
Service Virtualization (Flows)
>
L2 Services: Classification, Filtering
>
Aggregation
>
Vanity Names
>
Drivers (1GbE and 10GbE, FCoE, IPoIB)
S Y S A P I s
Kernel Socket API MAC Client API MAC Driver API IP Hooks API
www.opensolaris.org/os/project/crossbow 12
> Functionally physical NICs:
> IP address assigned statically or via DHCP and snooped individually > Appear in MIB as separate 'if' with configured link speed shown as 'ifspeed' > VNICs can be created over Link Aggregation on can be assigned to IPMP groups for load balancing and failover support
> VNICs Can have multiple hardware lanes assigned to them > Can be created over physical NIC (without needing a Vswitch) to
provide external connectivity with switching done in NIC H/W
> VNICs have configurable link speed, CPU and priority assignment > Standards based End to End Network Virtualization
> VLAN tags and Priority Flow Control (PFC) assigned to VNIC extend Hardware Lanes to Switch
> No configuration changes needed on switch to support virtualization
> Can be created to provide private connectivity between Virtual
Machines
www.opensolaris.org/os/project/crossbow 13
# dladm create-vnic -l bge1 vnic1 # dladm create-vnic -l bge1 -m random -p maxbw=100M -p cpus=4,5,6 vnic2 # dladm create-etherstub vswitch1 # dladm show-etherstub LINK vswitch1 # dladm create-vnic -l vswitch1 -p maxbw=1000M vnic3 # dladm show-vnic LINK OVER MACTYPE MACVALUE BANDWIDTH CPUS vnic1 bge1 factory 0:1:2:3:4:5 - - vnic2 bge1 random 2:5:6:7:8:9 max=100M 4,5,6 vnic3 vswitch1 random 4:3:4:7:0:1 max=1000M
www.opensolaris.org/os/project/crossbow 14
Client Router
Host 1 Host 2
Port 6 20.0.03 1 Gbps 1 Gbps 100 Mbps 1 Gbps Port 9 20.0.01 Port 3 10.0.03 Port 1 10.0.01 Port 2 10.0.02
Switch 3 Switch 1 Client Router
(Virtual Router) VNIC6 20.0.03 1 Gbps 1 Gbps 1 Gbps 100 Mbps 1 Gbps VNIC9 20.0.01 VNIC3 10.0.03 VNIC1 10.0.01 VNIC2 10.0.02 1 Gbps
EtherStub 3 EtherStub 1 Host 1 Host 2
www.opensolaris.org/os/project/crossbow 15
Switch
Physical NIC
Packet Classifier Rx/Tx Rings Rx/Tx Rings Rx/Tx Rings
Zone/Virtual Machine A VNIC A (100Mbps) VNIC B (500Mbps) Zone/Virtual Machine B Pause Frame sent By VNIC-A to switch asking it to slow the incoming traffic for VM-A
Client A
(Sending traffic to Virtual Machine A faster than 100 Mbps)
from client A to slow down
configured link speed) does not suffer
www.opensolaris.org/os/project/crossbow 16
Solaris Guest OS 1 Solaris Guest OS 2 Solaris Host OS Host OS
VIRTUAL SQUEUE All Traffic NIC Virtualization Engine NIC Virtualization Engine NIC Virtualization Engine
Guest OS 1
VIRTUAL SQUEUE
Guest OS 2
VIRTUAL SQUEUE All Traffic Host OS VNIC Guest OS 2 VNIC
NIC
H/W Flow Classifier HTTP
SQUEUE
HTTPS
SQUEUE
DEFAULT
SQUEUE
Virtual NIC Virtual NIC Virtual NIC
Host OS All traffic Guest OS 1 HTTP Guest OS 1 HTTPS Guest OS 1 DEFAULT Guest OS 2 All Traffic
www.opensolaris.org/os/project/crossbow 17
5 1 2 3 4 5 6 7 8 9 10 11 12 13
High Load TCP Read/Write Test 5 Clients (pktsz=1500; wrtsz=8k)
Xbow2 Fedora 2.6
5 Client Read/Write 3 Reading/2 Writing 10 thread/client Bi-Directional Thruput (Gbps)
Config Details:
5 Client; 1 Server – 10GigE Links 3 Clients reading (10 thread each) 2 Clients writing (10 thread each) All Client/Sever: x4150 dual soc 8x2.8Ghz Intel CPU 10 GigE NIC – Intel Oplin (ixgbe)
1 2 3 4 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 4500000 5000000
Pkts Rcv'd via interrupt/poll
Pkts by Interrupt Pkts by Poll Total Pkts
Lane Number Number of Packets
1 2 3 4 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%
Chain Lengths
Chains > 50 pkts Chains 10 – 50 Pkts Chains < 10 Pkts
Lane Numbers Chain Lengths
www.opensolaris.org/os/project/crossbow 18
1 2 3 4 5 10 20 30 40 50 60 70 80 90 100 110 120
UDP 66byte pkt High Load Latency Test
Xbow2 Fedora 2.6
Number of Clients Txn/Sec (66 byte packets)
1 2 3 4 5 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%
Pkt Chain Lengths
Chains > 50 pkts Chains 10 – 50 pkts Chains < 10 pkts
Number of clients Chain Lengths
1 2 3 4 5 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%
Pkts Received via Interrupt/Poll Ratio
Poll Interrupts
Number of Clients Interrupt/Poll Ratio
1 2 3 4 5 2 4 6 8 10 12 14 16 18 20
UDP 66byte pkt Low Load Latency Test
Xbow2 Fedora 2.6
Number of Clients Txn/Sec (66byte packets)
1 2 3 4 5 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 60.00% 70.00% 80.00% 90.00% 100.00%
Pkts Received via Interrupt/Poll Ratio
Pkts by Poll Pkts by Interrupts
Number of Clients Interrupt/Poll Percent
1 2 3 4 5 50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000 600000
Number of Chains < 10 pkts
Chains < 10 pkts
Number of Clients Chains < 10 pkts
www.opensolaris.org/os/project/crossbow 19
– Monetize via the subscription model in cloud using virtualized
networking services like vRouter, vloadbalancer, vFirewall, vDHCPserver, vDNSserver, etc
running on dedicated Networking blades/appliance
–
Open Source Virtualized Networking Services
–
VNICs and Vswitches provide the virtualized ports similar to physical ports
–
Enable Virtual Networks with configurable link speeds using Virtual Wire
–
Solaris command line
–
Cisco Style 'cli'
–
Web based
www.opensolaris.org/os/project/crossbow 20
The network is the computer
P e r im e t e r F W X M L M e s s a g e S w i tc h I n tr u s i o n D e t e c t i o n A p p l i c a t i o n S w i t c h 1 R o u t e r - O S P F S S L V P N G a t e w a y A s tr i s k V o IP P B X
www.opensolaris.org/os/project/crossbow 21
vFirewall,
vVPN
OpenSolaris N2/x64 Server/Blades
vRouter vNTP, vDHCP, vDNS, vLDAP, ..
IP IP TCP/ UDP IP Virtual NIC A Virtual NIC A Virtual NIC B Virtual NIC B TCP/ UDP TCP/ UDP Rx/Tx
DMA
Rx/Tx
DMA
Rx/Tx
DMA
Rx/Tx
DMA
Rx/Tx
DMA
Rx/Tx
DMA
Flow Classifier & Offload Eng. Flow Classifier & Offload Eng.
NIC A NIC B
WAN Data Center VLAN'd ETH Fabric
APIs for ISVs at each layer Dedicated CPUs
10Gbe NIC/ NIU
www.opensolaris.org/os/project/crossbow 22
Dom0
OpenSolaris N2/x64 Server/Blades
DomU DomU
TCP/IP Apps TCP/IP IBTF VNIC/ EoIB VNIC/ EoIB VNIC/ EoIB Apps iSER, NFS, ... Rx/Tx
Q-Pair
Rx/Tx
Q-Pair
Rx/Tx
Q-Pair
Rx/Tx
Q-Pair
Rx/Tx Q-Pair Rx/Tx
Q-Pair
Infiniband Firmware Infiniband Firmware
HCA A HCA B
DomU IB Partition
APIs for ISVs at each layer Dedicated CPUs
RDMA, IPoIB
Dom0 IB Partition
www.opensolaris.org/os/project/crossbow 23
> 8 ethernet virtual lanes with their own pause mechanism > Extend the Crossbow H/W Virtualized Lanes to the switch
> Add Class of service support within the ethernet virtual lane > Extend the Crossbow flow based QoS to the switch
www.opensolaris.org/os/project/crossbow 24
> CrossBow: http://opensolaris.org/os/project/crossbow > VNM: http://opensolaris.org/os/project/vnm > Networking: http://opensolaris.org/os/community/networking