Distributed Systems CS6421
Networking: SDN and NFV
- Prof. Tim Wood
Distributed Systems CS6421 Networking: SDN and NFV Prof. Tim Wood - - PowerPoint PPT Presentation
Distributed Systems CS6421 Networking: SDN and NFV Prof. Tim Wood SDN + NFV Networks are changing - Trying to achieve the same level of customization, flexibility, and automation found in the cloud Software-based Networks - SDN: Software
Networking: SDN and NFV
Tim Wood - The George Washington University - Department of Computer Science
Networks are changing
automation found in the cloud
Software-based Networks
2
Adapted from slides by
(with thanks to many people’s material that he re-used: David Koll, Univ. of Goettingen, Germany, Jennifer Rexford, Princeton, Nick Mckeown, Stanford and others).
3
“The average cloud environment might have 50 dedicated servers to one admin, and what you really need to get to is 500 servers to one admin, or what happened in the case of Microsoft, 10,000 servers. Without automation we don't have speed and scale - the very reason we want to go to the cloud.” (Microsoft)
4
“Even simple topologies take days or weeks to create. Workload placement and mobility are restricted by physical network limitations and hardware dependencies require vendor-specific expertise. Network configuration is performed manually and maintenance is both expensive and resource- intensive.” (VMWare)
5
– Assigned by the vendor of the interface card – Cannot be aggregated across hosts in LAN
6
mac1 mac2 mac3 mac4 mac5
host host host ...
mac1 mac2 mac3
switch host host
mac5 mac4
– Allocated by ICANN, regional registries, ISPs, and within individual organizations – Variable-length prefix identified by a mask length
7
host host host LAN 1 ... host host host LAN 2 ... router router router WAN WAN
1.2.3.4 1.2.3.7 1.2.3.156 5.6.7.8 5.6.7.9 5.6.7.212 1.2.3.0/24 5.6.7.0/24
forwarding table Prefixes may be nested. Routers identify the longest matching prefix.
– Directing a data packet to an outgoing link – Individual router using a forwarding table
– Computing paths the packets will follow – Routers talking amongst themselves – Individual router creating a forwarding table
8
– From a source u to all other nodes – Cost of the path through each link – Next hop along least-cost path to s
9
3 2 2 1 1 4 1 4 5 3
u s 6
v (u,v) w (u,w) x (u,w) y (u,v) z (u,v) link s (u,w) t (u,w)
v w y x t z
– Flood the entire topology to all nodes – Each node computes shortest paths – Dijkstra’s algorithm
10 18
v (u,v) w (u,w) x (u,w) y (u,v) z (u,v) link s (u,w) t (u,w)
3 2 2 1 1 4 1 4 5 3
u v w x y z s t
Tim Wood - The George Washington University
same
tenants!
3 2 2 1 1 4 1 4 2 3
u v w x y z s t
VM-a VM-b VM-c VM-d
Green VMs are paying more for higher bandwidth networking so we would like to support different paths!
– Inversely proportional to link capacity? – Proportional to propagation delay? – Network-wide optimization based on traffic?
12
3 2 2 1 1 3 1 4 5 3 3
– Network topology – Link capacities – Traffic matrix
– Link weights
– Minimize max-utilized link – Or, minimize a sum of link congestion
13
3 2 2 1 1 3 1 4 5 3
– Link weight change – Node/link failure or recovery
– Nodes temporarily disagree how to route – Leading to transient loops and blackholes
14
1 4 5 3 1 4 10 3 1 4 10 3
– Changing weights instead of paths – Complex optimization problem
– Cannot control which router updates first
– Routing and forwarding – Naming and addressing – Access control – Quality of service – …
15
– Decouples the control plane from the data plane
Images taken from materials of the Open Networking Foundation: https:// www.opennetworking.org/
16
– All such protocols can be done in software, controlled by a central instance – Scalable, easily manageable, better interoperability
17
18
– Connects applications with control plane – Allows for programming of routing, QoS, etc.
– Between control and data planes – Allows direct access to forwarding plane
–Sets up rules, actions, etc. for the network devices –Core element of SDN
19
– elastic resource allocation (e.g., to match QoS agreements) – distribution of the load on links (e.g., between backbone and application servers in SaaS) – scalability (no need to manually configure each of thousands (or even millions?) of devices) – overhead reduction – …and more
20
Specification by the Open Networking Foundation: https://www.opennetworking.org/ images/stories/downloads/sdn-resources/onf-specifications/openflow/openflow-spec- v1.3.4.pdf (March 2014)
21
OpenFlow Protocol
Data Path (Hardware) Control Path OpenFlow
Ethernet Switch
Control Program A Control Program B
22
Control Program A Control Program B
Packet Forwarding Packet Forwarding Packet Forwarding Flow Table(s) “If header = p, send to port 4” “If header = ?, send to me” “If header = q, overwrite header with r, add header s, and send to ports 5,6”
23
Match arbitrary bits in headers:
– Match on any header field, but not data – Allows ‘any’ flow granularity
Action
– Forward to port(s), drop, send to controller – Overwrite header with mask, push or pop – Forward at specific bit-rate
24
Header Data Match: 1000x01xx0101001x
25
26
27
Switching - can customize based on known MAC addresses
* Switch Port MAC src MAC dst Eth type VLAN ID IP Src IP Dst IP Prot TCP sport TCP dport Action * 00:1f:.. * * * * * * * port6
Flow Switching - fine grained switching for each TCP connection
port3 Switch Port MAC src MAC dst Eth type VLAN ID IP Src IP Dst IP Prot TCP sport TCP dport Action 00:20.. 00:1f.. 0800 vlan1 1.2.3.4 5.6.7.8 4 17264 80 port6
Firewall - not just switching, but also dropping/rate limiting/etc
* Switch Port MAC src MAC dst Eth type VLAN ID IP Src IP Dst IP Prot TCP sport TCP dport Action * * * * * * * * 22 drop
28
29
SRC: H2 DST: H4
30
SRC: H2 DST: H4 ?
31
SRC: H2 DST: H4
Packet-IN
32
SRC: H2 DST: H4
Packet-OUT Action: eth2
33
SRC: H2 DST: H4
34
SRC: H2 DST: H4
35
SRC: H2 DST: H4?
36
SRC: H2 DST: H4!
37
SRC: H2 DST: H4
38
Tim Wood - The George Washington University - Department of Computer Science
Data plane (switches): maintains a flow table
Control plane: defines flow table rules for switches
39
Client 1 Host 1 SDN Switch SDN Controller SDN Switch SDN Switch
Tim Wood - The George Washington University - Department of Computer Science
Make an efficient, customizable data plane
proxies, IDS, DPI, etc
Run network functions (NFs) in virtual machines
deploy and manage
40
Virtualization Layer
Router Switch
Tim Wood - The George Washington University - Department of Computer Science
Perform network functionality on custom ASICs
41
Tim Wood - The George Washington University - Department of Computer Science
Hardware Routers and Switches
PacketShader [Han, SIGCOMM ’10]
Netmap [Rizzo, ATC ’12] and DPDK
processing on commodity 10gbps NICs
ClickOS [Martins, NSDI ’14] and NetVM [Hwang, NSDI ’14]
42
Tim Wood - The George Washington University - Department of Computer Science
Switches, routers, firewalls, NAT
Intrusion Detection Systems (IDS)
Intrusion Prevention Systems (IPS)
traffic flows
Cellular functions (Evolved Packet Core - EPC)
Proxies, caches, load balancers, etc.
43
AKA “middleboxes"
Tim Wood - The George Washington University - Department of Computer Science
44
Linux User Applications H/W Platform Packet copy Interrupt Handling Systemcalls
Can you handle being interrupted 60 million times per second?
Tim Wood - The George Washington University - Department of Computer Science
Recent NICs and OS support allow user space apps to directly access packet data
45
Tim Wood - The George Washington University - Department of Computer Science
High performance I/O library Poll mode driver reads packets from NIC Packets bypass the OS and are copied directly into user space memory Low level library... does not provide:
46
Tim Wood - The George Washington University - Department of Computer Science
Where to find it:
What to use it for:
data
Why try it:
Alternatives:
47
Network Platforms Group
Packet Size 64 bytes 40G Packets/second 59.5 Million each way Packet arrival rate 16.8 ns 2 GHz Clock cycles
33 cycles
Typical Server Packet Sizes Network Infrastructure Packet Sizes
Packet Size (B)
Packets per second
15,000,000 30,000,000 45,000,000 60,000,000 64 224 384 544 704 864 1024 1184 13441504
What is “line rate”?
Packet Size 1024 bytes 40G Packets/second 4.8 Million each way Packet arrival rate 208.8 ns 2 GHz Clock cycles
417 cycles
40 Gbps Line Rate (or 4x10G) Process Packet
48
Network Platforms Group
49
Interrupt Context Switch Overhead Kernel User Overhead
Core To Thread Scheduling Overhead
Polling User Mode Driver Pthread Affinity
4K Paging Overhead
PCI Bridge I/O Overhead
Huge Pages Lockless Inter-core Communication High Throughput Bulk Mode I/O calls
Tim Wood - The George Washington University - Department of Computer Science
Very distracting! Have to stop doing useful work to handle incoming packets Coalescing interrupts helps, but still causes problems
50
App Kernel App Kernel App App Kernel
Interrupt Context Switch Overhead
Tim Wood - The George Washington University - Department of Computer Science
Continuously loop looking for new packet arrivals Trade-off?
51
Interrupt Context Switch Overhead
App Kernel App Kernel App
App App
Busy wait
Tim Wood - The George Washington University - Department of Computer Science
NIC Driver operates in kernel mode
memory
packets
space for application
calls to interface with OS
Why is copying so bad?
52
Kernel Space
Driver
User Space
NIC
Applications
Stack
System Calls
CSRs
Interrupts
Memory (RAM)
Packet Data
Copy
Kernel Space Driver
Configuration Descriptors
DMA
Descriptor Rings Socket Buffers (skb’s)
1 2 3
Kernel User Overhead
From Intel DPDK University Lecture
Tim Wood - The George Washington University - Department of Computer Science
User-mode Driver
access to NIC
to DMA data directly into user-space memory
53
Kernel User Overhead Kernel Space
UIO Driver
User Space
NIC
DPDK PMD
Stack
System Calls
CSRs
Memory (RAM)
Packet Data
User Space Driver
Configuration
Descriptors
DMA
Descriptor Rings
1 2
From Intel DPDK University Lecture DPDK Application
Descriptors
Tim Wood - The George Washington University - Department of Computer Science
54
Tim Wood - The George Washington University - Department of Computer Science
Linux networking stack has a lot of extra components For NFV middlebox we don’t use all of this:
, UDP , sockets
NFV middle boxes just need packet data
55
Kernel User Overhead
Tim Wood - The George Washington University - Department of Computer Science
Linux Scheduler can move threads between cores
56
Core To Thread Scheduling Overhead
App App
App App
App
Tim Wood - The George Washington University - Department of Computer Science
Pin threads and dedicate cores
57
Core To Thread Scheduling Overhead
App
App
App
Tim Wood - The George Washington University - Department of Computer Science
4KB Pages
table entries every second
1GB Huge Pages
every second!
58
4K Paging Overhead
https://courses.cs.washington.edu/courses/cse378/00au/CSE378-00.Lec28/sld004.htm
How big is the TLB?
Tim Wood - The George Washington University - Department of Computer Science
Thread synchronization is expensive
Producer/Consumer architecture
them (consumer)
Lock-free communication
59
Tim Wood - The George Washington University - Department of Computer Science
PCIe bus uses messaging protocols for CPU to interact with devices (NICs) Each message incurs some overhead Better to make larger bulk requests over PCIe DPDK helps batch requests into bulk operations
Trade-offs?
60
PCI Bridge I/O Overhead
Tim Wood - The George Washington University - Department of Computer Science
DPDK provides efficient I/O… but that’s about it Doesn’t help with NF management or orchestration
61
Tim Wood - The George Washington University - Department of Computer Science
Chain together functionality to build more complex services
62
Firewall NAT Router
Server
Tim Wood - The George Washington University - Department of Computer Science
Chain together functionality to build more complex services
63
Firewall IDS Router NAT Cache Transcoder DPI
Mirror
Tim Wood - The George Washington University
SDN-aware: Controller can dictate flow rules for NFs
64
Container 1
Shared Memory
(packets, flow tables, service chains, ring buffers)
Packet
NF Manager (DPDK)
R T
NFlib
NF RX
Container 2
R T
NF
NFlib 3rd party
library
Container 3
R T
NF
NFlib Container 4
R T
NF
NFlib
custom distro
NIC 1 NIC 2 TX1 TX2
FT
Packet
Mgr
R T
User Space
M a d e a t G W !
Tim Wood - The George Washington University - Department of Computer Science
DPDK only helps with raw packet IO Doesn’t provide any protocol stacks!
65
Tim Wood - The George Washington University - Department of Computer Science
Linux TCP stack is not designed for high performance
66
–
0.0 0.5 1.0 1.5 2.0 2.5 1 2 4 6 8 Connections/sec (x 105) Number of CPU Cores
TCP Connection Setup Performance
Linux: 3.10.16 Intel Xeon E5-2690 Intel 10Gbps NIC Performance meltdown
Figures from Jeong’s mTCP talk at NSDI 14 83% of CPU usage spent inside kernel!
(without TCP/IP) 45% Packet I/O 4% TCP/IP 34% Application 17%
Web server (Lighttpd) Serving a 64 byte file Linux-3.10
Tim Wood - The George Washington University - Department of Computer Science
User space TCP stack
Key Ideas:
threads
67
3 6 9 12 15 2 4 6 8 Transactions/sec (x 105) Number of CPU Cores Linux REUSEPORT MegaPipe mTCP
1
Shared fd in process Shared listen socket
* [OSDI’12] MegaPipe: A New Programming Interface for Scalable Network I/O, Berkeley
Inefficient small packet processing in Kernel
3 6 9 12 15 3 6 9 12 15 3 6 9 12 15 3 6 9 12 15
Linux: 3.10.12 Intel Xeon E5-2690 32GB RAM Intel 10Gbps NIC
Figure from Jeong’s mTCP talk at NSDI 14
Tim Wood - The George Washington University - Department of Computer Science
Responding to a packet arrival only incurs a context switch, not a full system call
68
Packet I/O Kernel TCP Application thread BSD socket LInux epoll User-level packet I/O library mTCP thread Application Thread NIC device driver mTCP socket mTCP epoll Kernel User Linux TCP mTCP
System call Context switching
Figure from Jeong’s mTCP talk at NSDI 14
Tim Wood - The George Washington University - Department of Computer Science
69
Web Server (Lighttpd)
workload from SpecWeb2009 set
SSL Proxy (SSLShader)
1024-bit RSA, 128-bit AES, HMAC- SHA1
20 1.24 1.79 2.69 4.02 1 2 3 4 5
Linux REUSEPORT MegaPipe mTCP
Throughput (Gbps)
26,762 28,208 27,725 31,710 36,505 37,739 5 10 15 20 25 30 35 40 4K 8K 16K Transactions/sec (x 103) # Concurrent Flows
Linux mTCP
Slide from Jeong’s mTCP talk, NSDI ‘14
Tim Wood - The George Washington University - Department of Computer Science
70
47% 63% 67% 0% 25% 50% 75% Web security gateway Mail security gateway Web application firewall
Virtual Appliances Deployed in Service Provider Data Centers
Most Middleboxes Deal with TCP Traffic
2 [1]
TCP UDP etc
[1] “Comparison of Caching Strategies in Modern Cellular Backhaul Networks”, ACM MobiSys 2013.
95.7%
[2] IHS Infonetics Cloud & Data Center Security Strategies & Vendor Leadership: Global Service Provider Survey, Dec. 2014.
[2]
Slide from Jamshed’s mOS talk, NSDI ‘17
Tim Wood - The George Washington University - Department of Computer Science
What if your middle box (not end point server) needs TCP processing? Proxies, L4/L7 load balancers, DPI, IDS, etc
71
with their IDS logic
Borrow code from open-source IDS (e.g., snort, suricata)
Borrow code from open-source kernel (e.g., Linux/FreeBSD)
Implement your own flow management code
Table from Jamshed’s mOS talk, NSDI ‘17
Tim Wood - The George Washington University - Department of Computer Science
Reusable protocol stack for middle boxes Key Idea: Allow customizable processing based on flow-level “events” Separately track client and server side state
72
mOS stack emulation
TCP server Server side TCP stack
TCP state SYN
SYN/ACK
LISTEN CLOSED SYN_SENT
Client side TCP stack
TCP state SYN_RCVD ESTABLISHED
DATA/ACK
Receive buffer ESTABLISHED
TCP client
Figure from Jamshed’s mOS talk at NSDI 17
Tim Wood - The George Washington University - Department of Computer Science
Base Events
User Events
73
2104 765 615 0K 10K 20K 30K 40K 50K 60K 70K 80K 90K Snort3 nDPI PRADS Lines Modified Total Lines
1.4 1.2 4.1 3.2 5.0 4.5 16.7 11.6 22.8 21.7 53.0 42.5 10 20 30 40 50 60 1 4 16 1 4 16 Throughput (Gbps) (# of CPU cores) Counting packets Searching for a string 64B file 8KB file
1 4 16 1 4 16
Figures from Jamshed’s mOS talk at NSDI 17
Tim Wood - The George Washington University - Department of Computer Science
We have ported mOS/mTCP to run on OpenNetVM Allows deployment of mixed NFs and endpoints Allows several different mTCP endpoints on same host
74
NF Manager mTCP NF NF lighttpd mOS Proxy
M a d e a t G W !
Tim Wood - The George Washington University - Department of Computer Science
Mixed NFs + endpoints blurs the line of the application and the network
75
NF Manager mTCP NF NF lighttpd mOS Proxy
M a d e a t G W !
Tim Wood - The George Washington University - Department of Computer Science
76
Guyue Liu – George Washington University
üConsolidate Stack Processing üCustomizable Stack Modules üUnified Event Interface
Microboxes = µStack + µEvent
= stack snapshot + parallel stacks + parallel events + event hierarchy + publish/subscribe interface
NF NF NF NF NF
µStack µStack µStack µStack µStack µStack µStack µEvent µEvent µEvent µEvent
M a d e a t G W !