Bri Bring nging ng the the Power r of eB eBPF to to Open vSwitch ch
Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.io
1
Bri Bring nging ng the the Power r of eB eBPF to to Open - - PowerPoint PPT Presentation
Bri Bring nging ng the the Power r of eB eBPF to to Open vSwitch ch Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.io 1 Outline Introduction and Motivation OVS-eBPF Project
Linux Plumber 2018 William Tu, Joe Stringer, Yifeng Sun, Yi-Hung Wei VMware Inc. and Cilium.io
1
2
Fast Path Slow Path Datapath
3
SDN Controller
OpenFlow
driver
Hardware
IP/routing socket
Fast Path in Kernel Slow path in userspace OVS Kernel module
4
Device RX Hook
5
versions.
6
from/to the kernel
7
Parse Lookup Actions
Goal
entirely with eBPF
manages the eBPF program
between
eBPF Program eBPF maps
9
driver
Hardware
IP/routing TC hook
Slow path in userspace Fast Path in Kernel
Difficulties
10
Slow Path
translation, and programs flow entry into flow table in OVS kernel module
actions on the packet Fast Path
Flow Table
(EMC + Megaflow)
(netlink)
Parser
(netlink)
11
EMC: Exact Match Cache
Slow Path
metadata to ovs-vswitchd
translation, and programs flow entry into eBPF map
trigger lookup again Fast Path
map
Flow Table (eBPF hash map)
(perf ring buf -> netlink)
Parser
(netlink TLV -> fixed array -> eBPF map)
12
Limitation on flow installation: TLV format currently not supported in BPF verifier Solution: Convert TLV into fixed length array
A list of actions to execute on the packet Example cases of DP actions
set(tunnel(tun_id=0x5,src=2.2.2.2,dst=1.1.1.1,ttl=64,flags(df|key))),output:1
13
FlowTable Act1 Act2 Act3
…
A list of actions to execute on the packet Challenges
Solution:
14
FlowTable eBPF Act1
Map lookup Tail Call
eBPF Act2
Map lookup
…
Tail Call
receiving packet rate at the other port
4.9-rc3 on OVS server
16-core Intel Xeon E5 2650 2.4GHz 32GB memory DPDK packet generator Intel X3540-AT2 Dual port 10G NIC
+ eBPF Datapath
br0
eth1 Ingress Egress
BPF
eth0
15
14.88Mpps sender
eBPF DP Actions Mpps Redirect(no parser, lookup, actions) 1.90 Output 1.12 Set dst_mac + Output 1.14 Set GRE tunnel + Output 0.48 OVS Kernel DP Actions Mpps Output 1.34 Set dst_mac + Output 1.23 Set GRE tunnel + Output 0.57
16
All measurements are based on single flow, single core.
Features
Lesson Learned
17
18
1. Reimplement all features in userspace 2. Performance
19
Userspace Datapath
20
SDN Controller
Hardware
DPDK library
Both slow and fast path in userspace
Another datapath implementation in userspace
driver level
frames with high speed
Fill/Completion ring.
memory, achieving line rate (14Mpps)!
21
From “DPDK PMD for AF_XDP”
Goal
channel to usersapce OVS datapath
userspace
22
Network Stacks
Hardware
User space
Driver + XDP
Userspace Datapath
AF_XDP socket
Kernel
24
umem memory region: multiple 2KB chunk elements
desc
Users receives packets Users sends packets Rx Ring Tx Ring For kernel to receive packets For kernel to signal send complete Fill Ring Completion Ring One Rx/Tx pair per AF_XDP socket Descriptors pointing to umem elements
2KB
One Fill/Comp. pair per umem region
25
umem memory region: multiple 2KB chunk elements
desc
Users receives packets Users sends packets Rx Ring Tx Ring For kernel to receive packets For kernel to signal send complete Fill Ring Completion Ring One Rx/Tx pair per AF_XDP socket Descriptors pointing to umem elements
2KB
One Fill/Comp. pair per umem region Receive Transmit
26
umem consisting of 8 elements … … Rx Ring … … Fill Ring addr: 1 2 3 4 5 6 7 8 Umem mempool = {1, 2, 3, 4, 5, 6, 7, 8}
27
X X X X umem consisting of 8 elements … … Rx Ring … 1 2 3 4 … Fill Ring addr: 1 2 3 4 5 6 7 8 Umem mempool = {5, 6, 7, 8} GET four elements, program to Fill ring X: elem in use
28
X X X X umem consisting of 8 elements … 1 2 3 4 … Rx Ring … … Fill Ring addr: 1 2 3 4 5 6 7 8 Umem mempool = {5, 6, 7, 8} Kernel receives four packets Put them into the four umem chunks Transition to Rx ring for users X: elem in use
29
X X X X X X X X umem consisting of 8 elements … 1 2 3 4 … Rx Ring … 5 6 7 8 … Fill Ring addr: 1 2 3 4 5 6 7 8 Umem mempool = {} GET four elements Program Fill ring (so kernel can keeps receiving packets) X: elem in use
30
X X X X X X X X umem consisting of 8 elements … 1 2 3 4 … Rx Ring … 5 6 7 8 … Fill Ring addr: 1 2 3 4 5 6 7 8 Umem mempool = {} OVS userspace processes packets
X: elem in use
31
X X X X umem consisting of 8 elements … … Rx Ring … 5 6 7 8 … Fill Ring addr: 1 2 3 4 5 6 7 8 Umem mempool = {1, 2, 3, 4} OVS userspace finishes packet processing and recycle to umempool Back to state (1) X: elem in use
37
Three designs:
style
38
41
Multiple 2K umem chunk memory region
Idea:
ptr_array top
X X X X X X X X X X X X X
Two designs:
42
Packet data Packet metadata
44
Multiple 2K umem chunk memory region Packet metadata in another memory region
One-to-one maps to umem
receiving packet rate at the other port
kernel 4.19-rc3 and OVS 2.9
16-core Intel Xeon E5 2650 2.4GHz 32GB memory DPDK packet generator Netronome NFP-4000
+ AFXDP Userspace Datapath
br0
ingress Egress eth0
45
19Mpps sender Intel XL710 40GbE
Experiments
Results
46
XDPSOCK OVS-AFXDP Linux Kernel rxdrop 19Mpps 19Mpps < 2Mpps l2fwd 17Mpps 14Mpps < 2Mpps
Future Work
Discussion
47
48
OVS-eBPF OVS-AF_XDP OVS Kernel Module Maintenance Cost Low Low High Performance Comparable with kernel High with cost of CPU Standard (< 2Mpps) Development Efforts High Low Medium New feature deployment Easy Easy Hard due to ABI change Safety High due to verifier Depends on reviewers Depends on reviewers
49
Question?